Investigation: RobotUnit publishing performance

The reason seems to be context switches from the operating system scheduler, as suspected yesterday. I wrote a little more in depth information, about where we spend time during the publishing task using getrusage as follows:

        struct rusage afterUsage = {};
        getrusage(RUSAGE_THREAD, &afterUsage);
        IceUtil::Time afterTime = IceUtil::Time::now();

        std::uint64_t afterUserTime = timevalToMicroSeconds(afterUsage.ru_utime);
        std::uint64_t beforeUserTime = timevalToMicroSeconds(beforeUsage.ru_utime);
        double userTimeMs = 0.001 * (afterUserTime - beforeUserTime);

        std::uint64_t afterKernelTime = timevalToMicroSeconds(afterUsage.ru_stime);
        std::uint64_t beforeKernelTime = timevalToMicroSeconds(beforeUsage.ru_stime);
        double kernelTimeMs = 0.001 * (afterUserTime - beforeUserTime);

        IceUtil::Time clockTime = afterTime - beforeTime;
        double clockTimeMs = 1000.0f * clockTime.toSecondsDouble();

        int contextSwitchesVoluntarily = afterUsage.ru_nvcsw - beforeUsage.ru_nvcsw;
        int contextSwitchesForced = afterUsage.ru_nivcsw - beforeUsage.ru_nivcsw;

        if (clockTimeMs > 20.0)
        {
            // We have a spike, report all the things!
            ARMARX_WARNING << "Clock Duration: " << clockTimeMs
                           << "\nUser Time: " << userTimeMs
                           << "\nKernelTime: " << kernelTimeMs
                           << "\nCX Switch Voluntarily: " << contextSwitchesVoluntarily
                           << "\nCX Switch Forced: " << contextSwitchesForced;
        }

This shows that the user and kernel time (i.e. the actual time on the CPU) stays quite low (around 10 ms on my machine). But the clock time (real-time) exceeds 100 ms pretty fast. That means we spent 80+ ms being scheduled off any CPU core. That means we may be blocking in some mutexes, network calls or be just randomly scheduled off the CPU.

I also output, how many times a context switch appeared voluntarily (i.e. due to a mutex, network call, etc.) and a context switch was forced from the scheduling (i.e. during normal non-blocking code). There are some cases with 3-4 forced context switches. But the delay also happens, when there are only voluntary context switches. Example:

Clock Duration: 83.781
User Time: 5.581
KernelTime: 5.581
CX Switch Voluntarily: 1
CX Switch Forced: 0

That means that we gave up control at one point (mutex, network call) and it took almost 80 ms until the thread was rescheduled again! The total time actually doing something on the CPU was 5.6 ms.

The place at which the context switch appears does not seem to be constant. It happens all over the place. Here are some examples where at it happens at different locations:

Unit Update: 0.909
Sensor Update: 49.583
Control Update: 2.434
NJoint Update: 0.746
Class Update: 0.007

Unit Update: 1.062
Sensor Update: 3.965
Control Update: 1.991
NJoint Update: 46.193
Class Update: 0.01

Unit Update: 46.138
Sensor Update: 6.444
Control Update: 2.313
NJoint Update: 0.76
Class Update: 0.007

Unit Update: 0.945
Sensor Update: 3.813
Control Update: 49.133
NJoint Update: 0.707
Class Update: 0.006

this implies that it is also not related to a particular mutex, right? otherwise, it would always happen at the same place.

Could also be diagnosed in ARMAR-DE and the simulation. (@dreher , @paus.fabian )

Potential (temporal?) workaround:

Disable publishing sensor values to the robot state observer
Update client code relying on robot state observer (=> need to find that first) to use another source (e.g. robot state memory). Candidates:
- Hand open/close
- Grasping (SH demo)
- ...?

Maybe this just buys some time. Still a bit unclear what the actual cause is.

@christophpohl : There are ice utilities for profiling/monitoring the ice code. Maybe we can use that.

We could also try disabling the memory update topic messages published by the robot state memory (probably not used but could cause high traffic)

Can be tried with this change: https://gitlab.com/ArmarX/RobotAPI/-/merge_requests/234

mentioned in issue #89

Investigation: RobotUnit publishing performance

Designs

Child items 0

Activity