It's time to kill statistical profiling

From: Poul-Henning Kamp <phk_at_phk.freebsd.dk>
Date: Fri, 18 Jun 2021 07:36:33 +0000
Warners work to document the kernel timers in D30802 brought stathz up again.

To give a representative result, statistical profiling needs to
sample no less than approx 0.1% of instructions.

On a VAX that meant running the statistical profiling at O(1kHz).

On my 4 CPU, two thread, 2GHz laptop that means statistical profiling
needs to run at O(10 MHz), which is barely doable.

But it is worse:

The samples must be unbiased with respect to the system activity,
which was already a problem on the VAX and which is totally impossible
on modern hardware, with message based interrupts, deep pipelines
and telegraphic distance memory[1].

Therefore statistical profiling is worse than useless: it is downright
misleading, which is why modern CPUs have hardware performance counters.

Instead of documenting stathz, I suggest we retire statistical
profiling and convert the profiled libraries to code-coverage
profiling (-fprofile-arcs and -ftest-coverage)

Poul-Henning

[1] One could *possibly* approch unbiased samples, by locking the
stathz code path in L1 cache and disable L1 updates, but then
the results would be from an entirely different system.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk_at_FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Received on Fri Jun 18 2021 - 07:36:33 UTC

Original text of this message