[Bug 254040] AMD 5950X hyperthreading strange performance swings

From: <bugzilla-noreply_at_freebsd.org>
Date: Sun, 19 Dec 2021 16:07:34 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254040

--- Comment #3 from Stefan E├čer <se@FreeBSD.org> ---
Update after booting a kernel with SCHED_4BSD:

This really appears to be a SCHED_ULE issue, as expected. With SCHED_4BSD the
results are quite homogeneous for each run:

$ t # repeated runs ...
1048576000 bytes transferred in 6.720101 secs (156035746 bytes/sec)
s048576000 bytes transferred in 7.037620 secs (148995835 bytes/sec)
1048576000 bytes transferred in 7.031251 secs (149130790 bytes/sec)
1048576000 bytes transferred in 7.032412 secs (149106159 bytes/sec)
1048576000 bytes transferred in 7.202687 secs (145581229 bytes/sec)
1048576000 bytes transferred in 6.918272 secs (151566177 bytes/sec)
1048576000 bytes transferred in 6.391244 secs (164064463 bytes/sec)
1048576000 bytes transferred in 6.686209 secs (156826683 bytes/sec)
1048576000 bytes transferred in 6.668778 secs (157236600 bytes/sec)
1048576000 bytes transferred in 6.914906 secs (151639959 bytes/sec)
1048576000 bytes transferred in 6.835760 secs (153395678 bytes/sec)
1048576000 bytes transferred in 6.755348 secs (155221619 bytes/sec)

$ t 4
1048576000 bytes transferred in 7.574533 secs (138434415 bytes/sec)
1048576000 bytes transferred in 7.776473 secs (134839540 bytes/sec)
1048576000 bytes transferred in 7.839487 secs (133755689 bytes/sec)
1048576000 bytes transferred in 7.856730 secs (133462132 bytes/sec)
$ t 4
1048576000 bytes transferred in 7.481412 secs (140157499 bytes/sec)
1048576000 bytes transferred in 7.557077 secs (138754182 bytes/sec)
1048576000 bytes transferred in 7.676920 secs (136588110 bytes/sec)
1048576000 bytes transferred in 8.040430 secs (130412922 bytes/sec)
$ t 4
1048576000 bytes transferred in 7.484386 secs (140101810 bytes/sec)
1048576000 bytes transferred in 7.581198 secs (138312698 bytes/sec)
1048576000 bytes transferred in 7.710614 secs (135991248 bytes/sec)
1048576000 bytes transferred in 7.736921 secs (135528856 bytes/sec)

$ t 16
1048576000 bytes transferred in 9.879846 secs (106132833 bytes/sec)
1048576000 bytes transferred in 10.087562 secs (103947418 bytes/sec)
1048576000 bytes transferred in 10.232516 secs (102474894 bytes/sec)
1048576000 bytes transferred in 10.237664 secs (102423370 bytes/sec)
1048576000 bytes transferred in 10.320563 secs (101600658 bytes/sec)
1048576000 bytes transferred in 10.375758 secs (101060179 bytes/sec)
1048576000 bytes transferred in 10.443859 secs (100401199 bytes/sec)
1048576000 bytes transferred in 10.481844 secs (100037355 bytes/sec)
1048576000 bytes transferred in 10.494019 secs (99921294 bytes/sec)
1048576000 bytes transferred in 10.510178 secs (99767674 bytes/sec)
1048576000 bytes transferred in 10.559435 secs (99302286 bytes/sec)
1048576000 bytes transferred in 10.638978 secs (98559840 bytes/sec)
1048576000 bytes transferred in 10.678294 secs (98196963 bytes/sec)
1048576000 bytes transferred in 10.808183 secs (97016860 bytes/sec)
1048576000 bytes transferred in 11.084706 secs (94596647 bytes/sec)
1048576000 bytes transferred in 11.231343 secs (93361587 bytes/sec)

Seems that my BIOS settings were not optimal during the tests documented in
Comment 2. After loading "Optimal Default" settings and re-activation of the
energy efficiency options I do get the following sysctl output now:

$ sysctl dev.cpu.0
dev.cpu.0.temperature: 29,6C
dev.cpu.0.cx_method: C1/hlt C2/io
dev.cpu.0.cx_usage_counters: 180 70723
dev.cpu.0.cx_usage: 0.25% 99.74% last 9261us
dev.cpu.0.cx_lowest: C8
dev.cpu.0.cx_supported: C1/1/1 C2/2/18
dev.cpu.0.freq_levels: 3400/3740 2800/2800 2200/1980
dev.cpu.0.freq: 3400
dev.cpu.0.%parent: acpi0
dev.cpu.0.%pnpinfo: _HID=ACPI0007 _UID=0 _CID=none
dev.cpu.0.%location: handle=\_SB_.PLTF.C000
dev.cpu.0.%driver: cpu
dev.cpu.0.%desc: ACPI CPU

Anyway, after another reboot back to SCHED_ULE and repeating the tests with new
BIOS settings I see:

$ t 1 # multiple runs again ...
1048576000 bytes transferred in 7.640816 secs (137233502 bytes/sec)
1048576000 bytes transferred in 6.225996 secs (168418995 bytes/sec)
1048576000 bytes transferred in 4.852763 secs (216078118 bytes/sec)
1048576000 bytes transferred in 4.832574 secs (216980866 bytes/sec)
1048576000 bytes transferred in 4.819031 secs (217590617 bytes/sec)

$ t 4
1048576000 bytes transferred in 4.956440 secs (211558309 bytes/sec)
1048576000 bytes transferred in 7.634614 secs (137344998 bytes/sec)
1048576000 bytes transferred in 7.788965 secs (134623280 bytes/sec)
1048576000 bytes transferred in 7.819442 secs (134098574 bytes/sec)
$ t 4
1048576000 bytes transferred in 4.853663 secs (216038083 bytes/sec)
1048576000 bytes transferred in 4.857143 secs (215883287 bytes/sec)
1048576000 bytes transferred in 7.721846 secs (135793430 bytes/sec)
1048576000 bytes transferred in 7.792580 secs (134560821 bytes/sec)
$ t 4
1048576000 bytes transferred in 4.908570 secs (213621479 bytes/sec)
1048576000 bytes transferred in 7.742029 secs (135439433 bytes/sec)
1048576000 bytes transferred in 7.784341 secs (134703245 bytes/sec)
1048576000 bytes transferred in 7.794096 secs (134534649 bytes/sec)

Very similar to the results in the initial bug report ...

And it really shows that SCHED_ULE causes the incoherent performance. 
But the throughput with SCHED_ULE is a lot higher than with SCHED_4BSD,
probably due to the immensely higher system overhead of the latter with a large
number of cores.

While the CPU% with SCHED_ULE is in the order of 2%, with SCHED_4BSD I have
seen values up to 60% with 32 parallel tasks.

Another observation: WCPU of the bzip2 processes on the kernel with SCHED_4BSD
was displayed as way beyond 100% (in the order of 300% in one run I remember).
Since bzip2 is not multi-threaded (AFAIK) this seems to be a wrong measurement.
The CPU% for the bzip2 processes is always near 100% with SCHED_ULE.

-- 
You are receiving this mail because:
You are the assignee for the bug.