Is kern.sched.preempt_thresh=0 a sensible default? (was: Re: Extremely low disk throughput under high compute load)
Stefan Esser
se at freebsd.org
Wed Apr 4 13:19:52 UTC 2018
Am 02.04.18 um 00:18 schrieb Stefan Esser:
> Am 01.04.18 um 18:33 schrieb Warner Losh:
>> On Sun, Apr 1, 2018 at 9:18 AM, Stefan Esser <se at freebsd.org
>> <mailto:se at freebsd.org>> wrote:
>>
>> My i7-2600K based system with 24 GB RAM was in the midst of a buildworld -j8
>> (starting from a clean state) which caused a load average of 12 for more than
>> 1 hour, when I decided to move a directory structure holding some 10 GB to its
>> own ZFS file system. File sizes varied, but were mostly in the range 0f 500KB.
>>
>> I had just thrown away /usr/obj, but /usr/src was cached in ARC and thus there
>> was nearly no disk activity caused by the buildworld.
>>
>> The copying proceeded at a rate of at most 10 MB/s, but most of the time less
>> than 100 KB/s were transferred. The "cp" process had a PRIO of 20 and thus a
>> much better priority than the compute bound compiler processes, but it got
>> just 0.2% to 0.5% of 1 CPU core. Apparently, the copy process was scheduled
>> at such a low rate, that it only managed to issue a few controller writes per
>> second.
>>
>> The system is healthy and does not show any problems or anomalies under
>> normal use (e.g., file copies are fast, without the high compute load).
>>
>> This was with SCHED_ULE on a -CURRENT without WITNESS or malloc debugging.
>>
>> Is this a regression in -CURRENT?
>>
>> Does 'sync' push a lot of I/O to the disk?
>
> Each sync takes 0.7 to 1.5 seconds to complete, but since reading is so
> slow, not much is written.
>
> Normal gstat output for the 3 drives the RAIDZ1 consists of:
>
> dT: 1.002s w: 1.000s
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> 0 2 2 84 39.1 0 0 0.0 7.8 ada0
> 0 4 4 92 66.6 0 0 0.0 26.6 ada1
> 0 6 6 259 66.9 0 0 0.0 36.2 ada3
> dT: 1.058s w: 1.000s
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> 0 1 1 60 70.6 0 0 0.0 6.7 ada0
> 0 3 3 68 71.3 0 0 0.0 20.2 ada1
> 0 6 6 242 65.5 0 0 0.0 28.8 ada3
> dT: 1.002s w: 1.000s
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> 0 5 5 192 44.8 0 0 0.0 22.4 ada0
> 0 6 6 160 61.9 0 0 0.0 26.5 ada1
> 0 6 6 172 43.7 0 0 0.0 26.2 ada3
>
> This includes the copy process and the reads caused by "make -j 8 world"
> (but I assume that all the source files are already cached in ARC).
I have identified the cause of the extremely low I/O performance (2 to 6 read
operations scheduled per second).
The default value of kern.sched.preempt_thresh=0 does not give any CPU to the
I/O bound process unless a (long) time slice expires (kern.sched.quantum=94488
on my system with HZ=1000) or one of the CPU bound processes voluntarily gives
up the CPU (or exits).
Any non-zero value of preemt_thresh lets the system perform I/O in parallel
with the CPU bound processes, again.
I'm not sure about the bias relative to the PRI values displayed by top, but
for me a process with PRI above 72 (in top) should be eligible for preemption.
What value of preempt_thresh should I use to get that behavior?
And, more important: Is preempt_thresh=0 a reasonable default???
This prevents I/O bound processes from making reasonable progress if all CPU
cores/threads are busy. In my case, performance dropped from > 10 MB/s to just
a few hundred KB per second, i.e. by a factor of 30. (The %busy values in my
previous mail are misleading: At 10 MB/s the disk was about 70% busy ...)
Should preempt_thresh be set to some (possibly high, to only preempt long
running processes) value?
Regards, STefan
More information about the freebsd-current
mailing list