NFS client perf. degradation when SCHED_ULE is used (was when SMP enabled)
Colin Percival
cperciva at tarsnap.com
Sat Jun 3 05:01:45 UTC 2017
On 05/28/17 13:16, Rick Macklem wrote:
> cperciva@ is running a highly parallelized buuildworld and he sees better
> slightly better elapsed times and much lower system CPU for SCHED_ULE.
>
> As such, I suspect it is the single threaded, processes mostly sleeping waiting
> for I/O case that is broken.
> I suspect this is how many people use NFS, since a highly parallelized make would
> not be a typical NFS client task, I think?
Running `make buildworld -j36` on an EC2 "c4.8xlarge" instance (36 vCPUs, 60
GB RAM, 10 GbE) with GENERIC-NODEBUG, ULE has a slight edge over 4BSD:
GENERIC-NODEBUG, SCHED_4BSD:
1h14m12.48s real 6h25m44.59s user 1h4m53.42s sys
1h15m25.48s real 6h25m12.20s user 1h4m34.23s sys
1h13m34.02s real 6h25m14.44s user 1h4m09.55s sys
1h13m44.04s real 6h25m08.60s user 1h4m40.21s sys
1h14m59.69s real 6h25m53.13s user 1h4m55.20s sys
1h14m24.00s real 6h24m59.29s user 1h5m37.31s sys
GENERIC-NODEBUG, SCHED_ULE:
1h13m00.61s real 6h02m47.59s user 26m45.89s sys
1h12m30.18s real 6h01m39.97s user 26m16.45s sys
1h13m08.43s real 6h01m46.94s user 26m39.20s sys
1h12m18.94s real 6h02m26.80s user 27m39.71s sys
1h13m21.38s real 6h00m46.13s user 27m14.96s sys
1h12m01.80s real 6h02m24.48s user 27m18.37s sys
Running `make buildworld -j2` on an E2 "m4.large" instance (2 vCPUs, 8 GB RAM,
~ 500 Mbps network), 4BSD has a slight edge over ULE on real and sys
time but is slightly worse on user time:
GENERIC-NODEBUG, SCHED_4BSD:
6h29m25.17s real 7h2m56.02s user 14m52.63s sys
6h29m36.82s real 7h2m58.19s user 15m14.21s sys
6h28m27.61s real 7h1m38.24s user 14m56.91s sys
6h27m05.42s real 7h1m38.57s user 15m04.31s sys
GENERIC-NODEBUG, SCHED_ULE:
6h34m19.41s real 6h59m43.99s user 18m8.62s sys
6h33m55.08s real 6h58m44.91s user 18m4.31s sys
6h34m49.68s real 6h56m03.58s user 17m49.83s sys
6h35m22.14s real 6h58m12.62s user 17m52.05s sys
Note that in both cases there is lots of idle time (although far more in the
-j36 case); this is partly due to a lack of parallelism in buildworld, but
largely due to having /usr/obj mounted on Amazon EFS.
These differences all seem within the range which could result from cache
effects due to threads staying on one CPU rather than bouncing around; so
whatever Rick is tripping over, it doesn't seem to be affecting these tests.
--
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid
More information about the freebsd-current
mailing list