Benchmarking mpsafevfs with parallel tarball extraction
Kris Kennaway
kris at obsecurity.org
Fri May 6 11:35:31 PDT 2005
Here are my benchmark numbers for parallel tarball extraction
with/without mpsafevfs on a 12-processor E4500 running up-to-date 6.0.
Kernel was built without INVARIANTS and other debugging options,
without ADAPTIVE_GIANT (which causes about a 200% performance penalty
on system time in my testing, and has marginal impact on real or user
time) and with 4BSD scheduler (ULE causes spontaneous reboots on this
machine). The e4500 uses the esp SCSI controller, which runs
without Giant.
The test is this:
#!/bin/sh
for i in 1 2 3 4 5 6 7 8 9 10 11 12; do
mkdir $i
tar xfC /var/portbuild/sparc64/5/tarballs/bindist.tar $i &
done
on a 2000mb preallocated malloc backed md disk (machine has 5GB RAM).
Before each test I umount, newfs with default options (i.e. no -U;
this kills performance on md by a factor of several times) and mount.
The tarball is
# ls -l /var/portbuild/sparc64/5/tarballs/bindist.tar
-rw-r--r-- 1 kris kris 133231104 Apr 28 12:18 /var/portbuild/sparc64/5/tarballs/bindist.tar
# tar tvf /var/portbuild/sparc64/5/tarballs/bindist.tar | wc -l
5664
(it's a copy of a sparc64 5.4-STABLE world I use to populate package
build chroots).
A single extraction (with tarball cached) with mpsafevfs=1 takes:
14.85 real 1.31 user 10.43 sys
14.90 real 1.31 user 10.40 sys
15.03 real 1.26 user 10.55 sys
14.49 real 1.35 user 10.47 sys
14.50 real 1.36 user 10.42 sys
14.50 real 1.28 user 10.52 sys
14.52 real 1.33 user 10.48 sys
14.44 real 1.38 user 10.36 sys
14.54 real 1.37 user 10.39 sys
14.63 real 1.29 user 10.56 sys
mean=14.64 seconds real time
without mpsafevfs:
14.72 real 1.39 user 10.45 sys
14.70 real 1.40 user 10.47 sys
14.99 real 1.41 user 10.54 sys
15.13 real 1.48 user 10.45 sys
15.18 real 1.40 user 10.50 sys
14.87 real 1.64 user 10.38 sys
14.66 real 1.42 user 10.37 sys
14.69 real 1.49 user 10.30 sys
14.87 real 1.45 user 10.60 sys
14.75 real 1.47 user 10.43 sys
mean=14.86 real
x mpsafevfs
+ !mpsafevfs
+--------------------------------------------------------------------------+
| + x |
| + ++ + + + x xx x x + x + x + x x|
||__________M________A__|_________________|_M________________| |
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 14.66 15.18 14.87 14.856 0.18810163
+ 10 14.44 15.03 14.54 14.64 0.2081666
Difference at 95.0% confidence
-0.216 +/- 0.186404
-1.45396% +/- 1.25474%
(Student's t, pooled s = 0.198388)
So mpsafevfs has a slight measurable benefit even for non-concurrent
extraction.
The parallel extraction without mpsafevfs:
319.42 real 35.70 user 1547.38 sys
317.80 real 35.41 user 1532.87 sys
318.49 real 35.35 user 1542.23 sys
321.82 real 35.51 user 1559.50 sys
317.66 real 35.51 user 1566.16 sys
318.63 real 35.64 user 1552.48 sys
319.51 real 35.69 user 1548.99 sys
317.79 real 35.34 user 1542.89 sys
319.89 real 35.70 user 1536.34 sys
318.76 real 35.24 user 1545.21 sys
with mpsafevfs:
80.24 real 27.70 user 475.54 sys
83.13 real 27.94 user 491.55 sys
87.66 real 28.45 user 500.68 sys
81.88 real 28.12 user 463.51 sys
83.23 real 27.87 user 483.62 sys
82.20 real 28.07 user 482.57 sys
83.82 real 28.29 user 473.70 sys
84.54 real 27.95 user 472.12 sys
80.29 real 28.24 user 461.87 sys
87.77 real 28.34 user 482.03 sys
82.10 real 27.79 user 475.31 sys
system clock:
+--------------------------------------------------------------------------+
| x ++ |
| x ++ |
|xx +++|
|xxxx +++|
||A| |A |
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 461.87 500.68 482.03 478.719 11.975802
+ 10 1532.87 1566.16 1547.38 1547.405 10.066401
Difference at 95.0% confidence
1068.69 +/- 10.3942
223.239% +/- 2.17124%
(Student's t, pooled s = 11.0624)
wall clock:
+--------------------------------------------------------------------------+
| + |
| + |
| + |
| x + |
| x + |
| x + |
|xx + |
|xxx + |
|xxx ++|
||A| A||
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 11 80.24 87.77 83.13 83.350909 2.5277241
+ 10 317.66 321.82 318.76 318.977 1.2618333
Difference at 95.0% confidence
235.626 +/- 1.85556
282.692% +/- 2.2262%
(Student's t, pooled s = 2.02905)
i.e. mpsafevfs shows enormous improvements in both cases.
Comparing to the mean time for a single extraction, 12 simultaneous
extractions with mpsafevfs take the time of 5.69 single, and 21.788
without mpsafevfs. This is an effective concurrency of 2.11 (12/5.69)
extractions for mpsafevfs and 0.55 without (i.e. nearly twice as bad
as just sequentializing the extractions).
I might be bumping into the bandwidth of md here - when I ran less
rigorous tests with lower concurrency of extractions I seemed to be
getting marginally better performance (about an effective concurrency
of 2.2 for both 3 and 10 simultaneous extractions - so at least it
doesn't seem to degrade badly). Or this might be reflecting VFS lock
contention (which there is certainly a lot of, according to mutex
profiling traces).
Certainly for package builds on this machine I get much better
performance and lower CPU utilization if I do every package build in a
separate (swap-backed) md than with them all in a single large md,
which tells me it's not hard to saturate a single md.
Even if I am hitting another limit here that is placing an upper bound
on the performance, filesystem performance with mpsafevfs is clearly
much better than without, and we are now seeing clear benefits from
SMP on 6.0 compared to earlier versions of FreeBSD.
Kris
P.S. Big props to Jeff Roberson for making this work! Thanks also to
Hiroki Sato for donating the E4500 and other machine resources.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-smp/attachments/20050506/a679e571/attachment.bin
More information about the freebsd-smp
mailing list