Moving forward with vm page lock
K. Macy
kmacy at freebsd.org
Sat Apr 17 22:26:42 UTC 2010
Last February Jeff Roberson first shared his vm page lock patch with me.
The general premise is that modification of a vm_page_t is no longer
protected by the global "vm page queue mutex" but is instead protected
by an entry in an array of locks which each vm_page_t is hashed to
by its physical address. This complicates pmap somewhat because it increases
the number of cases where retry logic is required if we need to drop the
pmap lock in order to first acquire the page lock (see pa_tryrelock).
I've continued refining Jeff's initial page lock patch by resolving
lock ordering issues in vm_pageout, eliminating pv_lock, and eliminating the
need for pmap_collect on amd64. Rather than exposing ourselves to a race
condition by dropping the locks in pmap_collect, I pre-allocate any
necessary pv_entrys before changing any pmap state. This complicated calls
to demote slightly, but that can probably be simplified later. Currently
only amd64 supports this. Other platforms map vm_page_lock(m) to the
vm page queue mutex.
The current version of the patch can be found at:
http://people.freebsd.org/~kmacy/diffs/head_page_lock.diff
I've been refining it in a subversion branch at:
svn://svn.freebsd.org/base/user/kmacy/head_page_lock
On my workloads at a CDN startup I've seen as much as a 50% increase in
lighttpd throughput (3.2Gbps -> 4.8Gbps). At Jeff's request I've
done some basic measurements with buildkernel to demonstrate that,
at least on my hardware, a dual 4-core
"CPU: Intel(R) Xeon(R) CPU L5420 @ 2.50GHz (2500.01-MHz K8-class CPU)"
with 64GB of RAM there is no performance regression.
I did 2 warm up runs followed by 10 samples of
"time make -j16 buildkernel KERNCONF=GENERIC -DNO_MODULES
-DNO_KERNELCONFIG -DNO_KERNELDEPEND" on a ZFS file system on a twa
based raid device for both with page_lock and without. Wall clock time
is consistently just under a second lower (faster build time) for the
page_lock kernel. The bulk of the time is actually spent in user so it
is more meaningful to compare system times. I attached the logs of the
runs and the two files I fed to ministat.
ministat -c 95 -w 72 base page_lock
x base
+ page_lock
+------------------------------------------------------------------------+
| + ++ |
|+ ++ +++ + x xxxx xxxxx|
| |__AM__| |___AM__| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 47.35 49.09 48.64 48.417 0.53416706
+ 10 40.04 41.52 40.98 40.844 0.41494846
Difference at 95.0% confidence
-7.573 +/- 0.449396
-15.6412% +/- 0.928179%
(Student's t, pooled s = 0.478287)
ramsan2.lab1# head -2 prof.out
debug.lock.prof.stats:
max wait_max total wait_total count avg wait_avg
cnt_hold cnt_lock name
ramsan2.lab1# sort -nrk 4 prof.out | head
1592 243918 1768980 12026988 287680 6 41
0 112005 /usr/home/kmacy/head_page_lock/sys/vm/vm_page.c:1065 (sleep
mutex:vm page queue free mutex)
3967 750285 1678130 9447247 276594 6 34
0 104952 /usr/home/kmacy/head_page_lock/sys/vm/vm_page.c:1388 (sleep
mutex:vm page queue mutex)
18234 163969 5417360 9213400 282459 19 32
0 6548 /usr/home/kmacy/head_page_lock/sys/amd64/amd64/pmap.c:3372
(sleep mutex:page lock)
173094 134890 18226507 8195920 49757 366 164
0 625 /usr/home/kmacy/head_page_lock/sys/kern/vfs_subr.c:2091
(lockmgr:zfs)
254 167136 38222 5153728 2736 13 1883
0 2333 /usr/home/kmacy/head_page_lock/sys/amd64/amd64/pmap.c:550
(sleep mutex:page lock)
1160 104774 1624269 4380034 279240 5 15
0 107998 /usr/home/kmacy/head_page_lock/sys/vm/vm_page.c:1508 (sleep
mutex:vm page queue free mutex)
1107 80128 1581048 3377896 274341 5 12
0 100130 /usr/home/kmacy/head_page_lock/sys/vm/vm_page.c:1300 (sleep
mutex:vm page queue mutex)
104802 284128 14712290 2970729 259423 56 11
0 1900 /usr/home/kmacy/head_page_lock/sys/vm/vm_object.c:721 (sleep
mutex:page lock)
84339 158037 1455568 2875384 85147 17 33
0 292 /usr/home/kmacy/head_page_lock/sys/kern/vfs_cache.c:390
(rw:Name Cache)
9 995901 236 2468160 46 5 53655
0 45 /usr/home/kmacy/head_page_lock/sys/kern/sched_ule.c:2552
(spin mutex:sched lock 4)
Both Giovanni Trematerra and I have run stress2 on it for extended
periods with problems in evidence.
I'd like to see this go in to HEAD by the end of this month. Once
this change has proven to be stable by a wider audience I will
extend it to i386.
Thanks,
Kip
More information about the freebsd-arch
mailing list