Moving forward with vm page lock

K. Macy kmacy at freebsd.org
Sat Apr 17 22:26:42 UTC 2010


Last February Jeff Roberson first shared his vm page lock patch with me.
The general premise is that modification of a vm_page_t is no longer
protected by the global "vm page queue mutex" but is instead protected
by an entry in an array of locks which each vm_page_t is hashed to
by its physical address. This complicates pmap somewhat because it increases
the number of cases where retry logic is required if we need to drop the
pmap lock in order to first acquire the page lock (see pa_tryrelock).


I've continued refining Jeff's initial page lock patch by resolving
lock ordering issues in vm_pageout, eliminating pv_lock, and eliminating the
need for pmap_collect on amd64. Rather than exposing ourselves to a race
condition by dropping the locks in pmap_collect, I pre-allocate any
necessary pv_entrys before changing any pmap state. This complicated calls
to demote slightly, but that can probably be simplified later. Currently
only amd64 supports this. Other platforms map vm_page_lock(m) to the
vm page queue mutex.


The current version of the patch can be found at:
http://people.freebsd.org/~kmacy/diffs/head_page_lock.diff

I've been refining it in a subversion branch at:
svn://svn.freebsd.org/base/user/kmacy/head_page_lock



On my workloads at a CDN startup I've seen as much as a 50% increase in
lighttpd throughput (3.2Gbps -> 4.8Gbps). At Jeff's request I've
done some basic measurements with buildkernel to demonstrate that,
at least on my hardware, a dual 4-core
"CPU: Intel(R) Xeon(R) CPU L5420  @ 2.50GHz (2500.01-MHz K8-class CPU)"
with 64GB of RAM there is no performance regression.

I did 2 warm up runs followed by 10 samples of
"time make -j16 buildkernel KERNCONF=GENERIC -DNO_MODULES
-DNO_KERNELCONFIG -DNO_KERNELDEPEND" on a ZFS file system on a twa
based raid device for both with page_lock and without. Wall clock time
is consistently just under a second lower (faster build time) for the
page_lock kernel. The bulk of the time is actually spent in user so it
is more meaningful to compare system times. I attached the logs of the
runs and the two files I fed to ministat.



ministat -c 95 -w 72 base page_lock
x base
+ page_lock
+------------------------------------------------------------------------+
|    +  ++                                                               |
|+   ++ +++  +                                            x    xxxx xxxxx|
|   |__AM__|                                                   |___AM__| |
+------------------------------------------------------------------------+
  N           Min           Max        Median           Avg        Stddev
x  10         47.35         49.09         48.64        48.417    0.53416706
+  10         40.04         41.52         40.98        40.844    0.41494846
Difference at 95.0% confidence
      -7.573 +/- 0.449396
      -15.6412% +/- 0.928179%
      (Student's t, pooled s = 0.478287)




ramsan2.lab1# head -2 prof.out
                    debug.lock.prof.stats:
   max  wait_max       total  wait_total       count    avg wait_avg
cnt_hold cnt_lock name
ramsan2.lab1# sort -nrk 4 prof.out | head
  1592    243918     1768980    12026988      287680      6     41
0 112005 /usr/home/kmacy/head_page_lock/sys/vm/vm_page.c:1065 (sleep
mutex:vm page queue free mutex)
  3967    750285     1678130     9447247      276594      6     34
0 104952 /usr/home/kmacy/head_page_lock/sys/vm/vm_page.c:1388 (sleep
mutex:vm page queue mutex)
 18234    163969     5417360     9213400      282459     19     32
0   6548 /usr/home/kmacy/head_page_lock/sys/amd64/amd64/pmap.c:3372
(sleep mutex:page lock)
 173094    134890    18226507     8195920       49757    366    164
0    625 /usr/home/kmacy/head_page_lock/sys/kern/vfs_subr.c:2091
(lockmgr:zfs)
   254    167136       38222     5153728        2736     13   1883
0   2333 /usr/home/kmacy/head_page_lock/sys/amd64/amd64/pmap.c:550
(sleep mutex:page lock)
  1160    104774     1624269     4380034      279240      5     15
0 107998 /usr/home/kmacy/head_page_lock/sys/vm/vm_page.c:1508 (sleep
mutex:vm page queue free mutex)
  1107     80128     1581048     3377896      274341      5     12
0 100130 /usr/home/kmacy/head_page_lock/sys/vm/vm_page.c:1300 (sleep
mutex:vm page queue mutex)
 104802    284128    14712290     2970729      259423     56     11
0   1900 /usr/home/kmacy/head_page_lock/sys/vm/vm_object.c:721 (sleep
mutex:page lock)
 84339    158037     1455568     2875384       85147     17     33
0    292 /usr/home/kmacy/head_page_lock/sys/kern/vfs_cache.c:390
(rw:Name Cache)
     9    995901         236     2468160          46      5  53655
0     45 /usr/home/kmacy/head_page_lock/sys/kern/sched_ule.c:2552
(spin mutex:sched lock 4)


Both Giovanni Trematerra and I have run stress2 on it for extended
periods with problems in evidence.

I'd like to see this go in to HEAD by the end of this month. Once
this change has proven to be stable by a wider audience I will
extend it to i386.

Thanks,
Kip


More information about the freebsd-arch mailing list