Too many page faults on sparc64?

Sat Dec 10 17:06:57 PST 2005

I have been benchmarking jasone's new malloc implementation on a
14-cpu sparc64 (it performs pretty well, but that's not the point of
this mail).  Measuring MUTEX_PROFILING on a benchmark that does lots
of malloc()/free() in a threaded application, I see the following
acquisitions (column 3 is relevant):

   max        total       count   avg     cnt_hold     cnt_lock name
   646     13006535     2153037     6        15742        96076 kern/kern_condvar.c:135 (lockbuilder mtxpool)
   330      5397096     2154156     2          950         1035 vm/vm_fault.c:849 (vm page queue mutex)
   390     10672286     2154220     4         1068         1328 vm/vm_fault.c:344 (vm page queue mutex)
   378      6064832     2154870     2        33658        33444 sparc64/sparc64/trap.c:439 (process lock)
   423      7178990     2154870     3          638          591 sparc64/sparc64/trap.c:449 (process lock)
   312      4746241     2154903     2         1135         1280 vm/vm_fault.c:907 (vm page queue mutex)
   476      5494516     2154903     2          519         1194 vm/vm_fault.c:929 (process lock)
   701     66305500     2154903    30            0          189 vm/vm_fault.c:295 (vm object)
   735     21517443     2154903     9            0            0 vm/vm_fault.c:906 (vm object)
   615     55351240     2155743    25         2219         2158 sparc64/sparc64/pmap.c:1288 (vm page queue mutex)
   620     64873979     2155743    30            0            0 sparc64/sparc64/pmap.c:1289 (pmap)
   206      3527815     2155838     1            0            0 vm/vm_object.c:452 (vm object)
   365      8532728     2165256     3        33002        55736 kern/kern_sx.c:157 (lockbuilder mtxpool)
   675     79063301     2165256    36        93683        16249 kern/kern_sx.c:245 (lockbuilder mtxpool)

These seem to all be associated with processing of page faults.

Measuring with /usr/bin/time -l shows:

      246.48 real       176.50 user       419.00 sys
      1536  maximum resident set size
         7  average shared memory size
    323710  average unshared data size
       115  average unshared stack size
   1795339  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
   1794932  voluntary context switches
     10759  involuntary context switches

i.e. the number of mutex acquisitions correlates with the page
reclaims and VCSs (verified with other numbers of threads).  We're not
sure why there are so many page faults being seen, though - there
should be on the order of 100 pages touched on this workload, and
indeed this is observed on i386.  Also, time -l doesn't record any (is
the counter broken?)

I enabled KTR_TRAP and it shows that most of the traps are type
T_DATA_MISS.

Why should there be so many of these traps (and why aren't they
measured by rusage)?

Kris

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-sparc64/attachments/20051210/7a86940c/attachment.bin