Fine-grained locking for POSIX local sockets (UNIX domain sockets)

Kris Kennaway kris at
Sun May 7 21:04:28 UTC 2006

On Sat, May 06, 2006 at 06:19:08PM -0400, Kris Kennaway wrote:

> x norwatson-8
> + rwatson-8
> +------------------------------------------------------------+
> |                                 +                          |
> |                               + + +               x       x|
> |+                           +  +++ +          x   xxxxx x  x|
> |                   |_________A___M______|        |___A___|  |
> +------------------------------------------------------------+
>     N           Min           Max        Median           Avg        Stddev
> x  10       2601.46       2700.26       2650.52      2653.441     30.758034
> +  10       2240.86       2516.87      2496.085      2468.468     81.868576
> Difference at 95.0% confidence
>         -184.973 +/- 58.1052
>         -6.97106% +/- 2.1898%
>         (Student's t, pooled s = 61.8406)
> We see the drop in performance in both cases indicating that we are in
> the "overloaded" regime.  The fact that your patch seems to give worse
> performance is puzzling at first sight.
> Running mutex profiling (and only keeping the unp mutex entries and
> the 10 most contended for clarity) shows the following:
> norwatson, 8 clients:
>    max        total       count   avg     cnt_hold     cnt_lock name
>      5           40           9     4            0            3 kern/uipc_usrreq.c:170 (unp)
>      8            8           1     8            0            0 vm/uma_core.c:2101 (unpcb)
>     13          283          52     5            0            0 vm/uma_core.c:890 (unpcb)
>     14         1075         200     5            0            0 vm/uma_core.c:1885 (unpcb)
>      4           52          18     2            4            6 kern/uipc_usrreq.c:577 (unp)
>      5           39           9     4            4            2 kern/uipc_usrreq.c:534 (unp)
>      5           35          11     3            6            6 kern/uipc_usrreq.c:974 (unp)
>      5           45          11     4            7            4 kern/uipc_usrreq.c:210 (unp)
>    171         1164           9   129            7            2 kern/uipc_usrreq.c:917 (unp)
>     14           78          20     3           11      2872481 kern/uipc_usrreq.c:709 (unp)
>     70          156          11    14           13            4 kern/uipc_usrreq.c:895 (unp)
>     43          581          20    29           24            6 kern/uipc_usrreq.c:239 (unp)
>     44          429          18    23           26            8 kern/uipc_usrreq.c:518 (unp)
>     55          491          12    40           30           10 kern/uipc_usrreq.c:251 (unp)
> ...
>    449     20000519      320038    62        15158            0 kern/uipc_usrreq.c:431 (so_rcv)
>    459     86616085     2880079    30        15699         4944 kern/uipc_usrreq.c:319 (so_snd)
>    146      2273360      640315     3        27918        29789 kern/kern_sig.c:1002 (process lock)
>    387      3325481      640099     5        38143        47670 kern/kern_descrip.c:420 (filedesc structure)
>    150      1881990      640155     2        64111        49033 kern/kern_descrip.c:368 (filedesc structure)
>    496     13792853     3685885     3       101692       132480 kern/kern_descrip.c:1988 (filedesc structure)
>    207      4061793      551604     7       115427       118242 kern/kern_synch.c:220 (process lock)
>    391     10332282     3685885     2       194387       129547 kern/kern_descrip.c:1967 (filedesc structure)
>    465     25504709      320042    79      1632192       294498 kern/uipc_usrreq.c:364 (unp)
>    470    124263922     2880084    43     13222757      2685853 kern/uipc_usrreq.c:309 (unp)
> i.e. there is indeed heavy contention on the unp lock (column 5 counts
> the number of times we tried to acquire it and failed because someone
> else had the lock) - in fact about 5 times as many contentions as
> successful acquisitions.
> With your patch and the same load:
>      3           20           9     2            0            0 kern/uipc_usrreq.c:1028 (unp_mtx)
>      3           22           9     2            0            0 kern/uipc_usrreq.c:1161 (unp_mtx)
>      5           29           9     3            0            2 kern/uipc_usrreq.c:1065 (unp_global_mtx)
>      5           53          18     2            0        76488 kern/uipc_usrreq.c:287 (unp_global_mtx)
>      6           33           9     3            0            0 kern/uipc_usrreq.c:236 (unp_mtx)
>      6           37           9     4            0            0 kern/uipc_usrreq.c:819 (unp_mtx)
>      7            7           1     7            0            0 vm/uma_core.c:2101 (unpcb)
>      8           49           9     5            0            0 kern/uipc_usrreq.c:1101 (unp_mtx)
>     11          136          18     7            0            1 kern/uipc_usrreq.c:458 (unp_global_mtx)
>     32          143           9    15            0            1 kern/uipc_usrreq.c:1160 (unp_global_mtx)
>     44          472          18    26            0            0 kern/uipc_usrreq.c:801 (unp_mtx)
>    123          310           9    34            0            0 kern/uipc_usrreq.c:1100 (unp_mtx)
>    147          452           9    50            0            0 kern/uipc_usrreq.c:1099 (unp_mtx)
>    172          748           9    83            0            0 kern/uipc_usrreq.c:473 (unp_mtx)
>    337         1592           9   176            0            0 kern/uipc_usrreq.c:1147 (unp_mtx)
>    350         1790           9   198            0            0 kern/uipc_usrreq.c:1146 (unp_mtx)
>    780     39405928      320038   123            0            0 kern/uipc_usrreq.c:618 (unp_mtx)
>     18          140           9    15            1            0 kern/uipc_usrreq.c:235 (unp_global_mtx)
>     70          717          18    39            1            3 kern/uipc_usrreq.c:800 (unp_global_mtx)
>    528         2444           9   271            1            1 kern/uipc_usrreq.c:1089 (unp_global_mtx)
>    158          616           9    68            2            2 kern/uipc_usrreq.c:476 (unp_mtx)
>    794    175382857     2880084    60            2         7686 kern/uipc_usrreq.c:574 (unp_mtx)
>      4           25           9     2            3            2 kern/uipc_usrreq.c:422 (unp_global_mtx)
>    186          874           9    97            3            3 kern/uipc_usrreq.c:472 (unp_global_mtx)
>    768     33783759      320038   105         7442            0 kern/uipc_usrreq.c:696 (unp_mtx)
> ...
>    465       913127      320045     2        43130        35046 kern/uipc_socket.c:1101 (so_snd)
>    483      2453927      628737     3        44768        46177 kern/kern_sig.c:1002 (process lock)
>    767    124298544     2880082    43        70037        59994 kern/uipc_usrreq.c:581 (so_snd)
>    794     45176699      320038   141        83252        72140 kern/uipc_usrreq.c:617 (unp_global_mtx)
>    549      9858281     3200210     3       579269       712643 kern/kern_resource.c:1172 (sleep mtxpool)
>    554     17122245      631715    27       641888       268243 kern/kern_descrip.c:420 (filedesc structure)
>    388      3009912      631753     4       653540       260590 kern/kern_descrip.c:368 (filedesc structure)
>    642     49626755     3681446    13      1642954       682669 kern/kern_descrip.c:1988 (filedesc structure)
>    530     13802687     3681446     3      1663244       616899 kern/kern_descrip.c:1967 (filedesc structure)
>    477     23472709     2810986     8      5671248      1900047 kern/kern_synch.c:220 (process lock)
> The top 10 heavily contended mutexes are very different (but note the
> number of mutex acquisitions, column 3, is about the same).
> There is not much contention on unp_global_mtx any longer, but there
> is a lot more on some of the other mutexes, especially the process
> lock via msleep().  Off-hand I don't know what is the cause of this
> bottleneck (note: libthr is used as threading library and libpthread
> is not ported to sparc64).
> Also, a lot of the contention that used to be on the unp lock seems to
> have fallen through onto contending *two* of the filedesc locks (all
> about 1.6 million contentions).  This may also help to explain the
> performance drop.

I profiled the wait channels in msleep and found:

   1 getblk
   3 wait
1113 sigwait
2228 umtxq_busy
4505 fdesc
9405 umtx
14264 sbwait
31515 total

   1 getblk
   1 wait
1413 sigwait
2794 umtxq_busy
4638 fdesc
10779 umtx
11663 sbwait
31287 total

i.e. the same number of msleep calls in both cases, but the patch
shuffles them from sbwait onto everything else.

Relative change:
sbwait     -19%
umtx       +14%
fdesc      +3%
umtxq_busy +25%
sigwait    +27%

This may help to understand why performance drops at high load when
unp locking is moved out of the way.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url :

More information about the freebsd-current mailing list