Fine-grained locking for POSIX local sockets (UNIX domain
sockets)
Kris Kennaway
kris at obsecurity.org
Sun May 7 21:04:28 UTC 2006
On Sat, May 06, 2006 at 06:19:08PM -0400, Kris Kennaway wrote:
> x norwatson-8
> + rwatson-8
> +------------------------------------------------------------+
> | + |
> | + + + x x|
> |+ + +++ + x xxxxx x x|
> | |_________A___M______| |___A___| |
> +------------------------------------------------------------+
> N Min Max Median Avg Stddev
> x 10 2601.46 2700.26 2650.52 2653.441 30.758034
> + 10 2240.86 2516.87 2496.085 2468.468 81.868576
> Difference at 95.0% confidence
> -184.973 +/- 58.1052
> -6.97106% +/- 2.1898%
> (Student's t, pooled s = 61.8406)
>
> We see the drop in performance in both cases indicating that we are in
> the "overloaded" regime. The fact that your patch seems to give worse
> performance is puzzling at first sight.
>
> Running mutex profiling (and only keeping the unp mutex entries and
> the 10 most contended for clarity) shows the following:
>
> norwatson, 8 clients:
>
> debug.mutex.prof.stats:
> max total count avg cnt_hold cnt_lock name
> 5 40 9 4 0 3 kern/uipc_usrreq.c:170 (unp)
> 8 8 1 8 0 0 vm/uma_core.c:2101 (unpcb)
> 13 283 52 5 0 0 vm/uma_core.c:890 (unpcb)
> 14 1075 200 5 0 0 vm/uma_core.c:1885 (unpcb)
> 4 52 18 2 4 6 kern/uipc_usrreq.c:577 (unp)
> 5 39 9 4 4 2 kern/uipc_usrreq.c:534 (unp)
> 5 35 11 3 6 6 kern/uipc_usrreq.c:974 (unp)
> 5 45 11 4 7 4 kern/uipc_usrreq.c:210 (unp)
> 171 1164 9 129 7 2 kern/uipc_usrreq.c:917 (unp)
> 14 78 20 3 11 2872481 kern/uipc_usrreq.c:709 (unp)
> 70 156 11 14 13 4 kern/uipc_usrreq.c:895 (unp)
> 43 581 20 29 24 6 kern/uipc_usrreq.c:239 (unp)
> 44 429 18 23 26 8 kern/uipc_usrreq.c:518 (unp)
> 55 491 12 40 30 10 kern/uipc_usrreq.c:251 (unp)
> ...
> 449 20000519 320038 62 15158 0 kern/uipc_usrreq.c:431 (so_rcv)
> 459 86616085 2880079 30 15699 4944 kern/uipc_usrreq.c:319 (so_snd)
> 146 2273360 640315 3 27918 29789 kern/kern_sig.c:1002 (process lock)
> 387 3325481 640099 5 38143 47670 kern/kern_descrip.c:420 (filedesc structure)
> 150 1881990 640155 2 64111 49033 kern/kern_descrip.c:368 (filedesc structure)
> 496 13792853 3685885 3 101692 132480 kern/kern_descrip.c:1988 (filedesc structure)
> 207 4061793 551604 7 115427 118242 kern/kern_synch.c:220 (process lock)
> 391 10332282 3685885 2 194387 129547 kern/kern_descrip.c:1967 (filedesc structure)
> 465 25504709 320042 79 1632192 294498 kern/uipc_usrreq.c:364 (unp)
> 470 124263922 2880084 43 13222757 2685853 kern/uipc_usrreq.c:309 (unp)
>
> i.e. there is indeed heavy contention on the unp lock (column 5 counts
> the number of times we tried to acquire it and failed because someone
> else had the lock) - in fact about 5 times as many contentions as
> successful acquisitions.
>
> With your patch and the same load:
>
> 3 20 9 2 0 0 kern/uipc_usrreq.c:1028 (unp_mtx)
> 3 22 9 2 0 0 kern/uipc_usrreq.c:1161 (unp_mtx)
> 5 29 9 3 0 2 kern/uipc_usrreq.c:1065 (unp_global_mtx)
> 5 53 18 2 0 76488 kern/uipc_usrreq.c:287 (unp_global_mtx)
> 6 33 9 3 0 0 kern/uipc_usrreq.c:236 (unp_mtx)
> 6 37 9 4 0 0 kern/uipc_usrreq.c:819 (unp_mtx)
> 7 7 1 7 0 0 vm/uma_core.c:2101 (unpcb)
> 8 49 9 5 0 0 kern/uipc_usrreq.c:1101 (unp_mtx)
> 11 136 18 7 0 1 kern/uipc_usrreq.c:458 (unp_global_mtx)
> 32 143 9 15 0 1 kern/uipc_usrreq.c:1160 (unp_global_mtx)
> 44 472 18 26 0 0 kern/uipc_usrreq.c:801 (unp_mtx)
> 123 310 9 34 0 0 kern/uipc_usrreq.c:1100 (unp_mtx)
> 147 452 9 50 0 0 kern/uipc_usrreq.c:1099 (unp_mtx)
> 172 748 9 83 0 0 kern/uipc_usrreq.c:473 (unp_mtx)
> 337 1592 9 176 0 0 kern/uipc_usrreq.c:1147 (unp_mtx)
> 350 1790 9 198 0 0 kern/uipc_usrreq.c:1146 (unp_mtx)
> 780 39405928 320038 123 0 0 kern/uipc_usrreq.c:618 (unp_mtx)
> 18 140 9 15 1 0 kern/uipc_usrreq.c:235 (unp_global_mtx)
> 70 717 18 39 1 3 kern/uipc_usrreq.c:800 (unp_global_mtx)
> 528 2444 9 271 1 1 kern/uipc_usrreq.c:1089 (unp_global_mtx)
> 158 616 9 68 2 2 kern/uipc_usrreq.c:476 (unp_mtx)
> 794 175382857 2880084 60 2 7686 kern/uipc_usrreq.c:574 (unp_mtx)
> 4 25 9 2 3 2 kern/uipc_usrreq.c:422 (unp_global_mtx)
> 186 874 9 97 3 3 kern/uipc_usrreq.c:472 (unp_global_mtx)
> 768 33783759 320038 105 7442 0 kern/uipc_usrreq.c:696 (unp_mtx)
> ...
> 465 913127 320045 2 43130 35046 kern/uipc_socket.c:1101 (so_snd)
> 483 2453927 628737 3 44768 46177 kern/kern_sig.c:1002 (process lock)
> 767 124298544 2880082 43 70037 59994 kern/uipc_usrreq.c:581 (so_snd)
> 794 45176699 320038 141 83252 72140 kern/uipc_usrreq.c:617 (unp_global_mtx)
> 549 9858281 3200210 3 579269 712643 kern/kern_resource.c:1172 (sleep mtxpool)
> 554 17122245 631715 27 641888 268243 kern/kern_descrip.c:420 (filedesc structure)
> 388 3009912 631753 4 653540 260590 kern/kern_descrip.c:368 (filedesc structure)
> 642 49626755 3681446 13 1642954 682669 kern/kern_descrip.c:1988 (filedesc structure)
> 530 13802687 3681446 3 1663244 616899 kern/kern_descrip.c:1967 (filedesc structure)
> 477 23472709 2810986 8 5671248 1900047 kern/kern_synch.c:220 (process lock)
>
> The top 10 heavily contended mutexes are very different (but note the
> number of mutex acquisitions, column 3, is about the same).
>
> There is not much contention on unp_global_mtx any longer, but there
> is a lot more on some of the other mutexes, especially the process
> lock via msleep(). Off-hand I don't know what is the cause of this
> bottleneck (note: libthr is used as threading library and libpthread
> is not ported to sparc64).
>
> Also, a lot of the contention that used to be on the unp lock seems to
> have fallen through onto contending *two* of the filedesc locks (all
> about 1.6 million contentions). This may also help to explain the
> performance drop.
I profiled the wait channels in msleep and found:
Before:
1 getblk
3 wait
1113 sigwait
2228 umtxq_busy
4505 fdesc
9405 umtx
14264 sbwait
-----
31515 total
After:
1 getblk
1 wait
1413 sigwait
2794 umtxq_busy
4638 fdesc
10779 umtx
11663 sbwait
-----
31287 total
i.e. the same number of msleep calls in both cases, but the patch
shuffles them from sbwait onto everything else.
Relative change:
sbwait -19%
umtx +14%
fdesc +3%
umtxq_busy +25%
sigwait +27%
This may help to understand why performance drops at high load when
unp locking is moved out of the way.
Kris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-current/attachments/20060507/dfb6ea43/attachment.pgp
More information about the freebsd-current
mailing list