Lockless uidinfo.

Pawel Jakub Dawidek pjd at FreeBSD.org
Sat Aug 18 05:44:55 PDT 2007


On Sat, Aug 18, 2007 at 02:00:56PM +0200, Pawel Jakub Dawidek wrote:
> Hi.
> 
> The patch below remove per-uidinfo locks:
> 
> 	http://people.freebsd.org/~pjd/patches/uidinfo_lockless.patch
> 
> With the patch uidinfo is handled using atomics only and no locks
> (except for the global hash lock, which is not really important, as it's
> not used in the fast paths).
> 
> I needed to change ui_sbsize from rlim_t (64bit) to long, because we
> don't have 64bit atomics on all archs, and because sbsize represents
> size in bytes, it can't go beyond 32bit on 32bit archs (PAE might be a
> bit of a problem).
> 
> I changed maxval argument in chgproccnt() from int to rlim_t, as this is
> what is passed to the function.
> 
> In simple ping-pong test on unix domain socket, uidinfo lock was highly
> contented:
> 
>    max        total   wait_total       count   avg wait_avg  cnt_hold  cnt_lock name
>   1508      3242859     96267052     1467476     2    65       374553	1024743	/usr/src/sys/kern/kern_resource.c:1339 (sleep mutex:sleep mtxpool)
> 
> The ping-pong program you can find here:
> 
> 	http://people.freebsd.org/~pjd/misc/unixpingpong.c
> 
> At the end we reduced uidinfo structure size by 8 bytes and gain no
> measurable performance improvements:)
> Yes, I wasn't able to measure anything interesting, unfortunately, but I
> still believe the patch should be committed, as I'm sure there are
> workloads that will see improvements - note that uidinfo structure is
> per-uid, so if there are thousands of processes running with the same
> uid, they all need to fight for this one lock.
> Not only contention is important, but also the fact that number of
> atomic operations in chgsbsize() and chgproccnt() functions was reduced
> from 2 to 1.
> 
> Ok, while writting this e-mail I came up with a better benchmark:
> 
> 	http://people.freebsd.org/~pjd/misc/unixpingpong2.c
> 
> This one doesn't do ping-pong between two processes, but within one
> proccess only. This way we eliminate cost of context switches. I was
> running 8 such processes (I tested it on a 8way machine) and here are
> the results:
> 
> x ./uidinfo1.txt
> + ./uidinfo2.txt
> +------------------------------------------------------------------------------+
> |x                                                                            +|
> |xx                                                                          ++|
> |xx                                                                          ++|
> |A|                                                                          |A|
> +------------------------------------------------------------------------------+
>     N           Min           Max        Median           Avg        Stddev
> x   5        402742        416417        406216      408269.2     5388.3485
> +   5       1561566       1575987       1568964       1569767     5853.1399
> Difference at 95.0% confidence
>         1.1615e+06 +/- 8204.54
>         284.493% +/- 2.00959%
>         (Student's t, pooled s = 5625.55)
> 
> As you can see this was just a matter of good benchmark - here we can see
> 284% performance improvement, yay:)

Kris asked me to verify just in case that the unit cost is not higher
and here are the results from running only one ping-pong process (so no
contention):

x ./uidinfo_up1.txt
+ ./uidinfo_up2.txt
+------------------------------------------------------------------------------+
|x            x  x  x    x                 +                +   + +           +|
|     |________A_M_______|                        |___________A_M__________|   |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5        415023        420059        418405      418016.6     1878.5011
+   5        423731        430984        428084      427735.8     2623.4883
Difference at 95.0% confidence
        9719.2 +/- 3327.59
        2.32508% +/- 0.796043%
        (Student's t, pooled s = 2281.61)

We can see 2.3% of improvement most likely because of atomic operations
reduction.

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20070818/8131a8c5/attachment.pgp


More information about the freebsd-arch mailing list