PERFORCE change 65074 for review
David Xu
davidxu at freebsd.org
Tue Nov 16 16:01:46 PST 2004
John Baldwin wrote:
>On Monday 15 November 2004 06:41 pm, David Xu wrote:
>
>
>>John Baldwin wrote:
>>
>>
>>>On Sunday 14 November 2004 12:13 am, David Xu wrote:
>>>
>>>
>>>>http://perforce.freebsd.org/chv.cgi?CH=65074
>>>>
>>>>Change 65074 by davidxu at davidxu_alona on 2004/11/14 05:12:40
>>>>
>>>> 1. Fix a race between signal and umtx_unlock. a waiter
>>>> may be resumed by signal and left or exited, heavily
>>>> loaded test causes kernel to crash.
>>>> 2. Use distributed queue locks instead of single giant
>>>> lock.
>>>>
>>>>Affected files ...
>>>>
>>>>.. //depot/projects/davidxu_ksedbg/src/sys/kern/kern_umtx.c#4 edit
>>>>
>>>>Differences ...
>>>>
>>>>==== //depot/projects/davidxu_ksedbg/src/sys/kern/kern_umtx.c#4 (text+ko)
>>>>====
>>>>
>>>>@@ -49,25 +49,48 @@
>>>> pid_t uq_pid; /* Pid key component. */
>>>>};
>>>>
>>>>#define UMTX_QUEUES 128
>>>>#define UMTX_HASH(pid, umtx) \
>>>>- (((uintptr_t)pid + ((uintptr_t)umtx & ~65535)) % UMTX_QUEUES)
>>>>+ ((((uintptr_t)pid << 16) + ((uintptr_t)umtx & 65535)) % UMTX_QUEUES)
>>>>
>>>>
>>>I'm curious why you changed the hash macro here? Low order bits of
>>>pointers tend to be zero due to alignment, so I think this will result in
>>>fewer "useful" bits and more collisions and longer chains.
>>>
>>>
>>FYI, I changed back to original hash code:
>>I simulate it under userland:
>>
>>printf("thread lock: %p hash=%d\n", &thread->lock,
>> ((getpid() + ((unsigned)(&thread->lock) & ~0xFFFF))) % 128);
>>
>>get result:
>>
>>thread lock: 0x804e014 hash=39
>>thread lock: 0x8053014 hash=39
>>thread lock: 0x8056014 hash=39
>>thread lock: 0x805a014 hash=39
>>thread lock: 0x805d014 hash=39
>>thread lock: 0x8060014 hash=39
>>thread lock: 0x8063014 hash=39
>>thread lock: 0x8067014 hash=39
>>thread lock: 0x806a014 hash=39
>>thread lock: 0x806d014 hash=39
>>thread lock: 0x8070014 hash=39
>>
>>So my version put all locks on chain 0, the orignal version put all
>>locks on chain 39. :( Both seem bad algorithm.
>>Note it is my private version of libpthread using thr and umtx,
>>1:1 only.
>>
>>
>
>Humm, try changing the hash to not mask off bits from the umtx pointer. The
>pid is useful because if locks are allocated at the same address in different
>processes (e.g. if you run the same program N times) it will keep them from
>all colliding, but if you mask off all the bits with only 128 queues it will
>currently stick all locks from a process on the same queue. Maybe use
>something similar to the 4BSD sleep queue hash (still used in
>subr_sleepqueue.c) that looks like:
>
>/*
> * Constants for the hash table of sleep queue chains. These constants are
> * the same ones that 4BSD (and possibly earlier versions of BSD) used.
> * Basically, we ignore the lower 8 bits of the address since most wait
> * channel pointers are aligned and only look at the next 7 bits for the
> * hash. SC_TABLESIZE must be a power of two for SC_MASK to work properly.
> */
>#define SC_TABLESIZE 128 /* Must be power of 2. */
>#define SC_MASK (SC_TABLESIZE - 1)
>#define SC_SHIFT 8
>#define SC_HASH(wc) (((uintptr_t)(wc) >> SC_SHIFT) & SC_MASK)
>#define SC_LOOKUP(wc) &sleepq_chains[SC_HASH(wc)]
>
>So maybe use something like:
>
>#define UMTX_HASH(pid, umtx) \
> (((uintptr_t)(umtx) >> 8 + (uintptr_t)pid) % UMTX_QUEUES)
>
>
>
I t looks better, but I found with few locks, it still has two or more
locks on same chain.
I am thinking that we can use some code like this:
char *p = &(umtx_p);
unsigned k = 0;
k = ((k << 2) + k) + p[0];
k = ((k << 2) + k) + p[1];
k = ((k << 2) + k) + p[2];
k = ((k << 2) + k) + p[3];
The above code multiplicates k by 5 (prime number), and add one
byte from umtx pointer.
I found all umtxes are now well distributed in hash table.
I only calculate the hash number once in umtx_lock or umtx_unlock,
not like current it repeatly calculates it in loop.
David
More information about the p4-projects
mailing list