PERFORCE change 65074 for review

Tue Nov 16 16:01:46 PST 2004

John Baldwin wrote:

>On Monday 15 November 2004 06:41 pm, David Xu wrote:
>  
>
>>John Baldwin wrote:
>>    
>>
>>>On Sunday 14 November 2004 12:13 am, David Xu wrote:
>>>      
>>>
>>>>http://perforce.freebsd.org/chv.cgi?CH=65074
>>>>
>>>>Change 65074 by davidxu at davidxu_alona on 2004/11/14 05:12:40
>>>>
>>>>	1. Fix a race between signal and umtx_unlock. a waiter
>>>>	   may be resumed by signal and left or exited, heavily
>>>>	   loaded test causes kernel to crash.
>>>>	2. Use distributed queue locks instead of single giant
>>>>	   lock.
>>>>
>>>>Affected files ...
>>>>
>>>>.. //depot/projects/davidxu_ksedbg/src/sys/kern/kern_umtx.c#4 edit
>>>>
>>>>Differences ...
>>>>
>>>>==== //depot/projects/davidxu_ksedbg/src/sys/kern/kern_umtx.c#4 (text+ko)
>>>>====
>>>>
>>>>@@ -49,25 +49,48 @@
>>>>	pid_t		uq_pid;		/* Pid key component. */
>>>>};
>>>>
>>>>#define	UMTX_QUEUES	128
>>>>#define	UMTX_HASH(pid, umtx)						\
>>>>-    (((uintptr_t)pid + ((uintptr_t)umtx & ~65535)) % UMTX_QUEUES)
>>>>+    ((((uintptr_t)pid << 16) + ((uintptr_t)umtx & 65535)) % UMTX_QUEUES)
>>>>        
>>>>
>>>I'm curious why you changed the hash macro here?  Low order bits of
>>>pointers tend to be zero due to alignment, so I think this will result in
>>>fewer "useful" bits and more collisions and longer chains.
>>>      
>>>
>>FYI, I changed back to original hash code:
>>I simulate it under userland:
>>
>>printf("thread lock: %p hash=%d\n", &thread->lock,
>>           ((getpid() + ((unsigned)(&thread->lock) & ~0xFFFF))) % 128);
>>
>>get result:
>>
>>thread lock: 0x804e014 hash=39
>>thread lock: 0x8053014 hash=39
>>thread lock: 0x8056014 hash=39
>>thread lock: 0x805a014 hash=39
>>thread lock: 0x805d014 hash=39
>>thread lock: 0x8060014 hash=39
>>thread lock: 0x8063014 hash=39
>>thread lock: 0x8067014 hash=39
>>thread lock: 0x806a014 hash=39
>>thread lock: 0x806d014 hash=39
>>thread lock: 0x8070014 hash=39
>>
>>So my version put all locks on chain 0, the orignal version put all
>>locks on chain 39. :( Both seem bad algorithm.
>>Note it is my private version of libpthread using thr and umtx,
>>1:1 only.
>>    
>>
>
>Humm, try changing the hash to not mask off bits from the umtx pointer.  The 
>pid is useful because if locks are allocated at the same address in different 
>processes (e.g. if you run the same program N times) it will keep them from 
>all colliding, but if you mask off all the bits with only 128 queues it will 
>currently stick all locks from a process on the same queue.  Maybe use 
>something similar to the 4BSD sleep queue hash (still used in 
>subr_sleepqueue.c) that looks like:
>
>/*
> * Constants for the hash table of sleep queue chains.  These constants are
> * the same ones that 4BSD (and possibly earlier versions of BSD) used.
> * Basically, we ignore the lower 8 bits of the address since most wait
> * channel pointers are aligned and only look at the next 7 bits for the
> * hash.  SC_TABLESIZE must be a power of two for SC_MASK to work properly.
> */
>#define	SC_TABLESIZE	128			/* Must be power of 2. */
>#define	SC_MASK		(SC_TABLESIZE - 1)
>#define	SC_SHIFT	8
>#define	SC_HASH(wc)	(((uintptr_t)(wc) >> SC_SHIFT) & SC_MASK)
>#define	SC_LOOKUP(wc)	&sleepq_chains[SC_HASH(wc)]
>
>So maybe use something like:
>
>#define UMTX_HASH(pid, umtx)	\
>	(((uintptr_t)(umtx) >> 8 + (uintptr_t)pid) % UMTX_QUEUES)
>
>  
>
I t looks better, but I found with few locks, it still has two or more
locks on same chain.
I am thinking that we can use some code like this:

char *p = &(umtx_p);
unsigned k = 0;
k = ((k << 2) + k) + p[0];
k = ((k << 2) + k) + p[1];
k = ((k << 2) + k) + p[2];
k = ((k << 2) + k) + p[3];

The above code multiplicates k by 5 (prime number),  and add one
byte from umtx pointer.
I found all umtxes are now well distributed in hash table.
I only calculate the hash number once in umtx_lock or umtx_unlock,
not like current it repeatly calculates it in loop.

David