4.8 ffs_dirpref problem

Fri Nov 28 03:07:30 PST 2003

On 24 Nov, Ken Marx wrote:
> 
> 
> Don Lewis wrote:

>> Index: sys/kern/vfs_bio.c
>> ===================================================================
>> RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v
>> retrieving revision 1.242.2.21
>> diff -u -r1.242.2.21 vfs_bio.c
>> --- sys/kern/vfs_bio.c	9 Aug 2003 16:21:19 -0000	1.242.2.21
>> +++ sys/kern/vfs_bio.c	18 Nov 2003 02:10:55 -0000
>> @@ -140,6 +140,7 @@
>>  	&bufreusecnt, 0, "");
>>  
>>  static int bufhashmask;
>> +static int bufhashshift;
>>  static LIST_HEAD(bufhashhdr, buf) *bufhashtbl, invalhash;
>>  struct bqueues bufqueues[BUFFER_QUEUES] = { { 0 } };
>>  char *buf_wmesg = BUF_WMESG;
>> @@ -160,7 +161,20 @@
>>  struct bufhashhdr *
>>  bufhash(struct vnode *vnp, daddr_t bn)
>>  {
>> -	return(&bufhashtbl[(((uintptr_t)(vnp) >> 7) + (int)bn) & bufhashmask]);
>> +	u_int64_t hashkey64;
>> +	int hashkey; 
>> +	
>> +	/*
>> +	 * Fibonacci hash, see Knuth's
>> +	 * _Art of Computer Programming, Volume 3 / Sorting and Searching_
>> +	 *
>> +         * We reduce the argument to 32 bits before doing the hash to
>> +	 * avoid the need for a slow 64x64 multiply on 32 bit platforms.
>> +	 */
>> +	hashkey64 = (u_int64_t)(uintptr_t)vnp + (u_int64_t)bn;
>> +	hashkey = (((u_int32_t)(hashkey64 + (hashkey64 >> 32)) * 2654435769u) >>
>> +	    bufhashshift) & bufhashmask;
>> +	return(&bufhashtbl[hashkey]);
>>  }
>>  
>>  /*
>> @@ -319,8 +333,9 @@
>>  bufhashinit(caddr_t vaddr)
>>  {
>>  	/* first, make a null hash table */
>> +	bufhashshift = 29;
>>  	for (bufhashmask = 8; bufhashmask < nbuf / 4; bufhashmask <<= 1)
>> -		;
>> +		bufhashshift--;
>>  	bufhashtbl = (void *)vaddr;
>>  	vaddr = vaddr + sizeof(*bufhashtbl) * bufhashmask;
>>  	--bufhashmask;
>> 
>> 
> 
> Well, I'm mildly beflummoxed - I tried to compare hashtable preformance
> between all three known versions of the hashing - legacy power of 2,
> the Vicor ^= hash, and Don's fibonacci hash.
> 
> Running with 
> 
>        minifree = max( 1, avgifree / 4 );
>        minbfree = max( 1, avgbfree );
> 
> all perform about the same, with no performance problems all
> the way up to 100% disk capacity (didn't test into reserved space).
> 
> Looking at instrumentation to show freq and avg depth of the
> hash buckets, everything seems very calm (mainly because
> we're not hitting the linear searching very often, I'd presume).
> 
> I can't explain why I seemlingly got performance problems
> with similar (identical) minbfree code previously.
> 
> So, out of spite, I went back to 
> 
> 	minbfree = max( 1, avgbfree/4 );
> 
> This does hit the hashtable harder for the legacy version
> and not so much for either new flavor. Here are a few
> samplings of calling my dump routine from the debugger.
> "avgdepth" really means 'search depth' since we use
> the depth reached after finding a bp in gbincore.
> 
> The line below such as,
> 
> 	 0: avgdepth[1] cnt=801
> 
> means that 801 of the hashtable buckets had an avg search
> depth of 1 at the time the debug routine was called.
> The 'N:' prefix means the N-th unique non-zero such value.
> So large cnt's for small []'d depth values means an efficient hash.
> 
> I've edited out the details as much as possible.
> 
> LEGACY:
> --------
> Nov 24 13:34:54 oos0b /kernel: bh[442/0x1ba]: freq=2706110, avgdepth = 154
> ...
> Nov 24 13:34:54 oos0b /kernel: 0: avgdepth[1] cnt=1015
> Nov 24 13:34:54 oos0b /kernel: 1: avgdepth[2] cnt=7
> Nov 24 13:34:54 oos0b /kernel: 2: avgdepth[154] cnt=1	<- !!
> Nov 24 13:34:54 oos0b /kernel: 3: avgdepth[3] cnt=1
>  -----------
> 
> Nov 24 13:36:49 oos0b /kernel: bh[442/0x1ba]: freq=3416953, avgdepth = 141
> ...
> Nov 24 13:36:49 oos0b /kernel: 0: avgdepth[1] cnt=1017
> Nov 24 13:36:49 oos0b /kernel: 1: avgdepth[141] cnt=1
> Nov 24 13:36:49 oos0b /kernel: 2: avgdepth[2] cnt=6
> 
> VICOR x-or hashtable:
> ---------------------
> Nov 24 13:07:24 oos0b /kernel: 0: avgdepth[1] cnt=762
> Nov 24 13:07:24 oos0b /kernel: 1: avgdepth[2] cnt=259
> Nov 24 13:07:24 oos0b /kernel: 2: avgdepth[3] cnt=3
>  -----------
> 
> Nov 24 13:08:07 oos0b /kernel: 0: avgdepth[1] cnt=744
> Nov 24 13:08:07 oos0b /kernel: 1: avgdepth[2] cnt=275
> Nov 24 13:08:07 oos0b /kernel: 2: avgdepth[3] cnt=5
> 
> FIBONACCI:
> ----------
> Nov 24 11:56:50 oos0b /kernel: 0: avgdepth[1] cnt=811
> Nov 24 11:56:50 oos0b /kernel: 1: avgdepth[3] cnt=88
> Nov 24 11:56:50 oos0b /kernel: 2: avgdepth[2] cnt=124
> Nov 24 11:56:50 oos0b /kernel: 3: avgdepth[0] cnt=1
>  -----------
> 
> Nov 24 11:57:48 oos0b /kernel: 0: avgdepth[1] cnt=801
> Nov 24 11:57:48 oos0b /kernel: 1: avgdepth[3] cnt=93
> Nov 24 11:57:48 oos0b /kernel: 2: avgdepth[2] cnt=130
> 
> So, while this is far from analytically eshaustive,
> it almost appears the fibonacci hash has more entries
> of depth 3, while the Vicor one has more at depth 2.
> 
> I'm happy to run more tests if you have ideas. I'm also fine
> to cut bait and go with whatever you decide. It *seems* like
> putting the fibonacci hash is prudent since the current hash
> has been observed to be expensive. I had trouble proving this
> unequivocally though. So, perhaps Don's minbfree fix is sufficient
> after all. I'm tempted at this point to go with the 100% flavor.

I think we're running into one of the weaknesses in the Fibonacci hash.
There are a large number of hash entries for the cylinder group blocks,
which are located at offsets which are multiples of 89 * 2^10 in your
example, or something on the order of 2^16.  The effect of this is for
the cylinder group number to be hashed using the least significant bits
of the hash multiplier, which don't work as well for distributing the
hash values.  I tried some of Knuth's suggestions, and got better
results with the hash multiplier 0x9E376DB1u.  The most significant 16
bits of the multplier are the same as the original constant, and the
least significant bits act as a fraction in the desirable range of 1/3
to 3/7.  Please give this new hash multiplier a try.