4.8 ffs_dirpref problem

Ken Marx kmarx at vicor.com
Tue Oct 28 18:38:03 PST 2003



Kirk McKusick wrote:
>>Date: Thu, 23 Oct 2003 17:58:54 -0700
>>From: Ken Marx <kmarx at vicor.com>
>>To: Kirk McKusick <mckusick at mckusick.com>
>>CC: Julian Elischer <julian at vicor.com>, cburrell at vicor.com, davep at vicor.com,
>>       Ken Marx <kmarx at vicor.com>, gluk at ptci.ru, jpl at vicor.com, jrh at vicor.com,
>>       julian at vicor-nb.com, VicPE at aol.com
>>Subject: Re: 4.8 ffs_dirpref problem
>>X-ASK-Info: Whitelist match
>>
>>Hi Kirk,
>>
>>I had a few minutes before heading out, so tried getting a list
>>of block numbers in the bufferhash bucket that seemed to have
>>lots of hits. The depth changes of course, but I caught it  at
>>one point at a depth of 600 or so:
>> 
>>/kernel: dumpbh( 250 )
>>/kernel: bp[1]: b_vp=0xcfa3d480, b_lblkno=52561, b_flags=0x20100020
>>/kernel: bp[2]: b_vp=0xcf3c5d00, b_lblkno=345047104, b_flags=0x200000a0
>>...
>>
>>For no good reason, I sorted by block number and looked at differences
>>between block number values. It varies a bit, but of 522 block numbers,
>>494 of them have a difference of 65536.
>>
>>Er, some duplicates also show up, but the b_flags values differ.
>>
>>I'm not cc'ing fs at freebsd on this just in case it's being seen
>>as getting out of control. Feel free to fold them back in.
>>
>>Thanks again,
>>k.
> 
> 
> I does look like the hash function is having some trouble.
> It has been completely revamped in 5.0, but is still using
> a "power-of-2" hashing scheme in 4.X. I highly recommend 
> trying a scheme with non-power-of-2 base. Perhaps something
> as simple as changing the hashing to use modulo rather than 
> logical & (e.g., in bufhash change from & bufhashmask to
> % bufhashmask).
> 
> 	Kirk McKusick
> 
> 

Hi,

Hope this isn't seen as spamming the list, but this should
be the last of it I hope.

I'll summarize findings briefly. More details at:
	http://www.bigshed.com/kernel/raid_full_problem

and/or you can find our patches for what we finally did at:
	http://www.bigshed.com/kernel/ffs_vfsbio.diff

We did re-newfs our raid as Kirk suggested. Stupidly,
our data file and some test results were lost in the
process (doh!). So we had to use a slightly different
datafile for re-testing. Still 1.5Gb of mixed files/dir sizes.

Anyway, it would appear that the new fs settings
(average file size=48k, average files per dir = 1500)
help some, but performance still suffers as the disk fills.

We have a sample 'fix' for the hashtable in vfs_bio.c
that uses all the blkno bits. It's in the diff link above.
Use as you see fit. However, it too doesn't really address
our symptoms significantly. Darn.
Bogging down to 1Mb/sec and > 90% system seen.

The only thing that really addressed our problem was going
back to the 4.4 dirpref logic. We added a sysctl OID to
support this on a system-wide basis. That's also in the
diff patch.

It would be nice if we could do this on a per fs basis
via fs.h's fs_flags or some such, but perhaps this is too
messy for future support.

We can live with system-wide 4.4 semantics if necessary,
as Doug White mentioned.

If any of this does get addressed in 4.8 code, please
let us (er, julian at vicor.com) know so we can clean up
our kernel tree.

Of course, any comments, suggestions, flames totally welcome.

Thanks again for everyone's patience and assistance.

regards,
k
-- 
Ken Marx, kmarx at vicor-nb.com
Ramp up the solution space!!
		- http://www.bigshed.com/cgi-bin/speak.cgi



More information about the freebsd-fs mailing list