4.8 ffs_dirpref problem

Wed Oct 22 18:58:50 PDT 2003

Thanks very much for the reply. (Sorry about the cc list - I'm
working with others on this, but will forward things on from now on.)

It seems to me the old source differs just slightly from the
new 'backstop'. Old dirpref has an additional check for

	if (fs->fs_cs(fs, cg).cs_ndir < minndir 

I don't know how relevant this is. I could hack things to
only do the backstop and see empircally what happens.
Open to suggestions.

To recap what we did/found today:

1. Soft updates on non-hacked ffs_dirpref() does seem to improve
   things significantly, but not quite as much as using 4.4 code.

2. Talking with others here: We're a bit scared to enable soft updates
   in production. These are 7x24 sites. If there's a crash, we're
   concerned that at the very least, downtime could increase due to fsck.
   And at worst, more data could be lost.

3. Strangely, I can't find ffs_dirpref in the kgmon call graph. I
   even made it a non-static function. Hm...  Don't quite grok this.

4. I haven't looked into where tunefs (et al) store their info (superblock(s)?).
   Presuming for the sake of argument that making the backstop
   code configurable is a reasonable approach, might we have room
   to store this as a file system attribute?

5. We discussed why others might not be seeing this elsewhere. The guess
   is that it *does* happen, but that typical file systems aren't 550Gb.
   Hence the cost of the linear fallback cycl group searches isn't as
   noticeable. (I can try to give rationale for our large fs, but anyway,
   it's what we have.)

   I suppose one approach (uh, hack) is to do the backstop code first
   under certain extreme conditions (such as huge number of cyl groups)?

6. Open to ideas of where to instrument ffs a bit more. E.g., counters
   in ffs_alloc() for which strategies it uses or some such? Conditional upon?

7. Are there other tunefs settings that might help? We tried changing
   avg files/dir (-s) from 64 to 4096 since we often have >> 64.
   Results were varried: Sometimes things went more quickly, but we
   still often saw the very sluggish behavior.

Again thanks. And thanks in advance for any further guidance.

regards,
k

Doug White wrote:
> Purge extensive cc:.
> 
> On Tue, 21 Oct 2003, Ken Marx wrote:
> 
> 
>>We have 560GB raids that were sometimes bogging down heavily
>>in our production systems. Under 4.8-RELEASE (recently upgrated from 4.4)
>>we find that when:
>>
>>	o the raid file system grows to over 85% capacity (with only
>>	  30% inode usage)
>>	o we create ~1500 or so 2-6kb files in a given dir
>>	o (note: soft updates NOT enabled)
> 
> 
> Interesting problems and analysis.
> 
> If I'm reading the diffs and source right, the old allocation algorithm
> exists at the end of the dirpref function, below the comment about the
> backstop.  It would be interesting to wrap the rest of the function in a
> tunable so you could easily short-circuit to the backstop.
> 
> I don't know if it could be done on a per-filesystem basis. You might just
> have to eat the old layout semantics for the entire system if we want to
> keep the cost of the tunable low.
> 

-- 
Ken Marx, kmarx at vicor-nb.com
Speaking candidly, I say that we intend to do the right thing and maintain our 
commitment to the impedance match.
		- http://www.bigshed.com/cgi-bin/speak.cgi