cvs commit: src/sys/kern vfs_subr.c src/sys/sys buf.h bufobj.h vnode.h

Wed Oct 27 02:36:55 PDT 2004

In message <200410270915.i9R9FRIa019936 at gw.catspoiler.org>, Don Lewis writes:

>> The syncers job is to push dirty buffers onto disk.  In the process it
>> will need to call back into whoever owns the buffer so they can do their
>> private housekeeping as necessary.
>> 
>> So the syncer doesn't deal with timestamps, the filesystems do.
>
>One of the things that is on my list of things to do is to handle
>timestamp updates by putting them on the syncer worklist.

Ok, lets just make sure we understand each other here:  I talked
about "the syncer" as a concept here, not our particular implementation
which I think sucks.

We probably need to start talking about "the generic syncer", which
is the code which sits at the bufobj level and does the overall
system-wide pushing of things, and the "syncer method" which is
what the generic syncer calls to push stuff on a particular bufobj
out to storage.

In the future the syncer method will not always end up in the vnode
code, it may end up GBDE to handle the keysector cache or geom_raid3
to handle the parity sector cache.

My plan was to give bufobj's a method (bop_sync ?) which the generic
syncer would call when it decided to push this particular bufobj around.

No, to our vnode syncer method: The things you mention are some of
my beefs with the "lemming-syncer" but I'm not sure there isn't an
even deeper problems.  But so far I havn't gotten so far as to do
something about it, for now I'm just trying to get the code sorted
into various structured piles.

Anything you can do to improve the syncer is very welcome with me,
but please do not feel constrained by the current organization of
it, I think we can find better ways to do it.

I am still not convinced that even having the generic syncer is a
good idea.  I think it would make a lot more sense to park a kthread
on each mountpoint to act as syncer for that mountpoint.  And then
_maybe_ have a bufobj method that says "reduce your footprint" in
some sort of way to keep the global balance.

Poul-Henning

>I stumbled across a bug a while back that causes files being written to
>be synced twice as often as they should.  The file gets synced once when
>the syncer comes across it it on the worklist, and it gets synced again
>when the syncer encounters the file system syncer vnode, which results
>in a call to ffs_sync(), which syncs all the vnodes that have pending
>inode timestamp changes.
>
>This second sync can result in a large burst of activity if there are a
>lot of pending timestamp changes.  This is quite noticable when
>unpacking a large tarball or doing something similar that writes to a
>lot of files.  There are large bursts of disk activity every 30 seconds
>and the machine gets noticeably sluggish.  If you monitor the length of
>the syncer worklist, it will vary in a sawtooth manner.
>
>I discussed this privately with mckusick a while back and he told me
>that the original intent was to not walk the vnode list in ffs_sync() in
>the MNT_LAZY case.  The problem was that timestamp updates could end up
>being deferred indefinitely if no buffers were dirtied.
>
>Skipping the vnode list traversal in ffs_sync() in the MNT_LAZY case
>would also be a nice optimization just in terms of CPU time because this
>list can be quite long.
>
>It also makes sense to sync the timestamps stored in the inode and the
>file data blocks at the same time because the block pointers stored in
>the inode may need to be updated.
>

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.