svn commit: r222466 - head/sbin/umount

Wed Jun 1 12:14:09 UTC 2011

On Tue, 31 May 2011, Rick Macklem wrote:

>> It is only an optimization. Any number of syncs are useless for
>> actually syncing the system, since sync(2) only does an async sync (it
>> returns without waiting for most writes to complete). As you pointed
>> out later in this thread, unmount(2) does a sync that works -- a sync
>> sync -- before doing the unmount proper. It does this irrespective of
>> the force flag:
>>
>> % if (((mp->mnt_flag & MNT_RDONLY) ||
>> % (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != 0)
>> % error = VFS_UNMOUNT(mp, flags);
>>
>> The force flag doesn't mean that i/o's are forced to complete. It
>> only means that open files are forced to be closed (and a few related
>> things). This can be seen here. We wait (potentially forever) for
>> any existing writes to complete. Only then do we look at the force
>> flag and bale out if the sync failed and the force flag is _not_ set.
>> Most i/o problems will cause hangs in this sync, and the force flag
>> won't help. But if we somehow get past this sync, and have i/o

The sync should actually return an error (after syncing everything
that doesn't hang).  Perhaps it actually does.  Then we have got past
the sync.  But for nfs, it will tend to hang just detecting the i/o
errors (need large timeouts which are not very different from hangs).

>> problems, then honoring the force flag and continuing gives even more
>> fragility, since we have almost passed the point of being able to give
>> up after an error (this point is typically very early in
>> VFS_UNMOUNT()).
>>
> Well, in vgonel(), which is called from vflush() and seems to be what
> gets rid of the vnode even when open for forced dismounts, there is:
>
>    /*
>     * Clear out any buffers associated with the vnode.
>     * If the flush fails, just toss the buffers.
>     */
>    - followed shortly by
>    if (vinvalbuf(vp, V_SAVE, 0, 0) != 0)
>             vinvalbuf(vp, 0, 0, 0);
>
> It seems to me that this code is willing to give up when it can't write
> dirty buffers, but???

That looks like a hack to me (collateral from vfs_bio.c 1.196).  In
4.4BSD-Lite, the corresponding code is in vclean() and it is just:

% 	/*
% 	 * Clean out any buffers associated with the vnode.
% 	 */
% 	if (flags & DOCLOSE)
% 		vinvalbuf(vp, V_SAVE, NOCRED, p, 0, 0);

In 4.4BSD, the buffers would have been tossed at the bwrite() level
before we got here.  Now bwrite() is more careful, and we are a bit
more careful here -- we actually check if vinvalbuf() succeeded and
clean up if it didn't.  There must be a place that tosses buffers
with unrecoverable write errors, and forced unmounts is a good place.
But vgonel() is a bad place, since it has a much wider scope than
forced unmounts.  I think vgonel() is normally called when a vnode
is recycled.  So we can have a vnode unhappily holding buffers with
unrecoverable write errors for several minutes, with syncs trying
to flush them every ~30 seconds, but then when the vnode is recycled
for some reason the above forcibly tosses its buffers with no further
error messages.  Similarly for recoverable write errors (maybe an
nfs server that is down for longer than the recycling time).

> To me, a forced dismount of an NFS mount that just wedges on an
> unresponsive server isn't very useful. The only case I can think

I agree.  My point is that no one tried hard to make forced unmounts
very forceful, so making it work is even harder than might first appear.
The problem is related to making removed removable devices work properly.

> of where a forced dismount that isn't willing to throw away dirty
> buffers might be useful would be when an app. (userland process)
> is for some reason unkillable and is sitting with an open NFS file.
> Is this a problem in practice? (I do know that a mount stuck on an
> unresponsive server is a real problem. There have been several PRs,
> the most recent being kern/157365.)

The server can easily crash while the client has dirty buffers.

> I brought up the concept of a "hard forced dismount" at the developer
> summit a year ago, but no one seemed to see a difference between that
> and what they believed a forced dismount already did. After that, I
> thought about it a little and couldn't see much use for a "soft forced
> dismount that fails instead of throwing away dirty buffers, but does
> close down open files".

It's useful to be able to forcibly close all open files, even ones open
for writing or with dirty buffers (there can also be dirty buffers for
closed files).  I think umount -f flag gives this, and I wouldn't want
to to also discard dirty buffers on devices where the buffers are
hopefully-transiently unwritable.  Discarding of unwritable buffers
really has little to do with unmount -- if there is 1 with an i/o
error that I know is unrecoverable, then I would like to discard it
immediately instead of letting the system sit there and try to flush
it quite often.

> - For example, suppose a userland process is writing a file through
>  the stdio library:
>  A - fopen()s a file for writing
>  B - does fwrite()s of small amounts of data
>  C - does an fflush()
>
> If the open is closed by a forced dismount before C, the written data
> will be lost, since it isn't even in the kernel yet. If the file is on
> a responsive NFS server, it will be written to the server shortly after C.
> So, there seems to be a narrow window around C where the data will be lost
> for a forced dismount that throws away dirty buffers vs one that doesn't
> "but only if the server is responsive".
>
> In other words, "Is there much difference between ripping the open files
> out from under a app. before it does a write syscall vs throwing the data
> away when it's in the kernel buffer but not yet written to an NFS server?"

Not much, sinc a careful application should do an fsync() as well as the
above and not depend on the data reaching the file system until fsync()
returns successfully, which it must not do if the kernel discarded the
data.  [But the tossing in vgonel() seems to break this: most write()s
are async so they can't return an error.  Errors live in the buffers.
But when buffers are tossed, don't we lose all trace of the errors?
Then a subsequent fsync won't see any errors and will return success.]

> Anyhow, the way I now have the NFS clients, they do a "hard forced dismount"
> where they can throw away dirty buffers not yet written to the server.
> Recent patches to nfs_sync() make it return without attempting writes for
> the forced dismount case and the vgonel() { see above code snippet } throws
> away the data when the first vinvalbuf(.., V_SAVE,.) fails.

Did you have to change VFS_SYNC() to pass the forced-dismount flag?  I
don't like that, but see the problem: VFS_SYNC(XXX_WAIT) should normally
just try to sync everything, but return an error if this is impossible,
but for an nfs server it would take too long to determine that it is
impossible.

Bruce