umount -f implementation
Bruce Evans
brde at optusnet.com.au
Mon Jun 29 19:06:09 UTC 2009
On Mon, 29 Jun 2009, Rick Macklem wrote:
> On Mon, 29 Jun 2009, Attilio Rao wrote:
>>
>> While that should be real in principle (immediate shutdown of the fs
>> operation and unmounting of the partition) it is totally impossible to
>> have it completely unsleeping, so it can happen that also umount -f
>> sleeps / delays for some times (example: vflush).
>> Currently, umount -f is one of the most complicated thing to handle in
>> our VFS because it puts as requirement that vnodes can be reclaimed in
>> any moment, adding complexity and possibility for races.
>>
> Yes, agreed. And I like to leave that stuff to more clever chaps than I:-)
>
>> What's the fix for your problem?
>>
> Well, when I tested it I found that it got stuck in two places, both
> calls to VFS_SYNC(). The first was a
> sync();
> right at the beginning of umount.c.
> - All I did for that one is move it to after the code that handles
> option processing and change it to
> if ((fflag & MNT_FORCE) == 0)
> sync();
> so that it isn't done for the "-f" case. (I believe the sync(); call
> at the beginning of umount is only a performance optimization, so I
> don't think not doing it for "-f" should break anything.)
OK. This sync() is probably actually a performance pessimization, since
it syncs all file systems while the internal sync in umount(2) only
syncs the one being unmounted.
> - the second happened just before the VFS_UNMOUNT() call in the
> umount(2) system call. The code looks like:
> if (((mp->mnt_flag & MNT_RDONLY) ||
> (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) !=
> 0)
> - Although it was tempting to reverse the order of VFS_SYNC() and the
> test for MNT_FORCE, I thought that might have a negative impact on
> other file systems, since it avoided doing the VFS_SYNC(), so...
>
> - Instead, I just put a check for MNTK_UNMOUNTF at the beginning of
> nfs_sync(), so that it returns EBUSY for this case instead of getting
> stuck trying to flush().
OK. This sync is probably an optimization for correctness, since it
arranges to do as much as possible without forcing.
I checked ffs_mount() and found 2 large bugs, one related:
- in the only case that tends to cause problems, namely the non-readonly
case, ffs_unmount() does a suspend which calls VOP_SYNC(..., MNT_SUSPEND),
but after errors from this sync it checks neither MNT_FORCE nor
error == ENXIO. I think the usual effect is the same as if the top-level
unmount() didn't check MNT_FORCE after suspend failure: in problematic
cases, we have an unrecoverable write, due to the device going away or
just an i/o error, and this error has probably already occured (only
in rare cases will it be triggered by unmount). Then MNT_FORCE is
essentially unused, and the ENXIO hack is not reached either, and
unmount usually fails.
- the UFS_EXTATTR case destroys infrastructure before committing to
succeeding. It used to be just broken on failure. Now it uses a
hack to recover (call a constructor) on failure, but the recovery
code is not reached in the usual case of failure -- when the
suspension fails.
ffs_unmount() still seems to have no support for handling unrecoverable
write errors (short of you converting them to ENXIO by removing the
media). MNT_FORCE only meant FORCECLOSE for it. I see that old nfs
was similar, and you are now making MNT_FORCE stronger. I thought
that umount(8)'s man page documented -f being strongly forceful, but
checking it shows that it only documents a weak force like that of
FORCECLOSE (but not precisely enough). Perhaps a different flag should
be used for strong forcefulness. Weak forcefulness is still useful and
used for mount -f -u -- for remount we would never want errors in the
file system itself ignored. This use also shows that the generic
FORCECLOSE code must not ignore errors.
> Assuming that I'm right w.r.t. the "sync();" at the beginning of umount.c,
> it simply ensures that the umount command thread makes it as far as
> VFS_UNMOUNT()->nfs_unmount(), so that the forced dismount proceeds. It
> kills RPCs in progress before doing the vflush() and, since no new RPCs
> can be done once MNTK_UNMOUNTF is set (it is checked at the beginning of
> a request), the vflush() won't actually flush anything to the server.
>
> As such, "umount -f" is pretty well guaranteed to throw away the dirty
> buffers. I believe this is correct behaviour,
This is how I think ffs_mount() should work too -- It should be
responsible for throwing away the dirty buffers, while nothing else
should discard them. Now the discarding seems to be done by falling
through to g_vfs_done(), except g_vfs_done() is not reached in most
cases (see above). I don't like this -- at best we lose the opportunity
to print ffs-specific details about what was discarded. Falling through
only works for ENXIO anyway -- on other errors we should discard the
unwritable buffers in an fs-specific manner so as to write as many of
the writable buffers as possible.
> but it would mean that a
> user/sysadmin that uses "umount -f" for cases where the server is still
> functioning, but slow, will lose data when they probably don't expect to.
A new flag would help for this.
Bruce
More information about the freebsd-current
mailing list