Need to force sync(2) before umounting UFS1 filesystems?

Fri Sep 30 13:31:59 UTC 2011

2011/9/30 Kirk McKusick <mckusick at mckusick.com>:
>> Date: Thu, 29 Sep 2011 18:13:34 +0200
>> Subject: Re: Need to force sync(2) before umounting UFS1 filesystems?
>> From: Attilio Rao <attilio at freebsd.org>
>> To: Kirk McKusick <mckusick at mckusick.com>,
>>         Konstantin Belousov <kib at freebsd.org>
>> Cc: Garrett Cooper <yanegomi at gmail.com>, freebsd-fs at freebsd.org,
>>         Xin LI <delphij at freebsd.org>
>>
>> 2011/9/29 Kirk McKusick <mckusick at mckusick.com>:
>>
>> > I am definitely not in a rush on this, so by all means take some time
>> > to look it over. The EBUSY unmount has been in its current state
>> > for several years, so I am fine with taking a few weeks to sort out
>> > the correct solution. Indeed, I am glad that Garrett has volunteered
>> > to do some more serious testing.
>> >
>> > If this general approach is not correct, I can put a hook in for just
>> > UFS so that it can have its historic behavior. As you have noted, the
>> > SU code has a lot of activity that gets done under the protection of
>> > vfs_busy. So it may be the only filesystem for which draining the
>> > vfs_busy lock during unmount is needed.
>>
>> Honestly, my first thought was exactly that -- an option that was
>> forcing the unmount sleep if SU is compiled in the kernel.
>
> The above would solve the problem at hand (NanoBSD builds). But I
> believe that this issue can arise with any filesystem that has
> background behavior such as ZFS and NFS. It is just most evident
> with UFS using SU.
>
>> When you mention 'historical behaviour' you mean the behaviour UFS had
>> even prior the introduction of the 'complete' VFS layer or it was the
>> behaviour unmount(2) was expected to implemented since the beginning?
>
> Synchronous unmount has exisited since at least the UNIX versions
> that I used in the 1970's.
>
>> My guess is that recent SU improvement by you and Jeff may have lead
>> to higher vfs_busy() contention, thus making this behaviour just more
>> visible.
>
> You are correct. SU has a lot of background activity. And as memories
> have grown bigger, the backlog has been able to get bigger and hence
> more noticable. On a busy system I have measured over fourty calls to
> "sync; sleep 1; umount" before the filesystem would finally unmount
> successfully.
>
>> BTW, I'm afraid the forced unmount case may have a possible deadlock.
>>
>> thread1 is doing whatever codepath it wants and thread2 is doing
>> unmount (forced right now):
>>
>> thread1::vfs_busy()
>> thread2::lock coveredvnode
>> thread1::contests coveredvnode
>> thread2::sleep because of thread1 vfs_busy
>
> I agree that the above deadlock is possible. See suggested solution below.
>
>> I think this deadlock was actually possible even with the old code, it
>> was just a LOR between a vnode lock and mount lock.
>
> Yes, this is a long-standing issue.
>
>> I'm not sure if there was any invariant I discussed with Kostik in the
>> past, preventing one way or another I'm forgetting about, but it seems
>> a possible deadlock to me.
>>
>> If you see this issue I'll make a patch for it.
>>
>> Attilio
>
> Here is my proposed fix. It does the unroll originally found in the
> non-FORCE case before sleeping waiting for the vfs_busy to clear.
> Is it acceptable to hold the mount mutex while calling VOP_UNLOCK?
> If not, then it needs to be released before the unlock, reacquired
> afterwards, and the check to see if the sleep is needed redone.

I thought about this approach when sending the e-mail, but there is a
problem: you need to handle the MNTK_UNMOUNT flag checking and
subsequent setting after coveredvnode is held, otherwise at the first
looping you will just return EBUSY.

> Index: vfs_mount.c
> ===================================================================
> --- vfs_mount.c (revision 225881)
> +++ vfs_mount.c (working copy)
> @@ -1187,6 +1187,7 @@
>
>        mtx_assert(&Giant, MA_OWNED);
>
> +top:
>        if ((coveredvp = mp->mnt_vnodecovered) != NULL) {
>                mnt_gen_r = mp->mnt_gen;
>                VI_LOCK(coveredvp);
> @@ -1227,21 +1228,19 @@
>                mp->mnt_kern_flag |= MNTK_UNMOUNTF;
>        error = 0;
>        if (mp->mnt_lockref) {
> -               if ((flags & MNT_FORCE) == 0) {
> -                       mp->mnt_kern_flag &= ~(MNTK_UNMOUNT | MNTK_NOINSMNTQ |
> -                           MNTK_UNMOUNTF);
> -                       if (mp->mnt_kern_flag & MNTK_MWAIT) {
> -                               mp->mnt_kern_flag &= ~MNTK_MWAIT;
> -                               wakeup(mp);
> -                       }
> -                       MNT_IUNLOCK(mp);
> -                       if (coveredvp)
> -                               VOP_UNLOCK(coveredvp, 0);
> -                       return (EBUSY);
> +               mp->mnt_kern_flag &= ~(MNTK_UNMOUNT | MNTK_NOINSMNTQ |
> +                   MNTK_UNMOUNTF);
> +               if (mp->mnt_kern_flag & MNTK_MWAIT) {
> +                       mp->mnt_kern_flag &= ~MNTK_MWAIT;
> +                       wakeup(mp);
>                }
> +               if (coveredvp)
> +                       VOP_UNLOCK(coveredvp, 0);
>                mp->mnt_kern_flag |= MNTK_DRAINING;
>                error = msleep(&mp->mnt_lockref, MNT_MTX(mp), PVFS,
>                    "mount drain", 0);
> +               MNT_IUNLOCK(mp);
> +               goto top;

You can avoid the unlock by passing PVFS | PDROP.

Attilio

-- 
Peace can only be achieved by understanding - A. Einstein