Re: git: b19740f4ce7a - main - swap_pager: lock vnode in swapdev_strategy()
Date: Mon, 29 Nov 2021 01:43:39 UTC
Hi,
I wonder creating /etc/rc.d/nfsswap that depends on nfsclient can give a smoother shutdown like swaplate depends on mountlate.
Hiro
On Sun, 28 Nov 2021 04:00:56 +0200
Konstantin Belousov <kostikbel@gmail.com> wrote:
> On Sun, Nov 28, 2021 at 12:22:46PM +1100, Peter Jeremy wrote:
> > On 2021-Nov-27 01:26:17 +0200, Konstantin Belousov <kostikbel@gmail.com> wrote:
> > >commit 9c62295373f728459c19138f5aa03d9cb8422554
> > >Author: Konstantin Belousov <kib@FreeBSD.org>
> > >Date: Sat Nov 27 01:22:27 2021 +0200
> > >
> > > swapoff_one(): only check free pages count manually turning swap off
> >
> > That didn't work but I don't think the underlying bug is related to
> > your recent work on swap_pager - digging back through my logs, I've
> > found another similar panic in August last year.
> It is definitely not caused by, it is just yet another issue.
>
> >
> > Nov 28 09:40:17 rock64 syslogd: exiting on signal 15
> > Waiting (max 60 seconds) for system process `vnlru' to stop... done
> > Waiting (max 60 seconds) for system process `syncer' to stop...
> > Syncing disks, vnodes remaining... 0 0 done
> > Waiting (max 60 seconds) for system thread `bufdaemon' to stop... done
> > Waiting (max 60 seconds) for system thread `bufspacedaemon-0' to stop... done
> > All buffers synced.
> > No strategy for buffer at 0xffff0000bf8dc000
> > vnode 0xffffa00009024a80: type VBAD
> > usecount 2, writecount 0, refcount 33263 seqc users 1
> > hold count flags ()
> > flags (VIRF_DOOMED|VV_VMSIZEVNLOCK)
> > lock type nfs: SHARED (count 1)
> > swap_pager: I/O error - pagein failed; blkno 241400,size 4096, error 45
> > panic: VOP_STRATEGY failed bp=0xffff0000bf8dc000 vp=0
> > cpuid = 0
> > time = 1638052821
> > KDB: stack backtrace:
> > db_trace_self() at db_trace_self
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> > vpanic() at vpanic+0x178
> > panic() at panic+0x44
> > bufstrategy() at bufstrategy+0x80
> > swapdev_strategy() at swapdev_strategy+0xcc
> > swap_pager_getpages_locked() at swap_pager_getpages_locked+0x460
> > swapoff_one() at swapoff_one+0x3e4
> > swapoff_all() at swapoff_all+0x9c
> > bufshutdown() at bufshutdown+0x2ac
> > kern_reboot() at kern_reboot+0x240
> > sys_reboot() at sys_reboot+0x358
> > do_el0_sync() at do_el0_sync+0x4a4
> > handle_el0_sync() at handle_el0_sync+0x9c
> > --- exception, esr 0x56000000
> > KDB: enter: panic
> > [ thread pid 1 tid 100002 ]
> > Stopped at kdb_enter+0x48: undefined f900c11f
> > db>
> >
> > This is the same traceback as my previous mail. Looking at the code
> > path, the test whether there's enough RAM to swap in all the data
> > passes in both cases: If swapoff_one() returned ENOMEM then
> > swapoff_all() would report a "Cannot remove swap device" error and
> > keep going (not bother to actually remove the swap device) - and
> > that's not happening.
> >
> > I think the important message is "No strategy for buffer at 0x..."
> > which comes from vop_nostrategy() and causes bufstrategy() to panic:
> >
> > swapdev_strategy()
> > => bstrategy()
> > => BO_STRATEGY()
> > => bufstrategy()
> > => VOP_STRATEGY()
> > => VOP_STRATEGY_APV()
> > => vop_nostrategy()
> > => bufdone() => swp_pager_async_iodone()
> >
> > Presumably, stopping the network means there's no longer any way for
> > swap operations to complete so the swap device has become associated
> > with default_vnodeops, (though I haven't dug into the actual code
> > path that does that).
> >
> > Moving up a level, does it really matter if swapoff_one() is skipped?
> > If it actually returned an error (eg if the free memory test failed),
> > then that's what would happen. By this point in the shutdown, there's
> > no userland left (which makes me wonder why there's anything left in
> > swap in any case) and only the final cleanups remain before the kernel
> > shuts down. What's really needed is a way to detect that the relevant
> > swap I/O provider has gone away and return to swapoff_all() without
> > panicing.
>
> I think the intent there is to ensure that the system is in as much steady
> state as possible. I can easily imagine some HBA doing weird things if
> some io is in flight when the reset comes in. So we want to ensure that
> no io occurs.
>
> The cause for your panic is not the network interface down state (in fact,
> I think that interface state up), but as you correctly analyzed, the
> call to vop_nostrategy(). It is there in stack because underlying filesystem
> was unmounted, and the swap vnode was reclaimed, replacing NFS vop vector
> with deadfs vop vector, which inherits from the default vop.
>
> The bit that I do not understand from your report, is why swapoff_all() did
> not occured earlier. Your earlier report, where spurious ENOMEM came out
> from swapoff(2) syscall, and which I fixed by the previous patch, means
> that there was an attempt to swapoff.
>
> Still, the solution is, IMO, to swapoff before unmount. We do not need
> swapin for flushing buffers.
>
> Please try this combined patch.
>
> diff --git a/sys/kern/vfs_bio.c b/sys/kern/vfs_bio.c
> index 4b746a269171..201afeec9311 100644
> --- a/sys/kern/vfs_bio.c
> +++ b/sys/kern/vfs_bio.c
> @@ -1452,16 +1452,21 @@ bufshutdown(int show_busybufs)
> */
> printf("Giving up on %d buffers\n", nbusy);
> DELAY(5000000); /* 5 seconds */
> + swapoff_all();
> } else {
> if (!first_buf_printf)
> printf("Final sync complete\n");
> +
> /*
> - * Unmount filesystems
> + * Unmount filesystems. Swapoff before unmount,
> + * because swap on file is unoperational after unmount
> + * of underlying filesystem.
> */
> - if (!KERNEL_PANICKED())
> + if (!KERNEL_PANICKED()) {
> + swapoff_all();
> vfs_unmountall();
> + }
> }
> - swapoff_all();
> DELAY(100000); /* wait for console output to finish */
> }
>
> diff --git a/sys/vm/swap_pager.c b/sys/vm/swap_pager.c
> index 4cfdb3fd2cc8..981a71b2c4b1 100644
> --- a/sys/vm/swap_pager.c
> +++ b/sys/vm/swap_pager.c
> @@ -469,7 +469,8 @@ static bool swp_pager_swblk_empty(struct swblk *sb, int start, int
> limit); static void swp_pager_free_empty_swblk(vm_object_t, struct swblk *sb);
> static int swapongeom(struct vnode *);
> static int swaponvp(struct thread *, struct vnode *, u_long);
> -static int swapoff_one(struct swdevt *sp, struct ucred *cred);
> +static int swapoff_one(struct swdevt *sp, struct ucred *cred,
> + bool swapoff_syscall);
>
> /*
> * Swap bitmap functions
> @@ -2523,14 +2524,14 @@ sys_swapoff(struct thread *td, struct swapoff_args *uap)
> error = EINVAL;
> goto done;
> }
> - error = swapoff_one(sp, td->td_ucred);
> + error = swapoff_one(sp, td->td_ucred, true);
> done:
> sx_xunlock(&swdev_syscall_lock);
> return (error);
> }
>
> static int
> -swapoff_one(struct swdevt *sp, struct ucred *cred)
> +swapoff_one(struct swdevt *sp, struct ucred *cred, bool swapoff_syscall)
> {
> u_long nblks;
> #ifdef MAC
> @@ -2552,8 +2553,16 @@ swapoff_one(struct swdevt *sp, struct ucred *cred)
> * available virtual memory in the system will fit the amount
> * of data we will have to page back in, plus an epsilon so
> * the system doesn't become critically low on swap space.
> + * The vm_free_count() part does not account e.g. for clean
> + * pages that can be immediately reclaimed without paging, so
> + * this is very rough estimation.
> + *
> + * On the other hand, not turning swap off on swapoff_all()
> + * means that we loose swap data when filesystems go away,
> + * which is arguably worse.
> */
> - if (vm_free_count() + swap_pager_avail < nblks + nswap_lowat)
> + if (swapoff_syscall &&
> + vm_free_count() + swap_pager_avail < nblks + nswap_lowat)
> return (ENOMEM);
>
> /*
> @@ -2603,7 +2612,7 @@ swapoff_all(void)
> devname = devtoname(sp->sw_vp->v_rdev);
> else
> devname = "[file]";
> - error = swapoff_one(sp, thread0.td_ucred);
> + error = swapoff_one(sp, thread0.td_ucred, false);
> if (error != 0) {
> printf("Cannot remove swap device %s (error=%d), "
> "skipping.\n", devname, error);
>