From nobody Mon Nov 29 01:43:39 2021 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 6D89018AD72B; Mon, 29 Nov 2021 01:46:15 +0000 (UTC) (envelope-from ota@j.email.ne.jp) Received: from mail09.asahi-net.or.jp (mail09.asahi-net.or.jp [202.224.55.49]) by mx1.freebsd.org (Postfix) with ESMTP id 4J2Sqq1Yljz4RtG; Mon, 29 Nov 2021 01:46:14 +0000 (UTC) (envelope-from ota@j.email.ne.jp) Received: from vmware12.advok.com (pool-96-225-64-148.nwrknj.fios.verizon.net [96.225.64.148]) (Authenticated sender: NR2Y-OOT) by mail09.asahi-net.or.jp (Postfix) with ESMTPSA id 666174B81F; Mon, 29 Nov 2021 10:46:05 +0900 (JST) Date: Sun, 28 Nov 2021 20:43:39 -0500 From: Yoshihiro Ota To: Konstantin Belousov , Peter Jeremy Cc: src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Subject: Re: git: b19740f4ce7a - main - swap_pager: lock vnode in swapdev_strategy() Message-Id: <20211128204339.58f06247fd2946ddc030d1b0@j.email.ne.jp> In-Reply-To: References: <202111251935.1APJZA1e094731@gitrepo.freebsd.org> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; i386-portbld-freebsd13.0) List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4J2Sqq1Yljz4RtG X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N Hi, I wonder creating /etc/rc.d/nfsswap that depends on nfsclient can give a smoother shutdown like swaplate depends on mountlate. Hiro On Sun, 28 Nov 2021 04:00:56 +0200 Konstantin Belousov wrote: > On Sun, Nov 28, 2021 at 12:22:46PM +1100, Peter Jeremy wrote: > > On 2021-Nov-27 01:26:17 +0200, Konstantin Belousov wrote: > > >commit 9c62295373f728459c19138f5aa03d9cb8422554 > > >Author: Konstantin Belousov > > >Date: Sat Nov 27 01:22:27 2021 +0200 > > > > > > swapoff_one(): only check free pages count manually turning swap off > > > > That didn't work but I don't think the underlying bug is related to > > your recent work on swap_pager - digging back through my logs, I've > > found another similar panic in August last year. > It is definitely not caused by, it is just yet another issue. > > > > > Nov 28 09:40:17 rock64 syslogd: exiting on signal 15 > > Waiting (max 60 seconds) for system process `vnlru' to stop... done > > Waiting (max 60 seconds) for system process `syncer' to stop... > > Syncing disks, vnodes remaining... 0 0 done > > Waiting (max 60 seconds) for system thread `bufdaemon' to stop... done > > Waiting (max 60 seconds) for system thread `bufspacedaemon-0' to stop... done > > All buffers synced. > > No strategy for buffer at 0xffff0000bf8dc000 > > vnode 0xffffa00009024a80: type VBAD > > usecount 2, writecount 0, refcount 33263 seqc users 1 > > hold count flags () > > flags (VIRF_DOOMED|VV_VMSIZEVNLOCK) > > lock type nfs: SHARED (count 1) > > swap_pager: I/O error - pagein failed; blkno 241400,size 4096, error 45 > > panic: VOP_STRATEGY failed bp=0xffff0000bf8dc000 vp=0 > > cpuid = 0 > > time = 1638052821 > > KDB: stack backtrace: > > db_trace_self() at db_trace_self > > db_trace_self_wrapper() at db_trace_self_wrapper+0x30 > > vpanic() at vpanic+0x178 > > panic() at panic+0x44 > > bufstrategy() at bufstrategy+0x80 > > swapdev_strategy() at swapdev_strategy+0xcc > > swap_pager_getpages_locked() at swap_pager_getpages_locked+0x460 > > swapoff_one() at swapoff_one+0x3e4 > > swapoff_all() at swapoff_all+0x9c > > bufshutdown() at bufshutdown+0x2ac > > kern_reboot() at kern_reboot+0x240 > > sys_reboot() at sys_reboot+0x358 > > do_el0_sync() at do_el0_sync+0x4a4 > > handle_el0_sync() at handle_el0_sync+0x9c > > --- exception, esr 0x56000000 > > KDB: enter: panic > > [ thread pid 1 tid 100002 ] > > Stopped at kdb_enter+0x48: undefined f900c11f > > db> > > > > This is the same traceback as my previous mail. Looking at the code > > path, the test whether there's enough RAM to swap in all the data > > passes in both cases: If swapoff_one() returned ENOMEM then > > swapoff_all() would report a "Cannot remove swap device" error and > > keep going (not bother to actually remove the swap device) - and > > that's not happening. > > > > I think the important message is "No strategy for buffer at 0x..." > > which comes from vop_nostrategy() and causes bufstrategy() to panic: > > > > swapdev_strategy() > > => bstrategy() > > => BO_STRATEGY() > > => bufstrategy() > > => VOP_STRATEGY() > > => VOP_STRATEGY_APV() > > => vop_nostrategy() > > => bufdone() => swp_pager_async_iodone() > > > > Presumably, stopping the network means there's no longer any way for > > swap operations to complete so the swap device has become associated > > with default_vnodeops, (though I haven't dug into the actual code > > path that does that). > > > > Moving up a level, does it really matter if swapoff_one() is skipped? > > If it actually returned an error (eg if the free memory test failed), > > then that's what would happen. By this point in the shutdown, there's > > no userland left (which makes me wonder why there's anything left in > > swap in any case) and only the final cleanups remain before the kernel > > shuts down. What's really needed is a way to detect that the relevant > > swap I/O provider has gone away and return to swapoff_all() without > > panicing. > > I think the intent there is to ensure that the system is in as much steady > state as possible. I can easily imagine some HBA doing weird things if > some io is in flight when the reset comes in. So we want to ensure that > no io occurs. > > The cause for your panic is not the network interface down state (in fact, > I think that interface state up), but as you correctly analyzed, the > call to vop_nostrategy(). It is there in stack because underlying filesystem > was unmounted, and the swap vnode was reclaimed, replacing NFS vop vector > with deadfs vop vector, which inherits from the default vop. > > The bit that I do not understand from your report, is why swapoff_all() did > not occured earlier. Your earlier report, where spurious ENOMEM came out > from swapoff(2) syscall, and which I fixed by the previous patch, means > that there was an attempt to swapoff. > > Still, the solution is, IMO, to swapoff before unmount. We do not need > swapin for flushing buffers. > > Please try this combined patch. > > diff --git a/sys/kern/vfs_bio.c b/sys/kern/vfs_bio.c > index 4b746a269171..201afeec9311 100644 > --- a/sys/kern/vfs_bio.c > +++ b/sys/kern/vfs_bio.c > @@ -1452,16 +1452,21 @@ bufshutdown(int show_busybufs) > */ > printf("Giving up on %d buffers\n", nbusy); > DELAY(5000000); /* 5 seconds */ > + swapoff_all(); > } else { > if (!first_buf_printf) > printf("Final sync complete\n"); > + > /* > - * Unmount filesystems > + * Unmount filesystems. Swapoff before unmount, > + * because swap on file is unoperational after unmount > + * of underlying filesystem. > */ > - if (!KERNEL_PANICKED()) > + if (!KERNEL_PANICKED()) { > + swapoff_all(); > vfs_unmountall(); > + } > } > - swapoff_all(); > DELAY(100000); /* wait for console output to finish */ > } > > diff --git a/sys/vm/swap_pager.c b/sys/vm/swap_pager.c > index 4cfdb3fd2cc8..981a71b2c4b1 100644 > --- a/sys/vm/swap_pager.c > +++ b/sys/vm/swap_pager.c > @@ -469,7 +469,8 @@ static bool swp_pager_swblk_empty(struct swblk *sb, int start, int > limit); static void swp_pager_free_empty_swblk(vm_object_t, struct swblk *sb); > static int swapongeom(struct vnode *); > static int swaponvp(struct thread *, struct vnode *, u_long); > -static int swapoff_one(struct swdevt *sp, struct ucred *cred); > +static int swapoff_one(struct swdevt *sp, struct ucred *cred, > + bool swapoff_syscall); > > /* > * Swap bitmap functions > @@ -2523,14 +2524,14 @@ sys_swapoff(struct thread *td, struct swapoff_args *uap) > error = EINVAL; > goto done; > } > - error = swapoff_one(sp, td->td_ucred); > + error = swapoff_one(sp, td->td_ucred, true); > done: > sx_xunlock(&swdev_syscall_lock); > return (error); > } > > static int > -swapoff_one(struct swdevt *sp, struct ucred *cred) > +swapoff_one(struct swdevt *sp, struct ucred *cred, bool swapoff_syscall) > { > u_long nblks; > #ifdef MAC > @@ -2552,8 +2553,16 @@ swapoff_one(struct swdevt *sp, struct ucred *cred) > * available virtual memory in the system will fit the amount > * of data we will have to page back in, plus an epsilon so > * the system doesn't become critically low on swap space. > + * The vm_free_count() part does not account e.g. for clean > + * pages that can be immediately reclaimed without paging, so > + * this is very rough estimation. > + * > + * On the other hand, not turning swap off on swapoff_all() > + * means that we loose swap data when filesystems go away, > + * which is arguably worse. > */ > - if (vm_free_count() + swap_pager_avail < nblks + nswap_lowat) > + if (swapoff_syscall && > + vm_free_count() + swap_pager_avail < nblks + nswap_lowat) > return (ENOMEM); > > /* > @@ -2603,7 +2612,7 @@ swapoff_all(void) > devname = devtoname(sp->sw_vp->v_rdev); > else > devname = "[file]"; > - error = swapoff_one(sp, thread0.td_ucred); > + error = swapoff_one(sp, thread0.td_ucred, false); > if (error != 0) { > printf("Cannot remove swap device %s (error=%d), " > "skipping.\n", devname, error); >