Re: git: b19740f4ce7a - main - swap_pager: lock vnode in swapdev_strategy()

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Fri, 26 Nov 2021 23:26:17 UTC
On Fri, Nov 26, 2021 at 09:53:03PM +1100, Peter Jeremy wrote:
> On 2021-Nov-25 19:35:10 +0000, Konstantin Belousov <kib@FreeBSD.org> wrote:
> >    swap_pager: lock vnode in swapdev_strategy()
> >    
> >    VOP_STRATEGY() requires locked vnode.  Note that we lock the swap vnode
> >    while pages are busy, but this would only cause real LoR if pages belong
> >    to the swap vnode, which must not be the case for correct use.
> >    
> >    Reported and tested by: peterj
> 
> Thanks for those fixes.  Unfortunately, I've bumped into another edge
> case:  The system can panic during shutdown because it tries to swap
> in data after the network is shutdown.  For reasons I haven't tracked
> down, a "swapoff" can fail even though there should be more than
> enough RAM.  As an example:
> 
> Stopping cron.
> Waiting for PIDS: 1024.
> swapoff: /usr/obj/swapfile: Cannot allocate memory
> Stopping ntpd.
> Waiting for PIDS: 1012.
> Stopping tincd for: vpn
> Waiting for PIDS: 758.
> Stopping rtsold.
> Waiting for PIDS: 351.
> Stopping devd.
> Waiting for PIDS: 754.
> Writing entropy file: .
> Writing early boot entropy file: .
> .
> Terminated
> Nov 26 03:18:44 rock64 syslogd: exiting on signal 15
> Waiting (max 60 seconds) for system process `vnlru' to stop... done
> Waiting (max 60 seconds) for system process `syncer' to stop... 
> Syncing disks, vnodes remaining... 0 0 0 done
> Waiting (max 60 seconds) for system thread `bufdaemon' to stop... done
> Waiting (max 60 seconds) for system thread `bufspacedaemon-0' to stop... done
> All buffers synced.
> No strategy for buffer at 0xffff0000c0cd3000
> vnode 0xffffa00006475e00: type VBAD
>     usecount 3, writecount 0, refcount 974016 seqc users 1
>     hold count flags ()
>     flags (VIRF_DOOMED|VV_VMSIZEVNLOCK)
>     lock type nfs: SHARED (count 1)
> swap_pager: I/O error - pagein failed; blkno 184,size 4096, error 45
> panic: VOP_STRATEGY failed bp=0xffff0000c0cd3000 vp=0
> cpuid = 0
> time = 1637857131
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> vpanic() at vpanic+0x178
> panic() at panic+0x44
> bufstrategy() at bufstrategy+0x80
> swapdev_strategy() at swapdev_strategy+0xcc
> swap_pager_getpages_locked() at swap_pager_getpages_locked+0x460
> swapoff_one() at swapoff_one+0x3dc
> swapoff_all() at swapoff_all+0x98
> bufshutdown() at bufshutdown+0x2ac
> kern_reboot() at kern_reboot+0x240
> sys_reboot() at sys_reboot+0x358
> do_el0_sync() at do_el0_sync+0x4a4
> handle_el0_sync() at handle_el0_sync+0x90
> --- exception, esr 0x56000000
> KDB: enter: panic
> [ thread pid 1 tid 100002 ]
> Stopped at      kdb_enter+0x48: undefined       f900c11f
> db>

Try this.

commit 9c62295373f728459c19138f5aa03d9cb8422554
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Sat Nov 27 01:22:27 2021 +0200

    swapoff_one(): only check free pages count manually turning swap off
    
    When swap is turned off due to system shutdown or reboot, ignore the
    check.  Problem is that the check is not accurate by any means, free
    page count can legitimately be low while system still able to page in
    everything from the swap.  Then, we turn swap off if swapping on
    real file or some non-standard geom provider, and typically panic
    when system appears to actually need to unavailable page.
    
    For syscall, it is better to be safe than sorry.
    
    Reported by:    peterj
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week

diff --git a/sys/vm/swap_pager.c b/sys/vm/swap_pager.c
index 4cfdb3fd2cc8..981a71b2c4b1 100644
--- a/sys/vm/swap_pager.c
+++ b/sys/vm/swap_pager.c
@@ -469,7 +469,8 @@ static bool	swp_pager_swblk_empty(struct swblk *sb, int start, int limit);
 static void	swp_pager_free_empty_swblk(vm_object_t, struct swblk *sb);
 static int	swapongeom(struct vnode *);
 static int	swaponvp(struct thread *, struct vnode *, u_long);
-static int	swapoff_one(struct swdevt *sp, struct ucred *cred);
+static int	swapoff_one(struct swdevt *sp, struct ucred *cred,
+		    bool swapoff_syscall);
 
 /*
  * Swap bitmap functions
@@ -2523,14 +2524,14 @@ sys_swapoff(struct thread *td, struct swapoff_args *uap)
 		error = EINVAL;
 		goto done;
 	}
-	error = swapoff_one(sp, td->td_ucred);
+	error = swapoff_one(sp, td->td_ucred, true);
 done:
 	sx_xunlock(&swdev_syscall_lock);
 	return (error);
 }
 
 static int
-swapoff_one(struct swdevt *sp, struct ucred *cred)
+swapoff_one(struct swdevt *sp, struct ucred *cred, bool swapoff_syscall)
 {
 	u_long nblks;
 #ifdef MAC
@@ -2552,8 +2553,16 @@ swapoff_one(struct swdevt *sp, struct ucred *cred)
 	 * available virtual memory in the system will fit the amount
 	 * of data we will have to page back in, plus an epsilon so
 	 * the system doesn't become critically low on swap space.
+	 * The vm_free_count() part does not account e.g. for clean
+	 * pages that can be immediately reclaimed without paging, so
+	 * this is very rough estimation.
+	 *
+	 * On the other hand, not turning swap off on swapoff_all()
+	 * means that we loose swap data when filesystems go away,
+	 * which is arguably worse.
 	 */
-	if (vm_free_count() + swap_pager_avail < nblks + nswap_lowat)
+	if (swapoff_syscall &&
+	    vm_free_count() + swap_pager_avail < nblks + nswap_lowat)
 		return (ENOMEM);
 
 	/*
@@ -2603,7 +2612,7 @@ swapoff_all(void)
 			devname = devtoname(sp->sw_vp->v_rdev);
 		else
 			devname = "[file]";
-		error = swapoff_one(sp, thread0.td_ucred);
+		error = swapoff_one(sp, thread0.td_ucred, false);
 		if (error != 0) {
 			printf("Cannot remove swap device %s (error=%d), "
 			    "skipping.\n", devname, error);