smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore

Reply: Alexey Dokuchaev : "Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore"
Reply: Gleb Smirnoff : "Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: John Baldwin <jhb_at_FreeBSD.org>
Date: Mon, 13 Dec 2021 15:45:07 UTC
This weekend I upgraded my FreeBSD laptop and kicked off a poudriere build of
the packages I use.  My laptop kept "freezing" during the package builds however.
Initially due to messages in /var/log/messages I thought it was running out of
swap and killing the display server.  After poking it at off and on over the
weekend I finally narrowed it down to building the devel/apr1 port, and built
it on the console (rather than X) and was greeted with the following panic:

panic: malloc(M_WAITOK) with sleeping prohibited
cpuid = 7
time = 1639374072
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe001e5b55b0
vpanic() at vpanic+0x17f/frame 0xfffffe001e5b5600
panic() at panic+0x43/frame 0xfffffe001e5b5660
malloc_dbg() at malloc_dbg+0xd4/frame 0xfffffe001e5b5680
malloc() at malloc+0x2d/frame 0xfffffe001e5b56c0
intel_atomic_state_alloc() at intel_atomic_state_alloc+0x20/frame 0xfffffe001e5b56e0
drm_client_modeset_commit_atomic() at drm_client_modeset_commit_atomic+0x30/frame 0xfffffe001e5b5750
drm_client_modeset_commit_force() at drm_client_modeset_commit_force+0x6f/frame 0xfffffe001e5b5790
drm_fb_helper_restore_fbdev_mode_unlocked() at drm_fb_helper_restore_fbdev_mode_unlocked+0x82/frame 0xfffffe001e5b57c0
vt_kms_postswitch() at vt_kms_postswitch+0x18b/frame 0xfffffe001e5b57f0
vt_window_switch() at vt_window_switch+0x261/frame 0xfffffe001e5b5830
vtterm_cngrab() at vtterm_cngrab+0x4f/frame 0xfffffe001e5b5850
cngrab() at cngrab+0x26/frame 0xfffffe001e5b5870
vpanic() at vpanic+0xee/frame 0xfffffe001e5b58c0
panic() at panic+0x43/frame 0xfffffe001e5b5920
witness_checkorder() at witness_checkorder+0xd1c/frame 0xfffffe001e5b5ae0
__mtx_lock_flags() at __mtx_lock_flags+0x94/frame 0xfffffe001e5b5b30
prison_check_ip4() at prison_check_ip4+0x51/frame 0xfffffe001e5b5b60
in_pcblookup_hash_locked() at in_pcblookup_hash_locked+0x2b6/frame 0xfffffe001e5b5bc0
in_pcblookup_mbuf() at in_pcblookup_mbuf+0x84/frame 0xfffffe001e5b5c00
tcp_input_with_port() at tcp_input_with_port+0x635/frame 0xfffffe001e5b5d50
tcp_input() at tcp_input+0xb/frame 0xfffffe001e5b5d60
ip_input() at ip_input+0x25e/frame 0xfffffe001e5b5de0
swi_net() at swi_net+0x1a1/frame 0xfffffe001e5b5e60
ithread_loop() at ithread_loop+0x279/frame 0xfffffe001e5b5ef0
fork_exit() at fork_exit+0x80/frame 0xfffffe001e5b5f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe001e5b5f30
--- trap 0x61e8cb8b, rip = 0x8b48000000f890ff, rsp = 0x52ff38244c8d4800, rbp = 0x245c8948ccc35f20 ---

So there are two things here.  The root issue is that the devel/apr1 port
runs a configure test for TCP_NDELAY being inherited by accepted sockets.
This test panics because prison_check_ip4() tries to lock a prison mutex
to walk the IPs assigned to a jail, but the caller (in_pcblookup_hash()) has
done an smr_enter() which is a critical_enter():

(kgdb) p panicstr
$1 = 0xffffffff81ea90b0 <vpanic.buf> "acquiring blockable sleep lock with spinlock or critical section held (sleep mutex) jail mutex @ /usr/src/sys/netinet/in_jail.c:418"
(kgdb) frame 39
#39 0xffffffff80dbcf71 in prison_check_ip4 (cred=<optimized out>,
     ia=ia@entry=0xfffffe001e5b5b90) at /usr/src/sys/netinet/in_jail.c:418
418		mtx_lock(&pr->pr_mtx);
(kgdb) l
413		KASSERT(ia != NULL, ("%s: ia is NULL", __func__));
414	
415		pr = cred->cr_prison;
416		if (!(pr->pr_flags & PR_IP4))
417			return (0);
418		mtx_lock(&pr->pr_mtx);
419		if (!(pr->pr_flags & PR_IP4)) {
420			mtx_unlock(&pr->pr_mtx);
421			return (0);
422		}
(kgdb) up
#41 0xffffffff80dc5cb4 in in_pcblookup_hash (pcbinfo=0xfffffe0022db7748,
     faddr=..., fport=2166892021, laddr=..., lport=0,
     lookupflags=<optimized out>, numa_domain=56 '8', ifp=<optimized out>)
     at /usr/src/sys/netinet/in_pcb.c:2387
2387		inp = in_pcblookup_hash_locked(pcbinfo, faddr, fport, laddr, lport,
(kgdb) l
2382	    struct ifnet *ifp, uint8_t numa_domain)
2383	{
2384		struct inpcb *inp;
2385	
2386		smr_enter(pcbinfo->ipi_smr);
2387		inp = in_pcblookup_hash_locked(pcbinfo, faddr, fport, laddr, lport,
2388		    lookupflags & INPLOOKUP_WILDCARD, ifp, numa_domain);
2389		if (inp != NULL) {
2390			if (__predict_false(inp_smr_lock(inp,
2391			    (lookupflags & INPLOOKUP_LOCKMASK)) == false))

However, it was a bit harder to see this originally as the 915kms driver
tries to do a malloc(M_WAITOK) from cn_grab() when entering DDB which
recursively panics (even a malloc(M_NOWAIT) from cn_grab() is probably a
bad idea).  When it panicked in X the result was that the screen just froze
on whatever it had most recently drawn and the machine looked hung.  (The
fact that that sysbeep is off so I couldn't tell if typing in commands was
doing anything vs emitting errors probably didn't improve trying to diagnose
the hang as "sitting in ddb" initially, though I don't know if DDB itself
emits a beep for invalid commands, etc.)

-- 
John Baldwin