Re: ZFS + FreeBSD XEN dom0 panic

From: Roger Pau Monné <roger.pau_at_citrix.com>
Date: Fri, 25 Mar 2022 08:17:09 UTC
On Thu, Mar 24, 2022 at 06:38:41PM +0200, Ze Dupsys wrote:
> On 2022.03.24. 18:26, Roger Pau Monné wrote:
> > 
> > This seems to be a fairly common trace for your panics:
> > 
> > #0 0xffffffff80c74605 at kdb_backtrace+0x65
> > #1 0xffffffff80c26611 at vpanic+0x181
> > #2 0xffffffff80c26483 at panic+0x43
> > #3 0xffffffff810c1b97 at trap+0xba7
> > #4 0xffffffff810c1bef at trap+0xbff
> > #5 0xffffffff810c1243 at trap+0x253
> > #6 0xffffffff81098c58 at calltrap+0x8
> > #7 0xffffffff80c7f251 at rman_is_region_manager+0x241
> > #8 0xffffffff80c36e71 at sbuf_new_for_sysctl+0x101
> > #9 0xffffffff80c362bc at kernel_sysctl+0x3ec
> > #10 0xffffffff80c36933 at userland_sysctl+0x173
> > #11 0xffffffff80c3677f at sys___sysctl+0x5f
> > #12 0xffffffff810c249c at amd64_syscall+0x10c
> > #13 0xffffffff8109956b at Xfast_syscall+0xfb
> > 
> > Could you give me the output of executing the following on dom0:
> > 
> > $ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80c7f251
> > $ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80c36e71
> > $ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80c362bc
> > $ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80c36933
> > $ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80c3677f
> 
> Yes, i'd say that with current stress test the panic message always contains
> rman_is_region_manager in mid.

That's great. In fact I think I was mislead by the kdb resolved
symbols not being very accurate.

>  addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80c7f251
> /usr/src/sys/kern/subr_rman.c:0

It's a shame this one hasn't been resolved properly. I think this
would point to sysctl_rman, but without proper debug that's just a
guess. Could you install GNU binutils and try to resolve using GNU
addr2line?

$ pkg install binutils
$ /usr/local/bin/addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80c7f251

If you could post the output of that I think it would be helpful.

After attempting to resolve the address, can you give the attached
patch a try? (maybe it's not going to make a difference, as without
that symbol resolved this is just a hunch).

---8<---
diff --git a/sys/kern/subr_rman.c b/sys/kern/subr_rman.c
index 1bbaff8264ef..f73151c27bbe 100644
--- a/sys/kern/subr_rman.c
+++ b/sys/kern/subr_rman.c
@@ -1000,9 +1000,10 @@ sysctl_rman(SYSCTL_HANDLER_ARGS)
 		if (rman_idx-- == 0)
 			break;
 	}
-	mtx_unlock(&rman_mtx);
-	if (rm == NULL)
+	if (rm == NULL) {
+		mtx_unlock(&rman_mtx);
 		return (ENOENT);
+	}
 
 	/*
 	 * If the resource index is -1, we want details on the
@@ -1016,6 +1017,7 @@ sysctl_rman(SYSCTL_HANDLER_ARGS)
 		urm.rm_start = rm->rm_start;
 		urm.rm_size = rm->rm_end - rm->rm_start + 1;
 		urm.rm_type = rm->rm_type;
+		mtx_unlock(&rman_mtx);
 
 		error = SYSCTL_OUT(req, &urm, sizeof(urm));
 		return (error);
@@ -1037,6 +1039,7 @@ sysctl_rman(SYSCTL_HANDLER_ARGS)
 				goto found;
 	}
 	mtx_unlock(rm->rm_mtx);
+	mtx_unlock(&rman_mtx);
 	return (ENOENT);
 
 found:
@@ -1062,6 +1065,7 @@ sysctl_rman(SYSCTL_HANDLER_ARGS)
 	ures.r_flags = res->r_flags;
 
 	mtx_unlock(rm->rm_mtx);
+	mtx_unlock(&rman_mtx);
 	error = SYSCTL_OUT(req, &ures, sizeof(ures));
 	return (error);
 }