Re: ZFS + FreeBSD XEN dom0 panic

From: Ze Dupsys <zedupsys_at_gmail.com>
Date: Sat, 26 Mar 2022 07:09:27 UTC
On 2022.03.25. 10:17, Roger Pau Monné wrote:
> On Thu, Mar 24, 2022 at 06:38:41PM +0200, Ze Dupsys wrote:
>> On 2022.03.24. 18:26, Roger Pau Monné wrote:
>>>
>>> This seems to be a fairly common trace for your panics:
>>>
>>> #0 0xffffffff80c74605 at kdb_backtrace+0x65
>>> #1 0xffffffff80c26611 at vpanic+0x181
>>> #2 0xffffffff80c26483 at panic+0x43
>>> #3 0xffffffff810c1b97 at trap+0xba7
>>> #4 0xffffffff810c1bef at trap+0xbff
>>> #5 0xffffffff810c1243 at trap+0x253
>>> #6 0xffffffff81098c58 at calltrap+0x8
>>> #7 0xffffffff80c7f251 at rman_is_region_manager+0x241
>>> #8 0xffffffff80c36e71 at sbuf_new_for_sysctl+0x101
>>> #9 0xffffffff80c362bc at kernel_sysctl+0x3ec
>>> #10 0xffffffff80c36933 at userland_sysctl+0x173
>>> #11 0xffffffff80c3677f at sys___sysctl+0x5f
>>> #12 0xffffffff810c249c at amd64_syscall+0x10c
>>> #13 0xffffffff8109956b at Xfast_syscall+0xfb
>>>
>>> Could you give me the output of executing the following on dom0:
>>>
>>> $ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80c7f251
>>> $ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80c36e71
>>> $ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80c362bc
>>> $ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80c36933
>>> $ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80c3677f
>>
>> Yes, i'd say that with current stress test the panic message always contains
>> rman_is_region_manager in mid.
> 
> That's great. In fact I think I was mislead by the kdb resolved
> symbols not being very accurate.
> 
>>   addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80c7f251
>> /usr/src/sys/kern/subr_rman.c:0
> 
> It's a shame this one hasn't been resolved properly. I think this
> would point to sysctl_rman, but without proper debug that's just a
> guess. Could you install GNU binutils and try to resolve using GNU
> addr2line?
> 
> $ pkg install binutils
> $ /usr/local/bin/addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80c7f251

/usr/src/sys/kern/subr_rman.c:0

Prev. line is not my question, it is output from GNU addr2line. I don't 
know why it does not resolve either.


> After attempting to resolve the address, can you give the attached
> patch a try? (maybe it's not going to make a difference, as without
> that symbol resolved this is just a hunch).

Nice find, but it did not resolve panic problem. Aside from that, i 
think this fixes some possible problems since current locking logic does 
not seem to be correct.

Next week i will not be able to run tests till Thursday, but i will run 
some today and tomorrow.