Re: Debugging a (potentially?) ZFS-related panic, and discussion about large patchsets

From: Mateusz Guzik <mjguzik_at_gmail.com>
Date: Mon, 10 Jan 2022 23:43:06 UTC
On 1/11/22, Mark Johnston <markj@freebsd.org> wrote:
> On Mon, Jan 10, 2022 at 05:11:16PM -0500, Shawn Webb wrote:
>> Hey all,
>>
>> So I'm getting an interesting ZFS-related kernel panic. I've uploaded
>> the core.txt at [0]. I suspect it's related to FreeBSD commit
>> 681ce946f33e75c590e97c53076e86dff1fe8f4a (zfs: merge
>> openzfs/zfs@f291fa658 (master) into main).
>>
>> I'm able to reproduce it on a single system with some level of
>> determinism: I'm building the security appliance firmware at ${DAYJOB}
>> in a bhyve VM that's backed by a zvol. The host is a Dell Precision
>> 7540 laptop with a single NVMe drive in it. The VM is configured with
>> a single zvol, booting with UEFI.
>>
>> Looking at the commit email sent to dev-commits-src-all@, I see this:
>> 146 files changed, 4933 insertions(+), 1572 deletions(-)
>>
>> Strangely, when I run `git show
>> 681ce946f33e75c590e97c53076e86dff1fe8f4a`, I only see a small subset
>> of those changes.
>
> That is a merge commit.  You need to specify that you want a diff
> against the first parent (the preceding FreeBSD), so something
> equivalent to "git diff --stat 681ce946f^ 681ce946f".  Use
> "git log 681ce946f^2" to see the merged OpenZFS commits.
>
>> As a downstream consumer of 14-CURRENT, how am I supposed to even
>> start debugging such a large patchset in any manner that respects my
>> time?
>>
>> It seems to me that breaking up commits into smaller, bite-size chunks
>> would make life easier for those experiencing bugs, especially ones
>> that result in kernel panics.
>
> That's up to the upstream project, in this case OpenZFS.
>
>> ZFS in and of itself is a beast, and I've yet to study any of its
>> code, so when there's a commit that large, even thinking about
>> debugging it is a daunting task.
>>
>> Needless to say, I'm going to need some hand holding here for
>> debugging this. Anyone have any idea what's going on?
>
> To start, you'll need to look at the stack trace for the thread with tid
> 100061.
>

imo the kernel should be patched to obtain the trace on its own. As
the target has interrupts disabled it will have to do it with NMI, but
support for that got scrapped in

commit 1c29da02798d968eb874b86221333a56393a94c3
Author: Mark Johnston <markj@FreeBSD.org>
Date:   Fri Jan 31 15:43:33 2020 +0000

    Reimplement stack capture of running threads on i386 and amd64.

>> I guess this email is to serve three purposes:
>>
>> 1. Report that a bug was introduced recently.
>> 2. Ask for help in squashing the bug. I'm more than happy to test any
>>    patches.
>> 3. Start a dialogue on making life just a little easier for
>>    downstreams.
>>
>> [0]: https://hardenedbsd.org/~shawn/2022-01-10_zfs_core-r01.txt
>
>


-- 
Mateusz Guzik <mjguzik gmail.com>