[Bug 265695] Kernel panic on ZFS service

From: <bugzilla-noreply_at_freebsd.org>
Date: Sun, 07 Aug 2022 20:00:11 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265695

            Bug ID: 265695
           Summary: Kernel panic on ZFS service
           Product: Base System
           Version: 13.1-STABLE
          Hardware: powerpc
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: bugreporter@firemail.cc

Hello awesome FreeBSD people.

I can reliably cause a crash in 13.1-STABLE (cross-built on an amd64 host from
an earlier commitid on stable/13) as of commitid
3fbe3365df59f9b973c7b5bc8e82e13199ab5057 (the last commit that had llvm-14.0.4)
by starting the ZFS service from a fresh install.

Hardware is a Raptor Computing Talos II, 128GB RAM, 36-core, etc.

To trigger the crash:

/etc/rc.d/zfs onestart

Results on console are:

FreeBSD/powerpc (machinename) (ttyu0)

login: ZFS filesystem version: 5
ZFS storage pool version: features support (5000)

fatal kernel trap:

   exception       = 0x400 (instruction storage interrupt)
   virtual address = 0x3abd29ae1c9b0f88
   srr0            = 0x3abd29ae1c9b0f88 (0x7abd29ae1a6c0f88)
   srr1            = 0x9000000040009032
   current msr     = 0x9000000000009032
   lr              = 0xc0080001ef181758 (0x80001ece91758)
   frame           = 0xc0080001f24c9eb0
   curthread       = 0xc00800000bf68b00
          pid = 1599, comm = zfs

panic: instruction storage interrupt trap
cpuid = 96
time = 1659901415
KDB: stack backtrace:
#0 0xc000000002bda030 at kdb_backtrace+0x90
#1 0xc000000002b6af4c at vpanic+0x1f0
#2 0xc000000002b6ad48 at panic+0x44
#3 0xc000000003035900 at trap+0x304
#4 0xc000000003029b54 at powerpc_interrupt+0x1b4
Uptime: 1h41m52s

Dump failed. Partition too small.
aacraid0: shutting down controller...done
[84664.102624126,5] OPAL: Reboot request...
[84664.103037173,5] RESET: Initiating fast reboot 12...
[rest of OBMC-mediated reboot process]

I don't really see anything in stable/13 tip (in my tree that's
f95569fafcba5ed3cd119f8d177622fe0e64bbf6) which might suggest this is fixed in
a later commit, however, due to another bug which I'll file shortly, I
currently can't boot anything after the llvm-14.0.5 import on stable/13 on this
machine due to insta-panic.

However! I would be glad to spend time helping anyone chase this down and I'm
good at following instructions. If you (whoever you might be) would like me to
instrument with debugging and explore data structures, feel free to tell me
what to do and I'll get on it.

-- 
You are receiving this mail because:
You are the assignee for the bug.