Re: zfs related panic

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Sat, 16 Aug 2025 00:06:49 UTC
On Fri, Aug 15, 2025 at 04:51:00PM -0700, Bakul Shah wrote:
> On Aug 15, 2025, at 3:51 PM, Konstantin Belousov <kostikbel@gmail.com> wrote:
> > 
> > On Fri, Aug 15, 2025 at 11:19:55AM -0700, Bakul Shah wrote:
> >> Is this a known bug or may be something specific on my machine?
> >> If the latter, any way to "fsck" it? FYI, the zpool is a mirror
> >> (two files on the host via nvme). built from c992ac621327 commit hash
> >> (which has other issues but they seem to be separate from this).
> >> I saw the same panic when I booted from a day old snapshot.
> >> 
> >> Note that "ls /.zfs" panics but "ls /.zfs/snapshot" doesn't!
> >> 
> >> This is on a -current VM:
> >> 
> >> root@:/ # ls .zfs
> >> VNASSERT failed: oresid == 0 || nresid != oresid || *(a)->a_eofflag == 1 not true at vnode_if.c:1824 (VOP_READDIR_APV)
> > 
> > Try this, untested.
> 
> Thanks for the quick patch! But I am afraid it didn't help. Let me know if you
> want me to check things via gdb. [I have filed
> 	https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=288889
> so we can continue debugging there]
> 
> On the console (single user, RO root):
> # ls /.zfs
> VNASSERT failed: oresid == 0 || nresid != oresid || *(a)->a_eofflag == 1 not true at vnode_if.c:1824 (VOP_READDIR_APV)
> 0xfffff800059546e0: type VDIR state VSTATE_CONSTRUCTED op 0xffffffff8272cfd0
>     usecount 1, writecount 0, refcount 1 seqc users 0 mountedhere 0
>     hold count flags ()
>     flags ()
>     lock type zfs: SHARED (count 1)
>         name = .zfs
>         parent_id = 0
>         id = 1
> panic: VOP_READDIR: eofflag not set
> cpuid = 0
> time = 1755276357
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0053f83af0
> vpanic() at vpanic+0x136/frame 0xfffffe0053f83c20
> panic() at panic+0x43/frame 0xfffffe0053f83c80
> VOP_READDIR_APV() at VOP_READDIR_APV+0x205/frame 0xfffffe0053f83cd0
> kern_getdirentries() at kern_getdirentries+0x228/frame 0xfffffe0053f83dd0
> sys_getdirentries() at sys_getdirentries+0x29/frame 0xfffffe0053f83e00
> amd64_syscall() at amd64_syscall+0x169/frame 0xfffffe0053f83f30
> fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0053f83f30
> --- syscall (554, FreeBSD ELF64, getdirentries), rip = 0x331339f976aa, rsp = 0x33133631ade8, rbp = 0x33133631ae20 ---
> KDB: enter: panic
> [ thread pid 23 tid 100211 ]
> Stopped at      kdb_enter+0x33: movq    $0,0x12313e2(%rip)
> db>
> 
> Running gdb on the host (attached to tcp port):
> #16 0xffffffff80b7992b in vpanic (
>     fmt=0xffffffff812ddf30 "VOP_READDIR: eofflag not set",
>     ap=ap@entry=0xfffffe0053f83c60)
>     at /home/FreeBSD/current/sys/kern/kern_shutdown.c:962
> #17 0xffffffff80b79793 in panic (
>     fmt=0xffffffff81d9eab0 <cnputs_mtx> "\304\372\032\201\377\377\377\377")
>     at /home/FreeBSD/current/sys/kern/kern_shutdown.c:887
> #18 0xffffffff81195fd5 in VOP_READDIR_APV (vop=<optimized out>,
>     a=a@entry=0xfffffe0053f83d30) at vnode_if.c:1824
> #19 0xffffffff80c95e58 in VOP_READDIR (vp=0xfffff800059546e0,
>     uio=0xfffffe0053f83d00, cred=<optimized out>, eofflag=0xfffffe0053f83d6c,
>     ncookies=0x0, cookies=0x0) at ./vnode_if.h:972
From this frame, do
p *vp
and
p *(vp->v_op)
I am mostly interested what is the .vop_readdir fp points to.