Re: zfs related panic
- In reply to: Rick Macklem : "Re: zfs related panic"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 16 Aug 2025 01:43:17 UTC
On Fri, Aug 15, 2025 at 06:14:39PM -0700, Rick Macklem wrote:
> On Fri, Aug 15, 2025 at 5:58 PM Konstantin Belousov <kostikbel@gmail.com> wrote:
> >
> > On Fri, Aug 15, 2025 at 05:41:26PM -0700, Rick Macklem wrote:
> > > On Fri, Aug 15, 2025 at 5:35 PM Konstantin Belousov <kostikbel@gmail.com> wrote:
> > > >
> > > > On Fri, Aug 15, 2025 at 05:26:21PM -0700, Rick Macklem wrote:
> > > > > On Fri, Aug 15, 2025 at 5:07 PM Konstantin Belousov <kostikbel@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Aug 15, 2025 at 04:51:00PM -0700, Bakul Shah wrote:
> > > > > > > On Aug 15, 2025, at 3:51 PM, Konstantin Belousov <kostikbel@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Aug 15, 2025 at 11:19:55AM -0700, Bakul Shah wrote:
> > > > > > > >> Is this a known bug or may be something specific on my machine?
> > > > > > > >> If the latter, any way to "fsck" it? FYI, the zpool is a mirror
> > > > > > > >> (two files on the host via nvme). built from c992ac621327 commit hash
> > > > > > > >> (which has other issues but they seem to be separate from this).
> > > > > > > >> I saw the same panic when I booted from a day old snapshot.
> > > > > > > >>
> > > > > > > >> Note that "ls /.zfs" panics but "ls /.zfs/snapshot" doesn't!
> > > > > > > >>
> > > > > > > >> This is on a -current VM:
> > > > > > > >>
> > > > > > > >> root@:/ # ls .zfs
> > > > > > > >> VNASSERT failed: oresid == 0 || nresid != oresid || *(a)->a_eofflag == 1 not true at vnode_if.c:1824 (VOP_READDIR_APV)
> > > > > > > >
> > > > > > > > Try this, untested.
> > > > > > >
> > > > > > > Thanks for the quick patch! But I am afraid it didn't help. Let me know if you
> > > > > > > want me to check things via gdb. [I have filed
> > > > > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=288889
> > > > > > > so we can continue debugging there]
> > > > > > >
> > > > > > > On the console (single user, RO root):
> > > > > > > # ls /.zfs
> > > > > > > VNASSERT failed: oresid == 0 || nresid != oresid || *(a)->a_eofflag == 1 not true at vnode_if.c:1824 (VOP_READDIR_APV)
> > > > > > > 0xfffff800059546e0: type VDIR state VSTATE_CONSTRUCTED op 0xffffffff8272cfd0
> > > > > > > usecount 1, writecount 0, refcount 1 seqc users 0 mountedhere 0
> > > > > > > hold count flags ()
> > > > > > > flags ()
> > > > > > > lock type zfs: SHARED (count 1)
> > > > > > > name = .zfs
> > > > > > > parent_id = 0
> > > > > > > id = 1
> > > > > > > panic: VOP_READDIR: eofflag not set
> > > > > > > cpuid = 0
> > > > > > > time = 1755276357
> > > > > > > KDB: stack backtrace:
> > > > > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0053f83af0
> > > > > > > vpanic() at vpanic+0x136/frame 0xfffffe0053f83c20
> > > > > > > panic() at panic+0x43/frame 0xfffffe0053f83c80
> > > > > > > VOP_READDIR_APV() at VOP_READDIR_APV+0x205/frame 0xfffffe0053f83cd0
> > > > > > > kern_getdirentries() at kern_getdirentries+0x228/frame 0xfffffe0053f83dd0
> > > > > > > sys_getdirentries() at sys_getdirentries+0x29/frame 0xfffffe0053f83e00
> > > > > > > amd64_syscall() at amd64_syscall+0x169/frame 0xfffffe0053f83f30
> > > > > > > fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0053f83f30
> > > > > > > --- syscall (554, FreeBSD ELF64, getdirentries), rip = 0x331339f976aa, rsp = 0x33133631ade8, rbp = 0x33133631ae20 ---
> > > > > > > KDB: enter: panic
> > > > > > > [ thread pid 23 tid 100211 ]
> > > > > > > Stopped at kdb_enter+0x33: movq $0,0x12313e2(%rip)
> > > > > > > db>
> > > > > > >
> > > > > > > Running gdb on the host (attached to tcp port):
> > > > > > > #16 0xffffffff80b7992b in vpanic (
> > > > > > > fmt=0xffffffff812ddf30 "VOP_READDIR: eofflag not set",
> > > > > > > ap=ap@entry=0xfffffe0053f83c60)
> > > > > > > at /home/FreeBSD/current/sys/kern/kern_shutdown.c:962
> > > > > > > #17 0xffffffff80b79793 in panic (
> > > > > > > fmt=0xffffffff81d9eab0 <cnputs_mtx> "\304\372\032\201\377\377\377\377")
> > > > > > > at /home/FreeBSD/current/sys/kern/kern_shutdown.c:887
> > > > > > > #18 0xffffffff81195fd5 in VOP_READDIR_APV (vop=<optimized out>,
> > > > > > > a=a@entry=0xfffffe0053f83d30) at vnode_if.c:1824
> > > > > > > #19 0xffffffff80c95e58 in VOP_READDIR (vp=0xfffff800059546e0,
> > > > > > > uio=0xfffffe0053f83d00, cred=<optimized out>, eofflag=0xfffffe0053f83d6c,
> > > > > > > ncookies=0x0, cookies=0x0) at ./vnode_if.h:972
> > > > > > From this frame, do
> > > > > > p *vp
> > > > > > and
> > > > > > p *(vp->v_op)
> > > > > > I am mostly interested what is the .vop_readdir fp points to.
> > > > > I think the problem is that, for this case, ZFS replies with eofflag
> > > > > == -1 instead
> > > > > of 1. (I don't know if you want to change the ASSERT or try to fix ZFS
> > > > > to not do this?)
> > > >
> > > > Where do you see it? I mean the '-1' set to *eofp.
> > > I saw it in a printf() after VOP_READDIR(). However, a subsequent
> > > test showed 0.
> > > --> The first time I was printing out for non-ZFS, so there was a fair amount
> > > of other printf()s being logged. Then I limited it to ZFS.
> > >
> > > Anyhow, ZFS seems to get eof wrong when it is already at eof.
> > >
> > > I'll take a look at the ZFS code, rick
> > >
> >
> > Below is what I gathered so far.
> >
> > diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_ctldir.c b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_ctldir.c
> > index 61d0bb26d1e5..c191b2abf6cc 100644
> > --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_ctldir.c
> > +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_ctldir.c
> > @@ -674,6 +674,7 @@ zfsctl_root_readdir(struct vop_readdir_args *ap)
> > zfs_uio_t uio;
> > int *eofp = ap->a_eofflag;
> > off_t dots_offset;
> > + ssize_t orig_resid;
> > int error;
> >
> > zfs_uio_init(&uio, ap->a_uio);
> > @@ -688,14 +689,20 @@ zfsctl_root_readdir(struct vop_readdir_args *ap)
> > * count to return is 0.
> > */
> > if (zfs_uio_offset(&uio) == 3 * sizeof (entry)) {
> > + if (eofp != NULL)
> > + *eofp = 1;
> Yea, I was just going to try the same thing. I think it probably fixes
> the assert.
> (Of course, it might take a while to get this in ZFS, so you might have to drop
> the assert until it downstreams from OpenZFS?)
>
> > return (0);
> > }
> >
> > + orig_resid = zfs_uio_resid(&uio);
> > error = sfs_readdir_common(zfsvfs->z_root, ZFSCTL_INO_ROOT, ap, &uio,
> > &dots_offset);
> > if (error != 0) {
> > - if (error == ENAMETOOLONG) /* ran out of destination space */
> > + if (error == ENAMETOOLONG) { /* ran out of destination space */
> > error = 0;
> > + if (orig_resid == zfs_uio_resid(&uio) && eofp != NULL)
> > + *eofp = 1;
> If it hasn't filled in an entry, shouldn't it be an error return and not
> setting "error = 0;"?
This is how this code is structured: if there is no space, consider it the
end of the run. UFS does the same, but it is careful to only reset error
to zero if there were some entries already copied out.
>
> > + }
> > return (error);
> > }
> > if (zfs_uio_offset(&uio) != dots_offset)
> > @@ -710,8 +717,11 @@ zfsctl_root_readdir(struct vop_readdir_args *ap)
> > entry.d_reclen = sizeof (entry);
> > error = vfs_read_dirent(ap, &entry, zfs_uio_offset(&uio));
> > if (error != 0) {
> > - if (error == ENAMETOOLONG)
> > + if (error == ENAMETOOLONG) {
> > error = 0;
> > + if (orig_resid == zfs_uio_resid(&uio) && eofp != NULL)
> > + *eofp = 1;
> Ditto above.
> > + }
> > return (SET_ERROR(error));
> > }
> > if (eofp != NULL)
> > @@ -1056,17 +1066,22 @@ zfsctl_snapdir_readdir(struct vop_readdir_args *ap)
> > zfs_uio_t uio;
> > int *eofp = ap->a_eofflag;
> > off_t dots_offset;
> > + ssize_t orig_resid;
> > int error;
> >
> > zfs_uio_init(&uio, ap->a_uio);
> > + orig_resid = zfs_uio_resid(&uio);
> >
> > ASSERT3S(vp->v_type, ==, VDIR);
> >
> > error = sfs_readdir_common(ZFSCTL_INO_ROOT, ZFSCTL_INO_SNAPDIR, ap,
> > &uio, &dots_offset);
> > if (error != 0) {
> > - if (error == ENAMETOOLONG) /* ran out of destination space */
> > + if (error == ENAMETOOLONG) { /* ran out of destination space */
> > error = 0;
> > + if (orig_resid == zfs_uio_resid(&uio) && eofp != NULL)
> > + *eofp = 1;
> > + }
> Ditto again.
> > return (error);
> > }
> >
> > @@ -1084,7 +1099,8 @@ zfsctl_snapdir_readdir(struct vop_readdir_args *ap)
> > dsl_pool_config_exit(dmu_objset_pool(zfsvfs->z_os), FTAG);
> > if (error != 0) {
> > if (error == ENOENT) {
> > - if (eofp != NULL)
> > + if (orig_resid == zfs_uio_resid(&uio) &&
> > + eofp != NULL)
> > *eofp = 1;
> > error = 0;
> > }
> > @@ -1099,8 +1115,12 @@ zfsctl_snapdir_readdir(struct vop_readdir_args *ap)
> > entry.d_reclen = sizeof (entry);
> > error = vfs_read_dirent(ap, &entry, zfs_uio_offset(&uio));
> > if (error != 0) {
> > - if (error == ENAMETOOLONG)
> > + if (error == ENAMETOOLONG) {
> > error = 0;
> > + if (orig_resid == zfs_uio_resid(&uio) &&
> > + eofp != NULL)
> > + *eofp = 1;
> Ditto again.
> > + }
> > zfs_exit(zfsvfs, FTAG);
> > return (SET_ERROR(error));
> > }
> > diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> > index 1813c411b013..79e808a3ab3d 100644
> > --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> > +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> > @@ -1695,6 +1695,7 @@ zfs_readdir(vnode_t *vp, zfs_uio_t *uio, cred_t *cr, int *eofp,
> > objset_t *os;
> > caddr_t outbuf;
> > size_t bufsize;
> > + ssize_t orig_resid;
> > zap_cursor_t zc;
> > zap_attribute_t *zap;
> > uint_t bytes_wanted;
> > @@ -1735,7 +1736,7 @@ zfs_readdir(vnode_t *vp, zfs_uio_t *uio, cred_t *cr, int *eofp,
> > /*
> > * Quit if directory has been removed (posix)
> > */
> > - if ((*eofp = zp->z_unlinked) != 0) {
> > + if ((*eofp = (zp->z_unlinked != 0)) != 0) {
> > zfs_exit(zfsvfs, FTAG);
> > return (0);
> > }
> > @@ -1743,6 +1744,7 @@ zfs_readdir(vnode_t *vp, zfs_uio_t *uio, cred_t *cr, int *eofp,
> > error = 0;
> > os = zfsvfs->z_os;
> > offset = zfs_uio_offset(uio);
> > + orig_resid = zfs_uio_resid(uio);
> > prefetch = zp->z_zn_prefetch;
> > zap = zap_attribute_long_alloc();
> >
> > @@ -1933,6 +1935,8 @@ zfs_readdir(vnode_t *vp, zfs_uio_t *uio, cred_t *cr, int *eofp,
> > *cookies = NULL;
> > *ncookies = 0;
> > }
> > + if (error == 0 && zfs_uio_resid(uio) != orig_resid)
> > + *eofp = 1;
> I looked at this earlier and I don't this is needed. When it breaks out
> of the loop, it has set eof if it got ENOENT from zap_cursor_retrieve(),
> which I think is the only case that will return 0 and not an error?
> (But it looks harmless to me.)
Ok, I reverted this chunk, but also changed the case of no dirents to be
same as UFS.
The patches are moved to https://reviews.freebsd.org/D51930
>
> > return (error);
> > }
> >
> >