ZFS: Panic when attempting to delete certain data

Josh Beard josh at signalboxes.net
Tue Nov 27 20:32:30 UTC 2012


On Tue, Nov 27, 2012 at 11:48 AM, Andriy Gapon <avg at freebsd.org> wrote:

> on 27/11/2012 20:25 Josh Beard said the following:
> > Hello,
> >
> > I have a system that I can consistently reproduce a panic on when trying
> to
> > delete certain data.  The data is data that was rsynced from another
> system
> > - nothing terribly unique.  This has been ongoing from several months,
> > starting with 9.0-RELEASE and now running 9.1-RC3.
> >
> > I can't find anything in common with the files that I can trigger the
> > panics with.  One is a simple gzipped archive where some are plain text.
> >  Strangely, I can only reproduce it with data that was rsynced from that
> > particular system (which is a Mac).
>
> Josh,
>
> I am collecting these cases, thank you for another one.
> I had an interesting investigation of
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/173747
> Unfortunately, for some reason the whole conversation stayed private.
> I see that also opened a PR earlier:
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/170238
>
> Could you please provide the following info?
> From kgdb:
> - list in frame 7 (zfs_freebsd_remove), so that I can see the code line
> - local variables from frame 7 (info local)
>
>
>
Andriy,

Thanks for your quick response.  I've never used kgdb, so forgive my
ignorance here.  Is this what you're looking for?  If not, if you could you
elaborate on those?

#7  0xffffffff80ebd45a in zfs_freebsd_remove (ap=Variable "ap" is not
available.
) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1855
1855                    dmu_tx_hold_sa(tx, xzp->z_sa_hdl, B_FALSE);


(kgdb) list zfs_freebsd_remove
5796            struct vop_remove_args /* {
5797                    struct vnode *a_dvp;
5798                    struct vnode *a_vp;
5799                    struct componentname *a_cnp;
5800            } */ *ap;
5801    {
5802
5803            ASSERT(ap->a_cnp->cn_flags & SAVENAME);
5804
5805            return (zfs_remove(ap->a_dvp, ap->a_cnp->cn_nameptr,

(kgdb) info frame 7
Stack frame at 0xffffff8466a6a920:
 rip = 0xffffffff80ebd45a in zfs_freebsd_remove
(/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1855);
saved rip 0xffffffff8081cf13
 called by frame at 0xffffff8466a6a940, caller of frame at
0xffffff8466a6a7a0
 source language c.
 Arglist at 0xffffff8466a6a910, args: ap=Variable "ap" is not available.

Also, for one of the files that trigger the problem:
> - ls -i to obtain its inode number
> - zdb -ddddd <dataset name> <inode number>
>

# ls -i kyofilter\ v2.2.pax.gz  (this is a symlink.  the file that it's
linked to does *not* panic the system if I try to delete it).
247126 kyofilter v2.2.pax.gz

# zdb -ddddd store/tdxs1 247126
Dataset store/tdxs1 [ZPL], ID 109, cr_txg 35014, 1.33T, 1106389 objects,
rootbp DVA[0]=<0:80001a2400:400> DVA[1]=<0:30800610000:400> [L0 DMU objset]
fletcher4 lzjb LE contiguous unique double size=800L/200P
birth=1166838L/1166838P fill=1106389
cksum=19391f0f67:78eb24a9cca:1439005549d01:275015332d1bdf

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
    247126    1    16K    512      0    512    0.00  ZFS plain file
                                        201   bonus  System attributes
        dnode flags: USERUSED_ACCOUNTED
        dnode maxblkid: 0
        path
 /tech/2012-09-14-01-00/Drivers/Kyocera/.old/C2126.old/Kyocera OS X 10.5+
Web build 2011.01.27.mpkg/Contents/Packages/Kyocera OS X
subinstaller.mpkg/Contents/Packages/kyofilter
v2.2.pkg/Contents/Resources/kyofilter v2.2.pax.gz
        uid     1001
        gid     80
        atime   Tue Nov 27 13:27:57 2012
        mtime   Tue Jul 12 14:17:16 2011
        ctime   Fri Sep 14 01:05:23 2012
        crtime  Fri Sep 14 01:04:11 2012
        gen     81338
        mode    120755
        size    17
        parent  247122
        links   1
        pflags  40800000104
        xattr   155
Indirect blocks:


Thank you.




>
> > I seriously doubt it's hardware at this point, as virtually every piece
> of
> > hardware in that system has been replaced (including motherboard and
> > drives).  That said, the zpools were rebuilt from scratch when the drives
> > were replaced and the issue persists.
> >
> > I can't seem to trigger it with other actions, such as chmod, chown, or
> > even mv.  Simply attempting to unlink the files seems to do it.
> >
> > # uname -a (I can reproduce on a GENERIC kernel, too).
> > FreeBSD bksys1 9.1-RC3 FreeBSD 9.1-RC3 #0 r242591: Sun Nov  4 19:17:25
> MST
> > 2012     root at bksys1:/usr/obj/usr/src/sys/BKSYS191  amd64
> >
> > zpool version is 28; zfs version is 5.
> >
> > /boot/loader.conf  doesn't have anything related in it, and an empty one
> > produces the same results.
> >
> > zpool scrubs are done weekly and have returned no errors (most recent
> was 3
> > days ago).
> >
> > Any insight is very appreciated!
> >
> > Josh
> >
> >
> > The message:
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 3; apic id = 05
> > fault virtual address = 0x160
> > fault code = supervisor read data, page not present
> > instruction pointer = 0x20:0xffffffff80ebd45a
> > stack pointer        = 0x28:0xffffff8466534850
> > frame pointer        = 0x28:0xffffff8466534910
> > code segment = base 0x0, limit 0xfffff, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags = interrupt enabled, resume, IOPL = 0
> > current process = 3245 (rm)
> > trap number = 12
> > panic: page fault
> > cpuid = 3
> > KDB: stack backtrace:
> > #0 0xffffffff80585c28 at kdb_backtrace+0x68
> > #1 0xffffffff805502cb at panic+0x21b
> > #2 0xffffffff807a9fad at trap_fatal+0x39d
> > #3 0xffffffff807aa0f0 at trap_pfault+0x120
> > #4 0xffffffff807aa7e9 at trap+0x3d9
> > #5 0xffffffff80794f4f at calltrap+0x8
> > #6 0xffffffff8081cf13 at VOP_REMOVE_APV+0x53
> > #7 0xffffffff805ed355 at kern_unlinkat+0x265
> > #8 0xffffffff805ed419 at kern_unlink+0x19
> > #9 0xffffffff805ed431 at sys_unlink+0x11
> > #10 0xffffffff807a95bd at amd64_syscall+0x2fd
> > #11 0xffffffff80795237 at Xfast_syscall+0xf7
> > Uptime: 14m42s
> > Dumping 2432 out of 16361
> > MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> >
> > Reading symbols from /boot/kernel/coretemp.ko...Reading symbols from
> > /boot/kernel/coretemp.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/coretemp.ko
> > Reading symbols from /boot/kernel/zfs.ko...Reading symbols from
> > /boot/kernel/zfs.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/zfs.ko
> > Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from
> > /boot/kernel/opensolaris.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/opensolaris.ko
> > Reading symbols from /boot/kernel/if_lagg.ko...Reading symbols from
> > /boot/kernel/if_lagg.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/if_lagg.ko
> > Reading symbols from /boot/kernel/ng_ubt.ko...Reading symbols from
> > /boot/kernel/ng_ubt.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/ng_ubt.ko
> > Reading symbols from /boot/kernel/ng_hci.ko...Reading symbols from
> > /boot/kernel/ng_hci.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/ng_hci.ko
> > Reading symbols from /boot/kernel/ng_bluetooth.ko...Reading symbols from
> > /boot/kernel/ng_bluetooth.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/ng_bluetooth.ko
> > Reading symbols from /boot/kernel/netgraph.ko...Reading symbols from
> > /boot/kernel/netgraph.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/netgraph.ko
> > Reading symbols from /boot/kernel/ng_l2cap.ko...Reading symbols from
> > /boot/kernel/ng_l2cap.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/ng_l2cap.ko
> > Reading symbols from /boot/kernel/ng_btsocket.ko...Reading symbols from
> > /boot/kernel/ng_btsocket.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/ng_btsocket.ko
> > Reading symbols from /boot/kernel/ng_socket.ko...Reading symbols from
> > /boot/kernel/ng_socket.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/ng_socket.ko
> > Reading symbols from /boot/kernel/blank_saver.ko...Reading symbols from
> > /boot/kernel/blank_saver.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/blank_saver.ko
> > #0  doadump (textdump=Variable "textdump" is not available.
> > ) at pcpu.h:224
> > 224 pcpu.h: No such file or directory.
> > in pcpu.h
> > (kgdb) #0  doadump (textdump=Variable "textdump" is not available.
> > ) at pcpu.h:224
> > #1  0xffffffff8054ff87 in kern_reboot (howto=260)
> >     at /usr/src/sys/kern/kern_shutdown.c:448
> > #2  0xffffffff8055030f in panic (fmt=Variable "fmt" is not available.
> > )
> >     at /usr/src/sys/kern/kern_shutdown.c:636
> > #3  0xffffffff807a9fad in trap_fatal (frame=0xffffff84665347a0, eva=352)
> >     at /usr/src/sys/amd64/amd64/trap.c:857
> > #4  0xffffffff807aa0f0 in trap_pfault (frame=0xffffff84665347a0,
> usermode=0)
> >     at /usr/src/sys/amd64/amd64/trap.c:714
> > #5  0xffffffff807aa7e9 in trap (frame=0xffffff84665347a0)
> >     at /usr/src/sys/amd64/amd64/trap.c:456
> > #6  0xffffffff80794f4f in calltrap ()
> >     at /usr/src/sys/amd64/amd64/exception.S:228
> > #7  0xffffffff80ebd45a in zfs_freebsd_remove (ap=Variable "ap" is not
> > available.
> > )
> >     at
> >
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1855
> > #8  0xffffffff8081cf13 in VOP_REMOVE_APV (vop=Variable "vop" is not
> > available.
> > ) at vnode_if.c:1333
> > #9  0xffffffff805ed355 in kern_unlinkat (td=0xfffffe000c4b1000, fd=-100,
> >     path=0x7fffffffdb2e <Address 0x7fffffffdb2e out of bounds>,
> >     pathseg=UIO_USERSPACE, oldinum=0) at vnode_if.h:575
> > #10 0xffffffff805ed419 in kern_unlink (td=Variable "td" is not available.
> > )
> >     at /usr/src/sys/kern/vfs_syscalls.c:1897
> > #11 0xffffffff805ed431 in sys_unlink (td=Variable "td" is not available.
> > )
> >     at /usr/src/sys/kern/vfs_syscalls.c:1867
> > #12 0xffffffff807a95bd in amd64_syscall (td=0xfffffe000c4b1000, traced=0)
> >     at subr_syscall.c:135
> > #13 0xffffffff80795237 in Xfast_syscall ()
> >     at /usr/src/sys/amd64/amd64/exception.S:387
> > #14 0x00000008009100bc in ?? ()
> > Previous frame inner to this frame (corrupt stack?)
> > (kgdb)
> [snip]
>
> --
> Andriy Gapon
>


More information about the freebsd-fs mailing list