ZFS: Panic when attempting to delete certain data
Josh Beard
josh at signalboxes.net
Tue Nov 27 20:32:30 UTC 2012
On Tue, Nov 27, 2012 at 11:48 AM, Andriy Gapon <avg at freebsd.org> wrote:
> on 27/11/2012 20:25 Josh Beard said the following:
> > Hello,
> >
> > I have a system that I can consistently reproduce a panic on when trying
> to
> > delete certain data. The data is data that was rsynced from another
> system
> > - nothing terribly unique. This has been ongoing from several months,
> > starting with 9.0-RELEASE and now running 9.1-RC3.
> >
> > I can't find anything in common with the files that I can trigger the
> > panics with. One is a simple gzipped archive where some are plain text.
> > Strangely, I can only reproduce it with data that was rsynced from that
> > particular system (which is a Mac).
>
> Josh,
>
> I am collecting these cases, thank you for another one.
> I had an interesting investigation of
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/173747
> Unfortunately, for some reason the whole conversation stayed private.
> I see that also opened a PR earlier:
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/170238
>
> Could you please provide the following info?
> From kgdb:
> - list in frame 7 (zfs_freebsd_remove), so that I can see the code line
> - local variables from frame 7 (info local)
>
>
>
Andriy,
Thanks for your quick response. I've never used kgdb, so forgive my
ignorance here. Is this what you're looking for? If not, if you could you
elaborate on those?
#7 0xffffffff80ebd45a in zfs_freebsd_remove (ap=Variable "ap" is not
available.
) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1855
1855 dmu_tx_hold_sa(tx, xzp->z_sa_hdl, B_FALSE);
(kgdb) list zfs_freebsd_remove
5796 struct vop_remove_args /* {
5797 struct vnode *a_dvp;
5798 struct vnode *a_vp;
5799 struct componentname *a_cnp;
5800 } */ *ap;
5801 {
5802
5803 ASSERT(ap->a_cnp->cn_flags & SAVENAME);
5804
5805 return (zfs_remove(ap->a_dvp, ap->a_cnp->cn_nameptr,
(kgdb) info frame 7
Stack frame at 0xffffff8466a6a920:
rip = 0xffffffff80ebd45a in zfs_freebsd_remove
(/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1855);
saved rip 0xffffffff8081cf13
called by frame at 0xffffff8466a6a940, caller of frame at
0xffffff8466a6a7a0
source language c.
Arglist at 0xffffff8466a6a910, args: ap=Variable "ap" is not available.
Also, for one of the files that trigger the problem:
> - ls -i to obtain its inode number
> - zdb -ddddd <dataset name> <inode number>
>
# ls -i kyofilter\ v2.2.pax.gz (this is a symlink. the file that it's
linked to does *not* panic the system if I try to delete it).
247126 kyofilter v2.2.pax.gz
# zdb -ddddd store/tdxs1 247126
Dataset store/tdxs1 [ZPL], ID 109, cr_txg 35014, 1.33T, 1106389 objects,
rootbp DVA[0]=<0:80001a2400:400> DVA[1]=<0:30800610000:400> [L0 DMU objset]
fletcher4 lzjb LE contiguous unique double size=800L/200P
birth=1166838L/1166838P fill=1106389
cksum=19391f0f67:78eb24a9cca:1439005549d01:275015332d1bdf
Object lvl iblk dblk dsize lsize %full type
247126 1 16K 512 0 512 0.00 ZFS plain file
201 bonus System attributes
dnode flags: USERUSED_ACCOUNTED
dnode maxblkid: 0
path
/tech/2012-09-14-01-00/Drivers/Kyocera/.old/C2126.old/Kyocera OS X 10.5+
Web build 2011.01.27.mpkg/Contents/Packages/Kyocera OS X
subinstaller.mpkg/Contents/Packages/kyofilter
v2.2.pkg/Contents/Resources/kyofilter v2.2.pax.gz
uid 1001
gid 80
atime Tue Nov 27 13:27:57 2012
mtime Tue Jul 12 14:17:16 2011
ctime Fri Sep 14 01:05:23 2012
crtime Fri Sep 14 01:04:11 2012
gen 81338
mode 120755
size 17
parent 247122
links 1
pflags 40800000104
xattr 155
Indirect blocks:
Thank you.
>
> > I seriously doubt it's hardware at this point, as virtually every piece
> of
> > hardware in that system has been replaced (including motherboard and
> > drives). That said, the zpools were rebuilt from scratch when the drives
> > were replaced and the issue persists.
> >
> > I can't seem to trigger it with other actions, such as chmod, chown, or
> > even mv. Simply attempting to unlink the files seems to do it.
> >
> > # uname -a (I can reproduce on a GENERIC kernel, too).
> > FreeBSD bksys1 9.1-RC3 FreeBSD 9.1-RC3 #0 r242591: Sun Nov 4 19:17:25
> MST
> > 2012 root at bksys1:/usr/obj/usr/src/sys/BKSYS191 amd64
> >
> > zpool version is 28; zfs version is 5.
> >
> > /boot/loader.conf doesn't have anything related in it, and an empty one
> > produces the same results.
> >
> > zpool scrubs are done weekly and have returned no errors (most recent
> was 3
> > days ago).
> >
> > Any insight is very appreciated!
> >
> > Josh
> >
> >
> > The message:
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 3; apic id = 05
> > fault virtual address = 0x160
> > fault code = supervisor read data, page not present
> > instruction pointer = 0x20:0xffffffff80ebd45a
> > stack pointer = 0x28:0xffffff8466534850
> > frame pointer = 0x28:0xffffff8466534910
> > code segment = base 0x0, limit 0xfffff, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags = interrupt enabled, resume, IOPL = 0
> > current process = 3245 (rm)
> > trap number = 12
> > panic: page fault
> > cpuid = 3
> > KDB: stack backtrace:
> > #0 0xffffffff80585c28 at kdb_backtrace+0x68
> > #1 0xffffffff805502cb at panic+0x21b
> > #2 0xffffffff807a9fad at trap_fatal+0x39d
> > #3 0xffffffff807aa0f0 at trap_pfault+0x120
> > #4 0xffffffff807aa7e9 at trap+0x3d9
> > #5 0xffffffff80794f4f at calltrap+0x8
> > #6 0xffffffff8081cf13 at VOP_REMOVE_APV+0x53
> > #7 0xffffffff805ed355 at kern_unlinkat+0x265
> > #8 0xffffffff805ed419 at kern_unlink+0x19
> > #9 0xffffffff805ed431 at sys_unlink+0x11
> > #10 0xffffffff807a95bd at amd64_syscall+0x2fd
> > #11 0xffffffff80795237 at Xfast_syscall+0xf7
> > Uptime: 14m42s
> > Dumping 2432 out of 16361
> > MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> >
> > Reading symbols from /boot/kernel/coretemp.ko...Reading symbols from
> > /boot/kernel/coretemp.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/coretemp.ko
> > Reading symbols from /boot/kernel/zfs.ko...Reading symbols from
> > /boot/kernel/zfs.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/zfs.ko
> > Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from
> > /boot/kernel/opensolaris.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/opensolaris.ko
> > Reading symbols from /boot/kernel/if_lagg.ko...Reading symbols from
> > /boot/kernel/if_lagg.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/if_lagg.ko
> > Reading symbols from /boot/kernel/ng_ubt.ko...Reading symbols from
> > /boot/kernel/ng_ubt.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/ng_ubt.ko
> > Reading symbols from /boot/kernel/ng_hci.ko...Reading symbols from
> > /boot/kernel/ng_hci.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/ng_hci.ko
> > Reading symbols from /boot/kernel/ng_bluetooth.ko...Reading symbols from
> > /boot/kernel/ng_bluetooth.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/ng_bluetooth.ko
> > Reading symbols from /boot/kernel/netgraph.ko...Reading symbols from
> > /boot/kernel/netgraph.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/netgraph.ko
> > Reading symbols from /boot/kernel/ng_l2cap.ko...Reading symbols from
> > /boot/kernel/ng_l2cap.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/ng_l2cap.ko
> > Reading symbols from /boot/kernel/ng_btsocket.ko...Reading symbols from
> > /boot/kernel/ng_btsocket.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/ng_btsocket.ko
> > Reading symbols from /boot/kernel/ng_socket.ko...Reading symbols from
> > /boot/kernel/ng_socket.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/ng_socket.ko
> > Reading symbols from /boot/kernel/blank_saver.ko...Reading symbols from
> > /boot/kernel/blank_saver.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/blank_saver.ko
> > #0 doadump (textdump=Variable "textdump" is not available.
> > ) at pcpu.h:224
> > 224 pcpu.h: No such file or directory.
> > in pcpu.h
> > (kgdb) #0 doadump (textdump=Variable "textdump" is not available.
> > ) at pcpu.h:224
> > #1 0xffffffff8054ff87 in kern_reboot (howto=260)
> > at /usr/src/sys/kern/kern_shutdown.c:448
> > #2 0xffffffff8055030f in panic (fmt=Variable "fmt" is not available.
> > )
> > at /usr/src/sys/kern/kern_shutdown.c:636
> > #3 0xffffffff807a9fad in trap_fatal (frame=0xffffff84665347a0, eva=352)
> > at /usr/src/sys/amd64/amd64/trap.c:857
> > #4 0xffffffff807aa0f0 in trap_pfault (frame=0xffffff84665347a0,
> usermode=0)
> > at /usr/src/sys/amd64/amd64/trap.c:714
> > #5 0xffffffff807aa7e9 in trap (frame=0xffffff84665347a0)
> > at /usr/src/sys/amd64/amd64/trap.c:456
> > #6 0xffffffff80794f4f in calltrap ()
> > at /usr/src/sys/amd64/amd64/exception.S:228
> > #7 0xffffffff80ebd45a in zfs_freebsd_remove (ap=Variable "ap" is not
> > available.
> > )
> > at
> >
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1855
> > #8 0xffffffff8081cf13 in VOP_REMOVE_APV (vop=Variable "vop" is not
> > available.
> > ) at vnode_if.c:1333
> > #9 0xffffffff805ed355 in kern_unlinkat (td=0xfffffe000c4b1000, fd=-100,
> > path=0x7fffffffdb2e <Address 0x7fffffffdb2e out of bounds>,
> > pathseg=UIO_USERSPACE, oldinum=0) at vnode_if.h:575
> > #10 0xffffffff805ed419 in kern_unlink (td=Variable "td" is not available.
> > )
> > at /usr/src/sys/kern/vfs_syscalls.c:1897
> > #11 0xffffffff805ed431 in sys_unlink (td=Variable "td" is not available.
> > )
> > at /usr/src/sys/kern/vfs_syscalls.c:1867
> > #12 0xffffffff807a95bd in amd64_syscall (td=0xfffffe000c4b1000, traced=0)
> > at subr_syscall.c:135
> > #13 0xffffffff80795237 in Xfast_syscall ()
> > at /usr/src/sys/amd64/amd64/exception.S:387
> > #14 0x00000008009100bc in ?? ()
> > Previous frame inner to this frame (corrupt stack?)
> > (kgdb)
> [snip]
>
> --
> Andriy Gapon
>
More information about the freebsd-fs
mailing list