ZFS: Panic when attempting to delete certain data

Andriy Gapon avg at FreeBSD.org
Tue Nov 27 18:48:41 UTC 2012


on 27/11/2012 20:25 Josh Beard said the following:
> Hello,
> 
> I have a system that I can consistently reproduce a panic on when trying to
> delete certain data.  The data is data that was rsynced from another system
> - nothing terribly unique.  This has been ongoing from several months,
> starting with 9.0-RELEASE and now running 9.1-RC3.
> 
> I can't find anything in common with the files that I can trigger the
> panics with.  One is a simple gzipped archive where some are plain text.
>  Strangely, I can only reproduce it with data that was rsynced from that
> particular system (which is a Mac).

Josh,

I am collecting these cases, thank you for another one.
I had an interesting investigation of
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/173747
Unfortunately, for some reason the whole conversation stayed private.
I see that also opened a PR earlier:
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/170238

Could you please provide the following info?
>From kgdb:
- list in frame 7 (zfs_freebsd_remove), so that I can see the code line
- local variables from frame 7 (info local)

Also, for one of the files that trigger the problem:
- ls -i to obtain its inode number
- zdb -ddddd <dataset name> <inode number>

Thank you.

> I seriously doubt it's hardware at this point, as virtually every piece of
> hardware in that system has been replaced (including motherboard and
> drives).  That said, the zpools were rebuilt from scratch when the drives
> were replaced and the issue persists.
> 
> I can't seem to trigger it with other actions, such as chmod, chown, or
> even mv.  Simply attempting to unlink the files seems to do it.
> 
> # uname -a (I can reproduce on a GENERIC kernel, too).
> FreeBSD bksys1 9.1-RC3 FreeBSD 9.1-RC3 #0 r242591: Sun Nov  4 19:17:25 MST
> 2012     root at bksys1:/usr/obj/usr/src/sys/BKSYS191  amd64
> 
> zpool version is 28; zfs version is 5.
> 
> /boot/loader.conf  doesn't have anything related in it, and an empty one
> produces the same results.
> 
> zpool scrubs are done weekly and have returned no errors (most recent was 3
> days ago).
> 
> Any insight is very appreciated!
> 
> Josh
> 
> 
> The message:
> Fatal trap 12: page fault while in kernel mode
> cpuid = 3; apic id = 05
> fault virtual address = 0x160
> fault code = supervisor read data, page not present
> instruction pointer = 0x20:0xffffffff80ebd45a
> stack pointer        = 0x28:0xffffff8466534850
> frame pointer        = 0x28:0xffffff8466534910
> code segment = base 0x0, limit 0xfffff, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags = interrupt enabled, resume, IOPL = 0
> current process = 3245 (rm)
> trap number = 12
> panic: page fault
> cpuid = 3
> KDB: stack backtrace:
> #0 0xffffffff80585c28 at kdb_backtrace+0x68
> #1 0xffffffff805502cb at panic+0x21b
> #2 0xffffffff807a9fad at trap_fatal+0x39d
> #3 0xffffffff807aa0f0 at trap_pfault+0x120
> #4 0xffffffff807aa7e9 at trap+0x3d9
> #5 0xffffffff80794f4f at calltrap+0x8
> #6 0xffffffff8081cf13 at VOP_REMOVE_APV+0x53
> #7 0xffffffff805ed355 at kern_unlinkat+0x265
> #8 0xffffffff805ed419 at kern_unlink+0x19
> #9 0xffffffff805ed431 at sys_unlink+0x11
> #10 0xffffffff807a95bd at amd64_syscall+0x2fd
> #11 0xffffffff80795237 at Xfast_syscall+0xf7
> Uptime: 14m42s
> Dumping 2432 out of 16361
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> Reading symbols from /boot/kernel/coretemp.ko...Reading symbols from
> /boot/kernel/coretemp.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/coretemp.ko
> Reading symbols from /boot/kernel/zfs.ko...Reading symbols from
> /boot/kernel/zfs.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/zfs.ko
> Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from
> /boot/kernel/opensolaris.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/opensolaris.ko
> Reading symbols from /boot/kernel/if_lagg.ko...Reading symbols from
> /boot/kernel/if_lagg.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/if_lagg.ko
> Reading symbols from /boot/kernel/ng_ubt.ko...Reading symbols from
> /boot/kernel/ng_ubt.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/ng_ubt.ko
> Reading symbols from /boot/kernel/ng_hci.ko...Reading symbols from
> /boot/kernel/ng_hci.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/ng_hci.ko
> Reading symbols from /boot/kernel/ng_bluetooth.ko...Reading symbols from
> /boot/kernel/ng_bluetooth.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/ng_bluetooth.ko
> Reading symbols from /boot/kernel/netgraph.ko...Reading symbols from
> /boot/kernel/netgraph.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/netgraph.ko
> Reading symbols from /boot/kernel/ng_l2cap.ko...Reading symbols from
> /boot/kernel/ng_l2cap.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/ng_l2cap.ko
> Reading symbols from /boot/kernel/ng_btsocket.ko...Reading symbols from
> /boot/kernel/ng_btsocket.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/ng_btsocket.ko
> Reading symbols from /boot/kernel/ng_socket.ko...Reading symbols from
> /boot/kernel/ng_socket.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/ng_socket.ko
> Reading symbols from /boot/kernel/blank_saver.ko...Reading symbols from
> /boot/kernel/blank_saver.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/blank_saver.ko
> #0  doadump (textdump=Variable "textdump" is not available.
> ) at pcpu.h:224
> 224 pcpu.h: No such file or directory.
> in pcpu.h
> (kgdb) #0  doadump (textdump=Variable "textdump" is not available.
> ) at pcpu.h:224
> #1  0xffffffff8054ff87 in kern_reboot (howto=260)
>     at /usr/src/sys/kern/kern_shutdown.c:448
> #2  0xffffffff8055030f in panic (fmt=Variable "fmt" is not available.
> )
>     at /usr/src/sys/kern/kern_shutdown.c:636
> #3  0xffffffff807a9fad in trap_fatal (frame=0xffffff84665347a0, eva=352)
>     at /usr/src/sys/amd64/amd64/trap.c:857
> #4  0xffffffff807aa0f0 in trap_pfault (frame=0xffffff84665347a0, usermode=0)
>     at /usr/src/sys/amd64/amd64/trap.c:714
> #5  0xffffffff807aa7e9 in trap (frame=0xffffff84665347a0)
>     at /usr/src/sys/amd64/amd64/trap.c:456
> #6  0xffffffff80794f4f in calltrap ()
>     at /usr/src/sys/amd64/amd64/exception.S:228
> #7  0xffffffff80ebd45a in zfs_freebsd_remove (ap=Variable "ap" is not
> available.
> )
>     at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1855
> #8  0xffffffff8081cf13 in VOP_REMOVE_APV (vop=Variable "vop" is not
> available.
> ) at vnode_if.c:1333
> #9  0xffffffff805ed355 in kern_unlinkat (td=0xfffffe000c4b1000, fd=-100,
>     path=0x7fffffffdb2e <Address 0x7fffffffdb2e out of bounds>,
>     pathseg=UIO_USERSPACE, oldinum=0) at vnode_if.h:575
> #10 0xffffffff805ed419 in kern_unlink (td=Variable "td" is not available.
> )
>     at /usr/src/sys/kern/vfs_syscalls.c:1897
> #11 0xffffffff805ed431 in sys_unlink (td=Variable "td" is not available.
> )
>     at /usr/src/sys/kern/vfs_syscalls.c:1867
> #12 0xffffffff807a95bd in amd64_syscall (td=0xfffffe000c4b1000, traced=0)
>     at subr_syscall.c:135
> #13 0xffffffff80795237 in Xfast_syscall ()
>     at /usr/src/sys/amd64/amd64/exception.S:387
> #14 0x00000008009100bc in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> (kgdb)
[snip]

-- 
Andriy Gapon


More information about the freebsd-fs mailing list