panic while zfs scrubbing

Roger Hammerstein cheeky.m at live.com
Tue Aug 21 21:09:35 UTC 2012



I have a zpool where scrub seems to cause panics.

I do not have zfs in rc.conf, but import manually
on boot.

I start a scrub on a zpool, and some time through will get a panic
and reboot.
After panic and reboot, re-importing the pool and allowing
the scrub to restart on its own will cause another panic.
So I import and immediately stop the scrub for now.

ls -la *.{9,8,10}
-rw-------  1 root  wheel      150744 Aug 21 16:46 core.txt.10
-rw-------  1 root  wheel      147280 Aug 21 11:04 core.txt.8
-rw-------  1 root  wheel      148572 Aug 21 14:53 core.txt.9
-rw-------  1 root  wheel         457 Aug 21 16:45 info.10
-rw-------  1 root  wheel         456 Aug 21 11:04 info.8
-rw-------  1 root  wheel         458 Aug 21 14:52 info.9
-rw-------  1 root  wheel   643919872 Aug 21 16:46 vmcore.10
-rw-------  1 root  wheel   767168512 Aug 21 11:04 vmcore.8
-rw-------  1 root  wheel  1097850880 Aug 21 14:53 vmcore.9


 9.1-BETA1 FreeBSD 9.1-BETA1 #34: Thu Jul 12 05:57:44 EDT 2012
amd64
4GB of ram, 4gb of swap.


panic: integer divide fault

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 18: integer divide fault while in kernel mode
cpuid = 5; apic id = 05
instruction pointer     = 0x20:0xffffffff81674a14
stack pointer           = 0x28:0xffffff810c3d4520
frame pointer           = 0x28:0xffffff810c3d4540
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 9480 (txg_thread_enter)
trap number             = 18
panic: integer divide fault
cpuid = 5

KDB: stack backtrace:
#0 0xffffffff80920346 at kdb_backtrace+0x66
#1 0xffffffff808ea35e at panic+0x1ce
#2 0xffffffff80bd7a30 at trap_fatal+0x290
#3 0xffffffff80bd80c5 at trap+0x105
#4 0xffffffff80bc295f at calltrap+0x8
#5 0xffffffff816818cf at vdev_mirror_io_start+0x2bf
#6 0xffffffff81699542 at zio_vdev_io_start+0x232
#7 0xffffffff81698fe3 at zio_execute+0xc3
#8 0xffffffff8165ea1c at dsl_scan_scrub_cb+0x3ec
#9 0xffffffff8165fe14 at dsl_scan_visitbp+0x534
#10 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
#11 0xffffffff81660c84 at dsl_scan_visitdnode+0x84
#12 0xffffffff81660070 at dsl_scan_visitbp+0x790
#13 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
#14 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
#15 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
#16 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
#17 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
Uptime: 1h51m55s
Dumping 614 out of 3818 MB:..3%..11%..21%..32%..42%..53%..63%..71%..81%..92%


Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
#0  doadump (textdump=Variable "textdump" is not available.
) at pcpu.h:224
224     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) #0  doadump (textdump=Variable "textdump" is not available.
) at pcpu.h:224
#1  0xffffffff808e9e41 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:448
#2  0xffffffff808ea337 in panic (fmt=0x1 <Address 0x1 out of bounds>)
    at /usr/src/sys/kern/kern_shutdown.c:636
#3  0xffffffff80bd7a30 in trap_fatal (frame=0x12, eva=Variable "eva" is not available.
)
    at /usr/src/sys/amd64/amd64/trap.c:857
#4  0xffffffff80bd80c5 in trap (frame=0xffffff810c3d4470)
    at /usr/src/sys/amd64/amd64/trap.c:599
#5  0xffffffff80bc295f in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:228
#6  0xffffffff81674a14 in spa_get_random (range=0)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c:1165
#7  0xffffffff816818cf in vdev_mirror_io_start (zio=0xfffffe0037e5e000)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c:89
#8  0xffffffff81699542 in zio_vdev_io_start (zio=0xfffffe0037e5e000)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2305
#9  0xffffffff81698fe3 in zio_execute (zio=0xfffffe0037e5e000)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1196
#10 0xffffffff8165ea1c in dsl_scan_scrub_cb (dp=0xffffff810c3d4538, 
    bp=0xffffff8003c53480, zb=0xffffff810c3d4970)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:1737
#11 0xffffffff8165fe14 in dsl_scan_visitbp (bp=0xffffff8003c53480, 
    zb=0xffffff810c3d4970, dnp=0xffffff8003642200, pbuf=Variable "pbuf" is not available.
)
  at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:858
#12 0xffffffff8165fd99 in dsl_scan_visitbp (bp=0xffffff8003642240, 
    zb=0xffffff810c3d4a00, dnp=0xffffff8003642200, pbuf=Variable "pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:684
#13 0xffffffff81660c84 in dsl_scan_visitdnode (scn=0xfffffe001523dc00, 
    ds=0xfffffe0037abf400, ostype=DMU_OST_ZFS, dnp=0xffffff8003642200, 
    buf=0xfffffe00befda9c0, object=291417, tx=0xfffffe00151fc400)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:770
#14 0xffffffff81660070 in dsl_scan_visitbp (bp=0xffffff800359b900, 
    zb=0xffffff810c3d4cb0, dnp=0xfffffe0008076000, pbuf=Variable "pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:718
#15 0xffffffff8165fd99 in dsl_scan_visitbp (bp=0xffffff80033e5380, 
    zb=0xffffff810c3d4e10, dnp=0xfffffe0008076000, pbuf=Variable "pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:684
#16 0xffffffff8165fd99 in dsl_scan_visitbp (bp=0xffffff80033df000, 
    zb=0xffffff810c3d4f70, dnp=0xfffffe0008076000, pbuf=Variable "pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:684
#17 0xffffffff8165fd99 in dsl_scan_visitbp (bp=0xffffff80033db000, 
    zb=0xffffff810c3d50d0, dnp=0xfffffe0008076000, pbuf=Variable "pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:684
#18 0xffffffff8165fd99 in dsl_scan_visitbp (bp=0xffffff8003451000, 
    zb=0xffffff810c3d5230, dnp=0xfffffe0008076000, pbuf=Variable "pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:684
#19 0xffffffff8165fd99 in dsl_scan_visitbp (bp=0xffffff80033d7000, 
    zb=0xffffff810c3d5390, dnp=0xfffffe0008076000, pbuf=Variable "pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:684
#20 0xffffffff8165fd99 in dsl_scan_visitbp (bp=0xfffffe0008076040, 
    zb=0xffffff810c3d5420, dnp=0xfffffe0008076000, pbuf=Variable "pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:684
#21 0xffffffff81660c84 in dsl_scan_visitdnode (scn=0xfffffe001523dc00, 
    ds=0xfffffe0037abf400, ostype=DMU_OST_ZFS, dnp=0xfffffe0008076000, 
    buf=0xfffffe00375996e8, object=0, tx=0xfffffe00151fc400)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:770
#22 0xffffffff8165ff9a in dsl_scan_visitbp (bp=0xfffffe003729e280, 
    zb=0xffffff810c3d55f0, dnp=0x0, pbuf=Variable "pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:736
#23 0xffffffff816600d7 in dsl_scan_visit_rootbp (scn=Variable "scn" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:872
#24 0xffffffff81660172 in dsl_scan_visitds (scn=0xfffffe001523dc00, dsobj=21, 
    tx=0xfffffe00151fc400)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:1099
#25 0xffffffff81660695 in dsl_scan_sync (dp=0xfffffe0037335000, 
    tx=0xfffffe00151fc400)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:1355
#26 0xffffffff81667e30 in spa_sync (spa=0xfffffe0008161000, txg=97010)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:5711
#27 0xffffffff81678749 in txg_sync_thread (arg=Variable "arg" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:423
#28 0xffffffff808bb4cf in fork_exit (
    callout=0xffffffff81678610 <txg_sync_thread>, arg=0xfffffe0037335000, 
    frame=0xffffff810c3d5c40) at /usr/src/sys/kern/kern_fork.c:992
#29 0xffffffff80bc2e8e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:602
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000001 in ?? ()
#33 0x0000000000000000 in ?? ()
#34 0x0000000000000000 in ?? ()
#35 0x0000000000000000 in ?? ()
#36 0x0000000000000000 in ?? ()
#37 0x0000000000000000 in ?? ()
#38 0x0000000000000000 in ?? ()
#39 0x0000000000000000 in ?? ()
#40 0x0000000000000000 in ?? ()
#41 0x0000000000000000 in ?? ()
#42 0x0000000000000000 in ?? ()
#43 0x0000000000000000 in ?? ()
#44 0x0000000000000000 in ?? ()
#45 0x0000000000000000 in ?? ()
#46 0x0000000000000000 in ?? ()
#47 0x0000000000000000 in ?? ()
#48 0x0000000000000000 in ?? ()
#49 0x0000000000000000 in ?? ()
#50 0x0000000000000000 in ?? ()
#51 0x0000000000000000 in ?? ()
#52 0x0000000000000000 in ?? ()
#53 0x0000000000000000 in ?? ()
#54 0x0000000000000005 in ?? ()
#55 0xffffffff81242b00 in tdq_cpu ()
#56 0xfffffe0015e9d470 in ?? ()
#57 0x0000000000000000 in ?? ()
#58 0xffffff810c3d4580 in ?? ()
#59 0xffffff810c3d4528 in ?? ()
#60 0xfffffe00028848e0 in ?? ()
#61 0xffffffff80912fce in sched_switch (td=0xfffffe00370b1470, 
    newtd=0xfffffe0037335000, flags=Variable "flags" is not available.
) at /usr/src/sys/kern/sched_ule.c:1921
Previous frame inner to this frame (corrupt stack?)
(kgdb) 



 pool: zzzz
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub canceled on Tue Aug 21 16:53:03 2012
config:

        NAME        STATE     READ WRITE CKSUM
        zzzz      ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada9    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada5    ONLINE       0     0     0

errors: 4 data errors, use '-v' for a list

The data errors will go away if the scrub completes; it has shown that before.

And yes, here: 'zpool clear zzzz'

  pool: zzzz
 state: ONLINE
  scan: scrub canceled on Tue Aug 21 17:02:53 2012
config:

    NAME        STATE     READ WRITE CKSUM
    zzzz      ONLINE       0     0     0
      raidz2-0  ONLINE       0     0     0
        ada3    ONLINE       0     0     0
        ada7    ONLINE       0     0     0
        ada6    ONLINE       0     0     0
        ada9    ONLINE       0     0     0
        ada4    ONLINE       0     0     0
        ada2    ONLINE       0     0     0
        ada5    ONLINE       0     0     0

errors: No known data errors



The machine passes 'memtest' memory check of over 12 hours.
Bad disk ? One of the disks has command errors, but no pending
sectors to reallocate in smartctl output, and there are no disk
errors in /var/log/messages.  

Two sata port multipliers.
pmp0 at siisch0 bus 0 scbus6 target 15 lun 0
pmp0: <Port Multiplier 37261095 1706> ATA-0 device
pmp0: 300.000MB/s transfers (SATA 2.x, NONE, PIO 8192bytes)
pmp0: 5 fan-out ports

pmp1 at siisch4 bus 0 scbus10 target 15 lun 0
pmp1: <Port Multiplier 37261095 1706> ATA-0 device
pmp1: 300.000MB/s transfers (SATA 2.x, NONE, PIO 8192bytes)
pmp1: 5 fan-out ports





 		 	   		  


More information about the freebsd-fs mailing list