panic: solaris assert: bpobj_iterate(&spa->spa_deferred_bpobj, spa_free_sync_cb, zio, tx) == 0 (0x6 == 0x0), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c, line: 6156

Sun Sep 21 14:10:36 UTC 2014

Two days ago a power outage took out a zpool but not the laptop
it was attached to. This resulted in:

Sep 19 22:50:58 r500 kernel: [41317] ugen1.2: <Intenso> at usbus1 (disconnected)
Sep 19 22:50:58 r500 kernel: [41317] umass0: at uhub1, port 2, addr 2 (disconnected)
Sep 19 22:50:58 r500 kernel: [41317] da0 at umass-sim0 bus 0 scbus2 target 0 lun 0
Sep 19 22:50:58 r500 kernel: [41317] da0: <  > detached
Sep 19 22:50:58 r500 kernel: [41317] pass2 at umass-sim0 bus 0 scbus2 target 0 lun 0
Sep 19 22:50:58 r500 kernel: [41317] pass2: <  > detached
Sep 19 22:50:58 r500 kernel: [41317] (pass2:umass-sim0:0:0:0): Periph destroyed
Sep 19 22:50:58 r500 kernel: [41317] GEOM_ELI: Device label/intenso1.eli destroyed.
Sep 19 22:50:58 r500 kernel: [41317] GEOM_ELI: Detached label/intenso1.eli on last close.
Sep 19 22:50:58 r500 kernel: [41317] (da0:umass-sim0:0:0:0): Periph destroyed
Sep 19 22:50:58 r500 ZFS: vdev is removed, pool_guid=13312956307733420090 vdev_guid=11021414854688829035
[...]
Sep 19 22:50:58 r500 kernel: [41318] system power profile changed to 'economy'
Sep 19 22:50:58 r500 kernel: [41318] acpi_acad0: Off Line
Sep 19 22:50:59 r500 power_profile: changed to 'economy'

Followed by a panic:

(kgdb) where
#0  doadump (textdump=0) at pcpu.h:219
#1  0xffffffff8030eeae in db_dump (dummy=<value optimized out>, dummy2=0, dummy3=0, dummy4=0x0) at /usr/src/sys/ddb/db_command.c:543
#2  0xffffffff8030e98d in db_command (cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:449
#3  0xffffffff8030e704 in db_command_loop () at /usr/src/sys/ddb/db_command.c:502
#4  0xffffffff80311160 in db_trap (type=<value optimized out>, code=0) at /usr/src/sys/ddb/db_main.c:231
#5  0xffffffff805d7bc1 in kdb_trap (type=3, code=0, tf=<value optimized out>) at /usr/src/sys/kern/subr_kdb.c:654
#6  0xffffffff8085ab67 in trap (frame=0xfffffe00955ed850) at /usr/src/sys/amd64/amd64/trap.c:542
#7  0xffffffff8083eef2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231
#8  0xffffffff805d72be in kdb_enter (why=0xffffffff8095b0cd "panic", msg=<value optimized out>) at cpufunc.h:63
#9  0xffffffff80597d01 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:739
#10 0xffffffff8133d22f in assfail3 (a=<value optimized out>, lv=<value optimized out>, op=<value optimized out>, rv=<value optimized out>, f=<value optimized out>, l=<value optimized out>)
    at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91
#11 0xffffffff811477f8 in spa_sync (spa=0xfffff8005b727000, txg=69362) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:6155
#12 0xffffffff81150ed6 in txg_sync_thread (arg=0xfffff8000291a000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:517
#13 0xffffffff8055e4fa in fork_exit (callout=0xffffffff81150b30 <txg_sync_thread>, arg=0xfffff8000291a000, frame=0xfffffe00955edc00) at /usr/src/sys/kern/kern_fork.c:977
#14 0xffffffff8083f42e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:605
#15 0x0000000000000000 in ?? ()

The kernel is based on FreeBSD 11.0-CURRENT r271788.

Later on another power outage took out the pool again,
but this time it was just faulted as expected.

The pool is:

fk at r500 ~ $zpool status intenso1
  pool: intenso1
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub in progress since Fri Sep 19 23:07:49 2014
        400G scanned out of 941G at 29.3M/s, 5h15m to go
        0 repaired, 42.48% done
config:

	NAME                  STATE     READ WRITE CKSUM
	intenso1              ONLINE       0     0     0
	  label/intenso1.eli  ONLINE       0     0     0

errors: 8 data errors, use '-v' for a list

Once the scrub is complete, I expect the "data errors" to be gone
as they are merely the result of temporary read errors after the
second outage. Apparently those aren't properly handled with the
given pool layout, but that's another issue.

Fabian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20140921/3112992a/attachment.sig>