ZFS inresponsive.

Fabian Keil freebsd-listen at fabiankeil.de
Sun Oct 31 09:56:33 UTC 2010


Peter Ankerstål <peter at pean.org> wrote:

> I accidentally aborted a zfs send mid-tranfer.
> 
> Now the receiveing side cant do anything with the affected filesystem. both zfs list and zfs destroy hangs.
> How can I fix this? I can list other filesystems without a problem but a general list or any command concerning
> the one the got broken in transer hangs the zfs command.

I occasionally see this, too, when sending snapshots to zpools on
USB disks using geli. If the device gets lost in transfer, general
zfs and zpool commands hang:

Oct 18 17:09:28 r500 sudo:       fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/geli attach -k /home/fk/geli-keys/poppstar.key /dev/label/poppstar
Oct 18 17:09:34 r500 sudo:       fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zpool import poppstar
Oct 18 17:09:34 r500 kernel: GEOM_ELI: Device label/poppstar.eli created.
Oct 18 17:09:34 r500 kernel: GEOM_ELI: Encryption: AES-CBC 128
Oct 18 17:09:34 r500 kernel: GEOM_ELI:     Crypto: software
Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): Requesting SCSI sense data
Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 
Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Status Error
Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Condition
Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: NOT READY asc:3a,1 (Medium not present - tray closed)
Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): Error 6, Unretryable error
Oct 18 17:09:58 r500 sudo:       fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs create poppstar/dvds/sliders
Oct 18 17:10:32 r500 sudo:       fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs send wde2/dvds/sliders/season-1-2 at 2009-12-05
Oct 18 17:10:32 r500 sudo:       fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs receive -v poppstar/dvds/sliders/season-1-2
Oct 18 17:38:06 r500 sudo:       fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs receive -v poppstar/dvds/sliders/season-3
Oct 18 17:38:06 r500 sudo:       fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs send wde2/dvds/sliders/season-3 at 2009-03-28
Oct 18 17:39:47 r500 kernel: acpi_lid0: Lid closed
Oct 18 18:13:19 r500 kernel: acpi_lid0: Lid opened
Oct 18 18:22:56 r500 sudo:       fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs send wde2/dvds/sliders/season-4 at 2009-12-05
Oct 18 18:22:56 r500 sudo:       fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs receive -v poppstar/dvds/sliders/season-4
Oct 18 18:33:05 r500 kernel: ugen3.3: <Generic> at usbus3 (disconnected)
Oct 18 18:33:05 r500 kernel: umass2: at uhub3, port 1, addr 3 (disconnected)
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): SCSI status error
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Requesting SCSI sense data
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): AutoSense failed
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Error 5, Unretryable error
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362645020672, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362645151744, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362645282816, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362645413888, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362645544960, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362645676032, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362645807104, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362645938176, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362646069248, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362644889600, length=131072)]
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Selection timeout
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Retrying command
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Selection timeout
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Retrying command
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Selection timeout
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Retrying command
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Selection timeout
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Retrying command
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): lost device
Oct 18 18:33:05 r500 kernel: (pass4:umass-sim2:2:0:0): lost device
Oct 18 18:33:05 r500 kernel: (pass4:umass-sim2:2:0:0): removing device entry
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Error 6, Unretryable error
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0):G EOM_ELIInvalidating pack: 
Oct 18 18:33:05 r500 kernel: g_eli_read_done() failed label/poppstar.eli[READ(offset=2000398327808, length=8192)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: g_eli_read_done() failed label/poppstar.eli[READ(offset=270336, length=8192)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: g_eli_read_done() failed label/poppstar.eli[READ(offset=2000398589952, length=8192)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362646200320, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362646331392, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362646593536, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362646724608, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362646462464, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362646855680, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362646986752, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362647117824, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362647248896, length=131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362647379968, length=131072)]
Oct 18 18:33:05 r500 root: ZFS: vdev I/O failure, zpool=poppstar path=/dev/label/poppstar.eli offset=2000398327808 size=8192 error=6
Oct 18 18:33:05 r500 root: ZFS: vdev I/O failure, zpool=poppstar path=/dev/label/poppstar.eli offset=270336 size=8192 error=6
Oct 18 18:33:05 r500 root: ZFS: vdev I/O failure, zpool=poppstar path=/dev/label/poppstar.eli offset=2000398589952 size=8192 error=6
Oct 18 18:33:05 r500 root: ZFS: zpool I/O failure, zpool=poppstar error=6
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Synchronize cache failed, status == 0xa, scsi status == 0x0
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): removing device entry
Oct 18 18:33:05 r500 root: ZFS: zpool I/O failure, zpool=poppstar error=6
Oct 18 18:33:05 r500 last message repeated 151 times
Oct 18 18:33:05 r500 root: ZFS: zpool I/O failure, zpool=poppstar error=28
Oct 18 18:33:06 r500 last message repeated 161 times
Oct 18 18:33:06 r500 root: ZFS: vdev I/O failure, zpool=poppstar path= offset= size= error=
Oct 18 18:33:06 r500 root: Unknown USB device: vendor 0x13fd product 0x1240 bus uhub3
Oct 18 18:33:06 r500 kernel: ugen3.3: <Generic> at usbus3
Oct 18 18:33:06 r500 kernel: umass2: <Generic External, class 0/0, rev 2.00/2.10, addr 3> on usbus3
Oct 18 18:33:06 r500 kernel: umass2:  SCSI over Bulk-Only; quirks = 0x0000
Oct 18 18:33:08 r500 kernel: umass2:4:2:-1: Attached to scbus4
Oct 18 18:33:08 r500 kernel: pass4 at umass-sim2 bus 2 scbus4 target 0 lun 0
Oct 18 18:33:08 r500 kernel: pass4: <Generic External 2.10> Fixed Direct Access SCSI-4 device 
Oct 18 18:33:08 r500 kernel: pass4: Serial Number 395857304652454620202020
Oct 18 18:33:08 r500 kernel: pass4: 40.000MB/s transfers
Oct 18 18:33:08 r500 kernel: GEOM: new disk da2
Oct 18 18:33:08 r500 kernel: da2 at umass-sim2 bus 2 scbus4 target 0 lun 0
Oct 18 18:33:08 r500 kernel: da2: <Generic External 2.10> Fixed Direct Access SCSI-4 device 
Oct 18 18:33:08 r500 kernel: da2: Serial Number 395857304652454620202020
Oct 18 18:33:08 r500 kernel: da2: 40.000MB/s transfers
Oct 18 18:33:08 r500 kernel: da2: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
Oct 18 18:34:02 r500 sudo:       fk : TTY=pts/14 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zpool list
Oct 18 18:34:02 r500 root: ZFS: zpool I/O failure, zpool=poppstar error=6
Oct 18 18:34:02 r500 root: ZFS: vdev I/O failure, zpool=poppstar path= offset= size= error=
Oct 18 18:36:34 r500 kernel: ugen3.3: <Generic> at usbus3 (disconnected)
Oct 18 18:36:34 r500 kernel: umass2: at uhub3, port 1, addr 3 (disconnected)
Oct 18 18:36:34 r500 kernel: (da2:umass-sim2:2:0:0): lost device
Oct 18 18:36:34 r500 kernel: (da2:umass-sim2:2:0:0): removing device entry
Oct 18 18:36:34 r500 kernel: (pass4:umass-sim2:2:0:0): lost device
Oct 18 18:36:34 r500 kernel: (pass4:umass-sim2:2:0:0): removing device entry
Oct 18 18:36:44 r500 root: Unknown USB device: vendor 0x13fd product 0x1240 bus uhub3
Oct 18 18:36:44 r500 kernel: ugen3.3: <Generic> at usbus3
Oct 18 18:36:44 r500 kernel: umass2: <Generic External, class 0/0, rev 2.00/2.10, addr 3> on usbus3
Oct 18 18:36:44 r500 kernel: umass2:  SCSI over Bulk-Only; quirks = 0x0000
Oct 18 18:36:46 r500 kernel: umass2:4:2:-1: Attached to scbus4
Oct 18 18:36:46 r500 kernel: pass4 at umass-sim2 bus 2 scbus4 target 0 lun 0
Oct 18 18:36:46 r500 kernel: pass4: <Generic External 2.10> Fixed Direct Access SCSI-4 device 
Oct 18 18:36:46 r500 kernel: pass4: Serial Number 395857304652454620202020
Oct 18 18:36:46 r500 kernel: pass4: 40.000MB/s transfers
Oct 18 18:36:46 r500 kernel: GEOM: new disk da2
Oct 18 18:36:46 r500 kernel: da2 at umass-sim2 bus 2 scbus4 target 0 lun 0
Oct 18 18:36:46 r500 kernel: da2: <Generic External 2.10> Fixed Direct Access SCSI-4 device 
Oct 18 18:36:46 r500 kernel: da2: Serial Number 395857304652454620202020
Oct 18 18:36:46 r500 kernel: da2: 40.000MB/s transfers
Oct 18 18:36:46 r500 kernel: da2: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
Oct 18 18:36:46 r500 kernel: ugen3.3: <Generic> at usbus3 (disconnected)
Oct 18 18:36:46 r500 kernel: umass2: at uhub3, port 1, addr 3 (disconnected)
Oct 18 18:36:46 r500 kernel: ugen7.3: <Western Digital> at usbus7 (disconnected)
Oct 18 18:36:46 r500 kernel: umass1: at uhub7, port 1, addr 3 (disconnected)
Oct 18 18:36:46 r500 kernel: (da2:umass-sim2:2:0:0): lost device
Oct 18 18:36:46 r500 kernel: (da2:umass-sim2:2:0:0): removing device entry
Oct 18 18:36:46 r500 kernel: (pass4:umass-sim2:2:0:0): lost device
Oct 18 18:36:46 r500 kernel: (pass4:umass-sim2:2:0:0): removing device entry
Oct 18 18:36:46 r500 kernel: (da1:umass-sim1:1:0:0): lost device
Oct 18 18:36:46 r500 kernel: (pass3:umass-sim1:1:0:0): lost device
Oct 18 18:36:46 r500 kernel: (pass3:umass-sim1:1:0:0): removing device entry
Oct 18 18:36:46 r500 kernel: (da1:umass-sim1:1:0:0): Synchronize cache failed, status == 0xa, scsi status == 0x0
Oct 18 18:36:46 r500 kernel: (da1:umass-sim1:1:0:0): removing device entry
Oct 18 18:36:46 r500 kernel: ugen7.2: <Western Digital> at usbus7 (disconnected)
Oct 18 18:36:46 r500 kernel: umass0: at uhub7, port 2, addr 2 (disconnected)
Oct 18 18:36:46 r500 kernel: (da0:umass-sim0:0:0:0): lost device
Oct 18 18:36:46 r500 kernel: (pass2:umass-sim0:0:0:0): lost device
Oct 18 18:36:46 r500 kernel: (pass2:umass-sim0:0:0:0): removing device entry
Oct 18 18:36:46 r500 kernel: (da0:umass-sim0:0:0:0): Synchronize cache failed, status == 0xa, scsi status == 0x0
Oct 18 18:36:46 r500 kernel: (da0:umass-sim0:0:0:0): removing device entry
Oct 18 18:36:54 r500 root: ZFS: zpool I/O failure, zpool=wde2 error=28
Oct 18 18:36:54 r500 last message repeated 3 times
Oct 18 18:36:54 r500 root: ZFS: vdev I/O failure, zpool=wde2 path= offset= size= error=
Oct 18 18:36:54 r500 root: ZFS: zpool I/O failure, zpool=wde3 error=28
Oct 18 18:36:54 r500 last message repeated 3 times
Oct 18 18:36:54 r500 root: ZFS: vdev I/O failure, zpool=wde3 path= offset= size= error=

fk at r500 ~ $ps waux | grep \ [z]
root     5501   0.0  0.1  13364   1288   9  I+    6:22PM   0:00.01 sudo zfs receive -v poppstar/dvds/sliders/season-4
root     5502   0.0  0.1  18944   1312   9  D+    6:22PM   0:21.73 zfs receive -v poppstar/dvds/sliders/season-4
fk       4711   0.0  0.1  19980   1364  11  D+    5:10PM   0:01.03 zpool iostat poppstar 1
root     5947   0.0  0.1  13364   1728  14  I+    6:34PM   0:00.01 sudo zpool list
root     5948   0.0  0.1  19980   1760  14  D+    6:34PM   0:00.01 zpool list
fk       5990   0.0  0.1  19980   1744  16  D+    6:36PM   0:00.01 zpool list -H -o name

fk at r500 ~ $sudo procstat -kk `pgrep zfs`
  PID    TID COMM             TDNAME           KSTACK                       
 5502 100519 zfs              -                mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_wait_synced+0x7c dmu_tx_assign+0x16c dmu_recv_stream+0x125f zfs_ioc_recv+0x282 zfsdev_ioctl+0x8f devfs_ioctl_f+0x7b kern_ioctl+0x102 ioctl+0xfd syscallenter+0x331 syscall+0x4b Xfast_syscall+0xe2 

fk at r500 ~ $sudo procstat -kk `pgrep zpool`
  PID    TID COMM             TDNAME           KSTACK                       
 5990 100507 zpool            -                mi_switch+0x176 sleepq_wait+0x42 _sx_xlock_hard+0x3c2 _sx_xlock+0x6e spa_all_configs+0xbf zfs_ioc_pool_configs+0x29 zfsdev_ioctl+0x8f devfs_ioctl_f+0x7b kern_ioctl+0x102 ioctl+0xfd syscallenter+0x331 syscall+0x4b Xfast_syscall+0xe2 
 5948 100292 zpool            -                mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x39a dnode_hold_impl+0xf0 dmu_buf_hold+0x34 zap_lockdir+0x52 zap_cursor_retrieve+0x194 spa_prop_get+0x26f zfs_ioc_pool_get_props+0xb2 zfsdev_ioctl+0x8f devfs_ioctl_f+0x7b kern_ioctl+0x102 ioctl+0xfd syscallenter+0x331 syscall+0x4b Xfast_syscall+0xe2 
 4711 100269 zpool            -                mi_switch+0x176 sleepq_wait+0x42 _sx_xlock_hard+0x3c2 _sx_xlock+0x6e spa_open_common+0x76 spa_get_stats+0x42 zfs_ioc_pool_stats+0x2c zfsdev_ioctl+0x8f devfs_ioctl_f+0x7b kern_ioctl+0x102 ioctl+0xfd syscallenter+0x331 syscall+0x4b Xfast_syscall+0xe2 

At the time I've been using:
FreeBSD r500.local 9.0-CURRENT FreeBSD 9.0-CURRENT #193 r+3be3a96: Sun Oct 17 12:51:52 CEST 2010     fk at r500.local:/usr/obj/usr/src/sys/ZOEY  amd64
with ZFS pool version 15, but I saw the problem with earlier
FreeBSD and ZFS versions, too.

Fabian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20101031/8c23c8e0/signature.pgp


More information about the freebsd-fs mailing list