ZFS v28: mount dataset hangs

Tue Dec 6 12:20:00 UTC 2011

On Thursday 06 October 2011 at 16:04:30 Sergey Gavrilov wrote:
> Hello, all.
> 
> After "chown -R" hanging and reboot I cannot mount one of the
> datasets, while other pools work well.
> CTRL-T shows state [tx->tx_sync_done_cv)]
> 
> Hardware: 3ware 9690SA-8I, 8 SEAGATE ST32000445SS in raidz2, 48 Gb
> RAM, 2 E5520 CPU
> Software: FreeBSD 8.2-STABLE #3: Wed Oct  5 18:04:50 MSD 2011
> /usr/obj/usr/src/sys/GENERIC amd64, ZFS V28 compression=on
> 
> I can clone and mount any snapshot of the dataset, but not itself.
> No errors in zpool status or system messages. Scrub has finished clear.
> 
> Let me know if I could provide any additional information to solve the
> issue.

Hello.

Same problem here.

raidz2 using 4 disks (made with fake 4k sector size using gnop for possible 
future replacement by native 4k disks) with dedup, lzjb and NTFSv4 ACL which 
is shared over samba3.6 for windows clients. System freezed (even local log-in 
via ipmi didn't work) during flush of recycle (trash) over smb from windows 
station on one of datasets (a lot of files were being removed). After hard 
reset via ipmi, zfs hangs on mount of that dataset under 8.2 and 9.0-RC2 (but 
"zfs set mountpoint" works well). Scrubs under 8.2 and 9.0-RC2 did not help 
and found nothing, other datasets can be mounted without problems.

Storage consists of 2 seagate disks (ST3160318AS CC38) with ufs for system, 4 
WD (WD10EALX-009BA0 15.01H15) for zfs share and Qlogic 2432 FC adapter which 
is connected to HP tape library via FC-switch for bacula backups.

All WD disks were checked with smart long self-test, went without errors, also 
no reallocated sectors (and events) in smart output. here is example for one 
of the disks:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       
-       0
  3 Spin_Up_Time            0x0027   177   175   021    Pre-fail  Always       
-       4108
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       
-       60
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       
-       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       
-       0
  9 Power_On_Hours          0x0032   092   092   000    Old_age   Always       
-       6183
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       
-       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       
-       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       
-       58
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       
-       47
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       
-       12
194 Temperature_Celsius     0x0022   122   110   000    Old_age   Always       
-       25
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       
-       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       
-       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       
0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       
-       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       
85

Here is some other output:
---
# zfs mount data/storage/public
load: 0.15  cmd: zfs 1352 [tx->tx_sync_done_cv)] 348.58r 0.00u 0.92s 0% 2296k

# procstat -kk 1352
  PID    TID COMM             TDNAME           KSTACK                       
 1352 100081 zfs              initial thread   mi_switch+0x176 
sleepq_wait+0x42 _cv_wait+0x129 txg_wait_synced+0x85 zil_replay+0x10a 
zfsvfs_setup+0x117 zfs_mount+0x52f vfs_donmount+0xdc5 nmount+0x63 
amd64_syscall+0x1f4 Xfast_syscall+0xfc
---
# zfs snapshot data/storage/public at test
load: 0.02  cmd: zfs 1382 [tx->tx_sync_done_cv)] 4.34r 0.00u 0.00s 0% 2280k
load: 0.09  cmd: zfs 1382 [tx->tx_sync_done_cv)] 19.13r 0.00u 0.00s 0% 2280k
load: 0.01  cmd: zfs 1352 [tx->tx_sync_done_cv)] 611.74r 0.00u 0.92s 0% 2296k

# procstat -kk 1382
  PID    TID COMM             TDNAME           KSTACK                       
 1382 100295 zfs              initial thread   mi_switch+0x176 
sleepq_wait+0x42 _cv_wait+0x129 txg_wait_synced+0x85 
dsl_sync_task_group_wait+0x128 dmu_objset_snapshot+0x302 
zfs_ioc_snapshot+0x1a8 zfsdev_ioctl+0xe6 devfs_ioctl_f+0x7b kern_ioctl+0x102 
ioctl+0xfd amd64_syscall+0x1f4 Xfast_syscall+0xfc
---
# procstat -kk 6
  PID    TID COMM             TDNAME           KSTACK                       
    6 100058 zfskern          arc_reclaim_thre mi_switch+0x176 
sleepq_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x29d 
fork_exit+0x11f fork_trampoline+0xe 
    6 100059 zfskern          l2arc_feed_threa mi_switch+0x176 
sleepq_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1a8 
fork_exit+0x11f fork_trampoline+0xe 
    6 100279 zfskern          txg_thread_enter mi_switch+0x176 
sleepq_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 
fork_exit+0x11f fork_trampoline+0xe 
    6 100280 zfskern          txg_thread_enter mi_switch+0x176 
sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x5e5 
dmu_buf_hold+0xe0 zap_idx_to_blk+0xa3 zap_deref_leaf+0x4f fzap_remove+0x33 
zap_remove_uint64+0x85 ddt_sync+0x267 spa_sync+0x383 txg_sync_thread+0x139 
fork_exit+0x11f fork_trampoline+0xe
---

And here is some interesting screenshot of output from 9.0-RC2 livecd shell, 
which leads to beleive that there was swap exhaustion while zfs mount were 
freezed: http://imageshack.us/photo/my-images/64/90rc2zfsmounthangswap.gif/ 
(didn't get this on 8.2 and 9.0-RC1 though didn't specially try to let system 
be in freeze for a long time).

Hardware is supermicro X8SIA-F, core i3, 4Gb of memory.

releng8 r226341 amd64 (with e1000 from 8.2-release for another reason), custom 
kernel with config:
include         GENERIC
ident             IPFW_FWD
options         IPFIREWALL
options         IPFIREWALL_DEFAULT_TO_ACCEPT
options         IPFIREWALL_FORWARD

ZFS-related tuning in loader.conf is only following line:
vfs.zfs.arc_max="1073741824"

I have couple of days for debugging (i plan to try releng9 today or tomorrow 
because i can see some zfs commits there after RC2), after which i'll have to 
move this machine to production with zfs or without it.

If you need something else, let me know, i've subscribed to the list, so there 
is no need to CC me.

-- 
ISP SiberiaNet
System and Network Administrator