ZFS v28: mount dataset hangs

Fri Dec 16 19:49:31 UTC 2011

On Tuesday 06 December 2011 at 20:04:03 Sergey Lobanov wrote:
> On Thursday 06 October 2011 at 16:04:30 Sergey Gavrilov wrote:
> > Hello, all.
> > 
> > After "chown -R" hanging and reboot I cannot mount one of the
> > datasets, while other pools work well.
> > CTRL-T shows state [tx->tx_sync_done_cv)]
> > 
> > Hardware: 3ware 9690SA-8I, 8 SEAGATE ST32000445SS in raidz2, 48 Gb
> > RAM, 2 E5520 CPU
> > Software: FreeBSD 8.2-STABLE #3: Wed Oct  5 18:04:50 MSD 2011
> > /usr/obj/usr/src/sys/GENERIC amd64, ZFS V28 compression=on
> > 
> > I can clone and mount any snapshot of the dataset, but not itself.
> > No errors in zpool status or system messages. Scrub has finished clear.
> > 
> > Let me know if I could provide any additional information to solve the
> > issue.
> 
> Hello.
> 
> Same problem here.
> 
> raidz2 using 4 disks (made with fake 4k sector size using gnop for possible
> future replacement by native 4k disks) with dedup, lzjb and NTFSv4 ACL
> which is shared over samba3.6 for windows clients. System freezed (even
> local log-in via ipmi didn't work) during flush of recycle (trash) over
> smb from windows station on one of datasets (a lot of files were being
> removed). After hard reset via ipmi, zfs hangs on mount of that dataset
> under 8.2 and 9.0-RC2 (but "zfs set mountpoint" works well). Scrubs under
> 8.2 and 9.0-RC2 did not help and found nothing, other datasets can be
> mounted without problems.
> 
> Storage consists of 2 seagate disks (ST3160318AS CC38) with ufs for system,
> 4 WD (WD10EALX-009BA0 15.01H15) for zfs share and Qlogic 2432 FC adapter
> which is connected to HP tape library via FC-switch for bacula backups.
> 
> All WD disks were checked with smart long self-test, went without errors,
> also no reallocated sectors (and events) in smart output. here is example
> for one of the disks:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
> WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always
> -       0
>   3 Spin_Up_Time            0x0027   177   175   021    Pre-fail  Always
> -       4108
>   4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always
> -       60
>   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always
> -       0
>   7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always
> -       0
>   9 Power_On_Hours          0x0032   092   092   000    Old_age   Always
> -       6183
>  10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always
> -       0
>  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always
> -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always
> -       58
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always
> -       47
> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always
> -       12
> 194 Temperature_Celsius     0x0022   122   110   000    Old_age   Always
> -       25
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always
> -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always
> -       0
> 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline  
>    - 0
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always
> -       0
> 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline  
>    - 85
> 
> Here is some other output:
> ---
> # zfs mount data/storage/public
> load: 0.15  cmd: zfs 1352 [tx->tx_sync_done_cv)] 348.58r 0.00u 0.92s 0%
> 2296k
> 
> # procstat -kk 1352
>   PID    TID COMM             TDNAME           KSTACK
>  1352 100081 zfs              initial thread   mi_switch+0x176
> sleepq_wait+0x42 _cv_wait+0x129 txg_wait_synced+0x85 zil_replay+0x10a
> zfsvfs_setup+0x117 zfs_mount+0x52f vfs_donmount+0xdc5 nmount+0x63
> amd64_syscall+0x1f4 Xfast_syscall+0xfc
> ---
> # zfs snapshot data/storage/public at test
> load: 0.02  cmd: zfs 1382 [tx->tx_sync_done_cv)] 4.34r 0.00u 0.00s 0% 2280k
> load: 0.09  cmd: zfs 1382 [tx->tx_sync_done_cv)] 19.13r 0.00u 0.00s 0%
> 2280k load: 0.01  cmd: zfs 1352 [tx->tx_sync_done_cv)] 611.74r 0.00u 0.92s
> 0% 2296k
> 
> # procstat -kk 1382
>   PID    TID COMM             TDNAME           KSTACK
>  1382 100295 zfs              initial thread   mi_switch+0x176
> sleepq_wait+0x42 _cv_wait+0x129 txg_wait_synced+0x85
> dsl_sync_task_group_wait+0x128 dmu_objset_snapshot+0x302
> zfs_ioc_snapshot+0x1a8 zfsdev_ioctl+0xe6 devfs_ioctl_f+0x7b
> kern_ioctl+0x102 ioctl+0xfd amd64_syscall+0x1f4 Xfast_syscall+0xfc
> ---
> # procstat -kk 6
>   PID    TID COMM             TDNAME           KSTACK
>     6 100058 zfskern          arc_reclaim_thre mi_switch+0x176
> sleepq_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x29d
> fork_exit+0x11f fork_trampoline+0xe
>     6 100059 zfskern          l2arc_feed_threa mi_switch+0x176
> sleepq_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1a8
> fork_exit+0x11f fork_trampoline+0xe
>     6 100279 zfskern          txg_thread_enter mi_switch+0x176
> sleepq_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79
> txg_quiesce_thread+0xb5 fork_exit+0x11f fork_trampoline+0xe
>     6 100280 zfskern          txg_thread_enter mi_switch+0x176
> sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x5e5
> dmu_buf_hold+0xe0 zap_idx_to_blk+0xa3 zap_deref_leaf+0x4f fzap_remove+0x33
> zap_remove_uint64+0x85 ddt_sync+0x267 spa_sync+0x383 txg_sync_thread+0x139
> fork_exit+0x11f fork_trampoline+0xe
> ---
> 
> And here is some interesting screenshot of output from 9.0-RC2 livecd
> shell, which leads to beleive that there was swap exhaustion while zfs
> mount were freezed:
> http://imageshack.us/photo/my-images/64/90rc2zfsmounthangswap.gif/ (didn't
> get this on 8.2 and 9.0-RC1 though didn't specially try to let system be
> in freeze for a long time).
> 
> Hardware is supermicro X8SIA-F, core i3, 4Gb of memory.
> 
> releng8 r226341 amd64 (with e1000 from 8.2-release for another reason),
> custom kernel with config:
> include         GENERIC
> ident             IPFW_FWD
> options         IPFIREWALL
> options         IPFIREWALL_DEFAULT_TO_ACCEPT
> options         IPFIREWALL_FORWARD
> 
> ZFS-related tuning in loader.conf is only following line:
> vfs.zfs.arc_max="1073741824"
> 
> I have couple of days for debugging (i plan to try releng9 today or
> tomorrow because i can see some zfs commits there after RC2),

releng9 r228308 successfully mounted filesystem after some backstage work.
I've typed "zfs mount", then was waiting for few minutes during which there 
was some work being done (top was showing zfs threads eating little amount of 
cpu time and gstat was showing some reads and writes for all of zfs disks), 
then didn't watch for some time and after 30 minutes or so, job was already 
done.

> after which
> i'll have to move this machine to production with zfs or without it.

The system is in production on releng9 already a week without any issues. But 
i had to give up on dedup and recreate the pool from scratch, because system 
locked up on simultaneous zfs destroy of dedup filesystem and zfs destroy of 
another non-dedup filesystem snapshot, and after hard reset did not come up on 
any of releng9, mfsbsd8.2+zfsv28 or oi151a, eating all memory even on zpool 
status or zfs list.

> 
> If you need something else, let me know, i've subscribed to the list, so
> there is no need to CC me.

-- 
ISP SiberiaNet
System and Network Administrator