ZFS livelock / deadlock on pure SSD pool

Grant Gray grant at grantgray.id.au
Tue Sep 3 08:22:42 UTC 2013


I forgot to mention the device list:
<TEAC DW-224SL-R 1.0B>             at scbus0 target 0 lun 0 (pass0,cd0)
<DELL MD1000 A.04>                 at scbus3 target 39 lun 0 (pass1,ses0)
<ATA Crucial_CT960M50 MU02>        at scbus3 target 40 lun 0 (pass2,da0)
<ATA Crucial_CT960M50 MU02>        at scbus3 target 41 lun 0 (pass3,da1)
<ATA Crucial_CT960M50 MU02>        at scbus3 target 42 lun 0 (pass4,da2)
<ATA Crucial_CT960M50 MU02>        at scbus3 target 43 lun 0 (pass5,da3)
<ATA WDC WD2002FAEX-0 1D05>        at scbus3 target 44 lun 0 (pass6,da4)
<ATA WDC WD20EARX-00P AB51>        at scbus3 target 45 lun 0 (pass7,da5)
<ATA WDC WD20EARX-00P AB51>        at scbus3 target 46 lun 0 (pass8,da6)
<ATA WDC WD20EARX-00P AB51>        at scbus3 target 47 lun 0 (pass9,da7)
<ATA WDC WD20EARX-00P AB51>        at scbus3 target 48 lun 0 (pass10,da8)
<ATA WDC WD2002FAEX-0 1D05>        at scbus3 target 49 lun 0 (pass11,da9)
<ATA WDC WD20EARX-00P AB51>        at scbus3 target 50 lun 0 (pass12,da10)
<ATA WDC WD20EARX-00P AB51>        at scbus3 target 51 lun 0 (pass13,da11)
<ATA WDC WD20EARX-00P AB51>        at scbus3 target 52 lun 0 (pass14,da12)
<ATA WDC WD20EARX-00P AB51>        at scbus3 target 53 lun 0 (pass15,da13)
<ATA WDC WD20EARX-00P AB51>        at scbus3 target 54 lun 0 (pass16,da14)
<FUJITSU MBB2147RCSUN146G 0505>    at scbus4 target 0 lun 0 (pass17,da15)
<FUJITSU MBB2147RCSUN146G 0505>    at scbus4 target 1 lun 0 (pass18,da16)
<ATA INTEL SSDSC2CW12 400i>        at scbus4 target 2 lun 0 (pass19,da17)
<ATA INTEL SSDSC2CW12 400i>        at scbus4 target 3 lun 0 (pass20,da18)

On 09/03/2013 06:11 PM, Grant Gray wrote:
> Hello All,
>
> I have been experiencing a ZFS livelock on a 9.1 system since 
> introducing pools containing only SSDs. The livelock occurs typically 
> every 1-2 days, sometimes as much as twice a day.
>
> ZFS filesystems:
> http://pastebin.com/raw.php?i=svTZRd7m
>
> The pool configuration is as follows:
> http://pastebin.com/raw.php?i=KAdSGWu4
>
> /boot/loader.conf:
> http://pastebin.com/raw.php?i=J1cZNPjS
> <http://pastebin.com/raw.php?i=J1cZNPjS>
> There were a couple of livelock issues associated with 9.1 (one in 
> ZFS, one in CAM) that prompted an upgrade to 9.2RC2 and then to 
> 9.2RC3, however the problem persists. When the system has locked, it 
> can still be pinged and socket connections can be made (SSH begins 
> handshake for example, but doesn't get as far as prompting for password).
>
> Some details:
> * Regular (hourly, daily, weekly) rolling snapshots via zfs-snapshot,
> * Regular (hourly) cron jobs that traverse at least one filesystem of 
> tens of thousands of files,
> * NFS exports of some ZFS filesystems,
> * iSCSI exports via istgt of zvols,
> * Host controller is LSI 3801E (IT) with latest firmware,
> * Storage array is Dell MD1000 with latest firmware,
> * Host system is Sun X4200 M2 w/32GB RAM, 2 x dual core Opterons,
> * SSDs (4 of) are Crucial M500 960GB in two mirrored pools (san1 & san2).
>
>
> I haven't yet enabled the kernel debugger to get a stack trace/lock 
> status, but procstat -kk -a is here:
> http://pastebin.com/raw.php?i=SYhmyhGj
>
> Once livelock occurs, any ZFS command hangs, and it appears any 
> command that doesn't happen to be in cache may also hang.
>
> Any suggestions are warmly welcomed!
>
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"



More information about the freebsd-fs mailing list