ZFS livelock / deadlock on pure SSD pool

Grant Gray grant at grantgray.id.au
Tue Sep 3 08:20:44 UTC 2013


Hello All,

I have been experiencing a ZFS livelock on a 9.1 system since 
introducing pools containing only SSDs. The livelock occurs typically 
every 1-2 days, sometimes as much as twice a day.

ZFS filesystems:
http://pastebin.com/raw.php?i=svTZRd7m

The pool configuration is as follows:
http://pastebin.com/raw.php?i=KAdSGWu4

/boot/loader.conf:
http://pastebin.com/raw.php?i=J1cZNPjS
<http://pastebin.com/raw.php?i=J1cZNPjS>
There were a couple of livelock issues associated with 9.1 (one in ZFS, 
one in CAM) that prompted an upgrade to 9.2RC2 and then to 9.2RC3, 
however the problem persists. When the system has locked, it can still 
be pinged and socket connections can be made (SSH begins handshake for 
example, but doesn't get as far as prompting for password).

Some details:
* Regular (hourly, daily, weekly) rolling snapshots via zfs-snapshot,
* Regular (hourly) cron jobs that traverse at least one filesystem of 
tens of thousands of files,
* NFS exports of some ZFS filesystems,
* iSCSI exports via istgt of zvols,
* Host controller is LSI 3801E (IT) with latest firmware,
* Storage array is Dell MD1000 with latest firmware,
* Host system is Sun X4200 M2 w/32GB RAM, 2 x dual core Opterons,
* SSDs (4 of) are Crucial M500 960GB in two mirrored pools (san1 & san2).


I haven't yet enabled the kernel debugger to get a stack trace/lock 
status, but procstat -kk -a is here:
http://pastebin.com/raw.php?i=SYhmyhGj

Once livelock occurs, any ZFS command hangs, and it appears any command 
that doesn't happen to be in cache may also hang.

Any suggestions are warmly welcomed!



More information about the freebsd-fs mailing list