ZFS livelock / deadlock on pure SSD pool
Grant Gray
grant at grantgray.id.au
Tue Sep 3 08:20:44 UTC 2013
Hello All,
I have been experiencing a ZFS livelock on a 9.1 system since
introducing pools containing only SSDs. The livelock occurs typically
every 1-2 days, sometimes as much as twice a day.
ZFS filesystems:
http://pastebin.com/raw.php?i=svTZRd7m
The pool configuration is as follows:
http://pastebin.com/raw.php?i=KAdSGWu4
/boot/loader.conf:
http://pastebin.com/raw.php?i=J1cZNPjS
<http://pastebin.com/raw.php?i=J1cZNPjS>
There were a couple of livelock issues associated with 9.1 (one in ZFS,
one in CAM) that prompted an upgrade to 9.2RC2 and then to 9.2RC3,
however the problem persists. When the system has locked, it can still
be pinged and socket connections can be made (SSH begins handshake for
example, but doesn't get as far as prompting for password).
Some details:
* Regular (hourly, daily, weekly) rolling snapshots via zfs-snapshot,
* Regular (hourly) cron jobs that traverse at least one filesystem of
tens of thousands of files,
* NFS exports of some ZFS filesystems,
* iSCSI exports via istgt of zvols,
* Host controller is LSI 3801E (IT) with latest firmware,
* Storage array is Dell MD1000 with latest firmware,
* Host system is Sun X4200 M2 w/32GB RAM, 2 x dual core Opterons,
* SSDs (4 of) are Crucial M500 960GB in two mirrored pools (san1 & san2).
I haven't yet enabled the kernel debugger to get a stack trace/lock
status, but procstat -kk -a is here:
http://pastebin.com/raw.php?i=SYhmyhGj
Once livelock occurs, any ZFS command hangs, and it appears any command
that doesn't happen to be in cache may also hang.
Any suggestions are warmly welcomed!
More information about the freebsd-fs
mailing list