HDD Lockups with on 9.1-RC3 (MPS, LSI 2008, and ZFS)

Reed A. Cartwright cartwright at asu.edu
Tue Nov 27 19:11:22 UTC 2012


I also posted about this on freebsd-stable earlier.

I recently upgraded my server from 9.0 to 9.1-RC3 and have started
experiencing HDD lockups.  They don't happen immediately, but they do
appear to be happening during heavy read-write usage.  (The only other
change I did was to disable atime on one of the pools.)  The system
itself is not crashed because I can sometimes log in and execute a few
commands (if the right files are cached in memory).  The first time
this happened I was able to detect that many processes were stuck in
tx->tx state.  I can't figure out what this means.

The lockups have occurred when I was reading from the storage pool and
writing back to either the storage pool or a ufs scratch drive.

The system reboots fine; no HDD corruption is apparent.  I have yet to
find an error message associated with the lockups.

I upgraded my controller cards' firmware to match the new MPS driver
in 9.1 and the problem is still happening.  It looks like my cache
drive might have out of date firmware but it requires windows or linux
to upgrade according to OCZ.


pciconf and dmesg are attached.

SYSTEM INFO

64-core machine with 512GB memory, 9.1-RC3 kernel.

uname -a:
FreeBSD herschel.biodesign.asu.edu 9.1-RC3 FreeBSD 9.1-RC3 #0 r242324:
Tue Oct 30 00:58:57 UTC 2012
root at farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

camcontrol devlist:
<ATA Hitachi HUA72202 A3EA>        at scbus0 target 0 lun 0 (pass0,da0)
<ATA Hitachi HUA72202 A3EA>        at scbus0 target 1 lun 0 (pass1,da1)
<ATA Hitachi HUA72202 A3EA>        at scbus0 target 2 lun 0 (pass2,da2)
<ATA Hitachi HUA72202 A3EA>        at scbus0 target 3 lun 0 (pass3,da3)
<ATA Hitachi HUA72202 A3EA>        at scbus0 target 4 lun 0 (pass4,da4)
<ATA Hitachi HUA72202 A3EA>        at scbus0 target 5 lun 0 (pass5,da5)
<ATA Hitachi HUA72202 A3EA>        at scbus0 target 6 lun 0 (pass6,da6)
<ATA Hitachi HUA72202 A3EA>        at scbus0 target 7 lun 0 (pass7,da7)
<ATA D2CSTK251M11-048 2.15>        at scbus7 target 0 lun 0 (pass8,da8)
<ATA WDC WD1003FBYX-0 1V02>        at scbus7 target 1 lun 0 (pass9,da9)
<ATA WDC WD2503ABYX-0 1S02>        at scbus7 target 2 lun 0 (pass10,da10)
<ATA WDC WD2503ABYX-0 1S02>        at scbus7 target 3 lun 0 (pass11,da11)
<ATA INTEL SSDSA2CW30 0362>        at scbus7 target 4 lun 0 (pass12,da12)
<KVM vmDisk-CD 0.01>               at scbus9 target 0 lun 0 (cd0,pass13)

df -kh:
Filesystem            Size    Used   Avail Capacity  Mounted on
zroot                 199G     10G    189G     5%    /
devfs                 1.0k    1.0k      0B   100%    /dev
/dev/label/scratch    275G     15G    237G     6%    /scratch
fdescfs               1.0k    1.0k      0B   100%    /dev/fd
procfs                4.0k    4.0k      0B   100%    /proc
storage/home          8.1T    521G    7.6T     6%    /home
storage/jails         7.6T     63k    7.6T     0%    /jails
storage/storage       8.7T    1.1T    7.6T    13%    /storage
storage/storage/tt    8.1T    478G    7.6T     6%    /storage/tt
devfs                 1.0k    1.0k      0B   100%    /compat/linux/dev
linsysfs              4.0k    4.0k      0B   100%    /compat/linux/sys
linprocfs             4.0k    4.0k      0B   100%    /compat/linux/proc

zpool status:
  pool: storage
 state: ONLINE
  scan: scrub repaired 0 in 9h21m with 0 errors on Sat Nov 17 12:23:44 2012
config:

    NAME        STATE     READ WRITE CKSUM
    storage     ONLINE       0     0     0
      raidz2-0  ONLINE       0     0     0
        da0     ONLINE       0     0     0
        da1     ONLINE       0     0     0
        da2     ONLINE       0     0     0
        da3     ONLINE       0     0     0
        da4     ONLINE       0     0     0
        da5     ONLINE       0     0     0
        da6     ONLINE       0     0     0
        da7     ONLINE       0     0     0
    cache
      da8       ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
  scan: scrub repaired 0 in 0h14m with 0 errors on Sat Nov 17 03:16:09 2012
config:

    NAME        STATE     READ WRITE CKSUM
    zroot       ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        da11p2  ONLINE       0     0     0
        da10p2  ONLINE       0     0     0

cat /boot/loader.conf:
zfs_load="YES"
geom_eli_load="YES"
ahci_load="YES"

vfs.root.mountfrom="zfs:zroot"
debug.acpi.max_tasks="128"
#vboxdrv_load="YES"
kern.maxfiles="65536"




--
Reed A. Cartwright, PhD
Assistant Professor of Genomics, Evolution, and Bioinformatics
School of Life Sciences
Center for Evolutionary Medicine and Informatics
The Biodesign Institute
Arizona State University
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg.log
Type: application/octet-stream
Size: 18137 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-scsi/attachments/20121127/ff747777/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pciconf.log
Type: application/octet-stream
Size: 17091 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-scsi/attachments/20121127/ff747777/attachment-0001.obj>


More information about the freebsd-scsi mailing list