ZFS hanging
Dennis Glatting
dg at pki2.com
Tue Jul 10 18:08:21 UTC 2012
On Tue, 10 Jul 2012, George Kontostanos wrote:
> On Mon, Jul 9, 2012 at 11:13 PM, Dennis Glatting <freebsd at pki2.com> wrote:
>> I have a ZFS array of disks where the system simply stops as if forever
>> blocked by some IO mutex. This happens often and the following is the
>> output of top:
>>
>> last pid: 6075; load averages: 0.00, 0.00, 0.00 up 0+16:54:41
>> 13:04:10
>> 135 processes: 1 running, 134 sleeping
>> CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>> Mem: 47M Active, 24M Inact, 18G Wired, 120M Buf, 44G Free
>> Swap: 32G Total, 32G Free
>>
>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
>> COMMAND
>> 2410 root 1 33 0 11992K 2820K zio->i 7 331:25 0.00%
>> bzip2
>> 2621 root 1 52 4 28640K 5544K tx->tx 24 245:33 0.00%
>> john
>> 2624 root 1 48 4 28640K 5544K tx->tx 4 239:08 0.00%
>> john
>> 2623 root 1 49 4 28640K 5544K tx->tx 7 238:44 0.00%
>> john
>> 2640 root 1 42 4 28640K 5420K tx->tx 23 206:51 0.00%
>> john
>> 2638 root 1 42 4 28640K 5420K tx->tx 28 206:34 0.00%
>> john
>> 2639 root 1 42 4 28640K 5420K tx->tx 9 206:30 0.00%
>> john
>> 2637 root 1 42 4 28640K 5420K tx->tx 18 206:24 0.00%
>> john
>>
>>
>> This system is presently resilvering a disk but these stops have
>> happened before.
>>
>>
>> iirc# zpool status disk-1
>> pool: disk-1
>> state: DEGRADED
>> status: One or more devices is currently being resilvered. The pool
>> will
>> continue to function, possibly in a degraded state.
>> action: Wait for the resilver to complete.
>> scan: resilver in progress since Sun Jul 8 13:07:46 2012
>> 104G scanned out of 12.4T at 1.73M/s, (scan is slow, no
>> estimated time)
>> 10.3G resilvered, 0.82% done
>> config:
>>
>> NAME STATE READ WRITE CKSUM
>> disk-1 DEGRADED 0 0 0
>> raidz2-0 DEGRADED 0 0 0
>> da1 ONLINE 0 0 0
>> da2 ONLINE 0 0 0
>> da10 ONLINE 0 0 0
>> da9 ONLINE 0 0 0
>> da5 ONLINE 0 0 0
>> da6 ONLINE 0 0 0
>> da7 ONLINE 0 0 0
>> replacing-7 DEGRADED 0 0 0
>> 17938531774236227186 UNAVAIL 0 0 0 was /dev/da8
>> da3 ONLINE 0 0 0 (resilvering)
>> da8 ONLINE 0 0 0
>> da4 ONLINE 0 0 0
>> logs
>> ada2p1 ONLINE 0 0 0
>> cache
>> ada1 ONLINE 0 0 0
>>
>> errors: No known data errors
>>
>>
>> This system has dissimilar disks, which I understand should not be a
>> problem but the stopping also happened before I started the slow disk
>> upgrade process.
>>
>> The disks are served by:
>>
>> * A LSI 9211 flashed to IT, and
>> * A LSI 2008 controller on the motherboard also flashed to IT.
>>
>> The 2008 BIOS and firmware is the most recent from LSI. The motherboard
>> is a Supermicro H8DG6-F.
>>
>>
>> My question is what should I be looking at and how should I look at it?
>> There is nothing in the logs or the console, rather the system is
>> forever paused and entering commands results in no response (it's as if
>> everything is deadlocked).
>>
>>
>>
>>
>>
>> _______________________________________________
>> freebsd-fs at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>
> Can you post your 'dmesg | grep mps', the FreeBSD version you run?
> Also, is there any chance that those disks are 4K?
>
I sent that in another post but included it below.
Yes, the disks are a mix. I'm presently migrating 2TB crappy disks, and
some 2TB not-so-crappy disks, to 3TB crappy-unknown disks. However:
1) Why would a mix of 512/4k disks in a ZFS volume lock out a hardware
RAID1 volume on another controller?
2) Is there are known problem, other than performance, mixing 512/4k?
3) Related: How does a SSD array of block size foo impact an array of
sectory size bar?
Thanks.
iirc> dmesg | grep mps
mps0: <LSI SAS2008> port 0xd000-0xd0ff mem
0xdfe3c000-0xdfe3ffff,0xdfe40000-0xdfe7ffff irq 19 at device 0.0 on pci4
mps0: Firmware: 13.00.57.00, Driver: 14.00.00.01-fbsd
mps0: IOCCapabilities:
1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps0: attempting to allocate 1 MSI-X vectors (15 supported)
mps0: using IRQ 256 for MSI-X
mps1: <LSI SAS2008> port 0xc000-0xc0ff mem
0xdfd3c000-0xdfd3ffff,0xdfd40000-0xdfd7ffff irq 16 at device 0.0 on pci3
mps1: Firmware: 13.00.57.00, Driver: 14.00.00.01-fbsd
mps1: IOCCapabilities:
1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps1: attempting to allocate 1 MSI-X vectors (15 supported)
mps1: using IRQ 257 for MSI-X
da1 at mps0 bus 0 scbus1 target 0 lun 0
da5 at mps1 bus 0 scbus2 target 1 lun 0
da4 at mps0 bus 0 scbus1 target 6 lun 0
da2 at mps0 bus 0 scbus1 target 1 lun 0
da6 at mps1 bus 0 scbus2 target 2 lun 0
da8 at mps1 bus 0 scbus2 target 5 lun 0
da7 at mps1 bus 0 scbus2 target 3 lun 0
da10 at mps1 bus 0 scbus2 target 8 lun 0
pass2 at mps0 bus 0 scbus1 target 0 lun 0
pass3 at mps0 bus 0 scbus1 target 1 lun 0
pass4 at mps0 bus 0 scbus1 target 5 lun 0
pass5 at mps0 bus 0 scbus1 target 6 lun 0
pass6 at mps1 bus 0 scbus2 target 1 lun 0
pass7 at mps1 bus 0 scbus2 target 2 lun 0
pass8 at mps1 bus 0 scbus2 target 3 lun 0
pass9 at mps1 bus 0 scbus2 target 5 lun 0
pass10 at mps1 bus 0 scbus2 target 7 lun 0
pass11 at mps1 bus 0 scbus2 target 8 lun 0
da3 at mps0 bus 0 scbus1 target 5 lun 0
da9 at mps1 bus 0 scbus2 target 7 lun 0
iirc> uname -a
FreeBSD iirc 9.0-STABLE FreeBSD 9.0-STABLE #14: Sun Jul 8 16:54:00 PDT
2012 root at iirc:/sys/amd64/compile/SMUNI amd64
> --
> George Kontostanos
> Aicom telecoms ltd
> http://www.aisecure.net
>
More information about the freebsd-fs
mailing list