zfs hang in zio->io_cv) with dd read

John Hay jhay at meraka.org.za
Fri Oct 8 09:39:00 UTC 2010


On Thu, Oct 07, 2010 at 09:28:22PM +0300, Andriy Gapon wrote:
> on 07/10/2010 20:31 John Hay said the following:
> > Oct  7 17:11:49 thumper1 kernel: mvsch23: EMPTY CRPB 30 (->0) 0 4000
> 
> Can you rule out hardware (or driver-level) problems?
> E.g. by dd-ing to/from disk directly.
> Doing that in parallel on the same and/or different disks.
> Running any disk I/O benchmarks.

Well, it might not be conclusive, but here is what I have done/tried:

dd from a few select disks. They all do about 64MB/s and 900 interrupts
per second. No kernel messages in dmesg or /var/log/messages. Typical
command is:
dd if=/dev/ada17 of=/dev/null bs=64k count=80000

8 simultaneous dds from the 8 disks on a controller. I still get 64MB/s
and 7000+ interrupts per second. No kernel messages.

6 simultaneous dds from a disk on each of the 6 controllers. I still get
64MB/s and 900+ interrupts per second per controller. No kernel messages.

I made a small zfs raidz2 with 6 disks, one from each controller. dd to
and from it with no problem.

I made a small zfs raidz2 with 8 disks, all from one controller. dd to
and from it at 190MB/s and 270MB/s, no problem. Bonnie++ finished
without a problem.

Next I made a zpool with 2 X raidz2 with 8 disks each. Each raidz2 on
its own controller:

zpool create -m none tst \
raidz2  ada0p1 ada1p1 ada2p1 ada3p1 ada4p1 ada5p1 ada6p1 ada7p1 \
raidz2  ada8p1 ada9p1 ada10p1 ada11p1 ada12p1 ada13p1 ada14p1 ada15p1

Creating a file with dd finished without a problem, about 245MB/s.
# dd if=/dev/zero of=/export/tst.dd bs=64k count=160000
160000+0 records in
160000+0 records out
10485760000 bytes transferred in 42.732294 secs (245382567 bytes/sec)

Reading from the file caused a hang again:

# dd of=/dev/null if=/export/tst.dd bs=64k

This message arrived in dmesg:

mvsch15: EMPTY CRPB 13 (->14) 0 0000

And a little later there was a lot more:

mvsch15: Timeout on slot 1
mvsch15: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001100 dma_c 00000000 dma_s 00000000 rs 00000002 status 50
mvsch2: EMPTY CRPB 16 (->0) 2 4000
mvsch2: EMPTY CRPB 18 (->0) 1 4000
mvsch2: EMPTY CRPB 19 (->0) 2 4000
mvsch2: EMPTY CRPB 20 (->0) 3 4000
mvsch2: EMPTY CRPB 21 (->0) 0 4000
mvsch2: EMPTY CRPB 22 (->0) 1 4000
mvsch2: EMPTY CRPB 23 (->0) 2 4000
...

While this was happening, a dd from ada7p1 ran at normal speed, but from
ada15p1 (which is on mvsch15) hanged for a while until there was a burst
of mvsX interrupts and then finished without a further hickup. The original
dd from tst.dd still have not finished.

So it might be a driver problem, which only occur when pushed in a
different than I could with my simultaneous dds to the raw partitions.

If there are more tests that I can do, just say what. If someone wants a
login to debug this, I can do it.

John
-- 
John Hay -- jhay at meraka.csir.co.za / jhay at FreeBSD.org


More information about the freebsd-stable mailing list