zfs hang in zio->io_cv) with dd read

Jeremy Chadwick freebsd at jdc.parodius.com
Thu Oct 7 17:39:02 UTC 2010


On Thu, Oct 07, 2010 at 07:31:02PM +0200, John Hay wrote:
> On Thu, Oct 07, 2010 at 06:19:48PM +0200, Goran Lowkrantz wrote:
> > --On October 7, 2010 17:50:42 +0200 John Hay <jhay at meraka.org.za> wrote:
> > 
> > >On Thu, Oct 07, 2010 at 02:35:31PM +0200, Ivan Voras wrote:
> > >>On 10/07/10 14:15, John Hay wrote:
> > >>> Hi,
> > >>>
> > >>> I got hold of a SunFire X4500 with 48 X 500G disks and thought to try
> > >>> FreeBSD 8-stable with zfs on it.
> > >>>
> > >>> I have setup the two boot disks in a zfs mirror and then the rest in
> > >>> a pool of 6 X raidz2 of 7 disks each.
> > >>>
> > >>> I have created a 10G file with dd in the second pool, but if I try to
> > >>> read it with dd, dd will hang in "zio->io_cv)" according to ^T. This
> > >>> happens everytime. The first time I saw messages about an interrupt
> > >>> storm, so I have put "hw.intr_storm_threshold=10000" in
> > >>> /etc/sysctl.conf. According to "systat -vm 1" there is atapci for 2-3
> > >>> seconds and then it is quiet.
> > >>
> > >>There are two things you could try: 1) use the AHCI driver
> > >>(ahci_load="YES" in /boot/loader.conf) and 2) disable superpages, they
> > >>don't get along on a few models of Opterons (vm.pmap.pg_ps_enabled=0 in
> > >>/boot/loader.conf).
> > >
> > >ahci does not grab them. According to the ahci man page, it can handle
> > >Marvell 88SX61xx, while these are MV88SX6081 according to pciconf -lcv:
> > >
> > >atapci0 at pci0:1:1:0:     class=0x010000 card=0x11ab11ab chip=0x608111ab
> > >rev=0x09 hdr=0x00     vendor     = 'Marvell Semiconductor (Was: Galileo
> > >Technology Ltd)'     device     = 'MV88SX6081 8-port SATA II PCI-X
> > >Controller'
> > >    class      = mass storage
> > >    subclass   = SCSI
> > >    cap 01[40] = powerspec 2  supports D0 D3  current D0
> > >    cap 05[50] = MSI supports 1 message, 64 bit
> > >    cap 07[60] = PCI-X 64-bit supports 133MHz, 512 burst read, 4 split
> > >transactions
> > 
> > Then try mvs_load="YES"
> > 
> > mvs0 at pci0:6:2:0:	class=0x010000 card=0x11ab11ab chip=0x608111ab 
> > rev=0x09 hdr=0x00
> >    vendor     = 'Marvell Semiconductor (Was: Galileo Technology Ltd)'
> >    device     = 'MV88SX6081 8-port SATA II PCI-X Controller'
> >    class      = mass storage
> >    subclass   = SCSI
> > mvs1 at pci0:5:1:0:	class=0x010000 card=0x11ab11ab chip=0x608111ab 
> > rev=0x09 hdr=0x00
> >    vendor     = 'Marvell Semiconductor (Was: Galileo Technology Ltd)'
> >    device     = 'MV88SX6081 8-port SATA II PCI-X Controller'
> >    class      = mass storage
> >    subclass   = SCSI
> 
> That helped, thanks. Now the disks are detected as adaXX devices.
> 
> The problem still happens though. I think it takes a little longer after
> I have started dd before it hangs, but it still hangs.
> 
> One thing seems a little different though. Occasionaly a short burst of
> interrupts on the mvsX devices do come through. It also seems that a few
> seconds after I press ^T in the dd window, I see a burst of mvsX 
> interrupts happen and dd will report in/out records and bytes. This did
> not happen with the ata driver. It is still hanging in "zio->io_cv)"
> though.
> 
> I also see these messages in /var/log/messages
> 
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 16 (->0) 0 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 17 (->0) 1 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 18 (->0) 2 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 20 (->0) 0 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 21 (->0) 1 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 22 (->0) 2 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 23 (->0) 0 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 24 (->0) 0 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 25 (->0) 1 4000
> Oct  7 17:08:04 thumper1 kernel: 
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 26 (->0) 0 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 27 (->0) 0 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 28 (->0) 1 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 29 (->0) 2 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 30 (->0) 3 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 31 (->0) 0 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 2 (->18) 1 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 5 (->18) 0 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 6 (->18) 0 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 7 (->18) 0 4000
> Oct  7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 8 (->18) 1 4000
> Oct  7 17:08:05 thumper1 kernel: 
> Oct  7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 9 (->18) 2 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 10 (->18) 0 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 11 (->18) 1 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 12 (->18) 0 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 13 (->18) 0 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 14 (->18) 1 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 15 (->18) 2 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 16 (->18) 3 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 17 (->18) 0 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 18 (->22) 1 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 19 (->22) 2 4000
> Oct  7 17:08:05 thumper1 kernel: 
> Oct  7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 20 (->22) 3 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch20: EMPTY CRPB 30 (->31) 0 0000
> Oct  7 17:08:05 thumper1 kernel: mvsch20: EMPTY CRPB 7 (->0) 0 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch20: EMPTY CRPB 8 (->0) 1 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch20: EMPTY CRPB 10 (->0) 3 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch20: EMPTY CRPB 11 (->0) 0 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch20: EMPTY CRPB 12 (->0) 1 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch20: EMPTY CRPB 13 (->0) 2 4000
> Oct  7 17:08:05 thumper1 kernel: mvsch20: EMPTY CRPB 14 (->0) 3 4000
> ...
> Oct  7 17:08:09 thumper1 kernel: mvsch19: EMPTY CRPB 9 (->10) 0 0000
> Oct  7 17:08:39 thumper1 kernel: mvsch19: Timeout on slot 1
> Oct  7 17:08:39 thumper1 kernel: mvsch19: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001024 dma_c 00000000 dma_s 00000000 rs 00000002 status 50
> Oct  7 17:08:39 thumper1 kernel: mvsch22: EMPTY CRPB 19 (->0) 1 4000
> Oct  7 17:08:39 thumper1 kernel: mvsch22: EMPTY CRPB 20 (->0) 3 4000
> Oct  7 17:08:39 thumper1 kernel: mvsch22: EMPTY CRPB 21 (->0) 2 4000
> ...
> Oct  7 17:08:43 thumper1 kernel: mvsch18: EMPTY CRPB 14 (->15) 0 4000
> Oct  7 17:08:43 thumper1 kernel: mvsch29: EMPTY CRPB 29 (->0) 0 4000
> Oct  7 17:09:13 thumper1 kernel: mvsch29: Timeout on slot 1
> Oct  7 17:09:13 thumper1 kernel: mvsch29: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001025 dma_c 00000000 dma_s 00000000 rs 00000002 status 50
> Oct  7 17:09:14 thumper1 kernel: mvsch20: EMPTY CRPB 27 (->0) 1 4000
> Oct  7 17:09:14 thumper1 kernel: mvsch20: EMPTY CRPB 29 (->0) 3 4000
> Oct  7 17:09:14 thumper1 kernel: mvsch20: EMPTY CRPB 30 (->0) 0 4000
> ...
> Oct  7 17:09:14 thumper1 kernel: mvsch20: EMPTY CRPB 26 (->29) 1 4000
> Oct  7 17:09:14 thumper1 kernel: mvsch20: EMPTY CRPB 27 (->29) 2 4000
> Oct  7 17:09:14 thumper1 kernel: mvsch20: EMPTY CRPB 28 (->29) 0 4000
> Oct  7 17:09:14 thumper1 kernel: mvsch20: EMPTY CRPB 8 (->9) 0 0000
> Oct  7 17:09:44 thumper1 kernel: mvsch20: Timeout on slot 1
> Oct  7 17:09:44 thumper1 kernel: mvsch20: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001123 dma_c 00000000 dma_s 00000000 rs 00000002 status 50
> Oct  7 17:09:45 thumper1 kernel: mvsch12: EMPTY CRPB 26 (->27) 0 002a
> Oct  7 17:10:15 thumper1 kernel: mvsch12: Timeout on slot 1
> Oct  7 17:10:15 thumper1 kernel: mvsch12: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001022 dma_c 00000000 dma_s 00000000 rs 00000002 status 50
> Oct  7 17:10:15 thumper1 kernel: mvsch4: EMPTY CRPB 14 (->15) 0 0000
> Oct  7 17:10:45 thumper1 kernel: mvsch4: Timeout on slot 1
> Oct  7 17:10:45 thumper1 kernel: mvsch4: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001020 dma_c 00000000 dma_s 00000000 rs 00000002 status 50
> Oct  7 17:10:45 thumper1 kernel: mvsch22: EMPTY CRPB 7 (->0) 1 4000
> Oct  7 17:10:45 thumper1 kernel: mvsch22: EMPTY CRPB 8 (->0) 0 4000
> Oct  7 17:10:45 thumper1 kernel: mvsch22: EMPTY CRPB 9 (->0) 1 4000
> ...
> Oct  7 17:10:48 thumper1 kernel: mvsch19: EMPTY CRPB 21 (->26) 3 4000
> Oct  7 17:10:48 thumper1 kernel: mvsch21: EMPTY CRPB 19 (->20) 0 0000
> Oct  7 17:10:48 thumper1 kernel: mvsch21: EMPTY CRPB 28 (->29) 0 0000
> Oct  7 17:11:18 thumper1 kernel: mvsch21: Timeout on slot 2
> Oct  7 17:11:18 thumper1 kernel: mvsch21: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001026 dma_c 00000000 dma_s 00000000 rs 00000014 status 50
> Oct  7 17:11:18 thumper1 kernel: mvsch21:  ... waiting for slots 00000010
> Oct  7 17:11:18 thumper1 kernel: mvsch21: Timeout on slot 4
> Oct  7 17:11:18 thumper1 kernel: mvsch21: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001026 dma_c 00000000 dma_s 00000000 rs 00000014 status 50
> Oct  7 17:11:19 thumper1 kernel: mvsch27: EMPTY CRPB 22 (->23) 0 0000
> Oct  7 17:11:19 thumper1 kernel: mvsch21: EMPTY CRPB 12 (->13) 0 4000
> Oct  7 17:11:19 thumper1 kernel: mvsch29: EMPTY CRPB 6 (->7) 0 0000
> ...
> Oct  7 17:11:19 thumper1 kernel: mvsch1: EMPTY CRPB 13 (->16) 0 4000
> Oct  7 17:11:19 thumper1 kernel: mvsch1: EMPTY CRPB 14 (->16) 1 4000
> Oct  7 17:11:19 thumper1 kernel: mvsch1: EMPTY CRPB 15 (->16) 0 4000
> Oct  7 17:11:49 thumper1 kernel: mvsch27: Timeout on slot 1
> Oct  7 17:11:49 thumper1 kernel: mvsch27: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001023 dma_c 00000000 dma_s 00000000 rs 00000002 status 50
> Oct  7 17:11:49 thumper1 kernel: mvsch21: Timeout on slot 1
> Oct  7 17:11:49 thumper1 kernel: mvsch21: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001024 dma_c 00000000 dma_s 00000000 rs 00000002 status 50
> Oct  7 17:11:49 thumper1 kernel: mvsch29: Timeout on slot 1
> Oct  7 17:11:49 thumper1 kernel: mvsch29: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001024 dma_c 00000000 dma_s 00000000 rs 0000000a status 50
> Oct  7 17:11:49 thumper1 kernel: mvsch29:  ... waiting for slots 00000008
> Oct  7 17:11:49 thumper1 kernel: mvsch3: Timeout on slot 2
> Oct  7 17:11:49 thumper1 kernel: mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001023 dma_c 00000000 dma_s 00000000 rs 00000004 status 50
> Oct  7 17:11:49 thumper1 kernel: mvsch1: EMPTY CRPB 18 (->20) 1 4000
> Oct  7 17:11:49 thumper1 kernel: mvsch29: Timeout on slot 3
> Oct  7 17:11:49 thumper1 kernel: mvsch29: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001024 dma_c 00000000 dma_s 00000000 rs 0000000a status 50
> Oct  7 17:11:49 thumper1 kernel: mvsch2: EMPTY CRPB 14 (->15) 0 0000
> Oct  7 17:11:49 thumper1 kernel: mvsch30: EMPTY CRPB 21 (->22) 0 0000
> Oct  7 17:11:49 thumper1 kernel: mvsch12: EMPTY CRPB 22 (->23) 0 0000
> Oct  7 17:11:49 thumper1 kernel: mvsch23: EMPTY CRPB 30 (->0) 0 4000
> ...
> 
> Ahh and dd did finish after a long time:
> 
> > dd if=tst.dd of=/dev/null bs=64k
> load: 0.87  cmd: dd 1399 [zio->io_cv)] 35.50r 0.01u 10.57s 0% 868k
> 107740+0 records in
> 107740+0 records out
> 7060848640 bytes transferred in 48.836175 secs (144582344 bytes/sec)
> 
> load: 0.44  cmd: dd 1399 [zio->io_cv)] 76.87r 0.01u 11.99s 0% 880k
> 114476+0 records in
> 114476+0 records out
> 7502299136 bytes transferred in 83.218699 secs (90151603 bytes/sec)
> load: 0.79  cmd: dd 1399 [zio->io_cv)] 104.27r 0.01u 12.37s 0% 880k
> 118032+0 records in
> 118032+0 records out
> 7735345152 bytes transferred in 114.212610 secs (67727593 bytes/sec)
> load: 0.29  cmd: dd 1399 [zio->io_cv)] 165.37r 0.01u 12.77s 0% 880k
> 121816+0 records in
> 121816+0 records out
> 7983333376 bytes transferred in 174.702303 secs (45696784 bytes/sec)
> load: 0.01  cmd: dd 1399 [zio->io_cv)] 454.84r 0.01u 15.57s 0% 880k
> 134566+0 records in
> 134566+0 records out
> 8818917376 bytes transferred in 455.140146 secs (19376268 bytes/sec)
> load: 0.00  cmd: dd 1399 [zio->io_cv)] 671.34r 0.02u 16.71s 0% 880k
> 143446+0 records in
> 143446+0 records out
> 9400877056 bytes transferred in 699.953712 secs (13430712 bytes/sec)
> 160000+0 records in
> 160000+0 records out
> 10485760000 bytes transferred in 1077.266770 secs (9733671 bytes/sec)

Isn't slow I/O one of the results of ARC starvation?

Please also provide vmstat -i output.

Also adding Andriy Gapon to the CC list.

-- 
| Jeremy Chadwick                                   jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |



More information about the freebsd-stable mailing list