ahcich timeouts, only with ahci, not with ataahci

Harald Schmalzbauer h.schmalzbauer at omnilan.de
Sat Mar 13 21:28:14 UTC 2010


Am 03.03.2010 12:06, schrieb Jeremy Chadwick:
> On Wed, Mar 03, 2010 at 09:28:25AM +0100, Harald Schmalzbauer wrote:
>> Alexander Motin schrieb am 03.03.2010 09:18 (localtime):
>>> Harald Schmalzbauer wrote:
>>>> Alexander Motin schrieb am 23.02.2010 16:10 (localtime):
>>>>> Harald Schmalzbauer wrote:
>>>>>> I'm frequently getting my machine locked with ahcichX timeouts:
>>>>>> ahcich2: Timeout on slot 0
>>>>>> ahcich2: is 00000000 cs 00000001 ss 00000000 rs 00000001 tfd c0 serr
>>>>>> 00000000
>>>>>> ahcich2: Timeout on slot 8
>>>>>> ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd c0 serr
>>>>>> 00000000
>>>>>> ahcich2: Timeout on slot 8
>>>>>> ahcich2: is 00000000 cs fffff07f ss ffffff7f rs ffffff7f tfd c0 serr
>>>>>> 00000000
>>>>>> ...
>>>>> Looking that is (Interrupt status) is zero and `rs == cs | ss` (running
>>>>> command bitmasks in driver and hardware), controller doesn't report
>>>>> command completion. Looking on TFD status 0xc0 with BUSY bit set, I
>>>>> would suppose that either disk stuck in command processing for some
>>>>> reason, or controller missed command completion status.
>>>>>
>>>>> Have you noticed 30 second (default ATA timeout) pause before timeout
>>>>> message printed? Just want to be sure that driver waited enough before
>>>>> give up.
>>>>>
>>>>>> This happens when backup over GbE overloads ZFS/HDD capabilities.
>>>>>> I reduced vfs.zfs.txg.timeout to 1 to prevent the machine from locking
>>>>>> up almost immediately, but from it still happens.
>>>>>> When I don't use ahci but ataahci (the old driver if I understand things
>>>>>> correct) I also see the ZFS burst write congestion, but this doesn't
>>>>>> lead to controller timeouts, thus blocking the machine.
>>>>>>
>>>>>> Sometimes the machine recovers from the disk lock, but most often I have
>>>>>> to reboot.
>>>>> How it looks when it doesn't? Can you send me full log messages?
>>>> Hello, this morning I had a stall, but the machine recovered after about
>>>> one Minute. Here's what I got from the kernel:
>>>> ahcich2: Timeout on slot 29
>>>> ahcich2: is 00000000 cs 00000003 ss e0000003 rs e0000003 tfd c0 serr
>>>> 00000000
>>>> em1: watchdog timeout -- resetting
>>>> em1: watchdog timeout -- resetting
>>>> ahcich2: Timeout on slot 10
>>>> ahcich2: is 00000000 cs 00006000 ss 00007c00 rs 00007c00 tfd c0 serr
>>>> 00000000
>>>> ahcich2: Timeout on slot 18
>>>> ahcich2: is 00000000 cs 00040000 ss 00000000 rs 00040000 tfd c0 serr
>>>> 00000000
>>>> ahcich2: Timeout on slot 2
>>>> ahcich2: is 00000000 cs 00000004 ss 00000000 rs 00000004 tfd c0 serr
>>>> 00000000
>>>> ahcich2: Timeout on slot 2
>>>> ahcich2: is 00000000 cs 00000000 ss 0000000c rs 0000000c tfd 40 serr
>>>> 00000000
>>>>
>>>> Does this tell you something useful?
>>>
>>> It doesn't. Looking on logged register content - commands are indeed
>>> still running and no interrupts requested. Interesting to see em1
>>> watchdog timeout there. Aren't they related somehow?
>>
>> 	dmesg | grep "irq 18":
>> uhci0: <Intel 82801I (ICH9) USB controller> port 0x20c0-0x20df irq
>> 18 at device 26.0 on pci0
>> uhci4: <Intel 82801I (ICH9) USB controller> port 0x2040-0x205f irq
>> 18 at device 29.2 on pci0
>> em1: <Intel(R) PRO/1000 Network Connection 6.9.14> port
>> 0x1000-0x103f mem 0xe1920000-0xe193ffff,0xe1900000-0xe191ffff irq 18
>> at device 2.0 on pci3
>> ichsmb0: <Intel 82801I (ICH9) SMBus controller> port 0x2000-0x201f
>> mem 0xe1a22000-0xe1a220ff irq 18 at device 31.3 on pci0
>>
>> The don't share the same IRQ at least.
>> dmesg | grep "irq 21"
>> uhci1: <Intel 82801I (ICH9) USB controller> port 0x20a0-0x20bf irq
>> 21 at device 26.1 on pci0
>> ahci0: <Intel ICH9 AHCI SATA controller> port
>> 0x2408-0x240f,0x2414-0x2417,0x2400-0x2407,0x2410-0x2413,0x2020-0x203f
>> mem 0xe1a21000-0xe1a217ff irq 21 at device 31.2 on pci0
>>
>> The em1 has no cable attached. I get many of these em watchdog
>> timeouts. Never thought they could be related to ahci. I'll see if
>> the em watchdog timeouts happens in any relation to disk usage.
> 
> Please provide output from the commands I provided.  dmesg|grep is not
> sufficient for helping track this down, specifically with regards to the
> em1 watchdog timeouts.

Sorry for the delay, here's the details:
hostb0 at pci0:0:0:0:      class=0x060000 card=0x34d08086 chip=0x29f08086
rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '3200 Chipset (Bearlake) Processor to I/O Controller'
    class      = bridge
    subclass   = HOST-PCI
em0 at pci0:0:25:0:        class=0x020000 card=0x34d08086 chip=0x10bd8086
rev=0x02 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Intel 82566DM Gigabit Ethernet Adapter (82566DM)'
    class      = network
    subclass   = ethernet
uhci0 at pci0:0:26:0:      class=0x0c0300 card=0x34d08086 chip=0x29378086
rev=0x02 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82801IB/IR/IH (ICH9 Family) USB Universal Host Controller'
    class      = serial bus
    subclass   = USB
uhci1 at pci0:0:26:1:      class=0x0c0300 card=0x34d08086 chip=0x29388086
rev=0x02 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82801IB/IR/IH (ICH9 Family) USB Universal Host Controller'
    class      = serial bus
    subclass   = USB
ehci0 at pci0:0:26:7:      class=0x0c0320 card=0x34d08086 chip=0x293c8086
rev=0x02 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82801IB/IR/IH (ICH9 Family) USB2 Enhanced Host Controller'
    class      = serial bus
    subclass   = USB
pcib1 at pci0:0:28:0:      class=0x060400 card=0x29408086 chip=0x29408086
rev=0x02 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = '82801IB/IR/IH (ICH9 Family) PCIe Root Port 1'
    class      = bridge
    subclass   = PCI-PCI
pcib2 at pci0:0:28:4:      class=0x060400 card=0x29488086 chip=0x29488086
rev=0x02 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = '82801IB/IR/IH (ICH9 Family) PCIe Root Port 5'
    class      = bridge
    subclass   = PCI-PCI
uhci2 at pci0:0:29:0:      class=0x0c0300 card=0x34d08086 chip=0x29348086
rev=0x02 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82801IB/IR/IH (ICH9 Family) USB Universal Host Controller'
    class      = serial bus
    subclass   = USB
uhci3 at pci0:0:29:1:      class=0x0c0300 card=0x34d08086 chip=0x29358086
rev=0x02 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82801IB/IR/IH (ICH9 Family) USB Universal Host Controller'
    class      = serial bus
    subclass   = USB
uhci4 at pci0:0:29:2:      class=0x0c0300 card=0x34d08086 chip=0x29368086
rev=0x02 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82801IB/IR/IH (ICH9 Family) USB Universal Host Controller'
    class      = serial bus
    subclass   = USB
ehci1 at pci0:0:29:7:      class=0x0c0320 card=0x34d08086 chip=0x293a8086
rev=0x02 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82801IB/IR/IH (ICH9 Family) USB2 Enhanced Host Controller'
    class      = serial bus
    subclass   = USB
pcib3 at pci0:0:30:0:      class=0x060401 card=0x34d08086 chip=0x244e8086
rev=0x92 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = '82801 Family (ICH2/3/4/5/6/7/8/9,63xxESB) Hub
Interface to PCI Bridge'
    class      = bridge
    subclass   = PCI-PCI
isab0 at pci0:0:31:0:      class=0x060100 card=0x34d08086 chip=0x29168086
rev=0x02 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82801IR (ICH9R) LPC Interface Controller'
    class      = bridge
    subclass   = PCI-ISA
ahci0 at pci0:0:31:2:      class=0x010601 card=0x34d08086 chip=0x29228086
rev=0x02 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82801IB/IR/IH (ICH9 Family) 6 port SATA AHCI Controller'
    class      = mass storage
    subclass   = SATA
ichsmb0 at pci0:0:31:3:    class=0x0c0500 card=0x34d08086 chip=0x29308086
rev=0x02 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82801IB/IR/IH (ICH9 Family) SMBus Controller'
    class      = serial bus
    subclass   = SMBus
vgapci0 at pci0:2:0:0:     class=0x030000 card=0x01018086 chip=0x0522102b
rev=0x02 hdr=0x00
    vendor     = 'Matrox Electronic Systems Ltd.'
    device     = 'Matrox G200e (ServerEngines) - English (G200e)'
    class      = display
    subclass   = VGA
em1 at pci0:3:2:0: class=0x020000 card=0x34d08086 chip=0x10768086 rev=0x05
hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Gigabit Ethernet Controller (82541EI)'
    class      = network
    subclass   = ethernet
banana:~>20: vmstat -i
interrupt                          total       rate
irq4: uart0                        27392          0
irq18: em1 uhci0++                   178          0
irq19: uhci3                          22          0
irq23: uhci2 ehci1                     2          0
cpu0: timer                    231629257       1917
irq256: em0                      2559286         21
irq257: ahci0                    1602196         13
cpu1: timer                    231628629       1917
Total                          467446962       3869

Looks very similar to the problem here:
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2010-03/msg00143.html

Thanks,

-Harry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100313/d34f2cd8/signature.pgp


More information about the freebsd-stable mailing list