kern/157397: ahci/ada/cam NCQ timeouts on Samsung and non-disable-ability

Matthias Andree mandree at FreeBSD.org
Sun May 29 13:50:10 UTC 2011


>Number:         157397
>Category:       kern
>Synopsis:       ahci/ada/cam NCQ timeouts on Samsung and non-disable-ability
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun May 29 13:50:09 UTC 2011
>Closed-Date:
>Last-Modified:
>Originator:     Matthias Andree
>Release:        FreeBSD 8.2-STABLE amd64
>Organization:
>Environment:
System: FreeBSD apollo.emma.line.org 8.2-STABLE FreeBSD 8.2-STABLE #67: Fri May 27 21:36:00 CEST 2011 toor at apollo.emma.line.org:/usr/obj/usr/src/sys/GENERIC amd64


	
>Description:
(please disregard "class" field above)

Greetings,

I seem to have difficulties with the interaction between ahci(4), ada(4)
and the CAM subsystem on RELENG_8. Perhaps my HDD is quirky (Samsung
HD103SI rev. 1AG01118), perhaps ahci or ada. (dmesg below)

I seem to have two problems:

1. the problem is that once in a while queued ATA commands time out, and
let the computer freeze for like half a minute or two. It then recovers.
This only ever affects the Samsung drive, I've never seen it on the
Western Digital. Either drive doesn't report any suspicious data to
S.M.A.R.T. - no pending blocks, no past reallocations, no parameters
even close to Thresh, no self-test failures.  Possibly the Samsung also
has firmware quirks, silent limits and so on, I know there were firmware
updates to avoid data loss for larger siblings in the same drive series,
but apparently not for the HD103SI.

2. Trying to reduce the number of tagged openings with camcontrol tags
ada1 -N 16 doesn't work because min == max;
trying to disable tagged negotiation through "camcontrol negotiate ada1
-T disable" does not have any visible effect.

This is my /boot/loader.conf:

ahci_load="YES"
atapicam_load="YES"
cuse4bsd_load="YES"
geom_journal_load="YES"
sem_load="YES"
snd_hda_load="YES"
vboxdrv_load="YES"
hw.ata.wc=0
kern.cam.ada.write_cache=0
kern.ipc.shmall=32768
kern.ipc.shmmax=67108864
kern.maxfiles="25000"
vfs.root.mountfrom="ufs:/dev/ada0s4a"

This is dmesg with sensitive and useless information omitted. At the
end, it shows two episodes with freezes, of a handful of timeouts each
time.

...
FreeBSD 8.2-STABLE #67: Fri May 27 21:36:00 CEST 2011
    toor at apollo.emma.line.org:/usr/obj/usr/src/sys/GENERIC amd64
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Phenom(tm) II X4 905e Processor (2508.56-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x100f42  Family = 10  Model = 4  Stepping = 2
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x802009<SSE3,MON,CX16,POPCNT>
  AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT>
  TSC: P-state invariant
real memory  = 4294967296 (4096 MB)
avail memory = 3838828544 (3660 MB)
ACPI APIC Table: <052010 APIC1732>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
ACPI Warning: Optional field Pm2ControlBlock has zero address or length: 0x0000000000000000/0x1 (20101013/tbfadt-655)
ioapic0 <Version 2.1> irqs 0-23 on motherboard
Cuse4BSD v0.1.14 @ /dev/cuse
kbd1 at kbdmux0
acpi0: <052010 XSDT1732> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of fee00000, 1000 (3) failed
acpi0: reservation of ffb80000, 80000 (3) failed
acpi0: reservation of fec10000, 20 (3) failed
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, cfe00000 (3) failed
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 900
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
vgapci0: <VGA-compatible display> port 0xc000-0xc0ff mem 0xd0000000-0xdfffffff,0xfbde0000-0xfbdeffff,0xfbc00000-0xfbcfffff irq 18 at device 5.0 on pci1
hdac0: <ATI RS780 High Definition Audio Controller> mem 0xfbdfc000-0xfbdfffff irq 19 at device 5.1 on pci1
hdac0: HDA Driver Revision: 20100226_0142
hdac0: [ITHREAD]
pcib2: <ACPI PCI-PCI bridge> irq 18 at device 6.0 on pci0
pci2: <ACPI PCI bus> on pcib2
ale0: <Atheros AR8121/AR8113/AR8114 PCIe Ethernet> port 0xdc00-0xdc7f mem 0xfbec0000-0xfbefffff irq 18 at device 0.0 on pci2
ale0: 960 Tx FIFO, 1024 Rx FIFO
ale0: Using 1 MSI messages.
ale0: 4GB boundary crossed, switching to 32bit DMA addressing mode.
miibus0: <MII bus> on ale0
atphy0: <Atheros F1 10/100/1000 PHY> PHY 0 on miibus0
atphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto
ale0: Ethernet address: ...
ale0: [FILTER]
pcib3: <ACPI PCI-PCI bridge> irq 19 at device 7.0 on pci0
pci3: <ACPI PCI bus> on pcib3
fwohci0: <1394 Open Host Controller Interface> port 0xe800-0xe8ff mem 0xfbfff800-0xfbffffff irq 19 at device 0.0 on pci3
fwohci0: [ITHREAD]
fwohci0: OHCI version 1.10 (ROM=1)
fwohci0: No. of Isochronous channels is 4.
fwohci0: EUI64 ...
fwohci0: Phy 1394a available S400, 2 ports.
fwohci0: Link S400, max_rec 2048 bytes.
firewire0: <IEEE1394(FireWire) bus> on fwohci0
fwe0: <Ethernet over FireWire> on firewire0
if_fwe0: Fake Ethernet address: ...
fwe0: Ethernet address: ...
fwip0: <IP over FireWire> on firewire0
fwip0: Firewire address: ...
dcons_crom0: <dcons configuration ROM> on firewire0
dcons_crom0: bus_addr 0x1028000
fwohci0: Initiate bus reset
fwohci0: fwohci_intr_core: BUS reset
fwohci0: fwohci_intr_core: node_id=0x00000000, SelfID Count=1, CYCLEMASTER mode
ahci0: <ATI IXP700 AHCI SATA controller> port 0xb000-0xb007,0xa000-0xa003,0x9000-0x9007,0x8000-0x8003,0x7000-0x700f mem 0xfbbffc00-0xfbbfffff irq 22 at device 17.0 on pci0
ahci0: [ITHREAD]
ahci0: AHCI v1.10 with 6 3Gbps ports, Port Multiplier supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich0: [ITHREAD]
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich1: [ITHREAD]
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich2: [ITHREAD]
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich3: [ITHREAD]
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich4: [ITHREAD]
ahcich5: <AHCI channel> at channel 5 on ahci0
ahcich5: [ITHREAD]
ohci0: <OHCI (generic) USB controller> mem 0xfbbfd000-0xfbbfdfff irq 16 at device 18.0 on pci0
ohci0: [ITHREAD]
usbus0: <OHCI (generic) USB controller> on ohci0
ohci1: <OHCI (generic) USB controller> mem 0xfbbfe000-0xfbbfefff irq 16 at device 18.1 on pci0
ohci1: [ITHREAD]
usbus1: <OHCI (generic) USB controller> on ohci1
ehci0: <EHCI (generic) USB 2.0 controller> mem 0xfbbff800-0xfbbff8ff irq 17 at device 18.2 on pci0
ehci0: [ITHREAD]
usbus2: EHCI version 1.0
usbus2: <EHCI (generic) USB 2.0 controller> on ehci0
ohci2: <OHCI (generic) USB controller> mem 0xfbbfb000-0xfbbfbfff irq 18 at device 19.0 on pci0
ohci2: [ITHREAD]
usbus3: <OHCI (generic) USB controller> on ohci2
ohci3: <OHCI (generic) USB controller> mem 0xfbbfc000-0xfbbfcfff irq 18 at device 19.1 on pci0
ohci3: [ITHREAD]
usbus4: <OHCI (generic) USB controller> on ohci3
ehci1: <EHCI (generic) USB 2.0 controller> mem 0xfbbff400-0xfbbff4ff irq 19 at device 19.2 on pci0
ehci1: [ITHREAD]
usbus5: EHCI version 1.0
usbus5: <EHCI (generic) USB 2.0 controller> on ehci1
pci0: <serial bus, SMBus> at device 20.0 (no driver attached)
atapci0: <ATI IXP700/800 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
hdac1: <ATI SB600 High Definition Audio Controller> mem 0xfbbf4000-0xfbbf7fff irq 16 at device 20.2 on pci0
hdac1: HDA Driver Revision: 20100226_0142
hdac1: [ITHREAD]
isab0: <PCI-ISA bridge> at device 20.3 on pci0
isa0: <ISA bus> on isab0
pcib4: <ACPI PCI-PCI bridge> at device 20.4 on pci0
pci4: <ACPI PCI bus> on pcib4
ohci4: <OHCI (generic) USB controller> mem 0xfbbfa000-0xfbbfafff irq 18 at device 20.5 on pci0
ohci4: [ITHREAD]
usbus6: <OHCI (generic) USB controller> on ohci4
acpi_button0: <Power Button> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
acpi_hpet1: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
device_attach: acpi_hpet1 attach returned 12
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: [FILTER]
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
ppc0: cannot reserve I/O port range
acpi_throttle0: <ACPI CPU Throttling> on cpu0
hwpstate0: <Cool`n'Quiet 2.0> on cpu0
Timecounters tick every 1.000 msec
vboxdrv: fAsync=0 offMin=0x15a offMax=0x29f
firewire0: 1 nodes, maxhop <= 0 cable IRM irm(0)  (me) 
firewire0: bus manager 0 
hdac0: HDA Codec #0: ATI RS690/780 HDMI
pcm0: <HDA ATI RS690/780 HDMI PCM #0 HDMI> at cad 0 nid 1 on hdac0
hdac1: HDA Codec #0: VIA VT1708S_0
pcm1: <HDA VIA VT1708S_0 PCM #0 Analog> at cad 0 nid 1 on hdac1
pcm2: <HDA VIA VT1708S_0 PCM #1 Digital> at cad 0 nid 1 on hdac1
pcm3: <HDA VIA VT1708S_0 PCM #2 Digital> at cad 0 nid 1 on hdac1
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 12Mbps Full Speed USB v1.0
usbus2: 480Mbps High Speed USB v2.0
usbus3: 12Mbps Full Speed USB v1.0
usbus4: 12Mbps Full Speed USB v1.0
usbus5: 480Mbps High Speed USB v2.0
usbus6: 12Mbps Full Speed USB v1.0
ugen0.1: <ATI> at usbus0
uhub0: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <ATI> at usbus1
uhub1: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
ugen2.1: <ATI> at usbus2
uhub2: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2
ugen3.1: <ATI> at usbus3
uhub3: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus3
ugen4.1: <ATI> at usbus4
uhub4: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
ugen5.1: <ATI> at usbus5
uhub5: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus5
ugen6.1: <ATI> at usbus6
uhub6: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus6
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD5002ABYS-02B1B0 ...> ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)
ada1 at ahcich2 bus 0 scbus2 target 0 lun 0
ada1: <SAMSUNG HD103SI 1AG01118> ATA-7 SATA 2.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
cd0 at ahcich3 bus 0 scbus3 target 0 lun 0
cd0: <...> Removable CD-ROM SCSI-0 device 
cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed
SMP: AP CPU #2 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #1 Launched!
GEOM: ada0: partition 2 does not start on a track boundary.
GEOM: ada0: partition 2 does not end on a track boundary.
GEOM: ada0: partition 1 does not start on a track boundary.
GEOM: ada0: partition 1 does not end on a track boundary.
GEOM: ada1: partition 2 does not start on a track boundary.
GEOM: ada1: partition 2 does not end on a track boundary.
uhub6: 2 ports with 2 removable, self powered
uhub0: 3 ports with 3 removable, self powered
uhub1: 3 ports with 3 removable, self powered
uhub3: 3 ports with 3 removable, self powered
uhub4: 3 ports with 3 removable, self powered
GEOM_JOURNAL: Journal 3658629072: ada1s1d contains data.
GEOM_JOURNAL: Journal 3658629072: ada1s1e contains journal.
GEOM_JOURNAL: Journal ada1s1d clean.
Root mount waiting for: GJOURNAL usbus5 usbus2
GEOM: ada1s1d.journal: invalid disklabel.
GEOM: ufsid/4d540ab0e4fef3cb: invalid disklabel.
GEOM: ufs/usrgjournal: invalid disklabel.
Root mount waiting for: usbus5 usbus2
Root mount waiting for: usbus5 usbus2
uhub2: 6 ports with 6 removable, self powered
uhub5: 6 ports with 6 removable, self powered
ugen5.2: <vendor 0x05e3> at usbus5
umass0: <vendor 0x05e3 USB Storage, class 0/0, rev 2.00/97.32, addr 2> on usbus5
umass0:  SCSI over Bulk-Only; quirks = 0x0000
Root mount waiting for: usbus5 usbus2
ugen0.2: <Microsoft> at usbus0
ukbd0: <Microsoft Natural Ergonomic Keyboard 4000, class 0/0, rev 2.00/1.73, addr 2> on usbus0
ugen2.2: <vendor 0x05e3> at usbus2
uhub7: <vendor 0x05e3 USB2.0 Hub, class 9/0, rev 2.00/7.02, addr 2> on usbus2
kbd2 at ukbd0
uhid0: <Microsoft Natural Ergonomic Keyboard 4000, class 0/0, rev 2.00/1.73, addr 2> on usbus0
umass0:6:0:-1: Attached to scbus6
uhub7: 4 ports with 4 removable, self powered
(probe0:umass-sim0:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 
(probe0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(probe0:umass-sim0:0:0:0): SCSI status: Check Condition
(probe0:umass-sim0:0:0:0): SCSI sense: NOT READY asc:3a,0 (Medium not present)
da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
da0: <Generic STORAGE DEVICE 9732> Removable Direct Access SCSI-0 device 
da0: 40.000MB/s transfers
da0: Attempt to query device size failed: NOT READY, Medium not present
ugen2.3: <EPSON> at usbus2
ulpt0: <EPSON USB Printer, class 0/0, rev 1.10/1.00, addr 3> on usbus2
ulpt0: using bi-directional mode
Root mount waiting for: usbus2
(probe0:umass-sim0:0:0:1): TEST UNIT READY. CDB: 0 20 0 0 0 0 
(probe0:umass-sim0:0:0:1): CAM status: SCSI Status Error
(probe0:umass-sim0:0:0:1): SCSI status: Check Condition
(probe0:umass-sim0:0:0:1): SCSI sense: NOT READY asc:3a,0 (Medium not present)
da1 at umass-sim0 bus 0 scbus6 target 0 lun 1
da1: <Generic STORAGE DEVICE 9732> Removable Direct Access SCSI-0 device 
da1: 40.000MB/s transfers
da1: Attempt to query device size failed: NOT READY, Medium not present
ugen2.4: ...
ugen2.5: ...
ulpt1: ...
ulpt1: using bi-directional mode
Trying to mount root from ufs:/dev/ada0s4a
ugen0.3: ...
ums0: ...
ums0: 8 buttons and [XYZT] coordinates ID=0
cryptosoft0: <software crypto> on motherboard
GEOM_ELI: Device ada0s4b.eli created.
GEOM_ELI: Encryption: AES-XTS 256
GEOM_ELI:     Crypto: software
vboxnet0: Ethernet address: ...
ale0: link state changed to UP
fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.8
drm0: <ATI Radeon 3300 Graphics> on vgapci0
info: [drm] MSI enabled 1 message(s)
info: [drm] Initialized radeon 1.31.0 20080613
info: [drm] Setting GART location based on new memory map
info: [drm] Loading RS780/RS880 Microcode
info: [drm] Resetting GPU
info: [drm] writeback test succeeded in 1 usecs
drm0: [ITHREAD]
info: [drm] Resetting GPU

First failure episode:

ahcich2: Timeout on slot 28
ahcich2: is 00000000 cs 8fffffff ss ffffffff rs ffffffff tfd c0 serr 00000000
ahcich2: Timeout on slot 26
ahcich2: is 00000000 cs e3ffffff ss ffffffff rs ffffffff tfd c0 serr 00000000
ahcich2: Timeout on slot 24
ahcich2: is 00000000 cs f8ffffff ss ffffffff rs ffffffff tfd c0 serr 00000000

Computer has recovered. Several minutes later a second episode:

ahcich2: Timeout on slot 24
ahcich2: is 00000000 cs f80000ff ss ff0000ff rs ff0000ff tfd c0 serr 00000000
ahcich2: Timeout on slot 23
ahcich2: is 00000000 cs fc00003f ss ff80003f rs ff80003f tfd c0 serr 00000000
ahcich2: Timeout on slot 19
ahcich2: is 00000000 cs ffc00007 ss fff80007 rs fff80007 tfd c0 serr 00000000
ahcich2: Timeout on slot 18
ahcich2: is 00000000 cs ffe00007 ss fffc0007 rs fffc0007 tfd c0 serr 00000000
ahcich2: Timeout on slot 18
ahcich2: is 00000000 cs ffe00007 ss fffc0007 rs fffc0007 tfd c0 serr 00000000
ahcich2: Timeout on slot 18
ahcich2: is 00000000 cs ffe00007 ss fffc0007 rs fffc0007 tfd c0 serr 00000000
ahcich2: Timeout on slot 18
ahcich2: is 00000000 cs ffe0000f ss fffc000f rs fffc000f tfd c0 serr 00000000
ahcich2: Timeout on slot 25
ahcich2: is 00000000 cs f1ffffff ss ffffffff rs ffffffff tfd c0 serr 00000000

	
>How-To-Repeat:
	
>Fix:

	


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list