[Bug 229745] ahcich: CAM status: Command timeout

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Thu Jul 12 22:35:03 UTC 2018


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745

            Bug ID: 229745
           Summary: ahcich: CAM status: Command timeout
           Product: Base System
           Version: 11.2-STABLE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs at FreeBSD.org
          Reporter: fbsd98816551 at avksrv.org

Hello!

We have some Supermicro server based on X11SSH-F
All servers were installed half year ago and works under Fbsd 11.1. All server
have 4 HDD HGST HUS722T1TALA604
All of them works fine for this time with half year uptime.
Recently servers were upgraded to Fbsd 11.2 (self build 11.2-STABLE r335679
with default make.conf src.conf and GENERIC)

and after some time (all the time different, from 2 hours to 7 days) one or
some disks started timeout:

Jul 13 00:56:24 mrr32 kernel: ahcich2: Timeout on slot 17 port 0
Jul 13 00:56:24 srv32 kernel: ahcich2: is 00000000 cs 00000000 ss 00060000 rs
00060000 tfd 40 serr 00000000 cmd 0004d217
Jul 13 00:56:24 srv32 kernel: (ada2:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61
20 ca 22 23 40 06 00 00 00 00 00
Jul 13 00:56:24 srv32 kernel: (ada2:ahcich2:0:0:0): CAM status: Command timeout
Jul 13 00:56:24 srv32 kernel: (ada2:ahcich2:0:0:0): Retrying command
Jul 13 00:58:16 srv32 kernel: ahcich2: Timeout on slot 26 port 0
Jul 13 00:58:16 srv32 kernel: ahcich2: is 00000000 cs 00000000 ss 04000000 rs
04000000 tfd 40 serr 00000000 cmd 0004da17
Jul 13 00:58:16 srv32 kernel: (ada2:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61
e0 8a cc c6 40 18 00 00 00 00 00
Jul 13 00:58:16 srv32 kernel: (ada2:ahcich2:0:0:0): CAM status: Command timeout
Jul 13 00:58:16 srv32 kernel: (ada2:ahcich2:0:0:0): Retrying command
Jul 13 01:01:46 srv32 kernel: ahcich2: Timeout on slot 18 port 0
Jul 13 01:01:46 srv32 kernel: ahcich2: is 00000000 cs 00000000 ss 00040000 rs
00040000 tfd 40 serr 00000000 cmd 0004d217
Jul 13 01:01:46 srv32 kernel: (ada2:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61
20 2a 2b 23 40 06 00 00 00 00 00
Jul 13 01:01:46 srv32 kernel: (ada2:ahcich2:0:0:0): CAM status: Command timeout
Jul 13 01:01:46 srv32 kernel: (ada2:ahcich2:0:0:0): Retrying command
Jul 13 01:07:12 srv32 kernel: ahcich0: Timeout on slot 23 port 0
Jul 13 01:07:12 srv32 kernel: ahcich0: is 00000000 cs 00000000 ss 00800000 rs
00800000 tfd 40 serr 00000000 cmd 0004d717
Jul 13 01:07:12 srv32 kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61
18 62 f5 c6 40 18 00 00 00 00 00
Jul 13 01:07:12 srv32 kernel: (ada0:ahcich0:0:0:0): CAM status: Command timeout
Jul 13 01:07:12 srv32 kernel: (ada0:ahcich0:0:0:0): Retrying command
Jul 13 01:07:43 srv32 kernel: ahcich0: Timeout on slot 2 port 0
Jul 13 01:07:43 srv32 kernel: ahcich0: is 00000000 cs 00000000 ss 00000004 rs
00000004 tfd 40 serr 00000000 cmd 0004c217
Jul 13 01:07:43 srv32 kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61
10 62 12 7b 40 06 00 00 00 00 00
Jul 13 01:07:43 srv32 kernel: (ada0:ahcich0:0:0:0): CAM status: Command timeout
Jul 13 01:07:43 srv32 kernel: (ada0:ahcich0:0:0:0): Retrying command

reboot (/sbin/shutdown -r or /sbin/reboot) does not solve the problem, disks
still timeout after boot. Only power off / power on solve problem for some
time. and after while it generate timeount 

Servers were updated to latest bios available on Supermicro. No changes.

ahci0: <Intel Sunrise Point AHCI SATA controller> port
0xf050-0xf057,0xf040-0xf043,0xf020-0xf03f mem
0xdf310000-0xdf311fff,0xdf31e000-0xdf31e0ff,0xdf31d000-0xdf31d7ff irq 16 at
device 23.0 on pci0
ahci0: AHCI v1.31 with 8 6Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
ahcich6: <AHCI channel> at channel 6 on ahci0
ahcich7: <AHCI channel> at channel 7 on ahci0

ses0 at ahciem0 bus 0 scbus8 target 0 lun 0
ses0: <AHCI SGPIO Enclosure 1.00 0001> SEMB S-E-S 2.00 device
ses0: SEMB SES Device

ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <HGST HUS722T1TALA604 RAGNWA07> ACS-3 ATA SATA 3.x device
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 953869MB (1953525168 512 byte sectors)


ahci0 at pci0:0:23:0:      class=0x010601 card=0x088415d9 chip=0xa1028086 rev=0x31
hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Sunrise Point-H SATA controller [AHCI mode]'
    class      = mass storage
    subclass   = SATA


We use zfs on all servers, some servers are raidz1, some raid-10, with same
results

We use to use smartd on all servers, I tried to disable smartd. Looks like no
changes.

We already upgraded zpools to new features, it require remove features before
downgrade back to 11.1

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list