siisch1: Error while READ LOG EXT
Jeremy Chadwick
freebsd at jdc.parodius.com
Wed Feb 8 21:27:25 UTC 2012
On Wed, Feb 08, 2012 at 04:00:57PM -0500, Mike Tancsa wrote:
> I have a 4 port eSata PCIe card with 3 external port multipliers attached on an AMD64 box (8G of RAM), RELENG8 from Feb1st.
>
> siis0 at pci0:5:0:0: class=0x010400 card=0x71241095 chip=0x31241095 rev=0x02 hdr=0x00
> vendor = 'Silicon Image Inc (Was: CMD Technology Inc)'
> device = 'PCI-X to Serial ATA Controller (SiI 3124)'
> class = mass storage
> subclass = RAID
> bar [10] = type Memory, range 64, base 0xb4408000, size 128, enabled
> bar [18] = type Memory, range 64, base 0xb4400000, size 32768, enabled
> bar [20] = type I/O Port, range 32, base 0x3000, size 16, enabled
> cap 01[64] = powerspec 2 supports D0 D1 D2 D3 current D0
> cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 12 split transactions
> cap 05[54] = MSI supports 1 message, 64 bit enabled with 1 message
>
> siis0: <SiI3124 SATA controller> port 0x3000-0x300f mem 0xb4408000-0xb440807f,0xb4400000-0xb4407fff irq 19 at device 0.0 on pci5
> siis0: [ITHREAD]
> siisch0: <SIIS channel> at channel 0 on siis0
> siisch0: [ITHREAD]
> siisch1: <SIIS channel> at channel 1 on siis0
> siisch1: [ITHREAD]
> siisch2: <SIIS channel> at channel 2 on siis0
> siisch2: [ITHREAD]
> siisch3: <SIIS channel> at channel 3 on siis0
> siisch3: [ITHREAD]
>
> # camcontrol devlist
> <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 0 lun 0 (pass0,ada0)
> <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 1 lun 0 (pass1,ada1)
> <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 2 lun 0 (pass2,ada2)
> <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 3 lun 0 (pass3,ada3)
> <Port Multiplier 47261095 1f06> at scbus0 target 15 lun 0 (pass4,pmp1)
> <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 0 lun 0 (pass5,ada4)
> <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 1 lun 0 (pass6,ada5)
> <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 2 lun 0 (pass7,ada6)
> <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 3 lun 0 (pass8,ada7)
> <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 4 lun 0 (pass9,ada8)
> <Port Multiplier 37261095 1706> at scbus1 target 15 lun 0 (pass10,pmp0)
> <Areca usrvar R001> at scbus4 target 0 lun 0 (pass11,da0)
> <Areca backup1 R001> at scbus4 target 0 lun 1 (pass12,da1)
> <Areca RAID controller R001> at scbus4 target 16 lun 0 (pass13)
> <AMCC 9650SE-2LP DISK 4.10> at scbus5 target 0 lun 0 (pass14,da2)
> <ST31000333AS SD35> at scbus6 target 0 lun 0 (pass15,ada9)
> <ST31000528AS CC35> at scbus7 target 0 lun 0 (pass16,ada10)
> <ST31000340AS SD1A> at scbus8 target 0 lun 0 (pass17,ada11)
> <WDC WD1002FAEX-00Z3A0 05.01D05> at scbus11 target 0 lun 0 (pass18,ada12)
>
>
> Ever since I added a new PM, I have been seeing a new error (READ LOG EXT) along with a the odd slot timeout error.
>
>
> Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 47000000
> Feb 7 23:49:32 backup3 kernel: siisch1: Timeout on slot 26
> Feb 7 23:49:32 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000
> Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 43000000
> Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 30
> Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000
> Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 03000000
> Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 25
> Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000
> Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 01000000
> Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 24
> Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000
This indicates the controller on channel 1 (siisch1) is "stalled"
waiting for underlying communication with the device attached to it.
> Feb 7 23:57:59 backup3 kernel: siisch1: Error while READ LOG EXT
> Feb 8 00:13:36 backup3 kernel: siisch1: Error while READ LOG EXT
> Feb 8 00:21:53 backup3 kernel: siisch1: Error while READ LOG EXT
> Feb 8 00:22:16 backup3 kernel: siisch1: Error while READ LOG EXT
> Feb 8 00:39:13 backup3 kernel: siisch1: Error while READ LOG EXT
> Feb 8 01:24:25 backup3 kernel: siisch1: Error while READ LOG EXT
> Feb 8 01:33:52 backup3 last message repeated 2 times
> Feb 8 01:43:45 backup3 kernel: siisch1: Error while READ LOG EXT
> Feb 8 01:50:31 backup3 last message repeated 2 times
> Feb 8 01:55:20 backup3 kernel: siisch1: Error while READ LOG EXT
> Feb 8 02:26:26 backup3 kernel: siisch1: Error while READ LOG EXT
> Feb 8 02:27:24 backup3 kernel: siisch1: Error while READ LOG EXT
> Feb 8 03:16:28 backup3 kernel: siisch1: Error while READ LOG EXT
> Feb 8 03:36:20 backup3 kernel: siisch1: Error while READ LOG EXT
> Feb 8 04:04:05 backup3 kernel: siisch1: Error while READ LOG EXT
This indicates the underlying device was handed a READ LOG EXT ATA
command (command 0x2f) and the device did not respond promptly
(resulting in the timeout messages you see).
> smartctl doesnt show any issues on the drives other than one that has some historical errors from a while ago. What are these errors and do I need to worry about them ? The "READ LOG EXT" ones are new.
>
> {snipping SMART stats}
You're focused heavily on the READ LOG EXT command. READ LOG EXT is
intended for accessing the GP Log section of a drive. EXT stands for
"Extended". "GP Log" means "General Purpose Log", and is where all
sorts of logging information regarding drive performance is stored.
It's usually stored within a reserved section of the platters, or in the
HPA area. It's not within a "standard" user-accessible LBA/sector
region. This is a completely separate log from that of SMART logs.
You can review the different types of "logs" on a device by reviewing
the ATA8-ACS specification here. See Annex A, section A.1, page 362:
http://www.t13.org/documents/UploadedDocuments/docs2007/D1699r4a-ATA8-ACS.pdf
This is almost certainly a lower level problem with the disk that cannot
be addressed/solved via normal means. Thus, my recommendation is to
replace the disk.
If you would rather not replace the disk, I can try to step you through
looking at the GPLog sections of the disk to see if you can trigger the
problem -- and I have a feeling you'll be able to, but I won't
necessarily be able to tell you where the actual problem lies
hardware-wise, nor will I be able to solve the problem.
Regarding the repeated errors at semi-regular (but not entirely)
intervals: are you using smartd? Do you have a cronjob that issues
smartctl -a or smartctl -x commands at intervals? I imagine any of
these could be tickling something lower level.
Also, please upgrade your smartmontools to 5.42. It does provide some
further enhancements that are useful.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |
More information about the freebsd-stable
mailing list