siisch1: Error while READ LOG EXT
Mike Tancsa
mike at sentex.net
Wed Feb 8 21:00:59 UTC 2012
I have a 4 port eSata PCIe card with 3 external port multipliers attached on an AMD64 box (8G of RAM), RELENG8 from Feb1st.
siis0 at pci0:5:0:0: class=0x010400 card=0x71241095 chip=0x31241095 rev=0x02 hdr=0x00
vendor = 'Silicon Image Inc (Was: CMD Technology Inc)'
device = 'PCI-X to Serial ATA Controller (SiI 3124)'
class = mass storage
subclass = RAID
bar [10] = type Memory, range 64, base 0xb4408000, size 128, enabled
bar [18] = type Memory, range 64, base 0xb4400000, size 32768, enabled
bar [20] = type I/O Port, range 32, base 0x3000, size 16, enabled
cap 01[64] = powerspec 2 supports D0 D1 D2 D3 current D0
cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 12 split transactions
cap 05[54] = MSI supports 1 message, 64 bit enabled with 1 message
siis0: <SiI3124 SATA controller> port 0x3000-0x300f mem 0xb4408000-0xb440807f,0xb4400000-0xb4407fff irq 19 at device 0.0 on pci5
siis0: [ITHREAD]
siisch0: <SIIS channel> at channel 0 on siis0
siisch0: [ITHREAD]
siisch1: <SIIS channel> at channel 1 on siis0
siisch1: [ITHREAD]
siisch2: <SIIS channel> at channel 2 on siis0
siisch2: [ITHREAD]
siisch3: <SIIS channel> at channel 3 on siis0
siisch3: [ITHREAD]
# camcontrol devlist
<WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 0 lun 0 (pass0,ada0)
<WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 1 lun 0 (pass1,ada1)
<WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 2 lun 0 (pass2,ada2)
<WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 3 lun 0 (pass3,ada3)
<Port Multiplier 47261095 1f06> at scbus0 target 15 lun 0 (pass4,pmp1)
<WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 0 lun 0 (pass5,ada4)
<WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 1 lun 0 (pass6,ada5)
<WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 2 lun 0 (pass7,ada6)
<WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 3 lun 0 (pass8,ada7)
<WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 4 lun 0 (pass9,ada8)
<Port Multiplier 37261095 1706> at scbus1 target 15 lun 0 (pass10,pmp0)
<Areca usrvar R001> at scbus4 target 0 lun 0 (pass11,da0)
<Areca backup1 R001> at scbus4 target 0 lun 1 (pass12,da1)
<Areca RAID controller R001> at scbus4 target 16 lun 0 (pass13)
<AMCC 9650SE-2LP DISK 4.10> at scbus5 target 0 lun 0 (pass14,da2)
<ST31000333AS SD35> at scbus6 target 0 lun 0 (pass15,ada9)
<ST31000528AS CC35> at scbus7 target 0 lun 0 (pass16,ada10)
<ST31000340AS SD1A> at scbus8 target 0 lun 0 (pass17,ada11)
<WDC WD1002FAEX-00Z3A0 05.01D05> at scbus11 target 0 lun 0 (pass18,ada12)
Ever since I added a new PM, I have been seeing a new error (READ LOG EXT) along with a the odd slot timeout error.
Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 47000000
Feb 7 23:49:32 backup3 kernel: siisch1: Timeout on slot 26
Feb 7 23:49:32 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000
Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 43000000
Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 30
Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000
Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 03000000
Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 25
Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000
Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 01000000
Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 24
Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000
Feb 7 23:57:59 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 8 00:13:36 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 8 00:21:53 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 8 00:22:16 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 8 00:39:13 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 8 01:24:25 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 8 01:33:52 backup3 last message repeated 2 times
Feb 8 01:43:45 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 8 01:50:31 backup3 last message repeated 2 times
Feb 8 01:55:20 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 8 02:26:26 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 8 02:27:24 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 8 03:16:28 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 8 03:36:20 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 8 04:04:05 backup3 kernel: siisch1: Error while READ LOG EXT
smartctl doesnt show any issues on the drives other than one that has some historical errors from a while ago. What are these errors and do I need to worry about them ? The "READ LOG EXT" ones are new.
This is the only drive with anything in its logs so not sure if this is causing the driver to complain
smartctl -a /dev/ada9
smartctl 5.41 2011-06-09 r3365 [FreeBSD 8.2-STABLE amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.11
Device Model: ST31000333AS
Serial Number: 9TE14SRV
LU WWN Device Id: 5 000c50 010a39664
Firmware Version: SD35
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Wed Feb 8 15:49:12 2012 EST
==> WARNING: There are known problems with these drives,
see the following Seagate web pages:
http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931
http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207951
http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207957
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 617) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 203) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103b) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 111 099 006 Pre-fail Always - 41201023
3 Spin_Up_Time 0x0003 093 092 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 68
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 2
7 Seek_Error_Rate 0x000f 088 060 030 Pre-fail Always - 791743293
9 Power_On_Hours 0x0032 075 075 000 Old_age Always - 22755
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 2
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 68
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 095 095 000 Old_age Always - 5
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 001 001 000 Old_age Always - 961
190 Airflow_Temperature_Cel 0x0022 065 056 045 Old_age Always - 35 (Min/Max 33/37)
194 Temperature_Celsius 0x0022 035 044 000 Old_age Always - 35 (0 25 0 0)
195 Hardware_ECC_Recovered 0x001a 049 030 000 Old_age Always - 41201023
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 5
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 5 occurred at disk power-on lifetime: 18292 hours (762 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 1a ff ff ff 4f 00 11d+02:29:18.542 READ FPDMA QUEUED
60 00 1a ff ff ff 4f 00 11d+02:29:18.542 READ FPDMA QUEUED
60 00 1b ff ff ff 4f 00 11d+02:29:18.541 READ FPDMA QUEUED
60 00 19 ff ff ff 4f 00 11d+02:29:18.541 READ FPDMA QUEUED
60 00 1c ff ff ff 4f 00 11d+02:29:18.541 READ FPDMA QUEUED
Error 4 occurred at disk power-on lifetime: 18292 hours (762 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 1a ff ff ff 4f 00 11d+02:29:15.783 READ FPDMA QUEUED
60 00 1a ff ff ff 4f 00 11d+02:29:15.780 READ FPDMA QUEUED
60 00 1b ff ff ff 4f 00 11d+02:29:15.732 READ FPDMA QUEUED
60 00 19 ff ff ff 4f 00 11d+02:29:15.732 READ FPDMA QUEUED
60 00 1c ff ff ff 4f 00 11d+02:29:15.731 READ FPDMA QUEUED
Error 3 occurred at disk power-on lifetime: 18292 hours (762 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 1b ff ff ff 4f 00 11d+02:29:12.889 READ FPDMA QUEUED
60 00 19 ff ff ff 4f 00 11d+02:29:12.889 READ FPDMA QUEUED
60 00 1c ff ff ff 4f 00 11d+02:29:12.888 READ FPDMA QUEUED
60 00 1c ff ff ff 4f 00 11d+02:29:12.888 READ FPDMA QUEUED
60 00 1a ff ff ff 4f 00 11d+02:29:12.888 READ FPDMA QUEUED
Error 2 occurred at disk power-on lifetime: 18292 hours (762 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 1b ff ff ff 4f 00 11d+02:29:10.011 READ FPDMA QUEUED
60 00 19 ff ff ff 4f 00 11d+02:29:10.011 READ FPDMA QUEUED
60 00 1c ff ff ff 4f 00 11d+02:29:10.010 READ FPDMA QUEUED
60 00 1c ff ff ff 4f 00 11d+02:29:10.010 READ FPDMA QUEUED
60 00 1a ff ff ff 4f 00 11d+02:29:10.010 READ FPDMA QUEUED
Error 1 occurred at disk power-on lifetime: 18292 hours (762 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 1b ff ff ff 4f 00 11d+02:29:07.148 READ FPDMA QUEUED
60 00 19 ff ff ff 4f 00 11d+02:29:07.140 READ FPDMA QUEUED
60 00 1c ff ff ff 4f 00 11d+02:29:07.131 READ FPDMA QUEUED
60 00 1c ff ff ff 4f 00 11d+02:29:07.117 READ FPDMA QUEUED
60 00 35 ff ff ff 4f 00 11d+02:29:07.111 READ FPDMA QUEUED
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
---Mike
--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike at sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/
More information about the freebsd-stable
mailing list