siis_timeout with port multiplier on 9.0R
Nenhum_de_Nos
matheus at eternamente.info
Sat May 26 12:53:16 UTC 2012
On Wed, May 23, 2012 17:07, Nenhum_de_Nos wrote:
>
> On Wed, May 23, 2012 12:54, Nenhum_de_Nos wrote:
>>
>> On Wed, May 23, 2012 11:22, Mike Tancsa wrote:
>>> On 5/21/2012 9:04 PM, Matthew Gamble wrote:
>>>> We have a box with 3 SiI3124 SATA controllers and 9 CFI-B53PM 5 Port Backplane port
>>>> multipliers
>>>> (the "backblaze storage pod"). Under intense IO (ZFS rebuild, presently) the system will lock
>>>> up all IO for 3-4 minutes and the following entry appears in the dmesg:
>>>>
>>>> siisch11: Timeout on slot 30
>>>> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr
>>>> 00000000
>>>> siisch11: ... waiting for slots 25000000
>>>> siisch11: Timeout on slot 26
>>>> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr
>>>> 00000000
>>>> siisch11: ... waiting for slots 21000000
>>>> siisch11: Timeout on slot 29
>>>> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr
>>>> 00000000
>>>> siisch11: ... waiting for slots 01000000
>>>> siisch11: Timeout on slot 24
>>>> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr
>>>> 00000000
>>>>
>>>> The errors are on different siisch devices so its not likely to be a SATA cable issue unless
>>>> multiple cables all went bad at the same time. On the advice of some other posts to the
>>>> mailing
>>>> list I've already tried locking the SATA rev to one with the following in /boot/loader.conf
>>>> which didn't
>>>
>>> If they are on different siisch devices then yes, it does not sound like
>>> a bad cable. However, I have had that issue with similar errors above
>>> that were fixed by using new cables. If you are using 9.0R, I would
>>> suggest upgrading to stable. There have been a few bug fixes /
>>> improvements to the drivers as well as various parts of the disk
>>> subsystem. I have RELENG8 right now and its quite stable for me on a
>>> 25TB system which is for the most part similar to 9.x
>>>
>>> # zpool status
>>> pool: zbackup1
>>> state: ONLINE
>>> scan: scrub repaired 0 in 11h11m with 0 errors on Mon Jul 25 19:51:11 2011
>>> config:
>>>
>>> NAME STATE READ WRITE CKSUM
>>> zbackup1 ONLINE 0 0 0
>>> raidz1-0 ONLINE 0 0 0
>>> ada14 ONLINE 0 0 0
>>> ada16 ONLINE 0 0 0
>>> ada13 ONLINE 0 0 0
>>> ada15 ONLINE 0 0 0
>>> raidz1-1 ONLINE 0 0 0
>>> ada0 ONLINE 0 0 0
>>> ada1 ONLINE 0 0 0
>>> ada2 ONLINE 0 0 0
>>> ada3 ONLINE 0 0 0
>>> raidz1-2 ONLINE 0 0 0
>>> ada4 ONLINE 0 0 0
>>> ada5 ONLINE 0 0 0
>>> ada6 ONLINE 0 0 0
>>> ada7 ONLINE 0 0 0
>>> raidz1-3 ONLINE 0 0 0
>>> ada9 ONLINE 0 0 0
>>> ada10 ONLINE 0 0 0
>>> ada11 ONLINE 0 0 0
>>> ada12 ONLINE 0 0 0
>>>
>>> errors: No known data errors
>>> # zpool get all zbackup1
>>> NAME PROPERTY VALUE SOURCE
>>> zbackup1 size 25.4T -
>>> zbackup1 capacity 68% -
>>> zbackup1 altroot - default
>>> zbackup1 health ONLINE -
>>> zbackup1 guid 917659042733882722 default
>>> zbackup1 version 28 default
>>> zbackup1 bootfs - default
>>> zbackup1 delegation on default
>>> zbackup1 autoreplace off default
>>> zbackup1 cachefile - default
>>> zbackup1 failmode wait default
>>> zbackup1 listsnapshots on local
>>> zbackup1 autoexpand off default
>>> zbackup1 dedupditto 0 default
>>> zbackup1 dedupratio 1.00x -
>>> zbackup1 free 7.95T -
>>> zbackup1 allocated 17.4T -
>>> zbackup1 readonly off -
>>> zbackup1 comment - default
>>>
>>> This is on an adonics adaptor.
>>
>> my adapter is this adonics as well, and my lucky is not the same. the host card is also sis3124
>> PCI ?
>>
>> I will upgrade to 9-STABLE and try.
>>
>> thanks,
>>
>> matheus
>
> Mike,
>
> I saw FreeBSD webcvs info on siis.c. The only change in 9-STABLE is this:
>
> Revision 1.43.2.2: download - view: text, markup, annotated - select for diffs
> Sat Dec 31 15:31:34 2011 UTC (4 months, 3 weeks ago) by hselasky
> Branches: RELENG_9
> Diff to: previous 1.43.2.1: preferred, colored; branchpoint 1.43: preferred, colored; next MAIN
> 1.44: preferred, colored
> Changes since revision 1.43.2.1: +2 -7 lines
>
> SVN rev 229118 on 2011-12-31 15:31:34Z by hselasky
>
> MFC r227701, r227847 and r227849:
> Move the device_delete_all_children() function from usb_util.c
> to kern/subr_bus.c. Simplify this function so that it no longer
> depends on malloc() to execute. Rename device_delete_all_children()
> into device_delete_children(). Identify a few other places where
> it makes sense to use device_delete_children().
>
> all others, 9.0R has it. As i don't know this stuff, I can't tell how much it would affect my
> issue (and the other Matheus/Matthew as well), but I imagine not much as it says something usb on
> it :)
>
> as I'm not at home, will try the cabling thing when I get home.
>
> thanks,
>
> matheus
Finished,
unfortunately the same result :(
I've also changed cables, used brand new ones, and the same thing happened :(
thanks,
matheus
>>> ---Mike
>>>>
>>>> hint.siisch.0.sata_rev=1
>>>> hint.siisch.1.sata_rev=1
>>>> hint.siisch.2.sata_rev=1
>>>> hint.siisch.3.sata_rev=1
>>>> hint.siisch.4.sata_rev=1
>>>> hint.siisch.5.sata_rev=1
>>>> hint.siisch.6.sata_rev=1
>>>> hint.siisch.7.sata_rev=1
>>>> hint.siisch.8.sata_rev=1
>>>> hint.siisch.9.sata_rev=1
>>>> hint.siisch.10.sata_rev=1
>>>> hint.siisch.11.sata_rev=1
>>>>
>>>> From time to time this is also causing one of the attached drives to go offline:
>>>>
>>>> siisch0: siis_timeout is 00040000 ss 40000000 rs 40000000 es 00000000 sts 801f2000 serr
>>>> 00000000
>>>> (ada0:siisch0:0:0:0): lost device
>>>> (ada0:siisch0:0:0:0): removing device entry
>>>> ada0 at siisch0 bus 0 scbus0 target 0 lun 0
>>>> ada0: <WDC WD30EZRX-00MMMB0 80.00A80> ATA-8 SATA 3.x device
>>>> ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
>>>> ada0: Command Queueing enabled
>>>> ada0: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
>>>> ada0: Previously was known as ad4
>>>> siisch11: Timeout on slot 30
>>>>
>>>> When the drive goes offline that causes the ZFS rebuild to restart, and so it's never
>>>> finishing
>>>> the rebuild of the array. Does anyone have any insight into what could be causing the
>>>> timeouts
>>>> and what we can do to resolve them? Right now my priority is to get the system a bit more
>>>> stable so the current ZFS rebuild can complete â right now it's been doing the same rebuild
>>>> for just over 6 days and the timeouts and drive drop offs are causing it to restart
>>>> constantly.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________
>>>>
>>>> This electronic message contains information from Primus Telecommunications Canada Inc.
>>>> ("PRIMUS") , which may be legally privileged and confidential. The information is intended to
>>>> be for the use of the individual(s) or entity named above. If you are not the intended
>>>> recipient, be aware that any disclosure, copying, distribution or use of the contents of this
>>>> information is prohibited. If you have received this electronic message in error, please
>>>> notify
>>>> us by telephone or e-mail (to the number or address above) immediately. Any views, opinions or
>>>> advice expressed in this electronic message are not necessarily the views, opinions or advice
>>>> of PRIMUS. It is the responsibility of the recipient to ensure that any attachments are virus
>>>> free and PRIMUS bears no responsibility for any loss or damage arising in any way from the use
>>>> thereof.The term "PRIMUS" includes its affiliates.
>>>>
>>>> ________________________________
>>>> Pour la version en français de ce message, veuillez voir
>>>> http://www.primustel.ca/fr/legal/cs.htm
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> freebsd-stable at freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>>>
>>>
>>> --
>>> -------------------
>>> Mike Tancsa, tel +1 519 651 3400
>>> Sentex Communications, mike at sentex.net
>>> Providing Internet services since 1994 www.sentex.net
>>> Cambridge, Ontario Canada http://www.tancsa.com/
>>> _______________________________________________
>>> freebsd-stable at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>>>
>>
>>
>> --
>> We will call you Cygnus,
>> The God of balance you shall be
>>
>> A: Because it messes up the order in which people normally read text.
>> Q: Why is top-posting such a bad thing?
>>
>> http://en.wikipedia.org/wiki/Posting_style
>> _______________________________________________
>> freebsd-stable at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>>
>
>
> --
> We will call you Cygnus,
> The God of balance you shall be
>
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
>
> http://en.wikipedia.org/wiki/Posting_style
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>
--
We will call you Cygnus,
The God of balance you shall be
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
http://en.wikipedia.org/wiki/Posting_style
More information about the freebsd-stable
mailing list