SIIS timeout with current r197392:
Pegasus Mc Cleaft
ken at mthelicon.com
Fri Sep 25 09:26:00 UTC 2009
----- Original Message -----
From: "James R. Van Artsdalen" <james-freebsd-current at jrv.org>
> Pegasus Mc Cleaft wrote:
>> Hello Current,
>>
>> Since my latest build of amd64-current kernel and world (r197392) I am
>> getting strange timeout errors in my dmesg and eventual system
>> instability.
>
> I believe mav has stated that error handling in SIIS isn't finished or
> is problematic in some way.
>
> I see similar problems: most of the time an error results in a hung
> device, requiring reboot. This usually happens within a TB or two or
> intense I/O: I have not yet seen a 6 TB ZFS pool complete a "scrub" due
> to this.
Hi James,
I believe I found the problem with my machine, and it _was_ my machine.
The device that was hanging is a Asus CD-ROM drive. The error messages
displayed were correct, I had a faulty SATA cable between the controller and
the drive (Funny how a SATA cable can go bad spontaniously). Re-boots of the
system did not clear the fault, but a full power down and power up would
mask the fault for about an hour and then it would start throwing the
messages into the log every few seconds. It was this behaviour that lead me
to believe it was a problem with the SIIS driver. It wasent until I noticed
on a reboot the system hung for a little while while interrogating the drive
during POST. After a cable change and a lot of swearing, the computer booted
fine and the error has never reappeared.
Some lessons learned:
1) Debug messages _MAY_ actually be telling the truth! :>
2) Reboots and software resets wont be heard from a SATA device whos
port has been scrambled by bad cabling
3) SATA cables may spontaniously decentergrate.
4) Modern computers respond less to threats than my older machines :>
This being said, I have seen the other fault where a device hangs during
high load / activity. Mine will, if it is going to do it, hang somewhere
around midnight to 3am when I am running maintance on the maching (find
/ -name "*.core" -exec rm {} /; ). It does exactly as you said where a
drive hangs, usually with the activity LED still lit. Sometimes the machine
will continue on and ZFS will carryon in a degraded state. The odd thing
about this is, it only started to so this when I was having problems with
the CD ROM on the SIIS card. The ZFS drives are on a completely different
controller (JMicron). When the SIIS controler was waiting for the scrambled
port to say hello, all sorts of weird things would happen. I would get
lock-up of the mouse for 2 seconds, keyboard would lock and if a key was
pressed when it happened, it would trigure off the key-repeat (much to my
amusement while reading email, hitting the delete key, have the keyboard
hang and watch it quickly run through deleting everything in my in-box :> ).
My query is, when it is hung, waiting for the SATA port to respond, could it
be possible to have the JMicron ports miss an event, or get a double IRQ and
cause the device to lock?
Best whishes,
Peg
More information about the freebsd-current
mailing list