ahcich Timeouts SATA SSD

Peter Jeremy peter at rulingia.com
Mon Oct 15 09:59:10 UTC 2012


On 2012-Oct-14 16:03:39 -0700, nate keegan <nate.keegan at gmail.com> wrote:
>Based on what I'm seeing for post types on freebsd-questions this
>might be the best forum for this issue as it looks like some sort of a
>strange issue or bug between FreeBSD 8.2/9.0 and SATA SSD drives.
>
>This system was commissioned in February of 2012 and ran without issue
>as a ZFS backup system on our network until about 3 weeks ago.
>
>At that time I started getting kernel panics due to timeouts to the
>on-board SATA devices. The only change to the system since it was
>built was to add an SSD for swap (32 Gb swap device) and this issue
>did not happen until several months after this was added.

This _does_ sound more like hardware than software - it's difficult
to envisage a software bug that does nothing for 6 months and then
makes the system hang regularly.

Has there been any significant change to the system load, how much
data is being transferred, clients, how full the data zpool is, etc
that might correlate with the onset of hangs?

>I then moved to systematically replacing items such as SATA cables,
>memory, motherboard, etc and the problem continued. For example, I
>swapped out the 4 SATA cables with brand new SATA cables and waited to
>see if the problem happened again. Once it did I moved on to replacing
>the motherboard with an identical motherboard, waited, etc.

Have you tried replacing RAM & PSU?

>The system logs do not show anything prior to event happening and the
>OS will respond to ping requests after the issue and if you have an
>active SSH session you will remain connected to the system until you
>attempt to do something like 'ls', 'ps', etc.

This implies that the kernel is still active but the filesystem is
deadlocked.  Are you able to drop into DDB?  Is anything displayed
on the kernel?

>New SSH requests to the system get 'connection refused'.

This implies that sshd has died - a filesystem deadlock should result
in connection attempts either timing out or just hanging.

>I'm open to suggestions, direction, etc to see if I can nail down what
>is going on and put this issue to bed for not only myself but for
>anyone else who might run into it in the future.

Are you running a GENERIC kernel?  If not, what changes have you made?
Have you set any loader tunables or sysctls?
Have you scrubbed the pools?
If you run "gstat -a", do any devices have anomolous readings?

I can't offer any definite fixes but can suggest a few more things to
try:
1) Try FreeBSD-9.1RC2 and see if the problem persists.
2) Try a new kernel with
     options WITNESS
     options WITNESS_SKIPSPIN
   this may make a software bug more obvious (but will somewhat increase
   kernel overheads)
3) If you can afford it, detach the L2ARC - which removes one potential issue.
4) If you haven't already, build a kernel with
     makeoptions DEBUG=-g
     options KDB
     options KDB_TRACE
     options KDB_UNATTENDED
     options DDB
   this won't have any impact on normal operation but will simplify debugging.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-hardware/attachments/20121015/d201db99/attachment.sig>


More information about the freebsd-hardware mailing list