System freeze: Adaptec (aac) timeouts (releng 8)

Dennis Koegel dk at neveragain.de
Wed Sep 14 08:08:32 UTC 2011


Cheers,

we have a reproducible system freeze due to Adaptec driver (aac) timeouts:

Sep  3 05:26:44 foo kernel: aac0: COMMAND 0xffffff80005ae4c0 (TYPE 502) TIMEOUT AFTER 129 SECONDS
Sep  3 05:26:44 foo kernel: aac0: COMMAND 0xffffff80005ac0e0 (TYPE 502) TIMEOUT AFTER 129 SECONDS
Sep  3 05:26:44 foo kernel: aac0: COMMAND 0xffffff80005b0fa0 (TYPE 502) TIMEOUT AFTER 129 SECONDS
<dozens more of these...>

Once this happens, the userland seems to be alive, but the controller is
completely dead. As soon as the disk subsystem is involved, any process
hangs forever (e.g. SSH crypto-exchange still happens, but a shell won't
even start anymore).

We observe the same issue on two systems of (mostly) identical spec, so
it's not a hardware issue.

Apparently this only happens under heavy disk i/o and high cpu load.
Notably high write throughput plus a 'zpool scrub' on a large
GELI-backed zpool usually triggers the problem after a few hours.
Without high activity, they run smooth for weeks.

Both systems are amd64 with an Adaptec 5805 controller and 16 disks (of
which two form a RAID-1 system volume (UFS), and the remaining 14 serve
as JBOD for a large zpool -- a total of 15 "aacd" devices).

Both were running 8.2R originally. I've taken them to 8-STABLE now and
also applied svn r222951 (where the MFC was forgotten, it seems), but
the problem remains.

Any help is greatly appreciated.

Thanks,
- D.


More information about the freebsd-stable mailing list