iir + Tyan S2460 + SMP problems

Douglas K. Rand rand at meridian-enviro.com
Fri Apr 21 21:59:59 UTC 2006


We're having problems with FreeBSD 5.4, 6.0, and 6.1 and an ICP Vortex
GDT8546RZ 4 port SATA RAID card in a Tyan S2460 system with dual AMD
Athlon MP 1600+ CPUs. We do not have any problems with this
configuration under FreeBSD 4.11, and we have the same ICP cards in
Tyan based Opterion system (S2882 and S4882) run with out problems
under FreeBSD 5.4 and 6.1.

We can reproduce the problem on two different S2460 based systems, and
have tried 2 seperate ICP GDT8546RZ cards, so we don't believe it is a
hardware problem. (Our success with FreeBSD 4.11 also provides some
evidence that our hardware is OK.)

The problem is that the system seems to stop doing any disk IO through
the ICP card. Processes that don't need to page in work fine. (You can
hit return in a shell, get another login: prompt on other consoles,
and the like.) The system continues to respond to pings, but anything
that attempts to do a disk IO simply stops. Sometimes the kernel emits
messages like this:

  swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096

The test we are using to produce this "hang" is a fairly trivial
expansion of a tar ball being fed via nc from another system. We run
on the source system:

   tar cf - radar | nc -w 3 10.10.10.229 12345

And on the system being tested we run: 

   nc -l 12345 | tar xvf -

One iteration of this test is the extraction of a 1.2 GB directory of
2,274 files.

The problem only exists with SMP kernels. While our other tests almost
always failed in the first iteration or two, the longest time to
failure was 5 iterations. With out SMP the test ran with out problems
for 570 iterations over 18 hours.

We've tried a number of different tests.  These tests are with a stock
6.1-RC1 kernel from the RC CD's. Unless otherwise specified, all tests
are on a UFS2 filesystem with softupdates enabled and a SMP enabled
GENERIC kernel.  

  * !SMP: Ran 570 iterations in 18 hours with out a problem, test
    terminated by hand. 

  * Large (190 GB) UFS2 filesystem with soft updates enabled and SMP
    kernel: Fails during the first iteration. 

  * Medium (12 GB) UFS2 filesystem with soft updates enabled and SMP
    kernel: Fails during the first iteration. 

  * !softupdates: fails during first iteration. 

  * !ACPI: fails during the first iteration. 

  * UFS1: fails during the first iteration. 

  * UFS1 + !ADAPTIVE_GIANT: failed during the first iteration. 

  * !ADAPTIVE_GIANT: failed during the first iteration. 

  * Cleared motherboard CMOS: failed at the end of the second
    iteration. 

  * FULL_PREEMPTION: failed during the first iteration. 

  * !PREEMPTION: failed during the first iteration. 

  * WITNESS + WITNESS_KDB: failed during the second iteration with no
    witness related kernel messages and with out entering the kernel
    debugger. 

  * WITNESS + INVARIANTS: failed during the fifth iteration, again w/o
    kernel messages. 

  * Motherboard BIOS "Use PCI Interrupt Entries in MP Table" set to
    OFF: failed during first iteration. 

  * Motherboard BIOS "Multiprocessor Specification" set from 1.4 to
    1.1: failed during first iteration. 

  * MUTEX_WAKE_ALL: failed during first iteration.

I have a serial console and a kernel debugger enabled, so if anybody
has suggestions for probes to do once the system is hung let us know. 

Any advice is welcome. Well, except for "dump the Tyan S2460
motherboards" maybe.

Oh, and we're at current BIOS and firmware revs for both the ICP card
and the motherboard.


More information about the freebsd-stable mailing list