SMP system not running SMP

Arno J. Klaassen arno at heho.snv.jussieu.fr
Thu Jun 29 20:43:18 UTC 2006


"UEMURA (fka. MAENAKA) Tetsuya" <maenaka at pluto.dti.ne.jp> writes:

> Posted on Tue, 27 Jun 2006 15:06:51 +0100
> By default, FreeBSD couldn't start. Dumping the ahd state when probing
> the da and simply stopped. So I set the SCSI BIOS to restrict the device
> speed upto 80MB/s and the problem went away. After that, the machine
> runs flawlessly for 8 months.

I have a Tyan S2882 which I cannot get up for more than a couple of
days under moderate load, and the symptoms seem related :

config :

 - tracking -stable 
 - 8G RAM 
 - latest BIOS 3ware 9500S-12 with 1.1T data
 - RAID-1 MAXTOR ATLAS10K5_73WLS as system-disk on ahd0
 - doing nothing else than some test-scripts implying fairly
   moderate nfs-traffic (i.e. scripts via nfs, (rarely needed) data
   either on NFS or raid, scripts being CPU-intensive)


symptom :

 - systems cold-boots fine (SMP dual opteron 248)
 - runs OK for a couple of minutes/hours/days
 - then total freeze; *never* a panic in 9 months
 - warm reset either does not detect da0 or indeed dumps ahd state
   when probing it
 - even cold reboot sometimes has to be repeated once or twice in order
   to redetect correctly da0


has tried :

 - changed scsi-cables and termination three times : no deal
 - decreased device speed to 80Mhz : seems to eliminate the "minutes"
   part from "runs OK for a couple of minutes/hours/days" ...


observations :

 - this week I downloaded the latest manual from tyan and came across
   the following jumper setting (dunno if it was in the original
   version or whether I overlooked it; the printed manual is at the
   customer's site) :

    "Set PCI-X Bridge A (PCI 3 & PCI 4 & SCSI7902 & BCM5704) to operate at
     a maximum 66MHz;
     Note: Due to the PCI-X specifications it will be necessary to set
     this bus to 66MHz if a 133/100MHz PCI-X card is
     added to this bus."

   Since I do have a 100MHz PCI-X card (3ware) I set this jumper;
   system up for three days now, cannot confirm right now this was the
   culprit but other AMD811X based systems might have the same issue.

 - this board has dual ahd and dual bge :

   vmstat -i (I just rebooted for an upgrade -stable + linux_base) :

    irq24: bge0 ahd0                   16826          2
    irq25: bge1 ahd1                 1305665        157

  network is attached to bge1, disk is on ahd0.  Interestingly, when I
  provoke insane swapping, it is the "irq25:" process which consumes
  50-90%! of cpu-time, but when I stop the program provoking swapping
  and redo vmstat -i, it indeed reports slightly increased irq24
  activity but no noticeable change in irq25 activity ...
  ( I put hint.ahd.1.disabled="1" in /boot/loader.conf since
    I do not need ahd1 but that does not seem to do anything )

FYI.

I can test on this box for a couple of more weeks, feel free to
contact me for more information.

Thanx, regards, Arno

-- 

  Arno J. Klaassen

  SCITO S.A.
  8 rue des Haies
  F-75020 Paris, France
  http://scito.com


More information about the freebsd-amd64 mailing list