How to solve mysterious system lockups?

N. Harrington drumslayer2 at yahoo.com
Tue Jun 5 17:45:55 UTC 2007


--- Garrett Cooper <youshi10 at u.washington.edu> wrote:

> N. Harrington wrote:
> > --- Garrett Cooper <youshi10 at u.washington.edu>
> wrote:
> >
> >   
> >> N. Harrington wrote:
> >>     
> >>> Hello
> >>>   I have several systems that are used as squid
> >>> caching servers. I have some systems that use
> SCSI
> >>> disks and some  that use SATA disks. They are
> >>> identical in everyway except for the sata vs
> SCSI
> >>> drives. 
> >>>
> >>>  At random times, the sata based systems seem to
> >>>       
> >> be
> >>     
> >>> freezing. You can ping them and they respond,
> but
> >>>       
> >> you
> >>     
> >>> cannot log in. Nor are any logs processed during
> >>>       
> >> that
> >>     
> >>> time. 
> >>>
> >>>  I figure it mist be something to do with the
> >>>       
> >> disks,
> >>     
> >>> but I am not sure how to solve it. There seems
> to
> >>>       
> >> be
> >>     
> >>> little rhyme or reason. It does not happen
> >>>       
> >> necessarily
> >>     
> >>> during busy times. It can happen in the middle
> of
> >>>       
> >> the
> >>     
> >>> night.
> >>>
> >>>  Any pointers in how to track down the cause
> would
> >>>       
> >> be
> >>     
> >>> much appreciated.
> >>>
> >>>  Tyan S2881 Motherboard - 4gigs mem
> >>>  Using 4 SATA (or scsi) drives
> >>>  FreeBSD amd64 6.2-STABLE.
> >>>
> >>>  Thanks!
> >>>
> >>>   Nicole
> >>>   
> >>>       
> >> Nicole,
> >>     What's the driver in use for the SATA and the
> >> SCSI drives?
> >> -Garrett
> >>     
> >
> >  Hi Garret
> >  Here is the driver info.
> >
> > -- SATA
> >
> > atapci0: <SiI 3114 SATA150 controller> port
> >
>
0xbc00-0xbc07,0xb400-0xb403,0xb000-0xb007,0xac00-0xac03,0xa800-0xa80f
> >
> > mem
> > 0xfeafec00-0xfeafefff irq 17 at device 5.0 on pci3
> > ata2: <ATA channel 0> on atapci0
> > ata3: <ATA channel 1> on atapci0
> > ata4: <ATA channel 2> on atapci0
> > ata5: <ATA channel 3> on atapci0
> > pci3: <display, VGA> at device 6.0 (no driver
> > attached)
> > isab0: <PCI-ISA bridge> at device 7.0 on pci0
> > isa0: <ISA bus> on isab0
> > atapci1: <AMD 8111 UDMA133 controller> port
> > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf
> at
> > device 7.1 on pci0
> > ata0: <ATA channel 0> on atapci1
> > ata1: <ATA channel 1> on atapci1
> > pci0: <serial bus, SMBus> at device 7.2 (no driver
> > attached)
> > pci0: <bridge> at device 7.3 (no driver attached)
> > pcib2: <ACPI PCI-PCI bridge> at device 10.0 on
> pci0
> > pci2: <ACPI PCI bus> on pcib2
> >
> > -- SCSI
> >
> > ahd0: <Adaptec AIC7902 Ultra320 SCSI adapter> port
> 
> > 0x8000-0x80ff,0x7800-0x78ff
> > mem 0xfc89c000-0xfc89dfff irq 24 at device 10.0 on
> > pci2
> > ahd0: [GIANT-LOCKED]
> > aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X
> > 67-100Mhz, 512 SCBs
> > ahd1: <Adaptec AIC7902 Ultra320 SCSI adapter> port
> 
> > 0x8800-0x88ff,0x8400-0x84ff
> > mem 0xfc89e000-0xfc89ffff irq 25 at device 10.1 on
> > pci2
> > ahd1: [GIANT-LOCKED]
> > aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X
> > 67-100Mhz, 512 SCBs
> > pci0: <base peripheral, interrupt controller> at
> > device 10.1 (no driver attached)
> > pcib3: <ACPI PCI-PCI bridge> at device 11.0 on
> pci0
> > pci1: <ACPI PCI bus> on pcib3
> > pci0: <base peripheral, interrupt controller> at
> > device 11.1 (no driver attached)
> >
> >
> >
> >  Thanks!
> >
> >   Nicole
> Ok, so it's an AMD 8111 northbridge versus an
> Adaptec onboard SCSI 
> controller.
> 
> 1. What release / version of FreeBSD are you using?
> You should upgrade 
> to 6.2 STABLE because there have been a variety of
> issues worked out in previous releases.

 I have a range of Versions from 6.1-Pre to 6.2-STABLE
as of a few months ago.

> 2. Do you have any logs for activity during the
> hours when it locks up 
> (in particular anything interesting / fishy popping
> up)?

 Nope. That would make it too easy :)
 They commit suicide without a note.

> 3. What scheduler are you using? 4BSD, ULE?

4BSD

> 4. Does your machine (using the SATA controllers)
> lock up under heavy 
> load? If so, you may have a northbridge cooling
> issue that you need to 
> put a fan on. For instance, the motherboard that I
> was using for a while 
> (ASUS P5N-E SLI) was really close to my CPU
> heatsink, and there was a 
> lot of heat transfer between my northbridge and CPU
> heatsink, which was 
> raising the onboard temperatures 5~10 degrees C. The
> new motherboard 
> (ASUS P5B DLX) doesn't do that though.

 The lockups seem rather random. I have healthd
running and they never seem to show very warm. The
room is cold and the servers have great fans. Altho
healthd can seem wonky as the cpu temp has actually
gone below the minimum. Also the -2Volt line seems
very low. But some servers runs forever that way.

 At least with SCSI, since it seems to manage itself
as another layer away from the system, you get some
error messages. Sort of like windows 3.1 dropping to
dos. Verses sata issues where it's just blue screen of
death but without even some debugging code.

 I am going to try the patch chuck Swiger sent me and
see how that effects things. Also try a few
replacement sata cards. Altho that is always fun
especially in 1U servers. As well as seeing if using
SAS drives may help if I can find some cheap enough. 
 Do you think that using the ULE scheduler could
really help? 

 Thanks

  Nicole


> Cheers,
> -Garrett
> 



More information about the freebsd-questions mailing list