How to solve mysterious system lockups?

youshi10 at u.washington.edu youshi10 at u.washington.edu
Tue Jun 5 22:38:53 UTC 2007


On Tue, 5 Jun 2007, N. Harrington wrote:

>
> --- Garrett Cooper <youshi10 at u.washington.edu> wrote:
>
>> N. Harrington wrote:
>>> --- Garrett Cooper <youshi10 at u.washington.edu>
>> wrote:
>>>
>>>
>>>> N. Harrington wrote:
>>>>
>>>>> Hello
>>>>>   I have several systems that are used as squid
>>>>> caching servers. I have some systems that use
>> SCSI
>>>>> disks and some  that use SATA disks. They are
>>>>> identical in everyway except for the sata vs
>> SCSI
>>>>> drives.
>>>>>
>>>>>  At random times, the sata based systems seem to
>>>>>
>>>> be
>>>>
>>>>> freezing. You can ping them and they respond,
>> but
>>>>>
>>>> you
>>>>
>>>>> cannot log in. Nor are any logs processed during
>>>>>
>>>> that
>>>>
>>>>> time.
>>>>>
>>>>>  I figure it mist be something to do with the
>>>>>
>>>> disks,
>>>>
>>>>> but I am not sure how to solve it. There seems
>> to
>>>>>
>>>> be
>>>>
>>>>> little rhyme or reason. It does not happen
>>>>>
>>>> necessarily
>>>>
>>>>> during busy times. It can happen in the middle
>> of
>>>>>
>>>> the
>>>>
>>>>> night.
>>>>>
>>>>>  Any pointers in how to track down the cause
>> would
>>>>>
>>>> be
>>>>
>>>>> much appreciated.
>>>>>
>>>>>  Tyan S2881 Motherboard - 4gigs mem
>>>>>  Using 4 SATA (or scsi) drives
>>>>>  FreeBSD amd64 6.2-STABLE.
>>>>>
>>>>>  Thanks!
>>>>>
>>>>>   Nicole
>>>>>
>>>>>
>>>> Nicole,
>>>>     What's the driver in use for the SATA and the
>>>> SCSI drives?
>>>> -Garrett
>>>>
>>>
>>>  Hi Garret
>>>  Here is the driver info.
>>>
>>> -- SATA
>>>
>>> atapci0: <SiI 3114 SATA150 controller> port
>>>
>>
> 0xbc00-0xbc07,0xb400-0xb403,0xb000-0xb007,0xac00-0xac03,0xa800-0xa80f
>>>
>>> mem
>>> 0xfeafec00-0xfeafefff irq 17 at device 5.0 on pci3
>>> ata2: <ATA channel 0> on atapci0
>>> ata3: <ATA channel 1> on atapci0
>>> ata4: <ATA channel 2> on atapci0
>>> ata5: <ATA channel 3> on atapci0
>>> pci3: <display, VGA> at device 6.0 (no driver
>>> attached)
>>> isab0: <PCI-ISA bridge> at device 7.0 on pci0
>>> isa0: <ISA bus> on isab0
>>> atapci1: <AMD 8111 UDMA133 controller> port
>>> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf
>> at
>>> device 7.1 on pci0
>>> ata0: <ATA channel 0> on atapci1
>>> ata1: <ATA channel 1> on atapci1
>>> pci0: <serial bus, SMBus> at device 7.2 (no driver
>>> attached)
>>> pci0: <bridge> at device 7.3 (no driver attached)
>>> pcib2: <ACPI PCI-PCI bridge> at device 10.0 on
>> pci0
>>> pci2: <ACPI PCI bus> on pcib2
>>>
>>> -- SCSI
>>>
>>> ahd0: <Adaptec AIC7902 Ultra320 SCSI adapter> port
>>
>>> 0x8000-0x80ff,0x7800-0x78ff
>>> mem 0xfc89c000-0xfc89dfff irq 24 at device 10.0 on
>>> pci2
>>> ahd0: [GIANT-LOCKED]
>>> aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X
>>> 67-100Mhz, 512 SCBs
>>> ahd1: <Adaptec AIC7902 Ultra320 SCSI adapter> port
>>
>>> 0x8800-0x88ff,0x8400-0x84ff
>>> mem 0xfc89e000-0xfc89ffff irq 25 at device 10.1 on
>>> pci2
>>> ahd1: [GIANT-LOCKED]
>>> aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X
>>> 67-100Mhz, 512 SCBs
>>> pci0: <base peripheral, interrupt controller> at
>>> device 10.1 (no driver attached)
>>> pcib3: <ACPI PCI-PCI bridge> at device 11.0 on
>> pci0
>>> pci1: <ACPI PCI bus> on pcib3
>>> pci0: <base peripheral, interrupt controller> at
>>> device 11.1 (no driver attached)
>>>
>>>
>>>
>>>  Thanks!
>>>
>>>   Nicole
>> Ok, so it's an AMD 8111 northbridge versus an
>> Adaptec onboard SCSI
>> controller.
>>
>> 1. What release / version of FreeBSD are you using?
>> You should upgrade
>> to 6.2 STABLE because there have been a variety of
>> issues worked out in previous releases.
>
> I have a range of Versions from 6.1-Pre to 6.2-STABLE
> as of a few months ago.
>
>> 2. Do you have any logs for activity during the
>> hours when it locks up
>> (in particular anything interesting / fishy popping
>> up)?
>
> Nope. That would make it too easy :)
> They commit suicide without a note.
>
>> 3. What scheduler are you using? 4BSD, ULE?
>
> 4BSD
>
>> 4. Does your machine (using the SATA controllers)
>> lock up under heavy
>> load? If so, you may have a northbridge cooling
>> issue that you need to
>> put a fan on. For instance, the motherboard that I
>> was using for a while
>> (ASUS P5N-E SLI) was really close to my CPU
>> heatsink, and there was a
>> lot of heat transfer between my northbridge and CPU
>> heatsink, which was
>> raising the onboard temperatures 5~10 degrees C. The
>> new motherboard
>> (ASUS P5B DLX) doesn't do that though.
>
> The lockups seem rather random. I have healthd
> running and they never seem to show very warm. The
> room is cold and the servers have great fans. Altho
> healthd can seem wonky as the cpu temp has actually
> gone below the minimum. Also the -2Volt line seems
> very low. But some servers runs forever that way.
>
> At least with SCSI, since it seems to manage itself
> as another layer away from the system, you get some
> error messages. Sort of like windows 3.1 dropping to
> dos. Verses sata issues where it's just blue screen of
> death but without even some debugging code.
>
> I am going to try the patch chuck Swiger sent me and
> see how that effects things. Also try a few
> replacement sata cards. Altho that is always fun
> especially in 1U servers. As well as seeing if using
> SAS drives may help if I can find some cheap enough.
> Do you think that using the ULE scheduler could
> really help?

Don't try it in 6.x. It's not stable by any means.

7-CURRENT's getting a lot closer though, especially as of late (past week)..

-Garrett



More information about the freebsd-questions mailing list