Problems with SMP on 6.1-STABLE-200608

Don O'Neil lists at lizardhill.com
Thu Apr 5 16:45:24 UTC 2007


More info on my problem.....

I swapped out the MB, CPU's, RAM, Power Supply and I still have the problem
with the kernel panicing when running on SMP.

When I re-build the kernel for NO SMP, the machine is rock solid, even under
VERY high loads.

I setup the old MB, CPU's, RAM & Power Supply on the bench, with a new
6.1-STABLE-200608 AND 6.2-RELEASE install and run dozens of copies of the
stress port. Even with it bringing loads up to >250, and eating up all
available RAM and SWAP I could not get the kernel to panic.

The ONLY difference between the bench setup and the production setup is a
3-Ware Escalade RAID card. I am going to setup another array on the bench
with a spare card I have and see if I can get it to panic under that setup
(which will be identical hardware wise to the production box). The only
thing I can think of right now is one of the following:

1) Bad RAID card or cables <- unlikely since it should show up even in
uniprocessor mode
2) Problem with the TWE driver in SMP mode <- more likely

I'm leaning towards #2, especially with the other recent reports of someone
else getting kernel panics with 3ware products. 

Anyone else have any thoughts as to what scenarios/tools I should try to
isolate the problem?

-----Original Message-----
From: owner-freebsd-questions at freebsd.org
[mailto:owner-freebsd-questions at freebsd.org] On Behalf Of
youshi10 at u.washington.edu
Sent: Wednesday, March 28, 2007 8:48 AM
To: freebsd-questions at freebsd.org
Subject: Re: Problems with SMP on 6.1-STABLE-200608

On Wed, 28 Mar 2007, Lowell Gilbert wrote:

> "Don O'Neil" <lists at lizardhill.com> writes:
>
>> I've been having problems with my server freezing up, having the #2 
>> CPU 'shut down', kernel panics, and all sorts of nastyness....
>>
>> Originally I thought it was exim, or possibly bind, or bad hardware 
>> (mb, cpu or memory)... I've swapped out the motherboard & CPU's & 
>> memory from an old server that was running 4.11 ROCK SOLID for years...
>>
>> At first I thought the problem was solved, but now it's popping up
again...
>> The 2nd CPU gets 'shut down', or kernel panics, esentially taking the 
>> system offline.
>
> There are lots of things this could be, and I certainly wouldn't rule 
> out hardware problems (power supply?).  Figuring out the problems 
> directly would certainly involve looking at more details than you're 
> listing here.
>
>> If I install a single CPU (non-smp) kernel, then the system works 
>> fine... (I did this on the old motherboard before I swapped it out, 
>> and it worked fine too).. So I'm wondering if there is an SMP bug or
problem I'm running into.
>>
>> I'm running 6.1-STABLE-200608, an ISO image I downloaded from the 
>> archives when I built the box (NOT 6.1-RELEASE).
>
> The whole point of making releases is that it's much easier to support 
> a small number of known reference software configurations.
>
>> I'm runining an Intel Serverworks motherboard with 2 1.4 GHz 
>> PIII's... The problem only seems to show up under high load.
>
> I don't think I've heard of anything similar.  I think there are a 
> bunch of these boards out there.
>
>> I'm wondering what I should do here...
>>
>> I'm concerned about doing a binary upgrade to 6.2 won't fix the 
>> problem, and I've tried using freebsd-update, but it complains about 
>> the version not being compatible.
>>
>> If I do a binary upgrade from CD, will it also update the kernel 
>> sources so I can build a new one? Will it complain about it not being
compatible?
>
> It can give you the sources; that's a menu option during the install.
> That should work fine.
>
>> Is there a way to 'force' the ID of the system to be 6.1-RELEASE so 
>> that freebsd-update will work?
>
> Well, yes, but there's a reason for the check, you know...
>
>> Will doing the 6.1-6.2 binary upgrade as posted by Colin also update 
>> the kernel sources?
>
> I don't know what procedure he described, so I don't know.  But if you 
> update to 6.2-RELEASE, then it will be easy to get the right sources 
> afterwards.  Again, that is the advantage of having releases.
>
>> Would my best option really be to start over with a fresh install 
>> rather than upgrade? (this would be painful)
>
> If it's that painful, you'd probably be well served to have a spare 
> system to stage changes on.  In addition to being good risk 
> management, it saves you time, which is worth something too.
>
>> I'm going to try to test out 6.2 on the old MB/CPU combo to see if I 
>> can re-create it under 6.2 as well before I do anything. As well as 
>> try doing an upgrade on the bench from CD from 6.1-STABLE-200608 to 
>> 6.2-RELEASE... Since this is a production server (and for months it 
>> was burned in with no apparent issues) I only have 1 shot at this to do
it right.
>>
>> Any help/recomendation would be appreciated.
>
> Good luck.

Honestly I would probe around your motherboard a bit checking voltages
(power supply) and/or heat dissipation, because those are the most likely
cases if it _only_ fails under high load. Next thing to check would be RAM
integrity.

-Garrett

_______________________________________________
freebsd-questions at freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscribe at freebsd.org"



More information about the freebsd-questions mailing list