Big problems with 7.1 locking up :-(

Tomas Randa freebsd at max.af.czu.cz
Mon Jan 12 13:04:17 PST 2009


Hello,

I have similar problems. The last "good" kernel I have from stable 
brach, october the 8. Then in next upgrade, I saw big problems with 
performance.
I tried ULE, 4BSD etc, but nothing helps, only downgrading system back.

Now I am trying 7.1-p1 and problems are here again. Mysql is waiting a 
lot of time with status "waiting for opening table" or "waiting for 
close tables"

I have 32bit FreeBSD with PAE, 1x xeon 5420, supermicro motherboard, 
areca SATA controller. Could not be problem in "da" device for example?

Thanks Tomas Randa

Garance A Drosihn wrote:
> At 2:55 PM +0000 1/12/09, Robert Watson wrote:
>> On Fri, 9 Jan 2009, Garance A Drosihn wrote:
>>
>>> At 2:39 PM -0500 1/9/09, Robert Blayzor wrote:
>>>> On Jan 8, 2009, at 8:58 PM, Pete French wrote:
>>>>> I have a number of HP 1U servers, all of which were running 7.0 
>>>>> perfectly happily. I have been testing 7.1 in it's various 
>>>>> incarnations for the last couple of months on our test server and 
>>>>> it has performed perfectly.
>>>>
>>>> I noticed a problem with 7.0 on a couple of Dell servers.  [...] 
>>>> We've since then compiled the kernel under the BSD scheduler to 
>>>> rule that out, and so far so good.
>>>>
>>>> Since ULE is now default in 7.1 and not in 7.0, perhaps you can try 
>>>> that?
>>>
>>> FWIW, the other guy I know who is having this problem had already 
>>> switched to using ULE under 7.0-release, and did not have any 
>>> problems with it.  So *his* problem was probably not related to 
>>> SCHED_ULE, unless something has recently changed there.
>>>
>>> Turns out he hasn't reverted back to 7.0-release just yet, so he's 
>>> going to try SCHED_4BSD and see if that helps his situation.
>>
>> Scheduler changes always come with some risk of exposing bugs that 
>> have existed in the code for a long time but never really manifested 
>> themselves. ULE is well shaken-out, having been under development for 
>> at least five years, but it is possible that some problems will 
>> become visible as a result of the switch.  I would encourage people 
>> to stick with ULE, but if you're having a stability problem then 
>> experimenting with scheduler as a variable that could be triggering 
>> the problem may well be useful to help track down the bug.
>
> Just to followup on this:  My friend did switch back to a 7.1 kernel with
> SCHED_4BSD, and he still ran into problems.  The error messages weren't
> the same, but errors did happen in the same high disk-I/O situations as
> the lockup happened with SCHED_ULE.  At this point he's fallen back to
> the 7.0-kernel that he had been running (which also has SCHED_ULE), and
> all the problems have gone away.  So at the moment he's running with a
> 7.0-ish kernel and the 7.1-release userland, without the hanging 
> problems.
> So the problem is something in the kernel, but it is *NOT* the scheduler
> (at least, not in his case).
>
> He is not eager to do a whole lot of experiments to track down the
> problem, since this is happening on busy production machines and he
> can't afford to have a lot of downtime on them (especially now that the
> semester at RPI has started up).  The systems have some large (2 TB)
> filesystems on them, and the lockups occur in high disk-I/O situations.
> He's seeing the problem on one system which is a dual CPU quad-core
> xeon, and another which is a 64 bit P4 with hyperthreading.  The one
> thing in common between the two setups is that the boot drives + a
> 3ware controller (with its array of RAID disks) is moved from one
> machine to the other one:
>
>   "its a 3ware 9500 12 port model, the boot drive is connected to
>    an ICH6 in IDE mode, and yes, I've run it in single, single with
>    hyper threading, and 8 way mode.  All 64 bit."
>
> We still have no idea where the problem really is.  For all we know,
> someone spilled a Pepsi on it when he wasn't looking...
>


More information about the freebsd-stable mailing list