FreeBSD 8.0-BETA2/amd64 crashes on SMP under load

Anton Shterenlikht mexas at bristol.ac.uk
Wed Jul 29 18:24:23 UTC 2009


On Tue, Jul 28, 2009 at 09:48:39AM -0700, Marcel Moolenaar wrote:
> 
> On Jul 28, 2009, at 9:12 AM, Anton Shterenlikht wrote:
> 
> >On Tue, Jul 28, 2009 at 08:34:52AM -0700, Marcel Moolenaar wrote:
> >>
> >>On Jul 28, 2009, at 7:45 AM, Anton Shterenlikht wrote:
> >>
> >>>On Tue, Jul 28, 2009 at 02:22:50PM +0000, O. Hartmann wrote:
> >>>>Anton Shterenlikht wrote:
> >>>>>On Mon, Jul 27, 2009 at 10:04:28PM +0100, Anton
> >>>>>Shterenlikht wrote:
> >>>>>>On Mon, Jul 27, 2009 at 09:55:12PM +0200, O. Hartmann wrote:
> >>>>>>>Kamigishi Rei wrote:
> >>>>>>>>O. Hartmann wrote:
> >>>>>>>>>I have the problem of crashing FreeBSD 8.0-BETA2/amd64 under
> >>>>>>>>>load on
> >>>>>>>>>all of our SMP boxes. Is there an issue known at the moment?
> >>>>>>>>>If not, I
> >>>>>>>>>will prepare the kernel for whitnessing and provide more
> >>>>>>>>>informations,
> >>>>>>>>>if you wish.
> >>>>>>>>A quick question: what is in the crash message, i.e. the
> >>>>>>>>backtrace?
> >>>>>>>>And what kind of crash is it - a panic() or a fatal trap?
> >>>>>>>On the 8-core server box, I sometimes see :
> >>>>>>>
> >>>>>>>Fatal trap 12: page fault while in kernel mode
> >>>>>>>fault code              = supervisor read, page not present
> >>>>>>Not sure if it's related, but on ia64 SMP (2 cpus) with 8.0-
> >>>>>>current and
> >>>>>>later with 8.0-beta1 (I havent' built beta2 yet) I'm getting
> >>>>>>crashes
> >>>>>>under load every so often. E.g buildworld -j8 is likely to crash
> >>>>>>the
> >>>>>>box. No messages, just a sudden freeze, no backtrace or panic,
> >>>>>>and then reboot.
> >>>>>>
> >>>>>>If load is less heavy, e.g. fewer processes and some
> >>>>>>idle time, the
> >>>>>>problem doesn't seem to appear.
> >>>>>>
> >>>>>>I'm happy to do any further testing, if suggested.
> >>>>>
> >>>>>my ia64 8.0-beta1 SMP box died again on
> >>>>>make -j8 buildworld
> >>>>>with no panic or log entries.
> >>>>>
> >>>>>Is it possible that some kernel variable needs to
> >>>>>be increased? E.g. kern.maxproc, kern.maxfiles, etc.
> >>>>>Or perhaps I'm talking complete rubbish..
> >>>>>
> >>>>
> >>>>I suggest you try again with a UP kernel - a suggestion from a
> >>>>kernel-nnob, sorry. My SMP boxes work now with UP-kernel, but they
> >>>>are
> >>>>really slowish although they have modern Intel C2D/Penryn cores.
> >>>
> >>>I need SMP for OpenMP codes. It's a shame if SMP is buggy, but
> >>>I guess all is down to small user base..
> >>
> >>I have no problems with SMP. If you don't have a panic, then
> >>you may have a hardware problem.
> >
> >yes.. I thought of this myself. I guess I ought to check
> >the Event Logs available from MP on rx2600. But those messages
> >are so cryptic..
> 
> The event logs are also mostly useless IMO. They fill
> up the log buffer and then cause warnings during the
> boot. I typically use the errdump command in EFI to
> see if there's anything fishy.

I had a look at errdump, but the output is cryptic for me as well.
errdump mca, errdump init had some lines which appeared to be
errors, but nothing about memory, at least nothing obvious to me.

> >
> >>Check for MCA records.
> >
> ># mca
> >mca: no error records found
> >
> ># sysctl hw.mca
> >hw.mca.last: 0
> >hw.mca.first: 0
> >hw.mca.count: 0
> 
> Hmmmm... I don't know what to make of this yet. I've
> always had MCAs when there were spontaneous reboots.
> The lack of MCAs can point to something else than
> hardware...

I installed new 4GB memory and got to beta2 with make -j8 buildworld,
so perhaps it was bad memory..

many thanks

-- 
Anton Shterenlikht
Room 2.6, Queen's Building
Mech Eng Dept
Bristol University
University Walk, Bristol BS8 1TR, UK
Tel: +44 (0)117 928 8233 
Fax: +44 (0)117 929 4423


More information about the freebsd-ia64 mailing list