debugging frequent kernel panics on 8.2-RELEASE

Attilio Rao attilio at freebsd.org
Thu Aug 11 09:43:27 UTC 2011


2011/8/11 Jeremy Chadwick <freebsd at jdc.parodius.com>:
> On Thu, Aug 11, 2011 at 09:59:36AM +0100, Steven Hartland wrote:
>> That's not the issue as its happening across board over 130 machines :(
>
> Agreed, bad hardware sounds unlikely here.  I could believe some strange
> incompatibility (e.g. BIOS quirk or the like[1]) that might cause problems
> en masse across many servers, but hardware issues are unlikely in this
> situation.
>
> [1]: I mention this because we had something similar happen at my
> workplace.  For months we used a specific model of system from our
> vendor which worked reliably, zero issues.  Then we got a new shipment
> of boxes (same model as prior) which started acting very odd (often AHCI
> timeout issues or MCEs which when decoded would usually turn out to be
> nonsensical).  It took weeks to determine the cause given how slow the
> vendor was to respond: root cause turned out to be that the vendor
> decided, on a whim, to start shipping a newer BIOS version which wasn't
> "as compatible" with Solaris as previous BIOSes.  Downgrading all the
> systems to the older BIOS fixed the problem.

That falls in the "hw problem" category for me.

Anyway, we really would need much more information in order to take a
proactive action.

Would it be possible to access to one of the panic'ing machine? Is it
always the same panic which is happening or it is variadic (like: once
page fault, once fatal double fault, once fatal trap, etc.).

Whatever informations you can provide may be valuable here.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein


More information about the freebsd-stable mailing list