Advice on kernel panics
Ian Smith
smithi at nimnet.asn.au
Thu Jun 1 14:35:04 UTC 2017
In freebsd-questions Digest, Vol 678, Issue 4, Message: 4
On Thu, 1 Jun 2017 10:27:49 +0200 Raimo Niskanen <raimo+freebsd at erix.ericsson.se> wrote:
> On Thu, Jun 01, 2017 at 12:10:30AM -0500, Doug McIntyre wrote:
> > On Mon, May 29, 2017 at 11:20:43AM +0200, Raimo Niskanen wrote:
> > > I have a server that panics about every 3 days and need some advice on how
> > > to handle that.
> >
> > I'd expect it is some sort of hardware failure, as I would expect
> > kernel panics more on the order of once a decade with FreeBSD. Ie.
> > I've seen one or two on my hundred or so servers, but its pretty rare.
> >
> > Check and recheck your hardware items.
>
> I have removed one of four memory capsules - panicked again. Will rotate
> through all of them...
>
> >
> > Runup memtest86+. Check your drive hardware, turn on SMART checking.
>
> I have run memtest86+ over night - no errors found.
>
> I have installed smartmontools - no errors found, short and long self tests
> on both disks run fine. zpool scrub repaired 0 errors and has no known data
> errors.
Everyone's suggesting hardware problems, and it's certainly worthwhile
eliminating that possibility - but this could be a software/OS issue.
If it were me and hardware all checks out, I'd try posting the original
report - plus other details about the box and setup that you've since
mentioned - to freebsd-stable@, or maybe freebsd-fs@ since those fstat
reports seem to point to possible FS/zfs issues? at a wild guess ..
One other hardware tester you might try is sysutils/stress which can
pound CPU, I/O, VM, disk as hard and for as long as you like, without
having to bring the box down. I've used this lots to generate heavy
loads. Keep a close eye on system temperatures during longer tests.
Ah, just before posting, I see your latest with dmesg. Just on a quick
scan, I wonder if these are a bad indication? Maybe just a side-issue,
but powerd might not work, so again heat might be something to watch:
est0: <Enhanced SpeedStep Frequency Control> on cpu0
est: CPU supports Enhanced Speedstep, but is not recognized.
cheers, Ian
More information about the freebsd-questions
mailing list