12.1 RELEASE General Protection Fault (Trap 9)
matthew at FreeBSD.org
matthew at FreeBSD.org
Wed Jan 22 16:36:56 UTC 2020
On 22/01/2020 15:12, Jason Van Patten wrote:
> Since sometime before Christmas (as far as I know), my NAS has started
> randomly crashing, reloading, and saving cores in /var/crash. It was
> doing this with 12.0 and now with 12.1. My gut tells me it's hardware
> related, but I'm not quite sure. The various bits and pieces are:
Given the crashes do not appear to be associated with any particular
activity, I think you're on the money with your diagnosis that it is
hardware related.
Did you change any of the hardware on this system recently? If you've
added more disks or such, then you may have overloaded the PSU. If the
PSU can't produce voltages in spec, then you will see random crashes,
although I doubt in that case you'ld always see 'General PRotection
Fault'. Unless this is a new machine, or you've changed some of the
hardware this is unlikely to be the diagnosis.
Otherwise, suspect hardware problems. In rough order of expense, least
to most:
* Bad heatsink, failed case fan, CPU thermal paste not up to snuff
or other cause that may lead to your system overheating
* Bad memory
* Bad CPU
The first of these is relatively cheap and easy to handle: make sure
you're getting unimpeded airflow through the chassis -- clean any
filters, make sure fans are spinning correctly and that heatsinks have
good thermal contact, if necessary by renewing any thermal paste.
Monitoring the CPU temperature will help here -- if you see the CPU
temperature increasing just before everything goes kaput, that's a
fairly solid diagnostic. For an i7, you should be able to use the
coretemp(4) kernel module and read-off the temperature from the
dev.cpu.%d.temperature sysctls.
Memory problems can frequently be diagnosed by use of a memory checker
like sysutils/memtest86+ -- if this says you have a problem, then you do
have a problem. However, it may not catch every possible memory problem
so it can wrongly give you an 'all clear'. It's pretty accurate in
practice though. A more definitive test is to swap out any suspect RAM
modules and see if the problem goes away.
The worst case is a bad CPU. memtest86+ will diagnose some CPU faults,
but it is less effective on CPU problems. If there is a CPU problem, it
will be a pretty subtle one, as typical symptoms of CPU problems are the
system won't boot and the BIOS makes horrible beeping noises when you try.
Even so, this isn't a definitive list. I've heard tales about trying to
diagnose this sort of problem where someone had bit by bit swapped out
all of the components of a system except for the case, and the problem
still occurred. Turned out the case was slightly bent and that put
enough stress on the motherboard to cause some intermittent electrical
connectivity.
Cheers,
Matthew
More information about the freebsd-questions
mailing list