4.9-STABLE fdisk GPF/Page Fault
Daniel
danielc at green-orb.com
Wed Nov 5 20:14:31 PST 2003
Hello all... I recently (Oct 31, c. 12pm EST) cvsupped to 4.9-STABLE (from a 4.9-
RC2 world/kernel)... the box suddenly started panic()ing (page fault - Supervisory
something, trap 12 -- IIRC) under seemingly random circumstances, so I ran a full
disk surface scan on each hard disk and did a buildworld buildkernel again as a
lightweight memory test. There were no errors reported (and the box has been
perfectly stable under 4.9PRE), so I figured it was a transient instability and would
re-cvsup today to see if there were any fixes. However, today the box was wedged
so hard I had to physically power-down and reboot it. On rebooting in either single
or multi-user mode, fsck panic()s the box -- once it was a General Protection Fault
(very windows-ish... ick) and more commonly its a supervisory page fault... I've run
fsck and fsck -p to see if either was more or less successful... the results were pretty
much the same, and consistent. The GPF I think occurred when I booted to an
older kernel (see below).
I wish I could provide you the wonderful memory dump it took, but I can't get the
damn thing to boot far enough to actually read it. I will try to boot off the recovery
CD enough to dump the disk to tape, catch the core dump and get a backtrace for
you, but I doubt (after all, the disk is probably corrupted by now after so many
panics and bad fscks) that it will show the real cause of the original problem. Does
anyone have any suggestions and what is the chance this is from getting hacked? I
was able to check the dates on the root partition under single user and they were all
consistent with when I last rebuilt the world, but that doesn't indicate much... :-/
Apologies for the lack of detailed info, and I'd love to help provide any info you
might need to track down the cause of it. Please just help me know how to get it! :)
My serial port on this unit for some reason has never wanted to talk with the serial
port on my monitoring box so serial consoles are basically out of the question at the
moment. I can only catch what I can copy from the screen.
FWIW: the original problem that took down the box was a supervisory page fault -
again trap? 12) while running a cvsup process (although it occurred at other random
times without cvsup being present).
I have tried booting under the previous kernels (even back to 4.9-PRERELEASE
which ran for over a month solid on the box) and the fsck just panic'd at different
places (consistent with the fact that it was build under the later kernel).
Oh, the last few times I booted, fsck actually left me with a screen full of flashing
colorful garbage instead of panicing. :-/
The unit is a homebrew PII-450 with 384MB of PC133 SDRAM and a hodgepodge
of SCSI hard disks running on AHA-2940U controllers. The mobo is a PCChips
M747 that has behaved surprisingly well throughout its life. The box is NOT on a
ups and has occasionally been rebooted by power outages although this has not in
the past caused problems.
Most likely I will go ahead and try 5.1-RELEASE on it (which is running happily on
all my other boxes) but I'd like to contribute to the stability of the 4.x series if any of
the information we can glean from this proves useful. Further, I'd personally like to
know how I can tell if I was hacked. :)
Thanks very much for furthering the cause of such a great OS!
Peace,
-- D
<><
More information about the freebsd-stable
mailing list