4.9-STABLE fdisk GPF/Page Fault

Daniel danielc at green-orb.com
Wed Nov 5 20:14:31 PST 2003


Hello all... I recently (Oct 31, c. 12pm EST) cvsupped to 4.9-STABLE (from a 4.9-
RC2 world/kernel)... the box suddenly started panic()ing (page fault - Supervisory 
something, trap 12 -- IIRC) under seemingly random circumstances, so I ran a full 
disk surface scan on each hard disk and did a buildworld buildkernel again as a 
lightweight memory test.  There were no errors reported (and the box has been 
perfectly stable under 4.9PRE), so I figured it was a transient instability and would 
re-cvsup today to see if there were any fixes.  However, today the box was wedged 
so hard I had to physically power-down and reboot it.  On rebooting in either single 
or multi-user mode, fsck panic()s the box -- once it was a General Protection Fault 
(very windows-ish... ick) and more commonly its a supervisory page fault... I've run 
fsck and fsck -p to see if either was more or less successful... the results were pretty 
much the same, and consistent.  The GPF I think occurred when I booted to an 
older kernel (see below).

I wish I could provide you the wonderful memory dump it took, but I can't get the 
damn thing to boot far enough to actually read it.  I will try to boot off the recovery 
CD enough to dump the disk to tape, catch the core dump and get a backtrace for 
you, but I doubt (after all, the disk is probably corrupted by now after so many 
panics and bad fscks) that it will show the real cause of the original problem.  Does 
anyone have any suggestions and what is the chance this is from getting hacked?  I 
was able to check the dates on the root partition under single user and they were all 
consistent with when I last rebuilt the world, but that doesn't indicate much... :-/

Apologies for the lack of detailed info, and I'd love to help provide any info you 
might need to track down the cause of it.  Please just help me know how to get it!  :)  
My serial port on this unit for some reason has never wanted to talk with the serial 
port on my monitoring box so serial consoles are basically out of the question at the 
moment.  I can only catch what I can copy from the screen.

FWIW: the original problem that took down the box was a supervisory page fault - 
again trap? 12) while running a cvsup process (although it occurred at other random 
times without cvsup being present).  

I have tried booting under the previous kernels (even back to 4.9-PRERELEASE 
which ran for over a month solid on the box) and the fsck just panic'd at different 
places (consistent with the fact that it was build under the later kernel).  

Oh, the last few times I booted, fsck actually left me with a screen full of flashing 
colorful garbage instead of panicing.  :-/

The unit is a homebrew PII-450 with 384MB of PC133 SDRAM and a hodgepodge 
of SCSI hard disks running on AHA-2940U controllers.  The mobo is a PCChips 
M747 that has behaved surprisingly well throughout its life.  The box is NOT on a 
ups and has occasionally been rebooted by power outages although this has not in 
the past caused problems.

Most likely I will go ahead and try 5.1-RELEASE on it (which is running happily on 
all my other boxes) but I'd like to contribute to the stability of the 4.x series if any of 
the information we can glean from this proves useful.  Further, I'd personally like to 
know how I can tell if I was hacked.  :)

Thanks very much for furthering the cause of such a great OS!

Peace,

 -- D
 <><

 


More information about the freebsd-stable mailing list