ALPHA4 panic in VM

Mark Johnston markj at freebsd.org
Wed Sep 19 21:20:40 UTC 2018


On Wed, Sep 19, 2018 at 02:11:56PM -0700, Steve Kargl wrote:
> On Wed, Sep 19, 2018 at 05:02:11PM -0400, Mark Johnston wrote:
> > On Wed, Sep 19, 2018 at 01:01:52PM -0700, Steve Kargl wrote:
> > > I have the kernel and core file if more information is needed.
> > > 
> > > % cat info.2
> > > Dump header from device: /dev/ada0p3
> >    Architecture: amd64
> > >   Architecture Version: 2
> > >   Dump Length: 2348281856
> > >   Blocksize: 512
> > >   Compression: none
> > >   Dumptime: Wed Sep 19 12:29:59 2018
> > >   Hostname: troutmask.apl.washington.edu
> > >   Magic: FreeBSD Kernel Dump
> > >   Version String: FreeBSD 12.0-ALPHA4 #0 r338505: Thu Sep  6 13:45:34 PDT 2018
> > >     kargl at troutmask.apl.washington.edu:/usr/obj/usr/src/amd64.amd64/sys/SPEW
> > >   Panic String: page fault
> > >   Dump Parity: 2676008548
> > >   Bounds: 2
> > >   Dump Status: good
> > > 
> > > % more core.txt.2
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid = 1; apic id = 11
> > > fault virtual address   = 0xffffb8000719a428
> > 
> > This seems to be the result of a bit-flip.  cred is 0xffffb8000719a400,
> > which is almost but not quite in the direct map.  In particular we have:
> > 
> > (kgdb) frame 10                                                                                                                 
> > #10 0xffffffff8083e07d in vm_object_destroy (object=<optimized out>) at /usr/src/sys/vm/vm_object.c:703            
> > 703                     swap_release_by_cred(object->charge, object->cred);                     
> > (kgdb) p object            
> > $8 = <optimized out>                                                                                                    
> > (kgdb) p *(vm_object_t)$r13                                                                            
> > $9 = {
> > ...
> >   cred = 0xffffb8000719a400,
> >   charge = 28672,
> >   umtx_data = 0x0
> > }
> > (kgdb) p *(struct ucred *)0xfffff8000719a400
> > $10 = {
> >   cr_ref = 5737, 
> >   cr_uid = 1001, 
> >   cr_ruid = 1001, 
> >   cr_svuid = 1001, 
> >   cr_ngroups = 7, 
> >   cr_rgid = 1001, 
> >   cr_svgid = 1001, 
> >   cr_uidinfo = 0xfffff80007285500, 
> >   cr_ruidinfo = 0xfffff80007285500, 
> >   cr_prison = 0xffffffff80a9de10 <prison0>, 
> > ... <more sane-looking ucred fields>
> > 
> > That is, flipping one of the bits in the fault address leads me to a
> > valid ucred.  This could in principle be the result of a software bug,
> > but I'd be more inclined to suspect the hardware.
> 
> Mark,
> 
> Thanks for looking into the problem.  This system has
> been running for probably 2 years or so without issues.
> I guess it's time to pull out memtest86+ (or similar)
> to see if hardware is starting to fail.

I'm not sure whether you're using ECC RAM, but if not, the system is
susceptible to silent random bit flips.


More information about the freebsd-current mailing list