bugs in contigmalloc*() related to "page not found in hash" panics

Sean Farley sean-freebsd at farley.org
Fri Nov 12 20:07:50 PST 2004


On Wed, 10 Nov 2004, Matthew Dillon wrote:

> :>    Here is the DragonFly commit.
> :>
> :>    http://www.dragonflybsd.org/cvsweb/src/sys/vm/vm_contig.c.diff?r1=1.10&r2=1.11&f=u
> :>
> :>    FreeBSD-4:
> :>
> :> 	FreeBSD-4 is in the same situation that DFly was in and requires
> :> 	the same fixes as the above patch, though note that in FreeBSD-4
> :> 	the contigmalloc() code is in vm_page.c, not vm_contig.c.
>
> I tried the patch in the hopes it would fix my Nvidia-driver
> crash-on-demand system.  :)  While my system appears stable without
> the Nvidia driver but with this patch, my system can still crash
> easily with the Nvidia driver.  It usually dies with a:
>
>    Point me at the nvidia driver source and I will do a quick audit of
>    it to see if there is anything obviously broken.  This is running
>    on FreeBSD-4.x?  If it's a binary-only driver there isn't much I
>    can do, though.

Unfortunately, it is the binary driver from Nvidia.  Maybe someone using
DragonFly is having similar problems?

<snip>

>    There is a test you can run.  If you have a kernel vmcore and
>    related kernel image that contains the vm page not found in hash
>    panic, you can run this program on it to do a sanity check on the
>    VM page array and hash table.  I have modified this program to work
>    with FreeBSD-4.x (I'd have to rewrite it to make it work with
>    5.x/6.x, which I don't have time to do):

<snip>

>    This program will sanity check the VM page hash table from the core
>    file and tell you if there are any pages missing from the hash
>    table or sitting in the wrong slot.
>
>    My expectation is that it will find a page sitting in the wrong
>    slot.

I ran the program on the vmcore and debug kernel from the recent crash
since the vmcore with the "page not found in hash" panic has long since
been deleted.  As expected, the program showed no problem with the
vmcore.

> :
> :     Fatal trap 12: page fault while in kernel mode
> :     fault virtual address   = 0x30
> :     fault code              = supervisor read, page not present
> :
>
>    This is a different failure.  I'd need a backtrace or a
>    kernel.debug and vmcore to play with, and a FreeBSD developer would
>    probably be able to help you more with it.  It's obviously a NULL
>    pointer indirection of some sort.

I will attach it , and I will also send it to Nvidia as I did once many
moons ago.  One interesting symptom that I just noticed very close to
the time of instability is this message from /var/log/messages:

Nov 10 22:47:14 thor /kernel: stray irq 7

Here is near the end of strings output of vmcore just before panic:

<118>Wed Nov 10 22:46:44 CST 2004
<3>stray irq 7
<118>Nov 10 22:47:14 thor /kernel: stray irq 7
<3>stray irq 7
<3>stray irq 7
<118>Nov 10 22:47:46 thor last message repeated 2 times

The parallel port is disabled, and I do not see these messages without
the Nvidia driver.

> :Two "page not found in hash" panics that I believe are related to the
> :Nvidia driver:
> :http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/71086
> :http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/72539
>
>    The 'page not found in hash' bug is *NOT* likely to be related to any
>    of the pmap code, simply because the sanity checks already in the
>    kernel (assuming the kernel is compiled with options INVARIANTS and
>    options INVARIANT_SUPPORT) mostly preclude an error path to this
>    panic from the pmap code.  However, pmap panics could be related to
>    corrupted VM pages.

I have not tried compiling these options into the kernel.  Sometime this
weekend I will give them a shot.

Thank you for your help and the detailed description of the bug
(tricksy, sneaky bug) you fixed.

Sean
-- 
sean-freebsd at farley.org
-------------- next part --------------
IdlePTD at physical address 0x00922000
initial pcb at physical address 0x00326cc0
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address	= 0x30
fault code		= supervisor read, page not present
instruction pointer	= 0x8:0xc024f311
stack pointer	        = 0x10:0xd861ec14
frame pointer	        = 0x10:0xd861ec3c
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 460 (glblur)
interrupt mask		= none
trap number		= 12
panic: page fault

syncing disks... 22 5 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 
giving up on 2 buffers
Uptime: 1m56s

dumping to dev #ad/0x30011, offset 1027680
dump ata1: resetting devices .. done

<snipped memory count down from 511 to 0>

---
#0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:487
487		if (dumping++) {
(kgdb) where full
#0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:487
	error = 0
#1  0xc0165523 in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:316
	howto = 256
#2  0xc0165948 in poweroff_wait (junk=0xc02f536c, howto=-1070641553) at /usr/src/sys/kern/kern_shutdown.c:595
	fmt = 0xc02f536c "%s"
	bootopt = 256
	buf = "page fault", '\000' <repeats 245 times>
#3  0xc02a164e in trap_fatal (frame=0xd861ebd4, eva=48) at /usr/src/sys/i386/i386/trap.c:974
	frame = (struct trapframe *) 0x100
	code = -1070640276
	type = 12
	ss = -1070640276
	esp = 0
	softseg = {ssd_base = 0, ssd_limit = 1048575, ssd_type = 27, ssd_dpl = 0, ssd_p = 1, ssd_xx = 2, ssd_xx1 = 1, 
  ssd_def32 = 1, ssd_gran = 1}
#4  0xc02a1321 in trap_pfault (frame=0xd861ebd4, usermode=0, eva=48) at /usr/src/sys/i386/i386/trap.c:867
	va = 0
	vm = (struct vmspace *) 0x0
	map = 0xd4c5cfc0
	rv = 0
	ftype = 1 '\001'
	p = (struct proc *) 0xd75358e0
#5  0xc02a0f0b in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = 16, tf_edi = -682403616, tf_esi = 0, 
      tf_ebp = -664671172, tf_isp = -664671232, tf_ebx = -661997632, tf_edx = -661997632, tf_ecx = 0, 
      tf_eax = -1029433088, tf_trapno = 12, tf_err = 0, tf_eip = -1071320303, tf_cs = 8, tf_eflags = 66118, 
      tf_esp = -661997632, tf_ss = 0}) at /usr/src/sys/i386/i386/trap.c:466
	p = (struct proc *) 0xd75358e0
	sticks = 14018746865864933376
	i = 0
	ucode = 0
	type = 12
	code = 0
	eva = 48
#6  0xc024f311 in ffs_fsync (ap=0xd861ec64) at /usr/src/sys/ufs/ffs/ffs_vnops.c:138
	ip = (struct inode *) 0x0
	vp = (struct vnode *) 0xd88ab7c0
	bp = (struct buf *) 0x0
	nbp = (struct buf *) 0xc015ff23
	s = -664671160
	error = 0
	wait = 1
	passes = -664671152
	skipmeta = -664671168
	lbn = -1071292299
#7  0xc0193dac in vinvalbuf (vp=0xd88ab7c0, flags=1, cred=0x0, p=0xd75358e0, slpflag=0, slptimeo=0) at vnode_if.h:558
	a = {a_desc = 0xc02fe8a0, a_vp = 0xd88ab7c0, a_cred = 0x0, a_waitfor = 1, a_p = 0xd75358e0}
	vp = (struct vnode *) 0xd88ab7c0
	cred = (struct ucred *) 0x0
	p = (struct proc *) 0x0
	cred = (struct ucred *) 0x0
	p = (struct proc *) 0x0
	bp = (struct buf *) 0xd88ab7c0
	nbp = (struct buf *) 0x0
	blist = (struct buf *) 0x0
	s = 0
	error = -661997632
	object = 0xd75358e0
#8  0xc01950e3 in vclean (vp=0xd88ab7c0, flags=8, p=0xd75358e0) at /usr/src/sys/kern/vfs_subr.c:1894
	vp = (struct vnode *) 0xd88ab7c0
	flags = 0
	p = (struct proc *) 0xd75358e0
	active = 0
#9  0xc01952f7 in vgonel (vp=0xd88ab7c0, p=0xd75358e0) at /usr/src/sys/kern/vfs_subr.c:2058
	vp = (struct vnode *) 0xd88ab7c0
	s = 0
#10 0xc01952a9 in vrecycle (vp=0xd88ab7c0, inter_lkp=0x0, p=0xd75358e0) at /usr/src/sys/kern/vfs_subr.c:2013
	vp = (struct vnode *) 0x0
#11 0xc0250cdb in ufs_inactive (ap=0xd861ed48) at /usr/src/sys/ufs/ufs/ufs_inode.c:105
	ap = (struct vop_inactive_args *) 0x0
	vp = (struct vnode *) 0xd88ab7c0
	ip = (struct inode *) 0xc2a41900
	p = (struct proc *) 0xd75358e0
	mode = 0
	error = 0
#12 0xc0256075 in ufs_vnoperate (ap=0xd861ed48) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2376
	ap = (struct vop_generic_args *) 0x0
#13 0xc0194e00 in vput (vp=0xd88ab7c0) at vnode_if.h:815
	a = {a_desc = 0xc02feae0, a_vp = 0xd88ab7c0, a_p = 0xd75358e0}
	vp = (struct vnode *) 0xd88ab7c0
	p = (struct proc *) 0x0
	vp = (struct vnode *) 0xd88ab7c0
	p = (struct proc *) 0x0
#14 0xc019b5b7 in vn_open (ndp=0xd861eeb4, fmode=258, cmode=384) at /usr/src/sys/kern/vfs_vnops.c:197
	cmode = 0
	vp = (struct vnode *) 0xd88ab7c0
	p = (struct proc *) 0xd75358e0
	cred = (struct ucred *) 0xc29e5000
	vat = {va_type = 3630296744, va_mode = 3627, va_nlink = -16362, va_uid = 0, va_gid = 11, 
  va_fsid = 3612563680, va_fileid = 11, va_size = 3265677312, va_blocksize = -664670748, va_atime = {tv_sec = 0, 
    tv_nsec = 1}, va_mtime = {tv_sec = -1072270043, tv_nsec = -1029289984}, va_ctime = {tv_sec = -682403293, 
    tv_nsec = 6}, va_gen = 11, va_flags = 3612563680, va_rdev = 11, va_bytes = 29400100336, 
  va_filerev = 15592005793520961536, va_vaflags = 3222666107, va_spare = 0}
	vap = (struct vattr *) 0xd861ed9c
	mode = 128
	error = 13
#15 0xc01676b1 in coredump (p=0xd75358e0) at /usr/src/sys/kern/kern_sig.c:1632
	vp = (struct vnode *) 0xd4c5cfc0
	cred = (struct ucred *) 0xc29e5000
	lf = {l_start = -3114855140103595808, l_len = -2854737733698453504, l_pid = -1071256140, l_type = 10, 
  l_whence = 0}
	nd = {ni_dirp = 0xc2a64800 "glblur.core", ni_segflg = UIO_SYSSPACE, ni_startdir = 0x0, 
  ni_rootdir = 0xd6aa5e00, ni_topdir = 0x0, ni_vp = 0xd88ab7c0, ni_dvp = 0x0, ni_pathlen = 1, 
  ni_next = 0xd754280b "", ni_loopcnt = 0, ni_cnd = {cn_nameiop = 1, cn_flags = 49164, cn_proc = 0xd75358e0, 
    cn_cred = 0xc29e5000, cn_pnbuf = 0xd7542800 "", cn_nameptr = 0xd7542800 "", cn_namelen = 11, cn_consume = 0}}
	vattr = {va_type = 3238354120, va_mode = 1, va_nlink = 0, va_uid = 2, va_gid = 0, va_fsid = 3569733378, 
  va_fileid = -725233728, va_size = 13845635962067087156, va_blocksize = -664670492, va_atime = {tv_sec = 0, 
    tv_nsec = 2}, va_mtime = {tv_sec = -664670488, tv_nsec = -664670500}, va_ctime = {tv_sec = -664670496, 
    tv_nsec = -664670521}, va_gen = 3630296776, va_flags = 3612563680, va_rdev = 3569733568, 
  va_bytes = 13862156936687910912, va_filerev = 3630497248, va_vaflags = 3222666107, va_spare = 2}
	error = 11
	error1 = 0
	name = 0xc2a64800 "glblur.core"
	limit = 9223372036854775807
#16 0xc01673d2 in sigexit (p=0xd75358e0, sig=11) at /usr/src/sys/kern/kern_sig.c:1494
	p = (struct proc *) 0xd75358e0
	sig = 11
#17 0xc01671b0 in postsig (sig=11) at /usr/src/sys/kern/kern_sig.c:1407
	p = (struct proc *) 0xd75358e0
	ps = (struct sigacts *) 0xd861c260
	action = 0
	returnmask = {__bits = {1024, 0, 0, 0}}
	code = 0
#18 0xc02a1113 in trap (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = -1077939828, tf_esi = -1077939828, 
      tf_ebp = -1077940160, tf_isp = -664670252, tf_ebx = 134720384, tf_edx = 0, tf_ecx = 14, tf_eax = 1, 
      tf_trapno = 12, tf_err = 1, tf_eip = 672019668, tf_cs = 31, tf_eflags = 66198, tf_esp = -1077940284, 
      tf_ss = 47}) at /usr/src/sys/i386/i386/trap.c:174
	s = -664670320
	sig = 0
	p = (struct proc *) 0xd75358e0
	frame = (struct trapframe *) 0xd861efa8
	oticks = 7
	p = (struct proc *) 0xd75358e0
	sticks = 7
	i = -682403324
	ucode = 12
	type = -664670320
	code = 0
	eva = 1
#19 0x280e34d4 in ?? ()
No symbol table info available.
#20 0x804c9f1 in ?? ()
No symbol table info available.
#21 0x804d646 in ?? ()
No symbol table info available.
#22 0x804fd11 in ?? ()
No symbol table info available.
#23 0x804c099 in ?? ()
No symbol table info available.
#24 0x804e800 in ?? ()
No symbol table info available.
#25 0x804bfce in ?? ()
No symbol table info available.


More information about the freebsd-hackers mailing list