'make -j16 universe' gives SIReset

Tue Jul 5 16:07:15 UTC 2011

On Mon, Jul 04, 2011 at 11:41:58PM +0200, Marius Strobl wrote:
> On Sat, Jul 02, 2011 at 02:03:41PM -0500, Alan Cox wrote:
> > On 07/01/2011 19:23, Marius Strobl wrote:
> > >On Fri, Jul 01, 2011 at 08:17:52AM +1000, Peter Jeremy wrote:
> > >>[Moving back on-list]
> > >>
> > >>On 2011-Jun-30 06:30:08 +0800, Marius Strobl<marius at alchemy.franken.de>  
> > >>wrote:
> > >>>On Thu, Jun 30, 2011 at 08:00:10AM +1000, Peter Jeremy wrote:
> > >>>>On 2011-Jun-29 19:54:44 +0200, Marius Strobl<marius at alchemy.franken.de> 
> > >>>>wrote:
> > >>>>>On Wed, Jun 29, 2011 at 12:54:33PM +1000, Peter Jeremy wrote:
> > >>>>>>My V890 has been running "make -j32 buildworld" in a loop for a
> > >>>>>>week now without problems so I think that was the problem.
> > >>>>OTOH, a V440 that has been running similar load for a similar period
> > >>>>died overnight with:
> > >>>>
> > >>>>panic: uma_small_alloc: free page still has mappings!
> > >>>>VNASSERT failed
> > >>>>cpuid = 3
> > >>>>0xfffff800079643c0: KDB: enter: panic
> > >>...
> > >>>>I'm fairly sure that is the same kernel but will double-check and
> > >>>>investigate that panic further.
> > >>FWIW, that kernel didn't have the latest patchset (adding Zeus support).
> > >That shouldn't make a difference; the later version only adds the
> > >SPARC64 bits as you already noticed and adjusts the boot loader to
> > >compile again. I made no changes to the existing parts apart from
> > >fixing a comment. Besides I see no connection between fixing the
> > >gross user TLB flushing and the below problem so far.
> > >
> > >>>Ok, this appears to be an unrelated problem though. Alan, do you
> > >>>have an idea what could be causing this?
> > >>I managed to get the same panic (though different traceback) on the
> > >>V890 after about an hour of pho@'s stress test with INCARNATIONS=150:
> > >>
> > >>panic: uma_small_alloc: free page still has mappings!
> > >>cpuid = 1
> > >>KDB: enter: panic
> > >>[ thread pid 142 tid 100196 ]
> > >>Stopped at      kdb_enter+0x80: ta              %xcc, 1
> > >>db>  where
> > >>Tracing pid 142 tid 100196 td 0xfffff8a016ace880
> > >>panic() at panic+0x20c
> > >>uma_small_alloc() at uma_small_alloc+0xe8
> > >>keg_alloc_slab() at keg_alloc_slab+0xc8
> > >>keg_fetch_slab() at keg_fetch_slab+0x218
> > >>zone_fetch_slab() at zone_fetch_slab+0x44
> > >>uma_zalloc_arg() at uma_zalloc_arg+0x60c
> > >>m_getm2() at m_getm2+0x134
> > >>m_uiotombuf() at m_uiotombuf+0x4c
> > >>sosend_generic() at sosend_generic+0x420
> > >>sosend() at sosend+0x2c
> > >>soo_write() at soo_write+0x3c
> > >>dofilewrite() at dofilewrite+0x7c
> > >>kern_writev() at kern_writev+0x38
> > >>write() at write+0x4c
> > >>syscallenter() at syscallenter+0x270
> > >>syscall() at syscall+0x74
> > >>-- syscall (4, FreeBSD ELF64, write) %o7=0x101db4 --
> > >>userland() at 0x405936c8
> > >>user trace: trap %o7=0x101db4
> > >>pc 0x405936c8, sp 0x7fdffffd8a1
> > >>pc 0x101f44, sp 0x7fdffffd9a1
> > >>pc 0x104604, sp 0x7fdffffda81
> > >>pc 0x1046f0, sp 0x7fdffffdb51
> > >>pc 0x104994, sp 0x7fdffffdc21
> > >>pc 0x104d90, sp 0x7fdffffdd01
> > >>pc 0x101610, sp 0x7fdffffde41
> > >>pc 0x4020cff4, sp 0x7fdffffdf01
> > >>done
> > >>db>
> > >>
> > >>I've got a crashdump on the V440 but discovered that gdb reports
> > >>"GDB can't read core files on this machine." so it isn't much use.
> > >>Any suggestions on how to debug this?
> > >The VM and its interaction with the MD code are beyond me, I hope
> > >Alan can chime in here. Reading through the code I see a possible
> > >path which could lead to this though; tsb_tte_enter(), which is
> > >the only place where TD_PV ever is set and also only in case of
> > >managed pages, always calls pmap_cache_enter(), which together
> > >with pmap_cache_remove() does the page color handling. In
> > >pmap_remove_all() however, pmap_cache_remove() is only called for
> > >managed pages, so for unmanaged pages we might miss the removal
> > >of the mapping from the the color used. I've no idea though if
> > >this actually is relevant, i.e. whether the VM ever calls
> > >pmap_remove_all() for unmanaged pages.
> > 
> > In HEAD, it does not.  Other architectures have an assertion forbidding 
> > pmap_remove_all() calls on unmanaged pages.  (Btw, I'm happy to add this 
> > assertion to sparc64's pmap if you like.)  In older versions, calling 
> > pmap_remove_all() on unmanaged pages is expected to be a harmless NOP 
> > that's just a waste of cycles.
> > 
> > With unmanaged pages, it is expected that pmap_remove() is used to 
> > destroy mappings before the page is freed.
> > 
> > For years, vm_page_free{,_toq}() has asserted that the page has no 
> > managed mappings:
> > 
> >         if ((m->flags & PG_UNMANAGED) == 0) {
> >                 vm_page_lock_assert(m, MA_OWNED);
> >                 KASSERT(!pmap_page_is_mapped(m),
> >                     ("vm_page_free_toq: freeing mapped page %p", m));
> >         }
> > 
> 
> Okay, then my theories don't hold.
> 
> > As a debugging aid, you might want to add an additional check here on 
> > colors.
> 
> I did that and it turns out to trigger rather quickly:
> Trying to mount root from nfs: []...
> NFS ROOT: 192.168.1.40:/usr/data/nfsroot/sparc64
> dc1: link state changed to UP
> panic: vm_page_free_toq: free page 0xfffff80047b8a088 still has mappings!
> cpuid = 0
> KDB: enter: panic
> [ thread pid 1 tid 100001 ]
> Stopped at      kdb_enter+0x80: ta              %xcc, 1
> db> bt
> Tracing pid 1 tid 100001 td 0xfffff80041094000
> panic() at panic+0x20c
> vm_page_free_toq() at vm_page_free_toq+0xb4
> vm_page_free_zero() at vm_page_free_zero+0x10
> pmap_release() at pmap_release+0x170
> vmspace_free() at vmspace_free+0x70
> vmspace_exec() at vmspace_exec+0x48
> exec_new_vmspace() at exec_new_vmspace+0x240
> exec_elf64_imgact() at exec_elf64_imgact+0x598
> kern_execve() at kern_execve+0x398
> execve() at execve+0x34
> start_init() at start_init+0x2ec
> fork_exit() at fork_exit+0x9c
> fork_trampoline() at fork_trampoline+0x8
> db>
> 
> Further debugging shows that the page in question is one of the TSB
> pages entered by pmap_pinit(). In pmap_release() vm_page_free_zero()
> is called on these before pmap_qremove(), so there appears to be a
> race in which these pages can get re-used before their mappings are
> removed. I suspect that this might be related to your change in
> r207648, but just reverting that one nowadays this triggers the
> assertion in vm_page_free_toq() about the page lock not being held.
> Anyway, I'm not sure what the right fix for this is; should
> pmap_release() call pmap_qremove() on these pages one-by-one before
> calling vm_page_free_zero() or maybe just call pmap_qremove() for
> all of them before looping over them and calling vm_page_free_zero()?
> 

Well, given that all uses of pmap_qremove() in the kernel except
the one in the sparc64 pmap_release and two invocations in vfs_bio.c
remove the pages before they are freed, unwired etc this seems to be
a safe thing to do. Does the below patch look correct to you?

Marius

Index: kern/vfs_bio.c
===================================================================

--- kern/vfs_bio.c	(revision 223705)
+++ kern/vfs_bio.c	(working copy)
@@ -1625,6 +1625,7 @@ vfs_vmio_release(struct buf *bp)
 	int i;
 	vm_page_t m;
 
+	pmap_qremove(trunc_page((vm_offset_t) bp->b_data), bp->b_npages);
 	VM_OBJECT_LOCK(bp->b_bufobj->bo_object);
 	for (i = 0; i < bp->b_npages; i++) {
 		m = bp->b_pages[i];
@@ -1658,7 +1659,6 @@ vfs_vmio_release(struct buf *bp)
 		vm_page_unlock(m);
 	}
 	VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object);
-	pmap_qremove(trunc_page((vm_offset_t) bp->b_data), bp->b_npages);
 	
 	if (bp->b_bufsize) {
 		bufspacewakeup();
@@ -3012,6 +3012,10 @@ allocbuf(struct buf *bp, int size)
 			if (desiredpages < bp->b_npages) {
 				vm_page_t m;
 
+				pmap_qremove((vm_offset_t)trunc_page(
+				    (vm_offset_t)bp->b_data) +
+				    (desiredpages << PAGE_SHIFT),
+				    (bp->b_npages - desiredpages));
 				VM_OBJECT_LOCK(bp->b_bufobj->bo_object);
 				for (i = desiredpages; i < bp->b_npages; i++) {
 					/*
@@ -3032,8 +3036,6 @@ allocbuf(struct buf *bp, int size)
 					vm_page_unlock(m);
 				}
 				VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object);
-				pmap_qremove((vm_offset_t) trunc_page((vm_offset_t)bp->b_data) +
-				    (desiredpages << PAGE_SHIFT), (bp->b_npages - desiredpages));
 				bp->b_npages = desiredpages;
 			}
 		} else if (size > bp->b_bcount) {
Index: sparc64/sparc64/pmap.c
===================================================================
--- sparc64/sparc64/pmap.c	(revision 223705)
+++ sparc64/sparc64/pmap.c	(working copy)
@@ -1286,6 +1289,7 @@ pmap_release(pmap_t pm)
 			pc->pc_pmap = NULL;
 	mtx_unlock_spin(&sched_lock);
 
+	pmap_qremove((vm_offset_t)pm->pm_tsb, TSB_PAGES);
 	obj = pm->pm_tsb_obj;
 	VM_OBJECT_LOCK(obj);
 	KASSERT(obj->ref_count == 1, ("pmap_release: tsbobj ref count != 1"));
@@ -1297,7 +1301,6 @@ pmap_release(pmap_t pm)
 		vm_page_free_zero(m);
 	}
 	VM_OBJECT_UNLOCK(obj);
-	pmap_qremove((vm_offset_t)pm->pm_tsb, TSB_PAGES);
 	PMAP_LOCK_DESTROY(pm);
 }
 
@@ -1379,6 +1382,8 @@ pmap_remove_all(vm_page_t m)
 	struct tte *tp;
 	vm_offset_t va;
 
+	KASSERT((m->flags & (PG_FICTITIOUS | PG_UNMANAGED)) == 0,
+	    ("pmap_remove_all: page %p is not managed", m));
 	vm_page_lock_queues();
 	for (tp = TAILQ_FIRST(&m->md.tte_list); tp != NULL; tp = tpn) {
 		tpn = TAILQ_NEXT(tp, tte_link);