Panic in nfs_putpages() on 6-stable.

Sun Jan 15 12:40:05 PST 2006

I've run into this panic a couple of times over the last few days, while
trying to rebuild ports using an NFS-mounted /usr/ports filesystem.  It
happened again today and this time I had time to look at the dump.

The problem is a null pointer dereference in nfs_putpages(), when it
tries to look at np->n_size.  It turns out that v_data is NULL on entry
to this routine.  Looking at the stack I see why:

#6  0xc0674e4a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0xc05eb030 in nfs_putpages (ap=0xe81c6a14)
at /usr/src/sys/nfsclient/nfs_bio.c:301
#8  0xc0691148 in VOP_PUTPAGES_APV (vop=0x1000, a=0xe81c6a14) at
vnode_if.c:2164
#9  0xc064fd8e in vnode_pager_putpages (object=0xcafaa840, m=0x1000,
count=0x1000, sync=0x5, rtvals=0x1000)
    at vnode_if.h:1119
During symbol reading, Attribute value is not a constant (DW_FORM_ref4).
#10 0xc064b99e in vm_pageout_flush (mc=0xe81c6ab0, count=0x1, flags=0x5)
at vm_pager.h:147
#11 0xc0647d0c in vm_object_page_collect_flush (object=0xcafaa840,
p=0xc19e5218, curgeneration=0x0, pagerflags=0x5)
    at /usr/src/sys/vm/vm_object.c:950
#12 0xc0647800 in vm_object_page_clean (object=0xcafaa840, start=0x0,
end=Unhandled dwarf expression opcode 0x93
) at /usr/src/sys/vm/vm_object.c:753
#13 0xc0647525 in vm_object_terminate (object=0xcafaa840)
at /usr/src/sys/vm/vm_object.c:608
#14 0xc064e5ad in vnode_destroy_vobject (vp=0xcb58c110)
at /usr/src/sys/vm/vnode_pager.c:166
#15 0xc05ee075 in nfs_reclaim (ap=0x1000)
at /usr/src/sys/nfsclient/nfs_node.c:247
#16 0xc069095e in VOP_RECLAIM_APV (vop=0x1000, a=0xe81c6c90) at
vnode_if.c:1589
#17 0xc0587aa5 in vgonel (vp=0xcb58c110) at vnode_if.h:818
#18 0xc0584ac2 in vlrureclaim (mp=0xc9b2e400)
at /usr/src/sys/kern/vfs_subr.c:612
#19 0xc0584e8b in vnlru_proc () at /usr/src/sys/kern/vfs_subr.c:725
#20 0xc052034c in fork_exit (callout=0xc0584d00 <vnlru_proc>, arg=0x0,
frame=0xe81c6d38)
    at /usr/src/sys/kern/kern_fork.c:789
#21 0xc0674eac in fork_trampoline ()
at /usr/src/sys/i386/i386/exception.s:208

In nfs_reclaim(), just before he calls vnode_destroy_vobject(), he
zfrees and clears vp->v_data.  When, down in the guts of vm_object.c, he
tries to flush the associated pages, v_data is already NULL so he goes
boom.

Now, why does he do the zfree/clear before vnode_destroy_vobject()?  Is
he assuming that there are no pages associated with this vnode that need
to be flushed?  Should there be? I looked at some other file systems and
they do the same thing.  The obvious fix is to move the zfree/clear to
after the vnode_destroy_vobject() but if there should be no pages that
need to be flushed on the vnode at this point, that would just hide the
problem.

I can keep looking at the code to answer my question but I thought I
would ask here first, in case there's someone who knows the answer right
away.  Thanks.
-- 
Frank Mayhar frank at exit.com     http://www.exit.com/
Exit Consulting                 http://www.gpsclock.com/
                                http://www.exit.com/blog/frank/