sysinstall spec_getpages panic (with VM overtones)
Robert Watson
rwatson at freebsd.org
Sun Aug 24 18:04:26 PDT 2003
On Mon, 25 Aug 2003, Gavin Atkinson wrote:
> On Wed, 20 Aug 2003, Robert Watson wrote:
> > On Wed, 20 Aug 2003, Gavin Atkinson wrote:
> > > _mtx_lock_flags(0,0,c0529513,300,ffffffff) at _mtx_lock_flags+0x43
> > > spec_getpages(cce33b3c,54,0,cce33b2c,0) at spec_getpages+0x26c
> > > ffs_getpages(cce33b80,0,c05459de,274,c05c63e0) at ffs_getpages+0x5f6
> > > vnode_pager_getpages(c0bebafc,cce33c70,1,0,cce33c20) at
> > > vnode_pager_getpages+0x73 vm_fault(c1259900,819b000,1,0,c12534c0) at
> > > vm_fault+0x8e2 trap_pfault(cce33d48,1,819b004,200,819b004) at
> > > trap_pfault+0x109 trap(2f,2f,2f,82e533c,0) at trap+0x1fc calltrap() at
> > > calltrap+0x5
> > >
> > > *c0529513 = "/usr/src/sys/fs/specfs/spec_vnops.c", line 0x300 is line 768:
> > >
> > > 766 gotreqpage = 0;
> > > 767 VM_OBJECT_LOCK(vp->v_object);
> > > 768 vm_page_lock_queues();
> > > 769 for (i = 0, toff = 0; i < pcount; i++, toff = nextoff) {
> >
> > Is it ap->a_vp that's NULL, or vp->v_object that's NULL? vp is
> > dereferenced several times before that in the code, so if vp is really
> > NULL at line 767, we're probably talking about memory corruption. But if
> > vp->v_object is NULL, then it could be we're not creating a VM object
> > along some code path.
>
> Although this panic is 100% reproducible during the initial install
> through sysinstall, I have tried hard but can not reproduce this once
> the system is installed and running multiuser, even by performing the
> same actions within sysinstall. I have I have also tried without success
> to get a crash dump of the panic, however after a fair bit of head
> scratching it looks from a grep of the source code like the "dumpdev"
> loader variable documented in loader(8) is not yet implemented... and as
> far as I can tell there is no other way I can get the installer off CD
> to generate a dump.
>
> I'm trying to make a release with extra debugging info, but won't be
> able to test this until at least Wednesday or Thursday. What extra
> debugging info would be useful? Who would be the best person to discuss
> this with? From what kuriyama said, it appears that it is indeed
> vp->v_object that is null, so I have added the following to
> specfs_vnops.c just before the lock that fails:
>
> if (vp->v_object == NULL)
> panic("vp->v_object is null in %s, rdev=%s", __func__,
> devtoname(vp->v_rdev));
>
> Hopefully that will help diagnose the cause a little further, but I'm
> really working blind here - this is not an area of the kernel I
> understand at all. If there is any other debugging info I can provide
> that may be useful, I'm happy to have a go. Kuriyama, if you have any
> spare time before I am able to do it, maybe you could add the above code
> and find out what message it panics with?
Alan Cox just made a commit a couple of days ago that seems to resolve the
problem for us. Here's the commit message so you can give it a try.
alc 2003/08/22 10:50:32 PDT
FreeBSD src repository
Modified files:
sys/fs/specfs spec_vnops.c
Log:
Use the requested page's object field instead of the vnode's. In some
cases, the vnode's object field is not initialized leading to a NULL
pointer dereference when the object is locked.
Tested by: rwatson
Revision Changes Path
1.208 +5 -2 src/sys/fs/specfs/spec_vnops.c
More information about the freebsd-current
mailing list