debugging process in bovlbx state
jhb at freebsd.org
Mon Dec 20 14:59:08 UTC 2010
On Sunday, December 19, 2010 7:10:04 pm Benjamin Kaduk wrote:
> Hi all,
> I'm working on bringing the out-of-tree OpenAFS network filesystem
> up-to-date for FreeBSD 7.3-RELEASE, and I think I need some help to fix
> this bug.
> I should preface my discourse with the fact that there is a whole slow of
> lock order reversals that I haven't even tried to track down, but I do not
> believe that this hang is deadlock since 'show alllocks' in DDB does not
> show anything that seems interesting.
> Any pointers for things to look at would be appreciated; more details of
> the failing case below.
> In order to get the afs kernel module to load, I needed to tweak a few
> lines of code in getpages(), as I had previously cribbed a bunch of
> changes/updates from the experimental NFS client while getting AFS to work
> on current freebsd. In particular, vm_page_set_valid is not present in
> 7.3, so I am currently running with:
> --- a/src/afs/FBSD/osi_vnodeops.c
> +++ b/src/afs/FBSD/osi_vnodeops.c
> @@ -890,12 +890,8 @@ afs_vop_getpages(struct vop_getpages_args *ap)
> * Read operation filled a partial page.
> m->valid = 0;
> - vm_page_set_valid(m, 0, size - toff);
> -#ifndef AFS_FBSD80_ENV
> - vm_page_undirty(m);
> + vm_page_set_validclean(m, 0, size - toff);
> KASSERT(m->dirty == 0, ("afs_getpages: page %p is dirty", m));
> But my knowledge of vm_page_* is approximately nil, so there's no reason
> to think everything was correct even before that patch.
> Anyway, my test case is running libarchive's configure script with source
> and destination directories in (different places in) AFS. It only gets
> twenty lines in, ending with:
> checking for gcc option to accept ISO C89... none needed
> checking for style of include used by make... GNU
> checking dependency style of gcc...
> ^Tload: 0.04 cmd: cp 1250 [bovlbx] 0.00u 0.00
> procstat -kk reports:
> mega-man# procstat -kk 1250
> PID TID COMM TDNAME KSTACK
> 1250 100060 cp - mi_switch+0x233
> sleepq_switch+0xe9 sleepq_wait+0x44 _sleep+0x3a0 vm_object_pip_wait+0x4e
> bufobj_invalbuf+0x10e afs_GetVCache+0x2f7
> The call to vinvalbuf in afs_GetVCache is here:
> 1646 iheldthelock = VOP_ISLOCKED(vp, curthread);
This is probably wrong. VOP_ISLOCKED() can return four different values:
- LK_SHARED: (someone, possibly curthread) holds a shared lock
- LK_EXCLUSIVE: curthread holds an exclusive lock
- LK_EXCLOTHER: some other thread holds an exclusive lock
- 0: no thread holds any lock.
This means if another thread has the vnode locked, you don't try to lock it.
Do you actually know that this routine can be held without the vnode locked by
the current thread?
More information about the freebsd-fs