inode deadlock: can't reclaim VLRU: suggestions please [was RE: k ernel deadlock]

Dave Dolson ddolson at sandvine.com
Tue Aug 19 05:57:36 PDT 2003


For FreeBSD 4.7

I've discovered the cause of the deadlock, but I can't figure out how to fix
it.
See below for traces.

If the vnode limit has been reached, the vnlru process is kicked 
and the requestor goes to sleep to wait for the vnlru process to 
signal that vnodes are available (10% of the vnodes need to be 
freed).

Under our test, none of the nodes meet the criteria for freeing, 
so the vnlru process goes to sleep for 3 seconds without signaling 
anything.  Then it wakes, tries again, same result.

Current constraints are:
 - v_type is not VNON or VBAD
 - v_object is NULL or v_object->resident_page_count < trigger
 - VMIGHTFREE(vp) is true
 - can acquire vp->v_interlock 


I tried adding code which uses only the following constraints if 
no nodes could be freed the previous time:
 - VMIGHTFREE(vp) is true
 - can acquire v_interlock

However, few nodes meet these constraints either.

Which of the following approaches seem best:
1. Can I do away with some of the VMIGHTFREE() criteria?  
I.e., are they constraints or merely heuristics?
#define VMIGHTFREE(vp) \
        (!((vp)->v_flag & (VFREE|VDOOMED|VXLOCK)) &&   \
         LIST_EMPTY(&(vp)->v_cache_src) && !(vp)->v_usecount)

2. If there is a dependancy on one of the user processes, 
can I determine the offendor (maybe kill it)?

3. Should vnlru process signal the requestor if as few as one 
nodes have been reclaimed (vs. the 10%)?

4. Why wait for 3 whole seconds?  How about waiting one tick?

If possible, my preference is (1), freeing as much as possible 
when things get bad.

Thanks in advance for your input.

David Dolson (ddolson at sandvine.com, www.sandvine.com)


More information about the freebsd-stable mailing list