Dave Dolson
2003-Aug-19 05:57 UTC
inode deadlock: can't reclaim VLRU: suggestions please [was RE: k ernel deadlock]
For FreeBSD 4.7
I've discovered the cause of the deadlock, but I can't figure out how to
fix
it.
See below for traces.
If the vnode limit has been reached, the vnlru process is kicked
and the requestor goes to sleep to wait for the vnlru process to
signal that vnodes are available (10% of the vnodes need to be
freed).
Under our test, none of the nodes meet the criteria for freeing,
so the vnlru process goes to sleep for 3 seconds without signaling
anything. Then it wakes, tries again, same result.
Current constraints are:
- v_type is not VNON or VBAD
- v_object is NULL or v_object->resident_page_count < trigger
- VMIGHTFREE(vp) is true
- can acquire vp->v_interlock
I tried adding code which uses only the following constraints if
no nodes could be freed the previous time:
- VMIGHTFREE(vp) is true
- can acquire v_interlock
However, few nodes meet these constraints either.
Which of the following approaches seem best:
1. Can I do away with some of the VMIGHTFREE() criteria?
I.e., are they constraints or merely heuristics?
#define VMIGHTFREE(vp) \
(!((vp)->v_flag & (VFREE|VDOOMED|VXLOCK)) && \
LIST_EMPTY(&(vp)->v_cache_src) && !(vp)->v_usecount)
2. If there is a dependancy on one of the user processes,
can I determine the offendor (maybe kill it)?
3. Should vnlru process signal the requestor if as few as one
nodes have been reclaimed (vs. the 10%)?
4. Why wait for 3 whole seconds? How about waiting one tick?
If possible, my preference is (1), freeing as much as possible
when things get bad.
Thanks in advance for your input.
David Dolson (ddolson@sandvine.com, www.sandvine.com)
