Dave Dolson
2003-Aug-19 05:57 UTC
inode deadlock: can't reclaim VLRU: suggestions please [was RE: k ernel deadlock]
For FreeBSD 4.7 I've discovered the cause of the deadlock, but I can't figure out how to fix it. See below for traces. If the vnode limit has been reached, the vnlru process is kicked and the requestor goes to sleep to wait for the vnlru process to signal that vnodes are available (10% of the vnodes need to be freed). Under our test, none of the nodes meet the criteria for freeing, so the vnlru process goes to sleep for 3 seconds without signaling anything. Then it wakes, tries again, same result. Current constraints are: - v_type is not VNON or VBAD - v_object is NULL or v_object->resident_page_count < trigger - VMIGHTFREE(vp) is true - can acquire vp->v_interlock I tried adding code which uses only the following constraints if no nodes could be freed the previous time: - VMIGHTFREE(vp) is true - can acquire v_interlock However, few nodes meet these constraints either. Which of the following approaches seem best: 1. Can I do away with some of the VMIGHTFREE() criteria? I.e., are they constraints or merely heuristics? #define VMIGHTFREE(vp) \ (!((vp)->v_flag & (VFREE|VDOOMED|VXLOCK)) && \ LIST_EMPTY(&(vp)->v_cache_src) && !(vp)->v_usecount) 2. If there is a dependancy on one of the user processes, can I determine the offendor (maybe kill it)? 3. Should vnlru process signal the requestor if as few as one nodes have been reclaimed (vs. the 10%)? 4. Why wait for 3 whole seconds? How about waiting one tick? If possible, my preference is (1), freeing as much as possible when things get bad. Thanks in advance for your input. David Dolson (ddolson@sandvine.com, www.sandvine.com)