During our testing, we found this code path where xen attempts to grab the shadow_lock, while holding it - leading to a deadlock. >> free_dom_mem-> >> shadow_sync_and_drop_references-> >> shadow_lock -> ..................... first lock >> shadow_remove_all_access-> >> remove_all_access_in_page-> >> put_page-> >> free_domheap_pages-> >> shadow_drop_references-> >> shadow_lock -> ..................... second lock Questions: - should shadow lock be recursive? - is shadow lock too coarse grained? It seems to have led to a lot of code refactoring (__foo without lock and foo with lock). But there may be more such instances we haven''t found yet. -Arun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xiaofeng Ling
2005-May-12 01:57 UTC
[Xen-devel] Re: Should shadow_lock be spin_lock_recursive?
This dead lock happened on VNIF code when enabled shadow mode. The shadow_lock path is so complex and maybe using recurisve lock is the easiest way to avoid dead lock currently before clearing the path or splitting the lock . Arun Sharma wrote:> > During our testing, we found this code path where xen attempts to grab > the shadow_lock, while holding it - leading to a deadlock. > > >> free_dom_mem-> > >> shadow_sync_and_drop_references-> > >> shadow_lock -> ..................... first lock > >> shadow_remove_all_access-> > >> remove_all_access_in_page-> > >> put_page-> > >> free_domheap_pages-> > >> shadow_drop_references-> > >> shadow_lock -> ..................... second lock > > Questions: > > - should shadow lock be recursive? > - is shadow lock too coarse grained? It seems to have led to a lot of > code refactoring (__foo without lock and foo with lock). But there may > be more such instances we haven''t found yet. > > -Arun_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2005-May-12 07:57 UTC
Re: [Xen-devel] Re: Should shadow_lock be spin_lock_recursive?
On 12 May 2005, at 02:57, Xiaofeng Ling wrote:> This dead lock happened on VNIF code when enabled shadow mode. > The shadow_lock path is so complex and maybe using recurisve lock > is the easiest way to avoid dead lock currently before clearing > the path or splitting the lock .shadow_lock() and show_unlock() should just get the per-domain BIGLOCK I think. That''s recursive anyway. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Michael A Fetterman
2005-May-12 12:59 UTC
RE: [Xen-devel] Re: Should shadow_lock be spin_lock_recursive?
I think there''s another bug somewhere that''s provoking this. There may be a flaw in my argument, below, but I currently think this argument is correct: Consider the call tree: free_dom_mem() is trying to get rid of all shadow references to page X, so that it can relinguish page X back to the free list. *Note that free_dom_mem() has done a get_page() on X, so X''s refcount must be >= 1... free_dom_mem() calls shadow_sync_and_drop_references(X), which calls shadow_remove_all_access(X), which calls remove_all_access_in_page(random shadow page, X), which (when it finds references to X) calls put_page(X). However, those calls to put_page(X) should never result in calls to free_domheap_pages(), as X''s refcount should always be >= 1 because of the get_page performed in free_dom_mem(). So that tells me the refcount on X was already broken before we got here... Michael -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Xiaofeng Ling Sent: Thursday, May 12, 2005 2:57 AM To: Sharma, Arun Cc: xen-devel@lists.xensource.com Subject: [Xen-devel] Re: Should shadow_lock be spin_lock_recursive? This dead lock happened on VNIF code when enabled shadow mode. The shadow_lock path is so complex and maybe using recurisve lock is the easiest way to avoid dead lock currently before clearing the path or splitting the lock . Arun Sharma wrote:> > During our testing, we found this code path where xen attempts to grab > the shadow_lock, while holding it - leading to a deadlock. > > >> free_dom_mem-> > >> shadow_sync_and_drop_references-> > >> shadow_lock -> ..................... first lock > >> shadow_remove_all_access-> > >> remove_all_access_in_page-> > >> put_page-> > >> free_domheap_pages-> > >> shadow_drop_references-> > >> shadow_lock -> ..................... second lock > > Questions: > > - should shadow lock be recursive? > - is shadow lock too coarse grained? It seems to have led to a lot of > code refactoring (__foo without lock and foo with lock). But there may > be more such instances we haven''t found yet. > > -Arun_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Arun Sharma
2005-May-13 22:40 UTC
Re: [Xen-devel] Re: Should shadow_lock be spin_lock_recursive?
Michael A Fetterman wrote:> > However, those calls to put_page(X) should never result in calls > to free_domheap_pages(), as X''s refcount should always be >= 1 > because of the get_page performed in free_dom_mem(). > > So that tells me the refcount on X was already broken before we got > here... >Thanks for the explanation. It was a patched xen that we were testing and the problem may have been caused by our patch. We''ll try to look closely at the patch to figure out why the refrence counts were wrong. -Arun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel