Dan Magenheimer
2012-Aug-31 19:58 UTC
[PATCH] [FOR 4.2] tmem: add matching unlock for an about-to-be-destroyed object
(Guidance appreciated if the Xen patch submittal process has changed and I need to do more than send this email.) A 4.2 changeset forces a preempt_disable/enable with every lock/unlock. Tmem has dynamically allocated "objects" that contain a lock. The lock is held when the object is destroyed. No reason to unlock something that''s about to be destroyed! But with the preempt_enable/disable in the generic locking code, and the fact that do_softirq ASSERTs that preempt_count must be zero, a crash occurs soon after any object is destroyed. So force lock to be released before destroying objects. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> diff -r 1967c7c290eb xen/common/tmem.c --- a/xen/common/tmem.c Wed Feb 09 12:03:09 2011 +0000 +++ b/xen/common/tmem.c Fri Aug 31 13:49:51 2012 -0600 @@ -957,6 +957,7 @@ /* use no_rebalance only if all objects are being destroyed anyway */ if ( !no_rebalance ) rb_erase(&obj->rb_tree_node,&pool->obj_rb_root[oid_hash(&old_oid)]); + tmem_spin_unlock(&obj->obj_spinlock); tmem_free(obj,sizeof(obj_t),pool); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Keir Fraser
2012-Aug-31 20:08 UTC
Re: [PATCH] [FOR 4.2] tmem: add matching unlock for an about-to-be-destroyed object
On 31/08/2012 20:58, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:> (Guidance appreciated if the Xen patch submittal process > has changed and I need to do more than send this email.)This is fine, thanks! -- Keir> A 4.2 changeset forces a preempt_disable/enable with > every lock/unlock. > > Tmem has dynamically allocated "objects" that contain a > lock. The lock is held when the object is destroyed. > No reason to unlock something that''s about to be destroyed! > But with the preempt_enable/disable in the generic locking code, > and the fact that do_softirq ASSERTs that preempt_count > must be zero, a crash occurs soon after any object is > destroyed. > > So force lock to be released before destroying objects. > > Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> > > diff -r 1967c7c290eb xen/common/tmem.c > --- a/xen/common/tmem.c Wed Feb 09 12:03:09 2011 +0000 > +++ b/xen/common/tmem.c Fri Aug 31 13:49:51 2012 -0600 > @@ -957,6 +957,7 @@ > /* use no_rebalance only if all objects are being destroyed anyway */ > if ( !no_rebalance ) > rb_erase(&obj->rb_tree_node,&pool->obj_rb_root[oid_hash(&old_oid)]); > + tmem_spin_unlock(&obj->obj_spinlock); > tmem_free(obj,sizeof(obj_t),pool); > } >
Jan Beulich
2012-Sep-05 13:24 UTC
Re: [PATCH] [FOR 4.2] tmem: add matching unlock for an about-to-be-destroyed object
>>> On 31.08.12 at 21:58, Dan Magenheimer <dan.magenheimer@oracle.com> wrote: > A 4.2 changeset forces a preempt_disable/enable with > every lock/unlock. > > Tmem has dynamically allocated "objects" that contain a > lock. The lock is held when the object is destroyed. > No reason to unlock something that''s about to be destroyed! > But with the preempt_enable/disable in the generic locking code, > and the fact that do_softirq ASSERTs that preempt_count > must be zero, a crash occurs soon after any object is > destroyed. > > So force lock to be released before destroying objects.We had noticed this oddity during XSA-15 processing too. What''s the purpose of holding a lock on a to be destroyed object anyway? One would expect a lock of a containing entity to be held for that purpose (or really just while taking the object off whatever data structure it can be looked up through), which would eliminate the need for locking the object itself. That lock generally appears to be pool_rwlock, but in several places that one gets acquired _after_ checking pgp_count to be zero - how is it being guaranteed that this count doesn''t increase between the check and the lock being acquired? Jan
Dan Magenheimer
2012-Sep-05 17:02 UTC
Re: [PATCH] [FOR 4.2] tmem: add matching unlock for an about-to-be-destroyed object
> From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Wednesday, September 05, 2012 7:25 AM > To: Dan Magenheimer > Cc: xen-devel@lists.xen.org; Konrad Wilk; Zhenzhong Duan; keir@xen.org > Subject: Re: [Xen-devel] [PATCH] [FOR 4.2] tmem: add matching unlock for an about-to-be-destroyed > object > > >>> On 31.08.12 at 21:58, Dan Magenheimer <dan.magenheimer@oracle.com> wrote: > > A 4.2 changeset forces a preempt_disable/enable with > > every lock/unlock. > > > > Tmem has dynamically allocated "objects" that contain a > > lock. The lock is held when the object is destroyed. > > No reason to unlock something that''s about to be destroyed! > > But with the preempt_enable/disable in the generic locking code, > > and the fact that do_softirq ASSERTs that preempt_count > > must be zero, a crash occurs soon after any object is > > destroyed. > > > > So force lock to be released before destroying objects. > > We had noticed this oddity during XSA-15 processing too. > What''s the purpose of holding a lock on a to be destroyed > object anyway? One would expect a lock of a containing > entity to be held for that purpose (or really just while > taking the object off whatever data structure it can be > looked up through), which would eliminate the need for > locking the object itself. That lock generally appears to be > pool_rwlock, but in several places that one gets acquired > _after_ checking pgp_count to be zero - how is it being > guaranteed that this count doesn''t increase between the > check and the lock being acquired?The flush_object code needs to obj_find the object before it can destroy it and a successful obj_find takes the lock. Pushing the lock down to the object level was a misguided attempt to maximize concurrency, especially for early versions of persistent pools (frontswap) where the number of objects was small (one per swap device). I do agree that the tmem locking model deserves a complete rewrite. The locking model of the similar code in upstream Linux (which model was originally suggested by Jeremy Fitzhardinge) is almost certainly superior and less buggy, though probably slightly less concurrent. The linux version probably should be "back ported" into Xen now. Dan