1: fix locking in offline_page() 2: use return value of domain_adjust_tot_pages() where feasible Signed-off-by: Jan Beulich <jbeulich@suse.com>
Coverity ID 1055655 Apart from the Coverity-detected lock order reversal (a domain''s page_alloc_lock taken with the heap lock already held), calling put_page() with heap_lock is a bad idea too (as a possible descendant from put_page() is free_heap_pages(), which wants to take this very lock). From all I can tell the region over which heap_lock was held was far too large: All we need to protect are the call to mark_page_offline() and reserve_heap_page() (and I''d even put under question the need for the former). Hence by slightly re-arranging the if/else-if chain we can drop the lock much earlier, at once no longer covering the two put_page() invocations. Once at it, do a little bit of other cleanup: Put the "pod_replace" code path inline rather than at its own label, and drop the effectively unused variable "ret". Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -957,7 +957,6 @@ int offline_page(unsigned long mfn, int { unsigned long old_info = 0; struct domain *owner; - int ret = 0; struct page_info *pg; if ( !mfn_valid(mfn) ) @@ -1007,16 +1006,28 @@ int offline_page(unsigned long mfn, int if ( page_state_is(pg, offlined) ) { reserve_heap_page(pg); - *status = PG_OFFLINE_OFFLINED; + + spin_unlock(&heap_lock); + + *status = broken ? PG_OFFLINE_OFFLINED | PG_OFFLINE_BROKEN + : PG_OFFLINE_OFFLINED; + return 0; } - else if ( (owner = page_get_owner_and_reference(pg)) ) + + spin_unlock(&heap_lock); + + if ( (owner = page_get_owner_and_reference(pg)) ) { if ( p2m_pod_offline_or_broken_hit(pg) ) - goto pod_replace; + { + put_page(pg); + p2m_pod_offline_or_broken_replace(pg); + *status = PG_OFFLINE_OFFLINED; + } else { *status = PG_OFFLINE_OWNED | PG_OFFLINE_PENDING | - (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); + (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); /* Release the reference since it will not be allocated anymore */ put_page(pg); } @@ -1024,7 +1035,7 @@ int offline_page(unsigned long mfn, int else if ( old_info & PGC_xen_heap ) { *status = PG_OFFLINE_XENPAGE | PG_OFFLINE_PENDING | - (DOMID_XEN << PG_OFFLINE_OWNER_SHIFT); + (DOMID_XEN << PG_OFFLINE_OWNER_SHIFT); } else { @@ -1043,21 +1054,7 @@ int offline_page(unsigned long mfn, int if ( broken ) *status |= PG_OFFLINE_BROKEN; - spin_unlock(&heap_lock); - - return ret; - -pod_replace: - put_page(pg); - spin_unlock(&heap_lock); - - p2m_pod_offline_or_broken_replace(pg); - *status = PG_OFFLINE_OFFLINED; - - if ( broken ) - *status |= PG_OFFLINE_BROKEN; - - return ret; + return 0; } /* _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Nov-27 08:26 UTC
[PATCH 2/2] use return value of domain_adjust_tot_pages() where feasible
This is generally cheaper than re-reading ->tot_pages. While doing so I also noticed an improper use (lacking error handling) of get_domain() as well as lacks of ->is_dying checks in the memory sharing code, which the patch fixes at once. In the course of doing this I further noticed other error paths there pointlessly calling put_page() et al with ->page_alloc_lock still held, which is also being reversed. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -611,9 +611,16 @@ static int page_make_sharable(struct dom struct page_info *page, int expected_refcnt) { - int drop_dom_ref; + bool_t drop_dom_ref; + spin_lock(&d->page_alloc_lock); + if ( d->is_dying ) + { + spin_unlock(&d->page_alloc_lock); + return -EBUSY; + } + /* Change page type and count atomically */ if ( !get_page_and_type(page, d, PGT_shared_page) ) { @@ -624,8 +631,8 @@ static int page_make_sharable(struct dom /* Check it wasn''t already sharable and undo if it was */ if ( (page->u.inuse.type_info & PGT_count_mask) != 1 ) { - put_page_and_type(page); spin_unlock(&d->page_alloc_lock); + put_page_and_type(page); return -EEXIST; } @@ -633,15 +640,14 @@ static int page_make_sharable(struct dom * the second from get_page_and_type at the top of this function */ if ( page->count_info != (PGC_allocated | (2 + expected_refcnt)) ) { + spin_unlock(&d->page_alloc_lock); /* Return type count back to zero */ put_page_and_type(page); - spin_unlock(&d->page_alloc_lock); return -E2BIG; } page_set_owner(page, dom_cow); - domain_adjust_tot_pages(d, -1); - drop_dom_ref = (d->tot_pages == 0); + drop_dom_ref = !domain_adjust_tot_pages(d, -1); page_list_del(page, &d->page_list); spin_unlock(&d->page_alloc_lock); @@ -659,6 +665,13 @@ static int page_make_private(struct doma spin_lock(&d->page_alloc_lock); + if ( d->is_dying ) + { + spin_unlock(&d->page_alloc_lock); + put_page(page); + return -EBUSY; + } + /* We can only change the type if count is one */ /* Because we are locking pages individually, we need to drop * the lock here, while the page is typed. We cannot risk the @@ -666,8 +679,8 @@ static int page_make_private(struct doma expected_type = (PGT_shared_page | PGT_validated | PGT_locked | 2); if ( page->u.inuse.type_info != expected_type ) { - put_page(page); spin_unlock(&d->page_alloc_lock); + put_page(page); return -EEXIST; } @@ -682,7 +695,7 @@ static int page_make_private(struct doma page_set_owner(page, d); if ( domain_adjust_tot_pages(d, 1) == 1 ) - get_domain(d); + get_knownalive_domain(d); page_list_add_tail(page, &d->page_list); spin_unlock(&d->page_alloc_lock); --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -475,8 +475,8 @@ static long memory_exchange(XEN_GUEST_HA (j * (1UL << exch.out.extent_order))); spin_lock(&d->page_alloc_lock); - domain_adjust_tot_pages(d, -dec_count); - drop_dom_ref = (dec_count && !d->tot_pages); + drop_dom_ref = (dec_count && + !domain_adjust_tot_pages(d, -dec_count)); spin_unlock(&d->page_alloc_lock); if ( drop_dom_ref ) --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -1516,8 +1516,9 @@ struct page_info *alloc_domheap_pages( void free_domheap_pages(struct page_info *pg, unsigned int order) { - int i, drop_dom_ref; struct domain *d = page_get_owner(pg); + unsigned int i; + bool_t drop_dom_ref; ASSERT(!in_irq()); @@ -1545,8 +1546,7 @@ void free_domheap_pages(struct page_info page_list_del2(&pg[i], &d->page_list, &d->arch.relmem_list); } - domain_adjust_tot_pages(d, -(1 << order)); - drop_dom_ref = (d->tot_pages == 0); + drop_dom_ref = !domain_adjust_tot_pages(d, -(1 << order)); spin_unlock_recursive(&d->page_alloc_lock); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 27/11/13 08:25, Jan Beulich wrote:> Coverity ID 1055655 > > Apart from the Coverity-detected lock order reversal (a domain''s > page_alloc_lock taken with the heap lock already held), calling > put_page() with heap_lock is a bad idea too (as a possible descendant > from put_page() is free_heap_pages(), which wants to take this very > lock). > > From all I can tell the region over which heap_lock was held was far > too large: All we need to protect are the call to mark_page_offline() > and reserve_heap_page() (and I''d even put under question the need for > the former). Hence by slightly re-arranging the if/else-if chain we > can drop the lock much earlier, at once no longer covering the two > put_page() invocations. > > Once at it, do a little bit of other cleanup: Put the "pod_replace" > code path inline rather than at its own label, and drop the effectively > unused variable "ret". > > Signed-off-by: Jan Beulich <jbeulich@suse.com>> > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -957,7 +957,6 @@ int offline_page(unsigned long mfn, int > { > unsigned long old_info = 0; > struct domain *owner; > - int ret = 0; > struct page_info *pg; > > if ( !mfn_valid(mfn) ) > @@ -1007,16 +1006,28 @@ int offline_page(unsigned long mfn, int > if ( page_state_is(pg, offlined) ) > { > reserve_heap_page(pg); > - *status = PG_OFFLINE_OFFLINED; > + > + spin_unlock(&heap_lock); > + > + *status = broken ? PG_OFFLINE_OFFLINED | PG_OFFLINE_BROKEN > + : PG_OFFLINE_OFFLINED; > + return 0; > } > - else if ( (owner = page_get_owner_and_reference(pg)) ) > + > + spin_unlock(&heap_lock); > + > + if ( (owner = page_get_owner_and_reference(pg)) ) > { > if ( p2m_pod_offline_or_broken_hit(pg) ) > - goto pod_replace; > + { > + put_page(pg); > + p2m_pod_offline_or_broken_replace(pg); > + *status = PG_OFFLINE_OFFLINED; > + } > else > { > *status = PG_OFFLINE_OWNED | PG_OFFLINE_PENDING | > - (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); > + (owner->domain_id << PG_OFFLINE_OWNER_SHIFT);domain_id will be promoted from a uint16_t to an int32_t, then shifted left by 16 bits which is undefined if the top of domain_id bit was set. Do we care about the likelyhood of a domain_id with the top bit set? I certainly cant see how one would get into that state. ~Andrew> /* Release the reference since it will not be allocated anymore */ > put_page(pg); > } > @@ -1024,7 +1035,7 @@ int offline_page(unsigned long mfn, int > else if ( old_info & PGC_xen_heap ) > { > *status = PG_OFFLINE_XENPAGE | PG_OFFLINE_PENDING | > - (DOMID_XEN << PG_OFFLINE_OWNER_SHIFT); > + (DOMID_XEN << PG_OFFLINE_OWNER_SHIFT); > } > else > { > @@ -1043,21 +1054,7 @@ int offline_page(unsigned long mfn, int > if ( broken ) > *status |= PG_OFFLINE_BROKEN; > > - spin_unlock(&heap_lock); > - > - return ret; > - > -pod_replace: > - put_page(pg); > - spin_unlock(&heap_lock); > - > - p2m_pod_offline_or_broken_replace(pg); > - *status = PG_OFFLINE_OFFLINED; > - > - if ( broken ) > - *status |= PG_OFFLINE_BROKEN; > - > - return ret; > + return 0; > } > > /* > > >
>>> On 27.11.13 at 11:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 27/11/13 08:25, Jan Beulich wrote: >> else >> { >> *status = PG_OFFLINE_OWNED | PG_OFFLINE_PENDING | >> - (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); >> + (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); > > domain_id will be promoted from a uint16_t to an int32_t, then shifted > left by 16 bits which is undefined if the top of domain_id bit was set.That promotion is done by zero extension, so I don''t figure what you think is undefined here. Furthermore I suppose you realize that all the patch does here is adjust indentation. Jan
Andrew Cooper
2013-Nov-27 10:07 UTC
Re: [PATCH 2/2] use return value of domain_adjust_tot_pages() where feasible
On 27/11/13 08:26, Jan Beulich wrote:> This is generally cheaper than re-reading ->tot_pages. > > While doing so I also noticed an improper use (lacking error handling) > of get_domain() as well as lacks of ->is_dying checks in the memory > sharing code, which the patch fixes at once. In the course of doing > this I further noticed other error paths there pointlessly calling > put_page() et al with ->page_alloc_lock still held, which is also being > reversed. > > Signed-off-by: Jan Beulich <jbeulich@suse.com>Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>> > --- a/xen/arch/x86/mm/mem_sharing.c > +++ b/xen/arch/x86/mm/mem_sharing.c > @@ -611,9 +611,16 @@ static int page_make_sharable(struct dom > struct page_info *page, > int expected_refcnt) > { > - int drop_dom_ref; > + bool_t drop_dom_ref; > + > spin_lock(&d->page_alloc_lock); > > + if ( d->is_dying ) > + { > + spin_unlock(&d->page_alloc_lock); > + return -EBUSY; > + } > + > /* Change page type and count atomically */ > if ( !get_page_and_type(page, d, PGT_shared_page) ) > { > @@ -624,8 +631,8 @@ static int page_make_sharable(struct dom > /* Check it wasn''t already sharable and undo if it was */ > if ( (page->u.inuse.type_info & PGT_count_mask) != 1 ) > { > - put_page_and_type(page); > spin_unlock(&d->page_alloc_lock); > + put_page_and_type(page); > return -EEXIST; > } > > @@ -633,15 +640,14 @@ static int page_make_sharable(struct dom > * the second from get_page_and_type at the top of this function */ > if ( page->count_info != (PGC_allocated | (2 + expected_refcnt)) ) > { > + spin_unlock(&d->page_alloc_lock); > /* Return type count back to zero */ > put_page_and_type(page); > - spin_unlock(&d->page_alloc_lock); > return -E2BIG; > } > > page_set_owner(page, dom_cow); > - domain_adjust_tot_pages(d, -1); > - drop_dom_ref = (d->tot_pages == 0); > + drop_dom_ref = !domain_adjust_tot_pages(d, -1); > page_list_del(page, &d->page_list); > spin_unlock(&d->page_alloc_lock); > > @@ -659,6 +665,13 @@ static int page_make_private(struct doma > > spin_lock(&d->page_alloc_lock); > > + if ( d->is_dying ) > + { > + spin_unlock(&d->page_alloc_lock); > + put_page(page); > + return -EBUSY; > + } > + > /* We can only change the type if count is one */ > /* Because we are locking pages individually, we need to drop > * the lock here, while the page is typed. We cannot risk the > @@ -666,8 +679,8 @@ static int page_make_private(struct doma > expected_type = (PGT_shared_page | PGT_validated | PGT_locked | 2); > if ( page->u.inuse.type_info != expected_type ) > { > - put_page(page); > spin_unlock(&d->page_alloc_lock); > + put_page(page); > return -EEXIST; > } > > @@ -682,7 +695,7 @@ static int page_make_private(struct doma > page_set_owner(page, d); > > if ( domain_adjust_tot_pages(d, 1) == 1 ) > - get_domain(d); > + get_knownalive_domain(d); > page_list_add_tail(page, &d->page_list); > spin_unlock(&d->page_alloc_lock); > > --- a/xen/common/memory.c > +++ b/xen/common/memory.c > @@ -475,8 +475,8 @@ static long memory_exchange(XEN_GUEST_HA > (j * (1UL << exch.out.extent_order))); > > spin_lock(&d->page_alloc_lock); > - domain_adjust_tot_pages(d, -dec_count); > - drop_dom_ref = (dec_count && !d->tot_pages); > + drop_dom_ref = (dec_count && > + !domain_adjust_tot_pages(d, -dec_count)); > spin_unlock(&d->page_alloc_lock); > > if ( drop_dom_ref ) > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -1516,8 +1516,9 @@ struct page_info *alloc_domheap_pages( > > void free_domheap_pages(struct page_info *pg, unsigned int order) > { > - int i, drop_dom_ref; > struct domain *d = page_get_owner(pg); > + unsigned int i; > + bool_t drop_dom_ref; > > ASSERT(!in_irq()); > > @@ -1545,8 +1546,7 @@ void free_domheap_pages(struct page_info > page_list_del2(&pg[i], &d->page_list, &d->arch.relmem_list); > } > > - domain_adjust_tot_pages(d, -(1 << order)); > - drop_dom_ref = (d->tot_pages == 0); > + drop_dom_ref = !domain_adjust_tot_pages(d, -(1 << order)); > > spin_unlock_recursive(&d->page_alloc_lock); > > >
At 10:05 +0000 on 27 Nov (1385543154), Jan Beulich wrote:> >>> On 27.11.13 at 11:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > On 27/11/13 08:25, Jan Beulich wrote: > >> else > >> { > >> *status = PG_OFFLINE_OWNED | PG_OFFLINE_PENDING | > >> - (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); > >> + (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); > > > > domain_id will be promoted from a uint16_t to an int32_t, then shifted > > left by 16 bits which is undefined if the top of domain_id bit was set. > > That promotion is done by zero extension, so I don''t figure what > you think is undefined here.If the domid has bit 15 set, it will be shifted into the sign bit of the promoted integer; it should be explicitly cast to an unsigned type before the shift. Tim.> Furthermore I suppose you realize that all the patch does here is > adjust indentation. > > Jan >
At 08:07 +0000 on 27 Nov (1385536046), Jan Beulich wrote:> 1: fix locking in offline_page() > 2: use return value of domain_adjust_tot_pages() where feasible > > Signed-off-by: Jan Beulich <jbeulich@suse.com>Reviewed-by: Tim Deegan <tim@xen.org>
>>> On 27.11.13 at 11:34, Tim Deegan <tim@xen.org> wrote: > At 10:05 +0000 on 27 Nov (1385543154), Jan Beulich wrote: >> >>> On 27.11.13 at 11:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> > On 27/11/13 08:25, Jan Beulich wrote: >> >> else >> >> { >> >> *status = PG_OFFLINE_OWNED | PG_OFFLINE_PENDING | >> >> - (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); >> >> + (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); >> > >> > domain_id will be promoted from a uint16_t to an int32_t, then shifted >> > left by 16 bits which is undefined if the top of domain_id bit was set. >> >> That promotion is done by zero extension, so I don''t figure what >> you think is undefined here. > > If the domid has bit 15 set, it will be shifted into the sign bit of > the promoted integer; it should be explicitly cast to an unsigned type > before the shift.Again - I don''t see this. Promotion happens before the shift, i.e. 0x8000 -> 0x00008000 -> 0x80000000. Jan
At 10:43 +0000 on 27 Nov (1385545399), Jan Beulich wrote:> >>> On 27.11.13 at 11:34, Tim Deegan <tim@xen.org> wrote: > > At 10:05 +0000 on 27 Nov (1385543154), Jan Beulich wrote: > >> >>> On 27.11.13 at 11:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > >> > On 27/11/13 08:25, Jan Beulich wrote: > >> >> else > >> >> { > >> >> *status = PG_OFFLINE_OWNED | PG_OFFLINE_PENDING | > >> >> - (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); > >> >> + (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); > >> > > >> > domain_id will be promoted from a uint16_t to an int32_t, then shifted > >> > left by 16 bits which is undefined if the top of domain_id bit was set. > >> > >> That promotion is done by zero extension, so I don''t figure what > >> you think is undefined here. > > > > If the domid has bit 15 set, it will be shifted into the sign bit of > > the promoted integer; it should be explicitly cast to an unsigned type > > before the shift. > > Again - I don''t see this. Promotion happens before the shift, > i.e. 0x8000 -> 0x00008000 -> 0x80000000.AIUI the default promotion is to a signed integer if the value will fit, i.e.: (unsigned short) 0x8000 promoted (signed int) 0x00008000 shifted left (signed int) 0x80000000 (undefined behaviour) (but I don''t have my copy of the standard here to check). Tim.
At 11:48 +0100 on 27 Nov (1385549319), Tim Deegan wrote:> At 10:43 +0000 on 27 Nov (1385545399), Jan Beulich wrote: > > >>> On 27.11.13 at 11:34, Tim Deegan <tim@xen.org> wrote: > > > At 10:05 +0000 on 27 Nov (1385543154), Jan Beulich wrote: > > >> >>> On 27.11.13 at 11:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > >> > On 27/11/13 08:25, Jan Beulich wrote: > > >> >> else > > >> >> { > > >> >> *status = PG_OFFLINE_OWNED | PG_OFFLINE_PENDING | > > >> >> - (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); > > >> >> + (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); > > >> > > > >> > domain_id will be promoted from a uint16_t to an int32_t, then shifted > > >> > left by 16 bits which is undefined if the top of domain_id bit was set. > > >> > > >> That promotion is done by zero extension, so I don''t figure what > > >> you think is undefined here. > > > > > > If the domid has bit 15 set, it will be shifted into the sign bit of > > > the promoted integer; it should be explicitly cast to an unsigned type > > > before the shift. > > > > Again - I don''t see this. Promotion happens before the shift, > > i.e. 0x8000 -> 0x00008000 -> 0x80000000. > > AIUI the default promotion is to a signed integer if the value will > fit, i.e.: > (unsigned short) 0x8000 > promoted (signed int) 0x00008000 > shifted left (signed int) 0x80000000 (undefined behaviour) > > (but I don''t have my copy of the standard here to check).(and AFAICT there''s no way for a real domain''s domain_id to hva the top bit set so it''s a moot point). Tim.
>>> On 27.11.13 at 11:48, Tim Deegan <tim@xen.org> wrote: > At 10:43 +0000 on 27 Nov (1385545399), Jan Beulich wrote: >> >>> On 27.11.13 at 11:34, Tim Deegan <tim@xen.org> wrote: >> > At 10:05 +0000 on 27 Nov (1385543154), Jan Beulich wrote: >> >> >>> On 27.11.13 at 11:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> >> > On 27/11/13 08:25, Jan Beulich wrote: >> >> >> else >> >> >> { >> >> >> *status = PG_OFFLINE_OWNED | PG_OFFLINE_PENDING | >> >> >> - (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); >> >> >> + (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); >> >> > >> >> > domain_id will be promoted from a uint16_t to an int32_t, then shifted >> >> > left by 16 bits which is undefined if the top of domain_id bit was set. >> >> >> >> That promotion is done by zero extension, so I don''t figure what >> >> you think is undefined here. >> > >> > If the domid has bit 15 set, it will be shifted into the sign bit of >> > the promoted integer; it should be explicitly cast to an unsigned type >> > before the shift. >> >> Again - I don''t see this. Promotion happens before the shift, >> i.e. 0x8000 -> 0x00008000 -> 0x80000000. > > AIUI the default promotion is to a signed integer if the value will > fit, i.e.: > (unsigned short) 0x8000 > promoted (signed int) 0x00008000 > shifted left (signed int) 0x80000000 (undefined behaviour)Right - but the promotion (as you also show) is done via zero extension. Hence, plus because of left shifts being ignorant of signed-ness, no need for a cast. Jan
George Dunlap
2013-Nov-27 14:44 UTC
Re: [PATCH 2/2] use return value of domain_adjust_tot_pages() where feasible
On 11/27/2013 08:26 AM, Jan Beulich wrote:> This is generally cheaper than re-reading ->tot_pages. > > While doing so I also noticed an improper use (lacking error handling) > of get_domain() as well as lacks of ->is_dying checks in the memory > sharing code, which the patch fixes at once. In the course of doing > this I further noticed other error paths there pointlessly calling > put_page() et al with ->page_alloc_lock still held, which is also being > reversed.Thus hiding two very important changes underneath a summary that looks completely unimportant. If this patch only did as the title suggests, I would question whether it should be included for 4.4, since it seems to have little benefit. Are the other two changes bug fixes? In any case, the summary should indicate to someone just browsing through what important changes might be inside. -George
On 11/27/2013 08:25 AM, Jan Beulich wrote:> Coverity ID 1055655 > > Apart from the Coverity-detected lock order reversal (a domain''s > page_alloc_lock taken with the heap lock already held), calling > put_page() with heap_lock is a bad idea too (as a possible descendant > from put_page() is free_heap_pages(), which wants to take this very > lock).This sounds like a bug fix, so doesn''t need a freeze exception at this time. -George> > From all I can tell the region over which heap_lock was held was far > too large: All we need to protect are the call to mark_page_offline() > and reserve_heap_page() (and I''d even put under question the need for > the former). Hence by slightly re-arranging the if/else-if chain we > can drop the lock much earlier, at once no longer covering the two > put_page() invocations. > > Once at it, do a little bit of other cleanup: Put the "pod_replace" > code path inline rather than at its own label, and drop the effectively > unused variable "ret". > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -957,7 +957,6 @@ int offline_page(unsigned long mfn, int > { > unsigned long old_info = 0; > struct domain *owner; > - int ret = 0; > struct page_info *pg; > > if ( !mfn_valid(mfn) ) > @@ -1007,16 +1006,28 @@ int offline_page(unsigned long mfn, int > if ( page_state_is(pg, offlined) ) > { > reserve_heap_page(pg); > - *status = PG_OFFLINE_OFFLINED; > + > + spin_unlock(&heap_lock); > + > + *status = broken ? PG_OFFLINE_OFFLINED | PG_OFFLINE_BROKEN > + : PG_OFFLINE_OFFLINED; > + return 0; > } > - else if ( (owner = page_get_owner_and_reference(pg)) ) > + > + spin_unlock(&heap_lock); > + > + if ( (owner = page_get_owner_and_reference(pg)) ) > { > if ( p2m_pod_offline_or_broken_hit(pg) ) > - goto pod_replace; > + { > + put_page(pg); > + p2m_pod_offline_or_broken_replace(pg); > + *status = PG_OFFLINE_OFFLINED; > + } > else > { > *status = PG_OFFLINE_OWNED | PG_OFFLINE_PENDING | > - (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); > + (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); > /* Release the reference since it will not be allocated anymore */ > put_page(pg); > } > @@ -1024,7 +1035,7 @@ int offline_page(unsigned long mfn, int > else if ( old_info & PGC_xen_heap ) > { > *status = PG_OFFLINE_XENPAGE | PG_OFFLINE_PENDING | > - (DOMID_XEN << PG_OFFLINE_OWNER_SHIFT); > + (DOMID_XEN << PG_OFFLINE_OWNER_SHIFT); > } > else > { > @@ -1043,21 +1054,7 @@ int offline_page(unsigned long mfn, int > if ( broken ) > *status |= PG_OFFLINE_BROKEN; > > - spin_unlock(&heap_lock); > - > - return ret; > - > -pod_replace: > - put_page(pg); > - spin_unlock(&heap_lock); > - > - p2m_pod_offline_or_broken_replace(pg); > - *status = PG_OFFLINE_OFFLINED; > - > - if ( broken ) > - *status |= PG_OFFLINE_BROKEN; > - > - return ret; > + return 0; > } > > /* > > >
Jan Beulich
2013-Nov-27 15:46 UTC
Re: [PATCH 2/2] use return value of domain_adjust_tot_pages() where feasible
>>> On 27.11.13 at 15:44, George Dunlap <george.dunlap@eu.citrix.com> wrote: > On 11/27/2013 08:26 AM, Jan Beulich wrote: >> This is generally cheaper than re-reading ->tot_pages. >> >> While doing so I also noticed an improper use (lacking error handling) >> of get_domain() as well as lacks of ->is_dying checks in the memory >> sharing code, which the patch fixes at once. In the course of doing >> this I further noticed other error paths there pointlessly calling >> put_page() et al with ->page_alloc_lock still held, which is also being >> reversed. > > Thus hiding two very important changes underneath a summary that looks > completely unimportant. > > If this patch only did as the title suggests, I would question whether > it should be included for 4.4, since it seems to have little benefit. > Are the other two changes bug fixes?I''m sure the missing ->is_dying check is one. I''m not sure the put vs unlock ordering is just inefficient or could actively cause issues.> In any case, the summary should indicate to someone just browsing > through what important changes might be inside.The issue is that you can''t really put all three things in the title. And hence I decided to keep the title what it was supposed to be when I started correcting this code. With - in my understanding - the memory sharing code still being experimental, it also didn''t really seem worthwhile splitting these changes out into separate patches (I generally try to do so when spotting isolated issues, but tend to keep things together when in the course of doing other adjustments I recognize further small issues in code that''s not production ready anyway, i.e. not needing to be backported, and hence the title not needing to hint at that). In any event, as to the freeze: The code was written and would have been submitted before the code freeze if I hadn''t been required to hold it back until after XSA-74 went public (which goes back to our [unwritten?] policy of not publishing changes that were found necessary/desirable in the course of researching a specific security issue). Jan
George Dunlap
2013-Nov-27 16:51 UTC
Re: [PATCH 2/2] use return value of domain_adjust_tot_pages() where feasible
On 11/27/2013 03:46 PM, Jan Beulich wrote:>>>> On 27.11.13 at 15:44, George Dunlap <george.dunlap@eu.citrix.com> wrote: >> On 11/27/2013 08:26 AM, Jan Beulich wrote: >>> This is generally cheaper than re-reading ->tot_pages. >>> >>> While doing so I also noticed an improper use (lacking error handling) >>> of get_domain() as well as lacks of ->is_dying checks in the memory >>> sharing code, which the patch fixes at once. In the course of doing >>> this I further noticed other error paths there pointlessly calling >>> put_page() et al with ->page_alloc_lock still held, which is also being >>> reversed. >> Thus hiding two very important changes underneath a summary that looks >> completely unimportant. >> >> If this patch only did as the title suggests, I would question whether >> it should be included for 4.4, since it seems to have little benefit. >> Are the other two changes bug fixes? > I''m sure the missing ->is_dying check is one. I''m not sure the > put vs unlock ordering is just inefficient or could actively cause > issues. > >> In any case, the summary should indicate to someone just browsing >> through what important changes might be inside. > The issue is that you can''t really put all three things in the title. > And hence I decided to keep the title what it was supposed to be > when I started correcting this code.I think I would call it something like, "Various fix-ups to mm-related code". That would allow anyone scanning it to know that there were a number of fix-ups, and they were in the MM code; and would prompt them to look further if it seemed like something they might be looking for (either fixes to backport, or the source of a bug that was introduced).> With - in my understanding - the memory sharing code still being > experimental, it also didn''t really seem worthwhile splitting these > changes out into separate patches (I generally try to do so when > spotting isolated issues, but tend to keep things together when > in the course of doing other adjustments I recognize further small > issues in code that''s not production ready anyway, i.e. not > needing to be backported, and hence the title not needing to hint > at that). > > In any event, as to the freeze: The code was written and would > have been submitted before the code freeze if I hadn''t been > required to hold it back until after XSA-74 went public (which > goes back to our [unwritten?] policy of not publishing changes > that were found necessary/desirable in the course of researching > a specific security issue).Unfortunately, the "unfairness" of having been held back for a security embargo (or in the dom0 PVH case, having been submitted a year ago) doesn''t change the realities of the situation: that making changes risks introducing bugs which delay the release, or worse, are not found until after the release. That may be a reason to consider it "having been submitted before the feature freeze", but it can''t tilt the scales of a cost/benefits analysis. Anyway, we''re only half-way through the code freeze, and these do look like bug fixes; I''m inclined to say they''re OK. -George
At 10:55 +0000 on 27 Nov (1385546118), Jan Beulich wrote:> >>> On 27.11.13 at 11:48, Tim Deegan <tim@xen.org> wrote: > > At 10:43 +0000 on 27 Nov (1385545399), Jan Beulich wrote: > >> >>> On 27.11.13 at 11:34, Tim Deegan <tim@xen.org> wrote: > >> > At 10:05 +0000 on 27 Nov (1385543154), Jan Beulich wrote: > >> >> >>> On 27.11.13 at 11:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > >> >> > On 27/11/13 08:25, Jan Beulich wrote: > >> >> >> else > >> >> >> { > >> >> >> *status = PG_OFFLINE_OWNED | PG_OFFLINE_PENDING | > >> >> >> - (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); > >> >> >> + (owner->domain_id << PG_OFFLINE_OWNER_SHIFT); > >> >> > > >> >> > domain_id will be promoted from a uint16_t to an int32_t, then shifted > >> >> > left by 16 bits which is undefined if the top of domain_id bit was set. > >> >> > >> >> That promotion is done by zero extension, so I don''t figure what > >> >> you think is undefined here. > >> > > >> > If the domid has bit 15 set, it will be shifted into the sign bit of > >> > the promoted integer; it should be explicitly cast to an unsigned type > >> > before the shift. > >> > >> Again - I don''t see this. Promotion happens before the shift, > >> i.e. 0x8000 -> 0x00008000 -> 0x80000000. > > > > AIUI the default promotion is to a signed integer if the value will > > fit, i.e.: > > (unsigned short) 0x8000 > > promoted (signed int) 0x00008000 > > shifted left (signed int) 0x80000000 (undefined behaviour) > > Right - but the promotion (as you also show) is done via zero > extension. Hence, plus because of left shifts being ignorant of > signed-ness, no need for a cast.No: left-shifting that set bit into the sign bit of the promoted value is undefined behaviour. I still don''t have my standard reference to hand, but e.g. http://blog.regehr.org/archives/738 Tim.
>>> Tim Deegan <tim@xen.org> 11/28/13 11:26 AM >>> >At 10:55 +0000 on 27 Nov (1385546118), Jan Beulich wrote: >> >>> On 27.11.13 at 11:48, Tim Deegan <tim@xen.org> wrote: >> > AIUI the default promotion is to a signed integer if the value will >> > fit, i.e.: >> > (unsigned short) 0x8000 >> > promoted (signed int) 0x00008000 >> > shifted left (signed int) 0x80000000 (undefined behaviour) >> >> Right - but the promotion (as you also show) is done via zero >> extension. Hence, plus because of left shifts being ignorant of >> signed-ness, no need for a cast. > >No: left-shifting that set bit into the sign bit of the promoted value >is undefined behaviour. I still don''t have my standard reference to >hand, but e.g. http://blog.regehr.org/archives/738Ah, indeed. I can certainly add a cast there, but as said before - the value can''t be negative as we only permit 2^^15 domains, - the change to the line in question was only white space adjustment. Jan
At 14:38 +0000 on 28 Nov (1385645907), Jan Beulich wrote:> >>> Tim Deegan <tim@xen.org> 11/28/13 11:26 AM >>> > >At 10:55 +0000 on 27 Nov (1385546118), Jan Beulich wrote: > >> >>> On 27.11.13 at 11:48, Tim Deegan <tim@xen.org> wrote: > >> > AIUI the default promotion is to a signed integer if the value will > >> > fit, i.e.: > >> > (unsigned short) 0x8000 > >> > promoted (signed int) 0x00008000 > >> > shifted left (signed int) 0x80000000 (undefined behaviour) > >> > >> Right - but the promotion (as you also show) is done via zero > >> extension. Hence, plus because of left shifts being ignorant of > >> signed-ness, no need for a cast. > > > >No: left-shifting that set bit into the sign bit of the promoted value > >is undefined behaviour. I still don''t have my standard reference to > >hand, but e.g. http://blog.regehr.org/archives/738 > > Ah, indeed. I can certainly add a cast there, but as said before > - the value can''t be negative as we only permit 2^^15 domains,Absolutely. My reviewed-by stands on the original patch. Tim.
On 27/11/2013 08:07, "Jan Beulich" <JBeulich@suse.com> wrote:> 1: fix locking in offline_page() > 2: use return value of domain_adjust_tot_pages() where feasible > > Signed-off-by: Jan Beulich <jbeulich@suse.com>Acked-by: Keir Fraser <keir@xen.org>> > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel