Kouya Shimura
2012-Nov-29 07:02 UTC
[PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall
There is a race condition between XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY and HVMOP_track_dirty_vram hypercall. Although HVMOP_track_dirty_vram is called many times from qemu-dm which is connected via VNC, XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY is called only once from a migration process (e.g. xc_save, libxl-save-helper). So the race seldom happens, but the following cases are possible. =======================================================================[case-1] XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY hypercall -> paging_enable_logdirty() -> hap_logdirty_init() -> paging_log_dirty_disable() dirty_vram = NULL -> paging_log_dirty_init(d, hap_enable_log_dirty, ...) ---> (A) -> paging_log_dirty_enable() ************************************************************************** /* <--- (B) */ -> hap_enable_vram_tracking() // should be hap_enable_log_dirty() !!! return -EINVAL <- return -EINVAL // live-migration failure!!! ************************************************************************** HVMOP_track_dirty_vram -> hap_track_dirty_vram() /* (!paging_mode_log_dirty(d) && !dirty_vram) */ -> hap_vram_tracking_init() /* <--- (A) */ -> paging_log_dirty_init(d, hap_enable_vram_tracking, ...) ---> (B) -> paging_log_dirty_enable() =======================================================================[case-2] XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY hypercall -> paging_enable_logdirty() -> hap_logdirty_init() -> paging_log_dirty_disable() /* d->arch.hvm_domain.dirty_vram != NULL */ d->arch.paging.mode &= ~PG_log_dirty; ---> (C) /* d->arch.hvm_domain.dirty_vram is freed */ dereference dirty_vram // access to freed memory !!! <--- (D) HVMOP_track_dirty_vram -> hap_track_dirty_vram() /* (!paging_mode_log_dirty(d) && dirty_vram) */ <---(C) rc = -EINVAL xfree(dirty_vram); ---> (D) dirty_vram = d->arch.hvm_domain.dirty_vram = NULL; return rc; ======================================================================= Actually I encountered the xen crash by null pointer exception in xen-3.4. FYI, in xen-4.x, c/s 19738:8dd5c3cae086 mitigated it. I''m not sure why paging_lock() is used partially in hap_XXX_vram_tracking functions. Thus, this patch introduces a new lock. It would be better to use paging_lock() instead of the new lock since shadow paging mode (not HAP mode) uses paging_lock to avoid this race condition. Thanks, Kouya Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org lists.xen.org/xen-devel
Tim Deegan
2012-Nov-29 15:40 UTC
Re: [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall
At 16:02 +0900 on 29 Nov (1354204941), Kouya Shimura wrote:> I''m not sure why paging_lock() is used partially in hap_XXX_vram_tracking > functions. Thus, this patch introduces a new lock. > It would be better to use paging_lock() instead of the new lock > since shadow paging mode (not HAP mode) uses paging_lock to avoid > this race condition.I think you''re right - it would be better to use the paging_lock. Cc''ing Robert Phillips, who''s got a big patch outstanding that touches the locking in this code. I think the right thing to do is make sure his patch fixes the issue and the backport just the locking parts of it to older trees. Robert, in your patch you do wrap this all in the paging_lock, but then unlock to call various enable and disable routines. Is there a version of this reace condition there, where some other CPU might call LOG_DIRTY_ENABLE while you''ve temporarily dropped the lock? Cheers, Tim.
Robert Phillips
2012-Dec-03 17:59 UTC
Re: [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall
Tim,> -----Original Message----- > From: Tim Deegan [mailto:tim@xen.org] > Sent: Thursday, November 29, 2012 10:41 AM > To: Kouya Shimura > Cc: Robert Phillips; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH] x86/hap: fix race condition between > ENABLE_LOGDIRTY and track_dirty_vram hypercall > > At 16:02 +0900 on 29 Nov (1354204941), Kouya Shimura wrote: > > I''m not sure why paging_lock() is used partially in hap_XXX_vram_tracking > > functions. Thus, this patch introduces a new lock. > > It would be better to use paging_lock() instead of the new lock > > since shadow paging mode (not HAP mode) uses paging_lock to avoid > > this race condition. > > I think you''re right - it would be better to use the paging_lock. > > Cc''ing Robert Phillips, who''s got a big patch outstanding that touches > the locking in this code. I think the right thing to do is make sure > his patch fixes the issue and the backport just the locking parts of it > to older trees. > > Robert, in your patch you do wrap this all in the paging_lock, but then > unlock to call various enable and disable routines. Is there a version > of this race condition there, where some other CPU might call > LOG_DIRTY_ENABLE while you''ve temporarily dropped the lock?My proposed patch does not modify the problematic locking code so, unfortunately, it preserves the race condition that Kouya Shimura has discovered. I question whether his proposed patch would be suitable for the multiple frame buffer situation that my proposed patch addresses. It is possible that a guest might be updating its frame buffers when live migration starts, and the same race would result. I think the domain.arch.paging.log_dirty function pointers are problematic. They are modified and executed without benefit of locking. I am uncomfortable with adding another lock. I will look at updating my patch to avoid the race and will (hopefully) avoid adding another lock. -- rsp> > Cheers, > > Tim.
Tim Deegan
2012-Dec-06 09:32 UTC
Re: [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall
Hi, At 12:59 -0500 on 03 Dec (1354539567), Robert Phillips wrote:> > Robert, in your patch you do wrap this all in the paging_lock, but then > > unlock to call various enable and disable routines. Is there a version > > of this race condition there, where some other CPU might call > > LOG_DIRTY_ENABLE while you''ve temporarily dropped the lock? > > My proposed patch does not modify the problematic locking code so, > unfortunately, it preserves the race condition that Kouya Shimura > has discovered. > > I question whether his proposed patch would be suitable for the > multiple frame buffer situation that my proposed patch addresses. > It is possible that a guest might be updating its frame buffers when > live migration starts, and the same race would result. > > I think the domain.arch.paging.log_dirty function pointers are problematic. > They are modified and executed without benefit of locking. > > I am uncomfortable with adding another lock. > > I will look at updating my patch to avoid the race and will (hopefully) > avoid adding another lock.Thanks. I think the paging_lock can probably cover everything we need here. These are toolstack operations and should be fairly rare, and HAP can do most of its work without the paging_lock. Also, in the next version can you please update this section: +int hap_track_dirty_vram(struct domain *d, + unsigned long begin_pfn, + unsigned long nr, + XEN_GUEST_HANDLE_64(uint8) guest_dirty_bitmap) +{ + long rc = 0; + dv_dirty_vram_t *dirty_vram; + + paging_lock(d); + dirty_vram = d->arch.hvm_domain.dirty_vram; + if ( nr ) + { + dv_range_t *range = NULL; + int size = ( nr + BITS_PER_LONG - 1 ) & ~( BITS_PER_LONG - 1 ); + uint8_t dirty_bitmap[size]; not to allocate a guest-specified amount of stack memory. This is one of the things recently found and fixed in the existing code as XSA-27. xenbits.xen.org/hg/staging/xen-4.1-testing.hg/rev/53ef1f35a0f8 Cheers, Tim.
Robert Phillips
2012-Dec-06 10:36 UTC
Re: [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall
Hi Tim,> -----Original Message----- > From: Tim Deegan [mailto:tim@xen.org] > Sent: Thursday, December 06, 2012 4:32 AM > To: Robert Phillips > Cc: Kouya Shimura; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH] x86/hap: fix race condition between > ENABLE_LOGDIRTY and track_dirty_vram hypercall > > Hi, > > At 12:59 -0500 on 03 Dec (1354539567), Robert Phillips wrote: > > > Robert, in your patch you do wrap this all in the paging_lock, but > > > then unlock to call various enable and disable routines. Is there a > > > version of this race condition there, where some other CPU might > > > call LOG_DIRTY_ENABLE while you''ve temporarily dropped the lock? > > > > My proposed patch does not modify the problematic locking code so, > > unfortunately, it preserves the race condition that Kouya Shimura has > > discovered. > > > > I question whether his proposed patch would be suitable for the > > multiple frame buffer situation that my proposed patch addresses. > > It is possible that a guest might be updating its frame buffers when > > live migration starts, and the same race would result. > > > > I think the domain.arch.paging.log_dirty function pointers are problematic. > > They are modified and executed without benefit of locking. > > > > I am uncomfortable with adding another lock. > > > > I will look at updating my patch to avoid the race and will > > (hopefully) avoid adding another lock. > > Thanks. I think the paging_lock can probably cover everything we need here. > These are toolstack operations and should be fairly rare, and HAP can do > most of its work without the paging_lock.I agree. I am currently working on fix. Should be ready in a short while.> > Also, in the next version can you please update this section: > > +int hap_track_dirty_vram(struct domain *d, > + unsigned long begin_pfn, > + unsigned long nr, > + XEN_GUEST_HANDLE_64(uint8) guest_dirty_bitmap) > +{ > + long rc = 0; > + dv_dirty_vram_t *dirty_vram; > + > + paging_lock(d); > + dirty_vram = d->arch.hvm_domain.dirty_vram; > + if ( nr ) > + { > + dv_range_t *range = NULL; > + int size = ( nr + BITS_PER_LONG - 1 ) & ~( BITS_PER_LONG - 1 ); > + uint8_t dirty_bitmap[size]; > > not to allocate a guest-specified amount of stack memory. This is one of the > things recently found and fixed in the existing code as XSA-27. > xenbits.xen.org/hg/staging/xen-4.1-testing.hg/rev/53ef1f35a0f8I''ll include this improvement too. -- rsp> > Cheers, > > Tim.
Possibly Parallel Threads
- track_dirty_vram(f0000000, 26) failed (-1, 3)
- track_dirty_vram(f0000000, 160) failed (-1, 22)
- [PATCH] vmx: Fix single step on debugger
- [Xen-ia64-devel] [PATCH 0/3][IA64] Accelerate IDE PIO on HVM/IA64
- [PATCH] x86/hvm: fix corrupt ACPI PM-Timer during live migration