thr3ads.net - Xen devel - [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty

If this information is useful, please help other people find it:
Share via:

Kouya Shimura

2012-Nov-29 07:02 UTC

[PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall

There is a race condition between XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY
and HVMOP_track_dirty_vram hypercall.

Although HVMOP_track_dirty_vram is called many times from qemu-dm
which is connected via VNC, XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY is
called only once from a migration process (e.g. xc_save, libxl-save-helper).
So the race seldom happens, but the following cases are possible.

=======================================================================[case-1]
XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY hypercall
-> paging_enable_logdirty()
   -> hap_logdirty_init()
     -> paging_log_dirty_disable()
        dirty_vram = NULL
     -> paging_log_dirty_init(d, hap_enable_log_dirty, ...) ---> (A)
   -> paging_log_dirty_enable()
**************************************************************************
     /* <--- (B) */
     -> hap_enable_vram_tracking() // should be hap_enable_log_dirty() !!!
        return -EINVAL
<- return -EINVAL // live-migration failure!!!
**************************************************************************

HVMOP_track_dirty_vram
-> hap_track_dirty_vram()
   /* (!paging_mode_log_dirty(d) && !dirty_vram) */
   -> hap_vram_tracking_init()
     /* <--- (A) */
     -> paging_log_dirty_init(d, hap_enable_vram_tracking, ...) ---> (B)
   -> paging_log_dirty_enable()
=======================================================================[case-2]
XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY hypercall
-> paging_enable_logdirty()
   -> hap_logdirty_init()
     -> paging_log_dirty_disable()
        /* d->arch.hvm_domain.dirty_vram != NULL */
        d->arch.paging.mode &= ~PG_log_dirty; ---> (C)
        /* d->arch.hvm_domain.dirty_vram is freed */
        dereference dirty_vram // access to freed memory !!! <--- (D)

HVMOP_track_dirty_vram
-> hap_track_dirty_vram()
   /* (!paging_mode_log_dirty(d) && dirty_vram) */ <---(C)
     rc = -EINVAL
     xfree(dirty_vram); ---> (D)
     dirty_vram = d->arch.hvm_domain.dirty_vram = NULL;
     return rc;
=======================================================================
Actually I encountered the xen crash by null pointer exception in xen-3.4.
FYI, in xen-4.x, c/s 19738:8dd5c3cae086 mitigated it.

I''m not sure why paging_lock() is used partially in
hap_XXX_vram_tracking
functions. Thus, this patch introduces a new lock.
It would be better to use paging_lock() instead of the new lock
since shadow paging mode (not HAP mode) uses paging_lock to avoid
this race condition.

Thanks,
Kouya

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
lists.xen.org/xen-devel

Tim Deegan

2012-Nov-29 15:40 UTC

head link

Re: [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall

At 16:02 +0900 on 29 Nov (1354204941), Kouya Shimura
wrote:> I''m not sure why paging_lock() is used partially in
hap_XXX_vram_tracking
> functions. Thus, this patch introduces a new lock.
> It would be better to use paging_lock() instead of the new lock
> since shadow paging mode (not HAP mode) uses paging_lock to avoid
> this race condition.
I think you''re right - it would be better to use the paging_lock. 

Cc''ing Robert Phillips, who''s got a big patch outstanding that
touches
the locking in this code.  I think the right thing to do is make sure
his patch fixes the issue and the backport just the locking parts of it
to older trees.

Robert, in your patch you do wrap this all in the paging_lock, but then
unlock to call various enable and disable routines.  Is there a version
of this reace condition there, where some other CPU might call
LOG_DIRTY_ENABLE while you''ve temporarily dropped the lock?

Cheers,

Tim.

Robert Phillips

2012-Dec-03 17:59 UTC

head link

Re: [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall

Tim,
> -----Original Message-----
> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Thursday, November 29, 2012 10:41 AM
> To: Kouya Shimura
> Cc: Robert Phillips; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH] x86/hap: fix race condition between
> ENABLE_LOGDIRTY and track_dirty_vram hypercall
> 
> At 16:02 +0900 on 29 Nov (1354204941), Kouya Shimura wrote:
> > I''m not sure why paging_lock() is used partially in
hap_XXX_vram_tracking
> > functions. Thus, this patch introduces a new lock.
> > It would be better to use paging_lock() instead of the new lock
> > since shadow paging mode (not HAP mode) uses paging_lock to avoid
> > this race condition.
> 
> I think you''re right - it would be better to use the paging_lock.
> 
> Cc''ing Robert Phillips, who''s got a big patch outstanding
that touches
> the locking in this code.  I think the right thing to do is make sure
> his patch fixes the issue and the backport just the locking parts of it
> to older trees.
> 
> Robert, in your patch you do wrap this all in the paging_lock, but then
> unlock to call various enable and disable routines.  Is there a version
> of this race condition there, where some other CPU might call
> LOG_DIRTY_ENABLE while you''ve temporarily dropped the lock?
My proposed patch does not modify the problematic locking code so, 
unfortunately, it preserves the race condition that Kouya Shimura 
has discovered.  

I question whether his proposed patch would be suitable for the 
multiple frame buffer situation that my proposed patch addresses.
It is possible that a guest might be updating its frame buffers when 
live migration starts, and the same race would result.

I think the domain.arch.paging.log_dirty  function pointers are problematic.
They are modified and executed without benefit of locking.

I am uncomfortable with adding another lock.

I will look at updating my patch to avoid the race and will (hopefully) 
avoid adding another lock.

-- rsp
> 
> Cheers,
> 
> Tim.

Tim Deegan

2012-Dec-06 09:32 UTC

head link

Re: [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall

Hi,

At 12:59 -0500 on 03 Dec (1354539567), Robert Phillips
wrote:> > Robert, in your patch you do wrap this all in the paging_lock, but
then
> > unlock to call various enable and disable routines.  Is there a
version
> > of this race condition there, where some other CPU might call
> > LOG_DIRTY_ENABLE while you''ve temporarily dropped the lock?
> 
> My proposed patch does not modify the problematic locking code so, 
> unfortunately, it preserves the race condition that Kouya Shimura 
> has discovered.  
> 
> I question whether his proposed patch would be suitable for the 
> multiple frame buffer situation that my proposed patch addresses.
> It is possible that a guest might be updating its frame buffers when 
> live migration starts, and the same race would result.
> 
> I think the domain.arch.paging.log_dirty  function pointers are
problematic.
> They are modified and executed without benefit of locking.
> 
> I am uncomfortable with adding another lock.
> 
> I will look at updating my patch to avoid the race and will (hopefully) 
> avoid adding another lock.
Thanks.  I think the paging_lock can probably cover everything we need
here.  These are toolstack operations and should be fairly rare, and HAP
can do most of its work without the paging_lock.

Also, in the next version can you please update this section:

+int hap_track_dirty_vram(struct domain *d,
+                         unsigned long begin_pfn,
+                         unsigned long nr,
+                         XEN_GUEST_HANDLE_64(uint8) guest_dirty_bitmap)
+{
+    long rc = 0;
+    dv_dirty_vram_t *dirty_vram;
+       
+    paging_lock(d);
+    dirty_vram = d->arch.hvm_domain.dirty_vram;
+    if ( nr )
+    {
+        dv_range_t *range = NULL;
+        int size = ( nr + BITS_PER_LONG - 1 ) & ~( BITS_PER_LONG - 1 );
+        uint8_t dirty_bitmap[size];

not to allocate a guest-specified amount of stack memory.  This is one
of the things recently found and fixed in the existing code as XSA-27.
xenbits.xen.org/hg/staging/xen-4.1-testing.hg/rev/53ef1f35a0f8

Cheers,

Tim.

Robert Phillips

2012-Dec-06 10:36 UTC

head link

Re: [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall

Hi Tim,
> -----Original Message-----
> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Thursday, December 06, 2012 4:32 AM
> To: Robert Phillips
> Cc: Kouya Shimura; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH] x86/hap: fix race condition between
> ENABLE_LOGDIRTY and track_dirty_vram hypercall
> 
> Hi,
> 
> At 12:59 -0500 on 03 Dec (1354539567), Robert Phillips wrote:
> > > Robert, in your patch you do wrap this all in the paging_lock,
but
> > > then unlock to call various enable and disable routines.  Is
there a
> > > version of this race condition there, where some other CPU might
> > > call LOG_DIRTY_ENABLE while you''ve temporarily dropped
the lock?
> >
> > My proposed patch does not modify the problematic locking code so,
> > unfortunately, it preserves the race condition that Kouya Shimura has
> > discovered.
> >
> > I question whether his proposed patch would be suitable for the
> > multiple frame buffer situation that my proposed patch addresses.
> > It is possible that a guest might be updating its frame buffers when
> > live migration starts, and the same race would result.
> >
> > I think the domain.arch.paging.log_dirty  function pointers are
problematic.
> > They are modified and executed without benefit of locking.
> >
> > I am uncomfortable with adding another lock.
> >
> > I will look at updating my patch to avoid the race and will
> > (hopefully) avoid adding another lock.
> 
> Thanks.  I think the paging_lock can probably cover everything we need
here.
> These are toolstack operations and should be fairly rare, and HAP can do
> most of its work without the paging_lock.
I agree.  I am currently working on fix.  Should be ready in a short while.
> 
> Also, in the next version can you please update this section:
> 
> +int hap_track_dirty_vram(struct domain *d,
> +                         unsigned long begin_pfn,
> +                         unsigned long nr,
> +                         XEN_GUEST_HANDLE_64(uint8) guest_dirty_bitmap)
> +{
> +    long rc = 0;
> +    dv_dirty_vram_t *dirty_vram;
> +
> +    paging_lock(d);
> +    dirty_vram = d->arch.hvm_domain.dirty_vram;
> +    if ( nr )
> +    {
> +        dv_range_t *range = NULL;
> +        int size = ( nr + BITS_PER_LONG - 1 ) & ~( BITS_PER_LONG - 1
);
> +        uint8_t dirty_bitmap[size];
> 
> not to allocate a guest-specified amount of stack memory.  This is one of
the
> things recently found and fixed in the existing code as XSA-27.
> xenbits.xen.org/hg/staging/xen-4.1-testing.hg/rev/53ef1f35a0f8
I''ll include this improvement too.
-- rsp
> 
> Cheers,
> 
> Tim.

Possibly Parallel Threads

Search for more apparently analagous threads

Xen devel - Nov 2012 - [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall

[PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall

Re: [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall

Re: [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall

Re: [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall

Re: [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall

Possibly Parallel Threads