thr3ads.net - Xen devel - LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end) [Apr 2011]

If this information is useful, please help other people find it:
Share via:

George Dunlap

2011-Apr-01 11:02 UTC

LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)

On Mon, Sep 6, 2010 at 11:03 AM, Ian Campbell <ian.campbell@citrix.com>
wrote:> # HG changeset patch
> # User Ian Campbell <ian.campbell@citrix.com>
> # Date 1283766891 -3600
> # Node ID bdf8ce09160d715451e1204babe5f80886ea6183
> # Parent  5f96f36feebdb87eaadbbcab0399f32eda86f735
> libxc: provide notification of final checkpoint to restore end
>
> When the restore code sees this notification it will restore the
> currently in-progress checkpoint when it completes.
>
> This allows the restore end to finish up without waiting for a
> spurious timeout on the receive fd and thereby avoids unnecessary
> error logging in the case of a successful migration or restore.
>
> In the normal migration or restore case the first checkpoint is always
> the last. For a rolling checkpoint (such as Remus) the notification is
> currently unused but could be used in the future for example to
> provide a controlled failover for reasons other than error
Unfortunatley, this introduces a backwards-compatibility problem,
since older versions of the tools never send LAST_CHECKPOINT.

Would it make sense to invert the logic on this -- i.e., default to
only one checkpoint, and send "MORE_CHECKPOINTS" if it expects this
*not* to be the last (either by expecting the checkpoint() function to
do it, or by sending it from xc_domain_save)?

Now that 4.1 is out, we have to consider version compatibilities.  But
we should be able to make it so that there would only be if:
1) You''re running REMUS
2) One of your toolstacks is 4.1.0, and one is not.

That seems like it shouldn''t be too bad.

Thoughts?

 -George
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Acked-by: Brendan Cully <brendan@cs.ubc.ca>
>
> diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_restore.c
> --- a/tools/libxc/xc_domain_restore.c   Mon Sep 06 10:54:51 2010 +0100
> +++ b/tools/libxc/xc_domain_restore.c   Mon Sep 06 10:54:51 2010 +0100
> @@ -42,6 +42,7 @@ struct restore_ctx {
>     xen_pfn_t *p2m; /* A table mapping each PFN to its new MFN. */
>     xen_pfn_t *p2m_batch; /* A table of P2M mappings in the current region.
 */
>     int completed; /* Set when a consistent image is available */
> +    int last_checkpoint; /* Set when we should commit to the current
checkpoint when it completes. */
>     struct domain_info_context dinfo;
>  };
>
> @@ -765,6 +766,11 @@ static int pagebuf_get_one(xc_interface
>         // DPRINTF("console pfn location: %llx\n",
buf->console_pfn);
>         return pagebuf_get_one(xch, ctx, buf, fd, dom);
>
> +    case XC_SAVE_ID_LAST_CHECKPOINT:
> +        ctx->last_checkpoint = 1;
> +        // DPRINTF("last checkpoint indication received");
> +        return pagebuf_get_one(xch, ctx, buf, fd, dom);
> +
>     default:
>         if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
>             ERROR("Max batch size exceeded (%d). Giving up.",
count);
> @@ -1296,10 +1302,23 @@ int xc_domain_restore(xc_interface *xch,
>             goto out;
>         }
>         ctx->completed = 1;
> -        /* shift into nonblocking mode for the remainder */
> -        if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
> -            flags = 0;
> -        fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
> +
> +        /*
> +         * If more checkpoints are expected then shift into
> +         * nonblocking mode for the remainder.
> +         */
> +        if ( !ctx->last_checkpoint )
> +        {
> +            if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
> +                flags = 0;
> +            fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
> +        }
> +    }
> +
> +    if ( ctx->last_checkpoint )
> +    {
> +        // DPRINTF("Last checkpoint, finishing\n");
> +        goto finish;
>     }
>
>     // DPRINTF("Buffered checkpoint\n");
> diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_save.c
> --- a/tools/libxc/xc_domain_save.c      Mon Sep 06 10:54:51 2010 +0100
> +++ b/tools/libxc/xc_domain_save.c      Mon Sep 06 10:54:51 2010 +0100
> @@ -1616,6 +1616,20 @@ int xc_domain_save(xc_interface *xch, in
>         }
>     }
>
> +    if ( !callbacks->checkpoint )
> +    {
> +        /*
> +         * If this is not a checkpointed save then this must be the first
and
> +         * last checkpoint.
> +         */
> +        i = XC_SAVE_ID_LAST_CHECKPOINT;
> +        if ( wrexact(io_fd, &i, sizeof(int)) )
> +        {
> +            PERROR("Error when writing last checkpoint chunk");
> +            goto out;
> +        }
> +    }
> +
>     /* Zero terminate */
>     i = 0;
>     if ( wrexact(io_fd, &i, sizeof(int)) )
> diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xg_save_restore.h
> --- a/tools/libxc/xg_save_restore.h     Mon Sep 06 10:54:51 2010 +0100
> +++ b/tools/libxc/xg_save_restore.h     Mon Sep 06 10:54:51 2010 +0100
> @@ -131,6 +131,7 @@
>  #define XC_SAVE_ID_TMEM_EXTRA         -6
>  #define XC_SAVE_ID_TSC_INFO           -7
>  #define XC_SAVE_ID_HVM_CONSOLE_PFN    -8 /* (HVM-only) */
> +#define XC_SAVE_ID_LAST_CHECKPOINT    -9 /* Commit to restoring after
completion of current iteration. */
>
>  /*
>  ** We process save/restore/migrate in batches of pages; the below
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Apr-01 11:58 UTC

head link

Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)

On Fri, 2011-04-01 at 12:02 +0100, George Dunlap wrote:> On Mon, Sep 6, 2010 at 11:03 AM, Ian Campbell
<ian.campbell@citrix.com> wrote:
> > # HG changeset patch
> > # User Ian Campbell <ian.campbell@citrix.com>
> > # Date 1283766891 -3600
> > # Node ID bdf8ce09160d715451e1204babe5f80886ea6183
> > # Parent  5f96f36feebdb87eaadbbcab0399f32eda86f735
> > libxc: provide notification of final checkpoint to restore end
> >
> > When the restore code sees this notification it will restore the
> > currently in-progress checkpoint when it completes.
> >
> > This allows the restore end to finish up without waiting for a
> > spurious timeout on the receive fd and thereby avoids unnecessary
> > error logging in the case of a successful migration or restore.
> >
> > In the normal migration or restore case the first checkpoint is always
> > the last. For a rolling checkpoint (such as Remus) the notification is
> > currently unused but could be used in the future for example to
> > provide a controlled failover for reasons other than error
> 
> Unfortunatley, this introduces a backwards-compatibility problem,
> since older versions of the tools never send LAST_CHECKPOINT.
Out of interest what actually happens? (I think you mentioned something
about failing when restoring from a file?)

My intention at the time was that in the absence of a LAST_CHECKPOINT
things would continue to behave as before e.g. you would see the
spurious timeout on the receive fd but still restore correctly.

Perhaps the associated changes to the O_NONBLOCK''ness of the fd at
various stages is what broken this for the non-Remus migrations?
> Would it make sense to invert the logic on this -- i.e., default to
> only one checkpoint, and send "MORE_CHECKPOINTS" if it expects
this
> *not* to be the last (either by expecting the checkpoint() function to
> do it, or by sending it from xc_domain_save)?
I''d quite like to understand what I broke and investigate the
trivialness (or otherwise) of a fix, but this change would probably
independently make sense too.
> Now that 4.1 is out, we have to consider version compatibilities.  But
> we should be able to make it so that there would only be if:
> 1) You''re running REMUS
Is it the case that Remus only cares about checkpointing between like
versions of the toolstack?
> 2) One of your toolstacks is 4.1.0, and one is not.
> 
> That seems like it shouldn''t be too bad.
> 
> Thoughts?
> 
>  -George
> 
> >
> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > Acked-by: Brendan Cully <brendan@cs.ubc.ca>
> >
> > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_restore.c
> > --- a/tools/libxc/xc_domain_restore.c   Mon Sep 06 10:54:51 2010 +0100
> > +++ b/tools/libxc/xc_domain_restore.c   Mon Sep 06 10:54:51 2010 +0100
> > @@ -42,6 +42,7 @@ struct restore_ctx {
> >     xen_pfn_t *p2m; /* A table mapping each PFN to its new MFN. */
> >     xen_pfn_t *p2m_batch; /* A table of P2M mappings in the current
region.  */
> >     int completed; /* Set when a consistent image is available */
> > +    int last_checkpoint; /* Set when we should commit to the current
checkpoint when it completes. */
> >     struct domain_info_context dinfo;
> >  };
> >
> > @@ -765,6 +766,11 @@ static int pagebuf_get_one(xc_interface
> >         // DPRINTF("console pfn location: %llx\n",
buf->console_pfn);
> >         return pagebuf_get_one(xch, ctx, buf, fd, dom);
> >
> > +    case XC_SAVE_ID_LAST_CHECKPOINT:
> > +        ctx->last_checkpoint = 1;
> > +        // DPRINTF("last checkpoint indication received");
> > +        return pagebuf_get_one(xch, ctx, buf, fd, dom);
> > +
> >     default:
> >         if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
> >             ERROR("Max batch size exceeded (%d). Giving
up.", count);
> > @@ -1296,10 +1302,23 @@ int xc_domain_restore(xc_interface *xch,
> >             goto out;
> >         }
> >         ctx->completed = 1;
> > -        /* shift into nonblocking mode for the remainder */
> > -        if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
> > -            flags = 0;
> > -        fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
> > +
> > +        /*
> > +         * If more checkpoints are expected then shift into
> > +         * nonblocking mode for the remainder.
> > +         */
> > +        if ( !ctx->last_checkpoint )
> > +        {
> > +            if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
> > +                flags = 0;
> > +            fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
> > +        }
> > +    }
> > +
> > +    if ( ctx->last_checkpoint )
> > +    {
> > +        // DPRINTF("Last checkpoint, finishing\n");
> > +        goto finish;
> >     }
> >
> >     // DPRINTF("Buffered checkpoint\n");
> > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_save.c
> > --- a/tools/libxc/xc_domain_save.c      Mon Sep 06 10:54:51 2010 +0100
> > +++ b/tools/libxc/xc_domain_save.c      Mon Sep 06 10:54:51 2010 +0100
> > @@ -1616,6 +1616,20 @@ int xc_domain_save(xc_interface *xch, in
> >         }
> >     }
> >
> > +    if ( !callbacks->checkpoint )
> > +    {
> > +        /*
> > +         * If this is not a checkpointed save then this must be the
first and
> > +         * last checkpoint.
> > +         */
> > +        i = XC_SAVE_ID_LAST_CHECKPOINT;
> > +        if ( wrexact(io_fd, &i, sizeof(int)) )
> > +        {
> > +            PERROR("Error when writing last checkpoint
chunk");
> > +            goto out;
> > +        }
> > +    }
> > +
> >     /* Zero terminate */
> >     i = 0;
> >     if ( wrexact(io_fd, &i, sizeof(int)) )
> > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xg_save_restore.h
> > --- a/tools/libxc/xg_save_restore.h     Mon Sep 06 10:54:51 2010 +0100
> > +++ b/tools/libxc/xg_save_restore.h     Mon Sep 06 10:54:51 2010 +0100
> > @@ -131,6 +131,7 @@
> >  #define XC_SAVE_ID_TMEM_EXTRA         -6
> >  #define XC_SAVE_ID_TSC_INFO           -7
> >  #define XC_SAVE_ID_HVM_CONSOLE_PFN    -8 /* (HVM-only) */
> > +#define XC_SAVE_ID_LAST_CHECKPOINT    -9 /* Commit to restoring after
completion of current iteration. */
> >
> >  /*
> >  ** We process save/restore/migrate in batches of pages; the below
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> >


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Apr-01 13:49 UTC

head link

Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)

On Fri, Apr 1, 2011 at 12:58 PM, Ian Campbell
<Ian.Campbell@eu.citrix.com> wrote:> On Fri, 2011-04-01 at 12:02 +0100, George Dunlap wrote:
>> On Mon, Sep 6, 2010 at 11:03 AM, Ian Campbell
<ian.campbell@citrix.com> wrote:
>> > # HG changeset patch
>> > # User Ian Campbell <ian.campbell@citrix.com>
>> > # Date 1283766891 -3600
>> > # Node ID bdf8ce09160d715451e1204babe5f80886ea6183
>> > # Parent  5f96f36feebdb87eaadbbcab0399f32eda86f735
>> > libxc: provide notification of final checkpoint to restore end
>> >
>> > When the restore code sees this notification it will restore the
>> > currently in-progress checkpoint when it completes.
>> >
>> > This allows the restore end to finish up without waiting for a
>> > spurious timeout on the receive fd and thereby avoids unnecessary
>> > error logging in the case of a successful migration or restore.
>> >
>> > In the normal migration or restore case the first checkpoint is
always
>> > the last. For a rolling checkpoint (such as Remus) the
notification is
>> > currently unused but could be used in the future for example to
>> > provide a controlled failover for reasons other than error
>>
>> Unfortunatley, this introduces a backwards-compatibility problem,
>> since older versions of the tools never send LAST_CHECKPOINT.
>
> Out of interest what actually happens? (I think you mentioned something
> about failing when restoring from a file?)
Steps to reproduce:
* Install last release of XenServer (based on Xen 3.4)
* Install a VM, suspend it.
* Upgrade to development version of XenServer (based on 4.1)
* Attempt to resume VM; resume fails.

libxc output is below:

Apr  1 14:27:29 localhost xenguest: xc: detail: xc_domain_restore
start: p2m_size = fefff
Apr  1 14:27:33 localhost xenguest: xc: error: Max batch size exceeded
(1970103633). Giving up.: Internal error
Apr  1 14:27:33 localhost xenguest: xc: error: error when buffering
batch, finishing (90 = Message too long): Internal error
Apr  1 14:27:33 localhost xenguest: xc: detail: Restore exit with rc=0

I had added debugging to print out the "batch" values read, and
confirmed that this is *after* the page transfer and everything else
was done, and after the "0" entry indicating the end of a checkpoint.
So it''s reading through one full checkpoint, and then reading some
garbage value and failing.  Initializing "ctx->last_checkpoint" to
1
makes the problem go away.

Ah -- hmm.  You know what I bet -- xc_domain_save() doesn''t write the
qemu checkpoint, but xc_domain_restore() reads it (again because of
REMUS).  Libxl handles this properly, but making xapi write in such a
way that xc_domain_restore() could read it properly turned out to be
non-trivial; making xc_domain_restore() not read the qemu file was the
simplest solution.

But of course, that means that there''s still data in the file after
you''ve finished a "checkpoint", so the next read succeeds in
reading
in the qemu save file.  Not surprising that it doesn''t look right. :-)
> My intention at the time was that in the absence of a LAST_CHECKPOINT
> things would continue to behave as before e.g. you would see the
> spurious timeout on the receive fd but still restore correctly.
>
> Perhaps the associated changes to the O_NONBLOCK''ness of the fd at
> various stages is what broken this for the non-Remus migrations?
>
>> Would it make sense to invert the logic on this -- i.e., default to
>> only one checkpoint, and send "MORE_CHECKPOINTS" if it
expects this
>> *not* to be the last (either by expecting the checkpoint() function to
>> do it, or by sending it from xc_domain_save)?
>
> I''d quite like to understand what I broke and investigate the
> trivialness (or otherwise) of a fix, but this change would probably
> independently make sense too.
>
>> Now that 4.1 is out, we have to consider version compatibilities.  But
>> we should be able to make it so that there would only be if:
>> 1) You''re running REMUS
>
> Is it the case that Remus only cares about checkpointing between like
> versions of the toolstack?
>
>> 2) One of your toolstacks is 4.1.0, and one is not.
>>
>> That seems like it shouldn''t be too bad.
>>
>> Thoughts?
>>
>>  -George
>>
>> >
>> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>> > Acked-by: Brendan Cully <brendan@cs.ubc.ca>
>> >
>> > diff -r 5f96f36feebd -r bdf8ce09160d
tools/libxc/xc_domain_restore.c
>> > --- a/tools/libxc/xc_domain_restore.c   Mon Sep 06 10:54:51 2010
+0100
>> > +++ b/tools/libxc/xc_domain_restore.c   Mon Sep 06 10:54:51 2010
+0100
>> > @@ -42,6 +42,7 @@ struct restore_ctx {
>> >     xen_pfn_t *p2m; /* A table mapping each PFN to its new MFN. */
>> >     xen_pfn_t *p2m_batch; /* A table of P2M mappings in the
current region.  */
>> >     int completed; /* Set when a consistent image is available */
>> > +    int last_checkpoint; /* Set when we should commit to the
current checkpoint when it completes. */
>> >     struct domain_info_context dinfo;
>> >  };
>> >
>> > @@ -765,6 +766,11 @@ static int pagebuf_get_one(xc_interface
>> >         // DPRINTF("console pfn location: %llx\n",
buf->console_pfn);
>> >         return pagebuf_get_one(xch, ctx, buf, fd, dom);
>> >
>> > +    case XC_SAVE_ID_LAST_CHECKPOINT:
>> > +        ctx->last_checkpoint = 1;
>> > +        // DPRINTF("last checkpoint indication
received");
>> > +        return pagebuf_get_one(xch, ctx, buf, fd, dom);
>> > +
>> >     default:
>> >         if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
>> >             ERROR("Max batch size exceeded (%d). Giving
up.", count);
>> > @@ -1296,10 +1302,23 @@ int xc_domain_restore(xc_interface *xch,
>> >             goto out;
>> >         }
>> >         ctx->completed = 1;
>> > -        /* shift into nonblocking mode for the remainder */
>> > -        if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
>> > -            flags = 0;
>> > -        fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
>> > +
>> > +        /*
>> > +         * If more checkpoints are expected then shift into
>> > +         * nonblocking mode for the remainder.
>> > +         */
>> > +        if ( !ctx->last_checkpoint )
>> > +        {
>> > +            if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
>> > +                flags = 0;
>> > +            fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
>> > +        }
>> > +    }
>> > +
>> > +    if ( ctx->last_checkpoint )
>> > +    {
>> > +        // DPRINTF("Last checkpoint, finishing\n");
>> > +        goto finish;
>> >     }
>> >
>> >     // DPRINTF("Buffered checkpoint\n");
>> > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_save.c
>> > --- a/tools/libxc/xc_domain_save.c      Mon Sep 06 10:54:51 2010
+0100
>> > +++ b/tools/libxc/xc_domain_save.c      Mon Sep 06 10:54:51 2010
+0100
>> > @@ -1616,6 +1616,20 @@ int xc_domain_save(xc_interface *xch, in
>> >         }
>> >     }
>> >
>> > +    if ( !callbacks->checkpoint )
>> > +    {
>> > +        /*
>> > +         * If this is not a checkpointed save then this must be
the first and
>> > +         * last checkpoint.
>> > +         */
>> > +        i = XC_SAVE_ID_LAST_CHECKPOINT;
>> > +        if ( wrexact(io_fd, &i, sizeof(int)) )
>> > +        {
>> > +            PERROR("Error when writing last checkpoint
chunk");
>> > +            goto out;
>> > +        }
>> > +    }
>> > +
>> >     /* Zero terminate */
>> >     i = 0;
>> >     if ( wrexact(io_fd, &i, sizeof(int)) )
>> > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xg_save_restore.h
>> > --- a/tools/libxc/xg_save_restore.h     Mon Sep 06 10:54:51 2010
+0100
>> > +++ b/tools/libxc/xg_save_restore.h     Mon Sep 06 10:54:51 2010
+0100
>> > @@ -131,6 +131,7 @@
>> >  #define XC_SAVE_ID_TMEM_EXTRA         -6
>> >  #define XC_SAVE_ID_TSC_INFO           -7
>> >  #define XC_SAVE_ID_HVM_CONSOLE_PFN    -8 /* (HVM-only) */
>> > +#define XC_SAVE_ID_LAST_CHECKPOINT    -9 /* Commit to restoring
after completion of current iteration. */
>> >
>> >  /*
>> >  ** We process save/restore/migrate in batches of pages; the below
>> >
>> > _______________________________________________
>> > Xen-devel mailing list
>> > Xen-devel@lists.xensource.com
>> > http://lists.xensource.com/xen-devel
>> >
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Shriram Rajagopalan

2011-Apr-02 03:57 UTC

head link

Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)

On 2011-04-01, at 6:58 AM, Ian Campbell <Ian.Campbell@eu.citrix.com>
wrote:
> On Fri, 2011-04-01 at 12:02 +0100, George Dunlap wrote:
>> On Mon, Sep 6, 2010 at 11:03 AM, Ian Campbell
<ian.campbell@citrix.com> wrote:
>>> # HG changeset patch
>>> # User Ian Campbell <ian.campbell@citrix.com>
>>> # Date 1283766891 -3600
>>> # Node ID bdf8ce09160d715451e1204babe5f80886ea6183
>>> # Parent  5f96f36feebdb87eaadbbcab0399f32eda86f735
>>> libxc: provide notification of final checkpoint to restore end
>>> 
>>> When the restore code sees this notification it will restore the
>>> currently in-progress checkpoint when it completes.
>>> 
>>> This allows the restore end to finish up without waiting for a
>>> spurious timeout on the receive fd and thereby avoids unnecessary
>>> error logging in the case of a successful migration or restore.
>>> 
>>> In the normal migration or restore case the first checkpoint is
always
>>> the last. For a rolling checkpoint (such as Remus) the notification
is
>>> currently unused but could be used in the future for example to
>>> provide a controlled failover for reasons other than error
>> 
>> Unfortunatley, this introduces a backwards-compatibility problem,
>> since older versions of the tools never send LAST_CHECKPOINT.
> 
> Out of interest what actually happens? (I think you mentioned something
> about failing when restoring from a file?)
> 
> My intention at the time was that in the absence of a LAST_CHECKPOINT
> things would continue to behave as before e.g. you would see the
> spurious timeout on the receive fd but still restore correctly.
> 
> Perhaps the associated changes to the O_NONBLOCK''ness of the fd at
> various stages is what broken this for the non-Remus migrations?
> 
>> Would it make sense to invert the logic on this -- i.e., default to
>> only one checkpoint, and send "MORE_CHECKPOINTS" if it
expects this
>> *not* to be the last (either by expecting the checkpoint() function to
>> do it, or by sending it from xc_domain_save)?
> 
> I''d quite like to understand what I broke and investigate the
> trivialness (or otherwise) of a fix, but this change would probably
> independently make sense too.
> 
>> Now that 4.1 is out, we have to consider version compatibilities.  But
>> we should be able to make it so that there would only be if:
>> 1) You''re running REMUS
> 
> Is it the case that Remus only cares about checkpointing between like
> versions of the toolstack?
> Can you please elaborate this statement ?

Shriram>> 2) One of your toolstacks is 4.1.0, and one is not.
>> 
>> That seems like it shouldn''t be too bad.
>> 
>> Thoughts?
>> 
>> -George
>> 
>>> 
>>> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>>> Acked-by: Brendan Cully <brendan@cs.ubc.ca>
>>> 
>>> diff -r 5f96f36feebd -r bdf8ce09160d
tools/libxc/xc_domain_restore.c
>>> --- a/tools/libxc/xc_domain_restore.c   Mon Sep 06 10:54:51 2010
+0100
>>> +++ b/tools/libxc/xc_domain_restore.c   Mon Sep 06 10:54:51 2010
+0100
>>> @@ -42,6 +42,7 @@ struct restore_ctx {
>>>   xen_pfn_t *p2m; /* A table mapping each PFN to its new MFN. */
>>>   xen_pfn_t *p2m_batch; /* A table of P2M mappings in the current
region.  */
>>>   int completed; /* Set when a consistent image is available */
>>> +    int last_checkpoint; /* Set when we should commit to the
current checkpoint when it completes. */
>>>   struct domain_info_context dinfo;
>>> };
>>> 
>>> @@ -765,6 +766,11 @@ static int pagebuf_get_one(xc_interface
>>>       // DPRINTF("console pfn location: %llx\n",
buf->console_pfn);
>>>       return pagebuf_get_one(xch, ctx, buf, fd, dom);
>>> 
>>> +    case XC_SAVE_ID_LAST_CHECKPOINT:
>>> +        ctx->last_checkpoint = 1;
>>> +        // DPRINTF("last checkpoint indication
received");
>>> +        return pagebuf_get_one(xch, ctx, buf, fd, dom);
>>> +
>>>   default:
>>>       if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
>>>           ERROR("Max batch size exceeded (%d). Giving
up.", count);
>>> @@ -1296,10 +1302,23 @@ int xc_domain_restore(xc_interface *xch,
>>>           goto out;
>>>       }
>>>       ctx->completed = 1;
>>> -        /* shift into nonblocking mode for the remainder */
>>> -        if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
>>> -            flags = 0;
>>> -        fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
>>> +
>>> +        /*
>>> +         * If more checkpoints are expected then shift into
>>> +         * nonblocking mode for the remainder.
>>> +         */
>>> +        if ( !ctx->last_checkpoint )
>>> +        {
>>> +            if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
>>> +                flags = 0;
>>> +            fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
>>> +        }
>>> +    }
>>> +
>>> +    if ( ctx->last_checkpoint )
>>> +    {
>>> +        // DPRINTF("Last checkpoint, finishing\n");
>>> +        goto finish;
>>>   }
>>> 
>>>   // DPRINTF("Buffered checkpoint\n");
>>> diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_save.c
>>> --- a/tools/libxc/xc_domain_save.c      Mon Sep 06 10:54:51 2010
+0100
>>> +++ b/tools/libxc/xc_domain_save.c      Mon Sep 06 10:54:51 2010
+0100
>>> @@ -1616,6 +1616,20 @@ int xc_domain_save(xc_interface *xch, in
>>>       }
>>>   }
>>> 
>>> +    if ( !callbacks->checkpoint )
>>> +    {
>>> +        /*
>>> +         * If this is not a checkpointed save then this must be
the first and
>>> +         * last checkpoint.
>>> +         */
>>> +        i = XC_SAVE_ID_LAST_CHECKPOINT;
>>> +        if ( wrexact(io_fd, &i, sizeof(int)) )
>>> +        {
>>> +            PERROR("Error when writing last checkpoint
chunk");
>>> +            goto out;
>>> +        }
>>> +    }
>>> +
>>>   /* Zero terminate */
>>>   i = 0;
>>>   if ( wrexact(io_fd, &i, sizeof(int)) )
>>> diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xg_save_restore.h
>>> --- a/tools/libxc/xg_save_restore.h     Mon Sep 06 10:54:51 2010
+0100
>>> +++ b/tools/libxc/xg_save_restore.h     Mon Sep 06 10:54:51 2010
+0100
>>> @@ -131,6 +131,7 @@
>>> #define XC_SAVE_ID_TMEM_EXTRA         -6
>>> #define XC_SAVE_ID_TSC_INFO           -7
>>> #define XC_SAVE_ID_HVM_CONSOLE_PFN    -8 /* (HVM-only) */
>>> +#define XC_SAVE_ID_LAST_CHECKPOINT    -9 /* Commit to restoring
after completion of current iteration. */
>>> 
>>> /*
>>> ** We process save/restore/migrate in batches of pages; the below
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>> 
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Apr-02 08:13 UTC

head link

Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)

On Sat, 2011-04-02 at 04:57 +0100, Shriram Rajagopalan
wrote:> On 2011-04-01, at 6:58 AM, Ian Campbell <Ian.Campbell@eu.citrix.com>
wrote:
> > Is it the case that Remus only cares about checkpointing between like
> > versions of the toolstack?
> > 
> Can you please elaborate this statement ?
For standard suspend/resume or migration we support migrating from
version N to version N+1 (but not vice versa), to support upgrades.

In the case of Remus though are we interested in supporting a rolling
checkpoint from a version N system to a version N+1 fallback system? Or
is Remus only interested in supporting checkpoints between systems
running the same version of Xen? 

Bear in mind that if you did support N->N+1 checkpoints you wouldn''t
be
able to migrate the VM back after a failover...

FWIW I think George got to the bottom of the specific issue he was
seeing and that the LAST_CHECKPOINT thing was a red-herring, although
flipping the protocol to use a MORE_CHECKPOINTS schema perhaps makes
sense anyway.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Shriram Rajagopalan

2011-Apr-02 14:03 UTC

head link

Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)

On Sat, Apr 2, 2011 at 1:13 AM, Ian Campbell
<Ian.Campbell@eu.citrix.com>wrote:
> On Sat, 2011-04-02 at 04:57 +0100, Shriram Rajagopalan wrote:
> > On 2011-04-01, at 6:58 AM, Ian Campbell
<Ian.Campbell@eu.citrix.com>
> wrote:
>
> > > Is it the case that Remus only cares about checkpointing between
like
> > > versions of the toolstack?
> > >
> > Can you please elaborate this statement ?
>
> For standard suspend/resume or migration we support migrating from
> version N to version N+1 (but not vice versa), to support upgrades.
>
> In the case of Remus though are we interested in supporting a rolling
> checkpoint from a version N system to a version N+1 fallback system? Or
> is Remus only interested in supporting checkpoints between systems
> running the same version of Xen?
>
> N->N+1 should work, for Remus even now, without the LAST_CHECKPOINT
patch. But as you said, failback from N+1 -> N wont work.

I agree with George''s suggestion on MORE_CHECKPOINTS, for backwards
compatibility
wrt live migration. But for HA, it doest make much sense if a user is able
to do HA only
one way and cannot failback. This is not a upgrade scenario. So, that would
require some
exception to be thrown when a version incompatibility is detected. IMO, its
better to
let the user handle this limitation than letting him/her do the N->N+1 HA
and then finding out
that they cannot failback.


shriram
> Bear in mind that if you did support N->N+1 checkpoints you
wouldn''t be
> able to migrate the VM back after a failover...
>
> FWIW I think George got to the bottom of the specific issue he was
> seeing and that the LAST_CHECKPOINT thing was a red-herring, although
> flipping the protocol to use a MORE_CHECKPOINTS schema perhaps makes
> sense anyway.
>
> Ian.
>
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Apr-04 09:42 UTC

head link

Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)

On Sat, Apr 2, 2011 at 3:03 PM, Shriram Rajagopalan <rshriram@cs.ubc.ca>
wrote:> I agree with George''s suggestion on MORE_CHECKPOINTS, for
backwards
> compatibility
> wrt live migration.
It turns out that there actually isn''t (I don''t believe) a
compatibility issue with live migration. the issue I was seeing was
due to a modification I''d made to libxc to get the XenServer / XCP
toolstack (xapi) working with Xen 4.1.  As long as you''re using only
open-source tools, you should still be able to migrate 4.0->4.1 --
there will just be a short delay while the toolstack waits for more
checkpoints.
> But for HA, it doest make much sense if a user is able
> to do HA only
> one way and cannot failback. This is not a upgrade scenario. So, that would
> require some
> exception to be thrown when a version incompatibility is detected. IMO, its
> better to
> let the user handle this limitation than letting him/her do the N->N+1
HA
> and then finding out
> that they cannot failback.
Well, you can''t actually do HA while one of the two hosts is being
rebooted anyway. :-)  So I think with backwards compatibility, you
could do it with 3 hosts: run on A with B as a fallback while
upgrading C, then run on A with C as a fallback while upgrading B,
then migrate to C (or just pull the plug on A and let it fall back
normally) and run on C with B as a fallback while upgrading A.
Without backwards compatibility, I think you could do it with 4 (so
you always have 2 hosts of a given version).

Or you could just run it in non-HA mode while the hosts are upgraded
serially; for XenServer / XCP, it normally wouldn''t take more than 20
minutes to do both, after which point you could turn on HA again.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Apr 2011 - LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)

LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)

Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)

Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)

Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)

Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)

Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)

Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)