George Dunlap
2011-Apr-01 11:02 UTC
LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)
On Mon, Sep 6, 2010 at 11:03 AM, Ian Campbell <ian.campbell@citrix.com> wrote:> # HG changeset patch > # User Ian Campbell <ian.campbell@citrix.com> > # Date 1283766891 -3600 > # Node ID bdf8ce09160d715451e1204babe5f80886ea6183 > # Parent 5f96f36feebdb87eaadbbcab0399f32eda86f735 > libxc: provide notification of final checkpoint to restore end > > When the restore code sees this notification it will restore the > currently in-progress checkpoint when it completes. > > This allows the restore end to finish up without waiting for a > spurious timeout on the receive fd and thereby avoids unnecessary > error logging in the case of a successful migration or restore. > > In the normal migration or restore case the first checkpoint is always > the last. For a rolling checkpoint (such as Remus) the notification is > currently unused but could be used in the future for example to > provide a controlled failover for reasons other than errorUnfortunatley, this introduces a backwards-compatibility problem, since older versions of the tools never send LAST_CHECKPOINT. Would it make sense to invert the logic on this -- i.e., default to only one checkpoint, and send "MORE_CHECKPOINTS" if it expects this *not* to be the last (either by expecting the checkpoint() function to do it, or by sending it from xc_domain_save)? Now that 4.1 is out, we have to consider version compatibilities. But we should be able to make it so that there would only be if: 1) You''re running REMUS 2) One of your toolstacks is 4.1.0, and one is not. That seems like it shouldn''t be too bad. Thoughts? -George> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Acked-by: Brendan Cully <brendan@cs.ubc.ca> > > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_restore.c > --- a/tools/libxc/xc_domain_restore.c Mon Sep 06 10:54:51 2010 +0100 > +++ b/tools/libxc/xc_domain_restore.c Mon Sep 06 10:54:51 2010 +0100 > @@ -42,6 +42,7 @@ struct restore_ctx { > xen_pfn_t *p2m; /* A table mapping each PFN to its new MFN. */ > xen_pfn_t *p2m_batch; /* A table of P2M mappings in the current region. */ > int completed; /* Set when a consistent image is available */ > + int last_checkpoint; /* Set when we should commit to the current checkpoint when it completes. */ > struct domain_info_context dinfo; > }; > > @@ -765,6 +766,11 @@ static int pagebuf_get_one(xc_interface > // DPRINTF("console pfn location: %llx\n", buf->console_pfn); > return pagebuf_get_one(xch, ctx, buf, fd, dom); > > + case XC_SAVE_ID_LAST_CHECKPOINT: > + ctx->last_checkpoint = 1; > + // DPRINTF("last checkpoint indication received"); > + return pagebuf_get_one(xch, ctx, buf, fd, dom); > + > default: > if ( (count > MAX_BATCH_SIZE) || (count < 0) ) { > ERROR("Max batch size exceeded (%d). Giving up.", count); > @@ -1296,10 +1302,23 @@ int xc_domain_restore(xc_interface *xch, > goto out; > } > ctx->completed = 1; > - /* shift into nonblocking mode for the remainder */ > - if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 ) > - flags = 0; > - fcntl(io_fd, F_SETFL, flags | O_NONBLOCK); > + > + /* > + * If more checkpoints are expected then shift into > + * nonblocking mode for the remainder. > + */ > + if ( !ctx->last_checkpoint ) > + { > + if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 ) > + flags = 0; > + fcntl(io_fd, F_SETFL, flags | O_NONBLOCK); > + } > + } > + > + if ( ctx->last_checkpoint ) > + { > + // DPRINTF("Last checkpoint, finishing\n"); > + goto finish; > } > > // DPRINTF("Buffered checkpoint\n"); > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_save.c > --- a/tools/libxc/xc_domain_save.c Mon Sep 06 10:54:51 2010 +0100 > +++ b/tools/libxc/xc_domain_save.c Mon Sep 06 10:54:51 2010 +0100 > @@ -1616,6 +1616,20 @@ int xc_domain_save(xc_interface *xch, in > } > } > > + if ( !callbacks->checkpoint ) > + { > + /* > + * If this is not a checkpointed save then this must be the first and > + * last checkpoint. > + */ > + i = XC_SAVE_ID_LAST_CHECKPOINT; > + if ( wrexact(io_fd, &i, sizeof(int)) ) > + { > + PERROR("Error when writing last checkpoint chunk"); > + goto out; > + } > + } > + > /* Zero terminate */ > i = 0; > if ( wrexact(io_fd, &i, sizeof(int)) ) > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xg_save_restore.h > --- a/tools/libxc/xg_save_restore.h Mon Sep 06 10:54:51 2010 +0100 > +++ b/tools/libxc/xg_save_restore.h Mon Sep 06 10:54:51 2010 +0100 > @@ -131,6 +131,7 @@ > #define XC_SAVE_ID_TMEM_EXTRA -6 > #define XC_SAVE_ID_TSC_INFO -7 > #define XC_SAVE_ID_HVM_CONSOLE_PFN -8 /* (HVM-only) */ > +#define XC_SAVE_ID_LAST_CHECKPOINT -9 /* Commit to restoring after completion of current iteration. */ > > /* > ** We process save/restore/migrate in batches of pages; the below > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Apr-01 11:58 UTC
Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)
On Fri, 2011-04-01 at 12:02 +0100, George Dunlap wrote:> On Mon, Sep 6, 2010 at 11:03 AM, Ian Campbell <ian.campbell@citrix.com> wrote: > > # HG changeset patch > > # User Ian Campbell <ian.campbell@citrix.com> > > # Date 1283766891 -3600 > > # Node ID bdf8ce09160d715451e1204babe5f80886ea6183 > > # Parent 5f96f36feebdb87eaadbbcab0399f32eda86f735 > > libxc: provide notification of final checkpoint to restore end > > > > When the restore code sees this notification it will restore the > > currently in-progress checkpoint when it completes. > > > > This allows the restore end to finish up without waiting for a > > spurious timeout on the receive fd and thereby avoids unnecessary > > error logging in the case of a successful migration or restore. > > > > In the normal migration or restore case the first checkpoint is always > > the last. For a rolling checkpoint (such as Remus) the notification is > > currently unused but could be used in the future for example to > > provide a controlled failover for reasons other than error > > Unfortunatley, this introduces a backwards-compatibility problem, > since older versions of the tools never send LAST_CHECKPOINT.Out of interest what actually happens? (I think you mentioned something about failing when restoring from a file?) My intention at the time was that in the absence of a LAST_CHECKPOINT things would continue to behave as before e.g. you would see the spurious timeout on the receive fd but still restore correctly. Perhaps the associated changes to the O_NONBLOCK''ness of the fd at various stages is what broken this for the non-Remus migrations?> Would it make sense to invert the logic on this -- i.e., default to > only one checkpoint, and send "MORE_CHECKPOINTS" if it expects this > *not* to be the last (either by expecting the checkpoint() function to > do it, or by sending it from xc_domain_save)?I''d quite like to understand what I broke and investigate the trivialness (or otherwise) of a fix, but this change would probably independently make sense too.> Now that 4.1 is out, we have to consider version compatibilities. But > we should be able to make it so that there would only be if: > 1) You''re running REMUSIs it the case that Remus only cares about checkpointing between like versions of the toolstack?> 2) One of your toolstacks is 4.1.0, and one is not. > > That seems like it shouldn''t be too bad. > > Thoughts? > > -George > > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > Acked-by: Brendan Cully <brendan@cs.ubc.ca> > > > > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_restore.c > > --- a/tools/libxc/xc_domain_restore.c Mon Sep 06 10:54:51 2010 +0100 > > +++ b/tools/libxc/xc_domain_restore.c Mon Sep 06 10:54:51 2010 +0100 > > @@ -42,6 +42,7 @@ struct restore_ctx { > > xen_pfn_t *p2m; /* A table mapping each PFN to its new MFN. */ > > xen_pfn_t *p2m_batch; /* A table of P2M mappings in the current region. */ > > int completed; /* Set when a consistent image is available */ > > + int last_checkpoint; /* Set when we should commit to the current checkpoint when it completes. */ > > struct domain_info_context dinfo; > > }; > > > > @@ -765,6 +766,11 @@ static int pagebuf_get_one(xc_interface > > // DPRINTF("console pfn location: %llx\n", buf->console_pfn); > > return pagebuf_get_one(xch, ctx, buf, fd, dom); > > > > + case XC_SAVE_ID_LAST_CHECKPOINT: > > + ctx->last_checkpoint = 1; > > + // DPRINTF("last checkpoint indication received"); > > + return pagebuf_get_one(xch, ctx, buf, fd, dom); > > + > > default: > > if ( (count > MAX_BATCH_SIZE) || (count < 0) ) { > > ERROR("Max batch size exceeded (%d). Giving up.", count); > > @@ -1296,10 +1302,23 @@ int xc_domain_restore(xc_interface *xch, > > goto out; > > } > > ctx->completed = 1; > > - /* shift into nonblocking mode for the remainder */ > > - if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 ) > > - flags = 0; > > - fcntl(io_fd, F_SETFL, flags | O_NONBLOCK); > > + > > + /* > > + * If more checkpoints are expected then shift into > > + * nonblocking mode for the remainder. > > + */ > > + if ( !ctx->last_checkpoint ) > > + { > > + if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 ) > > + flags = 0; > > + fcntl(io_fd, F_SETFL, flags | O_NONBLOCK); > > + } > > + } > > + > > + if ( ctx->last_checkpoint ) > > + { > > + // DPRINTF("Last checkpoint, finishing\n"); > > + goto finish; > > } > > > > // DPRINTF("Buffered checkpoint\n"); > > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_save.c > > --- a/tools/libxc/xc_domain_save.c Mon Sep 06 10:54:51 2010 +0100 > > +++ b/tools/libxc/xc_domain_save.c Mon Sep 06 10:54:51 2010 +0100 > > @@ -1616,6 +1616,20 @@ int xc_domain_save(xc_interface *xch, in > > } > > } > > > > + if ( !callbacks->checkpoint ) > > + { > > + /* > > + * If this is not a checkpointed save then this must be the first and > > + * last checkpoint. > > + */ > > + i = XC_SAVE_ID_LAST_CHECKPOINT; > > + if ( wrexact(io_fd, &i, sizeof(int)) ) > > + { > > + PERROR("Error when writing last checkpoint chunk"); > > + goto out; > > + } > > + } > > + > > /* Zero terminate */ > > i = 0; > > if ( wrexact(io_fd, &i, sizeof(int)) ) > > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xg_save_restore.h > > --- a/tools/libxc/xg_save_restore.h Mon Sep 06 10:54:51 2010 +0100 > > +++ b/tools/libxc/xg_save_restore.h Mon Sep 06 10:54:51 2010 +0100 > > @@ -131,6 +131,7 @@ > > #define XC_SAVE_ID_TMEM_EXTRA -6 > > #define XC_SAVE_ID_TSC_INFO -7 > > #define XC_SAVE_ID_HVM_CONSOLE_PFN -8 /* (HVM-only) */ > > +#define XC_SAVE_ID_LAST_CHECKPOINT -9 /* Commit to restoring after completion of current iteration. */ > > > > /* > > ** We process save/restore/migrate in batches of pages; the below > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-Apr-01 13:49 UTC
Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)
On Fri, Apr 1, 2011 at 12:58 PM, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote:> On Fri, 2011-04-01 at 12:02 +0100, George Dunlap wrote: >> On Mon, Sep 6, 2010 at 11:03 AM, Ian Campbell <ian.campbell@citrix.com> wrote: >> > # HG changeset patch >> > # User Ian Campbell <ian.campbell@citrix.com> >> > # Date 1283766891 -3600 >> > # Node ID bdf8ce09160d715451e1204babe5f80886ea6183 >> > # Parent 5f96f36feebdb87eaadbbcab0399f32eda86f735 >> > libxc: provide notification of final checkpoint to restore end >> > >> > When the restore code sees this notification it will restore the >> > currently in-progress checkpoint when it completes. >> > >> > This allows the restore end to finish up without waiting for a >> > spurious timeout on the receive fd and thereby avoids unnecessary >> > error logging in the case of a successful migration or restore. >> > >> > In the normal migration or restore case the first checkpoint is always >> > the last. For a rolling checkpoint (such as Remus) the notification is >> > currently unused but could be used in the future for example to >> > provide a controlled failover for reasons other than error >> >> Unfortunatley, this introduces a backwards-compatibility problem, >> since older versions of the tools never send LAST_CHECKPOINT. > > Out of interest what actually happens? (I think you mentioned something > about failing when restoring from a file?)Steps to reproduce: * Install last release of XenServer (based on Xen 3.4) * Install a VM, suspend it. * Upgrade to development version of XenServer (based on 4.1) * Attempt to resume VM; resume fails. libxc output is below: Apr 1 14:27:29 localhost xenguest: xc: detail: xc_domain_restore start: p2m_size = fefff Apr 1 14:27:33 localhost xenguest: xc: error: Max batch size exceeded (1970103633). Giving up.: Internal error Apr 1 14:27:33 localhost xenguest: xc: error: error when buffering batch, finishing (90 = Message too long): Internal error Apr 1 14:27:33 localhost xenguest: xc: detail: Restore exit with rc=0 I had added debugging to print out the "batch" values read, and confirmed that this is *after* the page transfer and everything else was done, and after the "0" entry indicating the end of a checkpoint. So it''s reading through one full checkpoint, and then reading some garbage value and failing. Initializing "ctx->last_checkpoint" to 1 makes the problem go away. Ah -- hmm. You know what I bet -- xc_domain_save() doesn''t write the qemu checkpoint, but xc_domain_restore() reads it (again because of REMUS). Libxl handles this properly, but making xapi write in such a way that xc_domain_restore() could read it properly turned out to be non-trivial; making xc_domain_restore() not read the qemu file was the simplest solution. But of course, that means that there''s still data in the file after you''ve finished a "checkpoint", so the next read succeeds in reading in the qemu save file. Not surprising that it doesn''t look right. :-)> My intention at the time was that in the absence of a LAST_CHECKPOINT > things would continue to behave as before e.g. you would see the > spurious timeout on the receive fd but still restore correctly. > > Perhaps the associated changes to the O_NONBLOCK''ness of the fd at > various stages is what broken this for the non-Remus migrations? > >> Would it make sense to invert the logic on this -- i.e., default to >> only one checkpoint, and send "MORE_CHECKPOINTS" if it expects this >> *not* to be the last (either by expecting the checkpoint() function to >> do it, or by sending it from xc_domain_save)? > > I''d quite like to understand what I broke and investigate the > trivialness (or otherwise) of a fix, but this change would probably > independently make sense too. > >> Now that 4.1 is out, we have to consider version compatibilities. But >> we should be able to make it so that there would only be if: >> 1) You''re running REMUS > > Is it the case that Remus only cares about checkpointing between like > versions of the toolstack? > >> 2) One of your toolstacks is 4.1.0, and one is not. >> >> That seems like it shouldn''t be too bad. >> >> Thoughts? >> >> -George >> >> > >> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> >> > Acked-by: Brendan Cully <brendan@cs.ubc.ca> >> > >> > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_restore.c >> > --- a/tools/libxc/xc_domain_restore.c Mon Sep 06 10:54:51 2010 +0100 >> > +++ b/tools/libxc/xc_domain_restore.c Mon Sep 06 10:54:51 2010 +0100 >> > @@ -42,6 +42,7 @@ struct restore_ctx { >> > xen_pfn_t *p2m; /* A table mapping each PFN to its new MFN. */ >> > xen_pfn_t *p2m_batch; /* A table of P2M mappings in the current region. */ >> > int completed; /* Set when a consistent image is available */ >> > + int last_checkpoint; /* Set when we should commit to the current checkpoint when it completes. */ >> > struct domain_info_context dinfo; >> > }; >> > >> > @@ -765,6 +766,11 @@ static int pagebuf_get_one(xc_interface >> > // DPRINTF("console pfn location: %llx\n", buf->console_pfn); >> > return pagebuf_get_one(xch, ctx, buf, fd, dom); >> > >> > + case XC_SAVE_ID_LAST_CHECKPOINT: >> > + ctx->last_checkpoint = 1; >> > + // DPRINTF("last checkpoint indication received"); >> > + return pagebuf_get_one(xch, ctx, buf, fd, dom); >> > + >> > default: >> > if ( (count > MAX_BATCH_SIZE) || (count < 0) ) { >> > ERROR("Max batch size exceeded (%d). Giving up.", count); >> > @@ -1296,10 +1302,23 @@ int xc_domain_restore(xc_interface *xch, >> > goto out; >> > } >> > ctx->completed = 1; >> > - /* shift into nonblocking mode for the remainder */ >> > - if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 ) >> > - flags = 0; >> > - fcntl(io_fd, F_SETFL, flags | O_NONBLOCK); >> > + >> > + /* >> > + * If more checkpoints are expected then shift into >> > + * nonblocking mode for the remainder. >> > + */ >> > + if ( !ctx->last_checkpoint ) >> > + { >> > + if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 ) >> > + flags = 0; >> > + fcntl(io_fd, F_SETFL, flags | O_NONBLOCK); >> > + } >> > + } >> > + >> > + if ( ctx->last_checkpoint ) >> > + { >> > + // DPRINTF("Last checkpoint, finishing\n"); >> > + goto finish; >> > } >> > >> > // DPRINTF("Buffered checkpoint\n"); >> > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_save.c >> > --- a/tools/libxc/xc_domain_save.c Mon Sep 06 10:54:51 2010 +0100 >> > +++ b/tools/libxc/xc_domain_save.c Mon Sep 06 10:54:51 2010 +0100 >> > @@ -1616,6 +1616,20 @@ int xc_domain_save(xc_interface *xch, in >> > } >> > } >> > >> > + if ( !callbacks->checkpoint ) >> > + { >> > + /* >> > + * If this is not a checkpointed save then this must be the first and >> > + * last checkpoint. >> > + */ >> > + i = XC_SAVE_ID_LAST_CHECKPOINT; >> > + if ( wrexact(io_fd, &i, sizeof(int)) ) >> > + { >> > + PERROR("Error when writing last checkpoint chunk"); >> > + goto out; >> > + } >> > + } >> > + >> > /* Zero terminate */ >> > i = 0; >> > if ( wrexact(io_fd, &i, sizeof(int)) ) >> > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xg_save_restore.h >> > --- a/tools/libxc/xg_save_restore.h Mon Sep 06 10:54:51 2010 +0100 >> > +++ b/tools/libxc/xg_save_restore.h Mon Sep 06 10:54:51 2010 +0100 >> > @@ -131,6 +131,7 @@ >> > #define XC_SAVE_ID_TMEM_EXTRA -6 >> > #define XC_SAVE_ID_TSC_INFO -7 >> > #define XC_SAVE_ID_HVM_CONSOLE_PFN -8 /* (HVM-only) */ >> > +#define XC_SAVE_ID_LAST_CHECKPOINT -9 /* Commit to restoring after completion of current iteration. */ >> > >> > /* >> > ** We process save/restore/migrate in batches of pages; the below >> > >> > _______________________________________________ >> > Xen-devel mailing list >> > Xen-devel@lists.xensource.com >> > http://lists.xensource.com/xen-devel >> > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shriram Rajagopalan
2011-Apr-02 03:57 UTC
Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)
On 2011-04-01, at 6:58 AM, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote:> On Fri, 2011-04-01 at 12:02 +0100, George Dunlap wrote: >> On Mon, Sep 6, 2010 at 11:03 AM, Ian Campbell <ian.campbell@citrix.com> wrote: >>> # HG changeset patch >>> # User Ian Campbell <ian.campbell@citrix.com> >>> # Date 1283766891 -3600 >>> # Node ID bdf8ce09160d715451e1204babe5f80886ea6183 >>> # Parent 5f96f36feebdb87eaadbbcab0399f32eda86f735 >>> libxc: provide notification of final checkpoint to restore end >>> >>> When the restore code sees this notification it will restore the >>> currently in-progress checkpoint when it completes. >>> >>> This allows the restore end to finish up without waiting for a >>> spurious timeout on the receive fd and thereby avoids unnecessary >>> error logging in the case of a successful migration or restore. >>> >>> In the normal migration or restore case the first checkpoint is always >>> the last. For a rolling checkpoint (such as Remus) the notification is >>> currently unused but could be used in the future for example to >>> provide a controlled failover for reasons other than error >> >> Unfortunatley, this introduces a backwards-compatibility problem, >> since older versions of the tools never send LAST_CHECKPOINT. > > Out of interest what actually happens? (I think you mentioned something > about failing when restoring from a file?) > > My intention at the time was that in the absence of a LAST_CHECKPOINT > things would continue to behave as before e.g. you would see the > spurious timeout on the receive fd but still restore correctly. > > Perhaps the associated changes to the O_NONBLOCK''ness of the fd at > various stages is what broken this for the non-Remus migrations? > >> Would it make sense to invert the logic on this -- i.e., default to >> only one checkpoint, and send "MORE_CHECKPOINTS" if it expects this >> *not* to be the last (either by expecting the checkpoint() function to >> do it, or by sending it from xc_domain_save)? > > I''d quite like to understand what I broke and investigate the > trivialness (or otherwise) of a fix, but this change would probably > independently make sense too. > >> Now that 4.1 is out, we have to consider version compatibilities. But >> we should be able to make it so that there would only be if: >> 1) You''re running REMUS > > Is it the case that Remus only cares about checkpointing between like > versions of the toolstack? >Can you please elaborate this statement ? Shriram>> 2) One of your toolstacks is 4.1.0, and one is not. >> >> That seems like it shouldn''t be too bad. >> >> Thoughts? >> >> -George >> >>> >>> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> >>> Acked-by: Brendan Cully <brendan@cs.ubc.ca> >>> >>> diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_restore.c >>> --- a/tools/libxc/xc_domain_restore.c Mon Sep 06 10:54:51 2010 +0100 >>> +++ b/tools/libxc/xc_domain_restore.c Mon Sep 06 10:54:51 2010 +0100 >>> @@ -42,6 +42,7 @@ struct restore_ctx { >>> xen_pfn_t *p2m; /* A table mapping each PFN to its new MFN. */ >>> xen_pfn_t *p2m_batch; /* A table of P2M mappings in the current region. */ >>> int completed; /* Set when a consistent image is available */ >>> + int last_checkpoint; /* Set when we should commit to the current checkpoint when it completes. */ >>> struct domain_info_context dinfo; >>> }; >>> >>> @@ -765,6 +766,11 @@ static int pagebuf_get_one(xc_interface >>> // DPRINTF("console pfn location: %llx\n", buf->console_pfn); >>> return pagebuf_get_one(xch, ctx, buf, fd, dom); >>> >>> + case XC_SAVE_ID_LAST_CHECKPOINT: >>> + ctx->last_checkpoint = 1; >>> + // DPRINTF("last checkpoint indication received"); >>> + return pagebuf_get_one(xch, ctx, buf, fd, dom); >>> + >>> default: >>> if ( (count > MAX_BATCH_SIZE) || (count < 0) ) { >>> ERROR("Max batch size exceeded (%d). Giving up.", count); >>> @@ -1296,10 +1302,23 @@ int xc_domain_restore(xc_interface *xch, >>> goto out; >>> } >>> ctx->completed = 1; >>> - /* shift into nonblocking mode for the remainder */ >>> - if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 ) >>> - flags = 0; >>> - fcntl(io_fd, F_SETFL, flags | O_NONBLOCK); >>> + >>> + /* >>> + * If more checkpoints are expected then shift into >>> + * nonblocking mode for the remainder. >>> + */ >>> + if ( !ctx->last_checkpoint ) >>> + { >>> + if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 ) >>> + flags = 0; >>> + fcntl(io_fd, F_SETFL, flags | O_NONBLOCK); >>> + } >>> + } >>> + >>> + if ( ctx->last_checkpoint ) >>> + { >>> + // DPRINTF("Last checkpoint, finishing\n"); >>> + goto finish; >>> } >>> >>> // DPRINTF("Buffered checkpoint\n"); >>> diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_save.c >>> --- a/tools/libxc/xc_domain_save.c Mon Sep 06 10:54:51 2010 +0100 >>> +++ b/tools/libxc/xc_domain_save.c Mon Sep 06 10:54:51 2010 +0100 >>> @@ -1616,6 +1616,20 @@ int xc_domain_save(xc_interface *xch, in >>> } >>> } >>> >>> + if ( !callbacks->checkpoint ) >>> + { >>> + /* >>> + * If this is not a checkpointed save then this must be the first and >>> + * last checkpoint. >>> + */ >>> + i = XC_SAVE_ID_LAST_CHECKPOINT; >>> + if ( wrexact(io_fd, &i, sizeof(int)) ) >>> + { >>> + PERROR("Error when writing last checkpoint chunk"); >>> + goto out; >>> + } >>> + } >>> + >>> /* Zero terminate */ >>> i = 0; >>> if ( wrexact(io_fd, &i, sizeof(int)) ) >>> diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xg_save_restore.h >>> --- a/tools/libxc/xg_save_restore.h Mon Sep 06 10:54:51 2010 +0100 >>> +++ b/tools/libxc/xg_save_restore.h Mon Sep 06 10:54:51 2010 +0100 >>> @@ -131,6 +131,7 @@ >>> #define XC_SAVE_ID_TMEM_EXTRA -6 >>> #define XC_SAVE_ID_TSC_INFO -7 >>> #define XC_SAVE_ID_HVM_CONSOLE_PFN -8 /* (HVM-only) */ >>> +#define XC_SAVE_ID_LAST_CHECKPOINT -9 /* Commit to restoring after completion of current iteration. */ >>> >>> /* >>> ** We process save/restore/migrate in batches of pages; the below >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Apr-02 08:13 UTC
Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)
On Sat, 2011-04-02 at 04:57 +0100, Shriram Rajagopalan wrote:> On 2011-04-01, at 6:58 AM, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote:> > Is it the case that Remus only cares about checkpointing between like > > versions of the toolstack? > > > Can you please elaborate this statement ?For standard suspend/resume or migration we support migrating from version N to version N+1 (but not vice versa), to support upgrades. In the case of Remus though are we interested in supporting a rolling checkpoint from a version N system to a version N+1 fallback system? Or is Remus only interested in supporting checkpoints between systems running the same version of Xen? Bear in mind that if you did support N->N+1 checkpoints you wouldn''t be able to migrate the VM back after a failover... FWIW I think George got to the bottom of the specific issue he was seeing and that the LAST_CHECKPOINT thing was a red-herring, although flipping the protocol to use a MORE_CHECKPOINTS schema perhaps makes sense anyway. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shriram Rajagopalan
2011-Apr-02 14:03 UTC
Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)
On Sat, Apr 2, 2011 at 1:13 AM, Ian Campbell <Ian.Campbell@eu.citrix.com>wrote:> On Sat, 2011-04-02 at 04:57 +0100, Shriram Rajagopalan wrote: > > On 2011-04-01, at 6:58 AM, Ian Campbell <Ian.Campbell@eu.citrix.com> > wrote: > > > > Is it the case that Remus only cares about checkpointing between like > > > versions of the toolstack? > > > > > Can you please elaborate this statement ? > > For standard suspend/resume or migration we support migrating from > version N to version N+1 (but not vice versa), to support upgrades. > > In the case of Remus though are we interested in supporting a rolling > checkpoint from a version N system to a version N+1 fallback system? Or > is Remus only interested in supporting checkpoints between systems > running the same version of Xen? > > N->N+1 should work, for Remus even now, without the LAST_CHECKPOINT patch.But as you said, failback from N+1 -> N wont work. I agree with George''s suggestion on MORE_CHECKPOINTS, for backwards compatibility wrt live migration. But for HA, it doest make much sense if a user is able to do HA only one way and cannot failback. This is not a upgrade scenario. So, that would require some exception to be thrown when a version incompatibility is detected. IMO, its better to let the user handle this limitation than letting him/her do the N->N+1 HA and then finding out that they cannot failback. shriram> Bear in mind that if you did support N->N+1 checkpoints you wouldn''t be > able to migrate the VM back after a failover... > > FWIW I think George got to the bottom of the specific issue he was > seeing and that the LAST_CHECKPOINT thing was a red-herring, although > flipping the protocol to use a MORE_CHECKPOINTS schema perhaps makes > sense anyway. > > Ian. > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-Apr-04 09:42 UTC
Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)
On Sat, Apr 2, 2011 at 3:03 PM, Shriram Rajagopalan <rshriram@cs.ubc.ca> wrote:> I agree with George''s suggestion on MORE_CHECKPOINTS, for backwards > compatibility > wrt live migration.It turns out that there actually isn''t (I don''t believe) a compatibility issue with live migration. the issue I was seeing was due to a modification I''d made to libxc to get the XenServer / XCP toolstack (xapi) working with Xen 4.1. As long as you''re using only open-source tools, you should still be able to migrate 4.0->4.1 -- there will just be a short delay while the toolstack waits for more checkpoints.> But for HA, it doest make much sense if a user is able > to do HA only > one way and cannot failback. This is not a upgrade scenario. So, that would > require some > exception to be thrown when a version incompatibility is detected. IMO, its > better to > let the user handle this limitation than letting him/her do the N->N+1 HA > and then finding out > that they cannot failback.Well, you can''t actually do HA while one of the two hosts is being rebooted anyway. :-) So I think with backwards compatibility, you could do it with 3 hosts: run on A with B as a fallback while upgrading C, then run on A with C as a fallback while upgrading B, then migrate to C (or just pull the plug on A and let it fall back normally) and run on C with B as a fallback while upgrading A. Without backwards compatibility, I think you could do it with 4 (so you always have 2 hosts of a given version). Or you could just run it in non-HA mode while the hosts are upgraded serially; for XenServer / XCP, it normally wouldn''t take more than 20 minutes to do both, after which point you could turn on HA again. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel