Andrew Cooper
2013-Jul-10 19:19 UTC
[PATCH] tools/migrate: Fix regression when migrating from older version of Xen
Commit 00a4b65f8534c9e6521eab2e6ce796ae36037774 Sep 7 2010 "libxc: provide notification of final checkpoint to restore end" broke migration from any version of Xen using tools from prior to that commit Older tools have no idea about an XC_SAVE_ID_LAST_CHECKPOINT, causing newer tools xc_domain_restore() to start reading the qemu save record, as ctx->last_checkpoint is 0. The failure looks like: xc: error: Max batch size exceeded (1970103633). Giving up. where 1970103633 = 0x756d6551 = *(uint32_t*)"Qemu" Sadly, the simple fix of just setting ctx->last_checkpoint = 1 will cause an opposite function regresson for anyone using the current behaviour of save_callbacks->checkpoint(). The only safe fix is to rely on the toolstack to provide this information. Passing 0 results in unchanged behaviour, while passing nonzero means "the other end of the migration stream does not know about XC_SAVE_ID_LAST_CHECKPOINT but is performing a normal migrate" Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- tools/libxc/xc_domain_restore.c | 3 ++- tools/libxc/xc_nomigrate.c | 2 +- tools/libxc/xenguest.h | 4 +++- tools/libxl/libxl_save_helper.c | 2 +- tools/xcutils/xc_restore.c | 2 +- 5 files changed, 8 insertions(+), 5 deletions(-) diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c index 63d36cd..40c9282 100644 --- a/tools/libxc/xc_domain_restore.c +++ b/tools/libxc/xc_domain_restore.c @@ -1401,7 +1401,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, domid_t store_domid, unsigned int console_evtchn, unsigned long *console_mfn, domid_t console_domid, unsigned int hvm, unsigned int pae, int superpages, - int no_incr_generationid, + int no_incr_generationid, int last_checkpoint_unaware, unsigned long *vm_generationid_addr, struct restore_callbacks *callbacks) { @@ -1472,6 +1472,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, ctx->superpages = superpages; ctx->hvm = hvm; + ctx->last_checkpoint = !!last_checkpoint_unaware; ctxt = xc_hypercall_buffer_alloc(xch, ctxt, sizeof(*ctxt)); diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c index 73e7566..1d3aec5 100644 --- a/tools/libxc/xc_nomigrate.c +++ b/tools/libxc/xc_nomigrate.c @@ -35,7 +35,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, domid_t store_domid, unsigned int console_evtchn, unsigned long *console_mfn, domid_t console_domid, unsigned int hvm, unsigned int pae, int superpages, - int no_incr_generationid, + int no_incr_generationid, int last_checkpoint_unaware, unsigned long *vm_generationid_addr, struct restore_callbacks *callbacks) { diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h index 4714bd2..977dde3 100644 --- a/tools/libxc/xenguest.h +++ b/tools/libxc/xenguest.h @@ -114,6 +114,8 @@ struct restore_callbacks { * @parm pae non-zero if this HVM domain has PAE support enabled * @parm superpages non-zero to allocate guest memory with superpages * @parm no_incr_generationid non-zero if generation id is NOT to be incremented + * @parm last_checkpoint_unaware non-zero if far end of the migration is unaware + * of the XC_SAVE_ID_LAST_CHECKPOINT hunk in the migration stream. * @parm vm_generationid_addr returned with the address of the generation id buffer * @parm callbacks non-NULL to receive a callback to restore toolstack * specific data @@ -124,7 +126,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, domid_t store_domid, unsigned int console_evtchn, unsigned long *console_mfn, domid_t console_domid, unsigned int hvm, unsigned int pae, int superpages, - int no_incr_generationid, + int no_incr_generationid, int last_checkpoint_unaware, unsigned long *vm_generationid_addr, struct restore_callbacks *callbacks); /** diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c index 772251a..8f14b8b 100644 --- a/tools/libxl/libxl_save_helper.c +++ b/tools/libxl/libxl_save_helper.c @@ -264,7 +264,7 @@ int main(int argc, char **argv) r = xc_domain_restore(xch, io_fd, dom, store_evtchn, &store_mfn, store_domid, console_evtchn, &console_mfn, console_domid, hvm, pae, superpages, - no_incr_genidad, &genidad, + no_incr_genidad, 0, &genidad, &helper_restore_callbacks); helper_stub_restore_results(store_mfn,console_mfn,genidad,0); complete(r); diff --git a/tools/xcutils/xc_restore.c b/tools/xcutils/xc_restore.c index 35d725c..e657c7f 100644 --- a/tools/xcutils/xc_restore.c +++ b/tools/xcutils/xc_restore.c @@ -52,7 +52,7 @@ main(int argc, char **argv) ret = xc_domain_restore(xch, io_fd, domid, store_evtchn, &store_mfn, 0, console_evtchn, &console_mfn, 0, hvm, pae, superpages, - 0, NULL, NULL); + 0, 0, NULL, NULL); if ( ret == 0 ) { -- 1.7.10.4
George Dunlap
2013-Jul-11 09:46 UTC
Re: [PATCH] tools/migrate: Fix regression when migrating from older version of Xen
On Wed, Jul 10, 2013 at 8:19 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:> Commit 00a4b65f8534c9e6521eab2e6ce796ae36037774 Sep 7 2010 > "libxc: provide notification of final checkpoint to restore end" > broke migration from any version of Xen using tools from prior to that commit > > Older tools have no idea about an XC_SAVE_ID_LAST_CHECKPOINT, causing newer > tools xc_domain_restore() to start reading the qemu save record, as > ctx->last_checkpoint is 0. > > The failure looks like: > xc: error: Max batch size exceeded (1970103633). Giving up. > where 1970103633 = 0x756d6551 = *(uint32_t*)"Qemu" > > Sadly, the simple fix of just setting ctx->last_checkpoint = 1 will cause an > opposite function regresson for anyone using the current behaviour of > save_callbacks->checkpoint(). > > The only safe fix is to rely on the toolstack to provide this information. > > Passing 0 results in unchanged behaviour, while passing nonzero means "the > other end of the migration stream does not know about > XC_SAVE_ID_LAST_CHECKPOINT but is performing a normal migrate" > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>Shouldn''t there also be a way to actually use the flag? -George
Ian Campbell
2013-Jul-11 09:55 UTC
Re: [PATCH] tools/migrate: Fix regression when migrating from older version of Xen
On Wed, 2013-07-10 at 20:19 +0100, Andrew Cooper wrote:> Commit 00a4b65f8534c9e6521eab2e6ce796ae36037774 Sep 7 2010 > "libxc: provide notification of final checkpoint to restore end" > broke migration from any version of Xen using tools from prior to that commit > > Older tools have no idea about an XC_SAVE_ID_LAST_CHECKPOINT, causing newer > tools xc_domain_restore() to start reading the qemu save record, as > ctx->last_checkpoint is 0.Can the receive not distinguish the case where it is receiving a checkpoint stream from a regular one? Either via some property of the stream or the parameters with which it was called? ctx->last_checkpoint should (be made to) only matter if you are doing check pointing and I think we can mandate that checking pointing is only supported between like versions of Xen. TBH it''s a bit odd to send the LAST_CHECKPOINT in anon-checkpoint stream anyway, but what''s done is done. Ian.> > The failure looks like: > xc: error: Max batch size exceeded (1970103633). Giving up. > where 1970103633 = 0x756d6551 = *(uint32_t*)"Qemu" > > Sadly, the simple fix of just setting ctx->last_checkpoint = 1 will cause an > opposite function regresson for anyone using the current behaviour of > save_callbacks->checkpoint(). > > The only safe fix is to rely on the toolstack to provide this information. > > Passing 0 results in unchanged behaviour, while passing nonzero means "the > other end of the migration stream does not know about > XC_SAVE_ID_LAST_CHECKPOINT but is performing a normal migrate" > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > --- > tools/libxc/xc_domain_restore.c | 3 ++- > tools/libxc/xc_nomigrate.c | 2 +- > tools/libxc/xenguest.h | 4 +++- > tools/libxl/libxl_save_helper.c | 2 +- > tools/xcutils/xc_restore.c | 2 +- > 5 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c > index 63d36cd..40c9282 100644 > --- a/tools/libxc/xc_domain_restore.c > +++ b/tools/libxc/xc_domain_restore.c > @@ -1401,7 +1401,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, > domid_t store_domid, unsigned int console_evtchn, > unsigned long *console_mfn, domid_t console_domid, > unsigned int hvm, unsigned int pae, int superpages, > - int no_incr_generationid, > + int no_incr_generationid, int last_checkpoint_unaware, > unsigned long *vm_generationid_addr, > struct restore_callbacks *callbacks) > { > @@ -1472,6 +1472,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, > > ctx->superpages = superpages; > ctx->hvm = hvm; > + ctx->last_checkpoint = !!last_checkpoint_unaware; > > ctxt = xc_hypercall_buffer_alloc(xch, ctxt, sizeof(*ctxt)); > > diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c > index 73e7566..1d3aec5 100644 > --- a/tools/libxc/xc_nomigrate.c > +++ b/tools/libxc/xc_nomigrate.c > @@ -35,7 +35,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, > domid_t store_domid, unsigned int console_evtchn, > unsigned long *console_mfn, domid_t console_domid, > unsigned int hvm, unsigned int pae, int superpages, > - int no_incr_generationid, > + int no_incr_generationid, int last_checkpoint_unaware, > unsigned long *vm_generationid_addr, > struct restore_callbacks *callbacks) > { > diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h > index 4714bd2..977dde3 100644 > --- a/tools/libxc/xenguest.h > +++ b/tools/libxc/xenguest.h > @@ -114,6 +114,8 @@ struct restore_callbacks { > * @parm pae non-zero if this HVM domain has PAE support enabled > * @parm superpages non-zero to allocate guest memory with superpages > * @parm no_incr_generationid non-zero if generation id is NOT to be incremented > + * @parm last_checkpoint_unaware non-zero if far end of the migration is unaware > + * of the XC_SAVE_ID_LAST_CHECKPOINT hunk in the migration stream. > * @parm vm_generationid_addr returned with the address of the generation id buffer > * @parm callbacks non-NULL to receive a callback to restore toolstack > * specific data > @@ -124,7 +126,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, > domid_t store_domid, unsigned int console_evtchn, > unsigned long *console_mfn, domid_t console_domid, > unsigned int hvm, unsigned int pae, int superpages, > - int no_incr_generationid, > + int no_incr_generationid, int last_checkpoint_unaware, > unsigned long *vm_generationid_addr, > struct restore_callbacks *callbacks); > /** > diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c > index 772251a..8f14b8b 100644 > --- a/tools/libxl/libxl_save_helper.c > +++ b/tools/libxl/libxl_save_helper.c > @@ -264,7 +264,7 @@ int main(int argc, char **argv) > r = xc_domain_restore(xch, io_fd, dom, store_evtchn, &store_mfn, > store_domid, console_evtchn, &console_mfn, > console_domid, hvm, pae, superpages, > - no_incr_genidad, &genidad, > + no_incr_genidad, 0, &genidad, > &helper_restore_callbacks); > helper_stub_restore_results(store_mfn,console_mfn,genidad,0); > complete(r); > diff --git a/tools/xcutils/xc_restore.c b/tools/xcutils/xc_restore.c > index 35d725c..e657c7f 100644 > --- a/tools/xcutils/xc_restore.c > +++ b/tools/xcutils/xc_restore.c > @@ -52,7 +52,7 @@ main(int argc, char **argv) > > ret = xc_domain_restore(xch, io_fd, domid, store_evtchn, &store_mfn, 0, > console_evtchn, &console_mfn, 0, hvm, pae, superpages, > - 0, NULL, NULL); > + 0, 0, NULL, NULL); > > if ( ret == 0 ) > {
Andrew Cooper
2013-Jul-11 11:09 UTC
Re: [PATCH] tools/migrate: Fix regression when migrating from older version of Xen
On 11/07/13 10:46, George Dunlap wrote:> On Wed, Jul 10, 2013 at 8:19 PM, Andrew Cooper > <andrew.cooper3@citrix.com> wrote: >> Commit 00a4b65f8534c9e6521eab2e6ce796ae36037774 Sep 7 2010 >> "libxc: provide notification of final checkpoint to restore end" >> broke migration from any version of Xen using tools from prior to that commit >> >> Older tools have no idea about an XC_SAVE_ID_LAST_CHECKPOINT, causing newer >> tools xc_domain_restore() to start reading the qemu save record, as >> ctx->last_checkpoint is 0. >> >> The failure looks like: >> xc: error: Max batch size exceeded (1970103633). Giving up. >> where 1970103633 = 0x756d6551 = *(uint32_t*)"Qemu" >> >> Sadly, the simple fix of just setting ctx->last_checkpoint = 1 will cause an >> opposite function regresson for anyone using the current behaviour of >> save_callbacks->checkpoint(). >> >> The only safe fix is to rely on the toolstack to provide this information. >> >> Passing 0 results in unchanged behaviour, while passing nonzero means "the >> other end of the migration stream does not know about >> XC_SAVE_ID_LAST_CHECKPOINT but is performing a normal migrate" >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > Shouldn''t there also be a way to actually use the flag? > > -GeorgeHow do you mean? Plumb it up through to xl? xl (in any non-experemental form) post-dates this regression. xend predates it, but I can''t find any way of working out whether the far end predates the regression. ~Andrew
Andrew Cooper
2013-Jul-11 11:19 UTC
Re: [PATCH] tools/migrate: Fix regression when migrating from older version of Xen
On 11/07/13 10:55, Ian Campbell wrote:> On Wed, 2013-07-10 at 20:19 +0100, Andrew Cooper wrote: >> Commit 00a4b65f8534c9e6521eab2e6ce796ae36037774 Sep 7 2010 >> "libxc: provide notification of final checkpoint to restore end" >> broke migration from any version of Xen using tools from prior to that commit >> >> Older tools have no idea about an XC_SAVE_ID_LAST_CHECKPOINT, causing newer >> tools xc_domain_restore() to start reading the qemu save record, as >> ctx->last_checkpoint is 0. > Can the receive not distinguish the case where it is receiving a > checkpoint stream from a regular one? Either via some property of the > stream or the parameters with which it was called? > > ctx->last_checkpoint should (be made to) only matter if you are doing > check pointing and I think we can mandate that checking pointing is only > supported between like versions of Xen. TBH it''s a bit odd to send the > LAST_CHECKPOINT in anon-checkpoint stream anyway, but what''s done is > done. > > Ian.Sending LAST_CHECKPOINT is the only thing which has allowed normal migration to work since this regression. I cant find any way of distinguishing a normal vs checkpointed migrate from the stream alone. The save_callback->checkpoint function pointer may or may not be filled in by a toolstack, but is completely out-of-band as far as the migration stream is concerned. ~Andrew>> The failure looks like: >> xc: error: Max batch size exceeded (1970103633). Giving up. >> where 1970103633 = 0x756d6551 = *(uint32_t*)"Qemu" >> >> Sadly, the simple fix of just setting ctx->last_checkpoint = 1 will cause an >> opposite function regresson for anyone using the current behaviour of >> save_callbacks->checkpoint(). >> >> The only safe fix is to rely on the toolstack to provide this information. >> >> Passing 0 results in unchanged behaviour, while passing nonzero means "the >> other end of the migration stream does not know about >> XC_SAVE_ID_LAST_CHECKPOINT but is performing a normal migrate" >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> --- >> tools/libxc/xc_domain_restore.c | 3 ++- >> tools/libxc/xc_nomigrate.c | 2 +- >> tools/libxc/xenguest.h | 4 +++- >> tools/libxl/libxl_save_helper.c | 2 +- >> tools/xcutils/xc_restore.c | 2 +- >> 5 files changed, 8 insertions(+), 5 deletions(-) >> >> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c >> index 63d36cd..40c9282 100644 >> --- a/tools/libxc/xc_domain_restore.c >> +++ b/tools/libxc/xc_domain_restore.c >> @@ -1401,7 +1401,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, >> domid_t store_domid, unsigned int console_evtchn, >> unsigned long *console_mfn, domid_t console_domid, >> unsigned int hvm, unsigned int pae, int superpages, >> - int no_incr_generationid, >> + int no_incr_generationid, int last_checkpoint_unaware, >> unsigned long *vm_generationid_addr, >> struct restore_callbacks *callbacks) >> { >> @@ -1472,6 +1472,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, >> >> ctx->superpages = superpages; >> ctx->hvm = hvm; >> + ctx->last_checkpoint = !!last_checkpoint_unaware; >> >> ctxt = xc_hypercall_buffer_alloc(xch, ctxt, sizeof(*ctxt)); >> >> diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c >> index 73e7566..1d3aec5 100644 >> --- a/tools/libxc/xc_nomigrate.c >> +++ b/tools/libxc/xc_nomigrate.c >> @@ -35,7 +35,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, >> domid_t store_domid, unsigned int console_evtchn, >> unsigned long *console_mfn, domid_t console_domid, >> unsigned int hvm, unsigned int pae, int superpages, >> - int no_incr_generationid, >> + int no_incr_generationid, int last_checkpoint_unaware, >> unsigned long *vm_generationid_addr, >> struct restore_callbacks *callbacks) >> { >> diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h >> index 4714bd2..977dde3 100644 >> --- a/tools/libxc/xenguest.h >> +++ b/tools/libxc/xenguest.h >> @@ -114,6 +114,8 @@ struct restore_callbacks { >> * @parm pae non-zero if this HVM domain has PAE support enabled >> * @parm superpages non-zero to allocate guest memory with superpages >> * @parm no_incr_generationid non-zero if generation id is NOT to be incremented >> + * @parm last_checkpoint_unaware non-zero if far end of the migration is unaware >> + * of the XC_SAVE_ID_LAST_CHECKPOINT hunk in the migration stream. >> * @parm vm_generationid_addr returned with the address of the generation id buffer >> * @parm callbacks non-NULL to receive a callback to restore toolstack >> * specific data >> @@ -124,7 +126,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, >> domid_t store_domid, unsigned int console_evtchn, >> unsigned long *console_mfn, domid_t console_domid, >> unsigned int hvm, unsigned int pae, int superpages, >> - int no_incr_generationid, >> + int no_incr_generationid, int last_checkpoint_unaware, >> unsigned long *vm_generationid_addr, >> struct restore_callbacks *callbacks); >> /** >> diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c >> index 772251a..8f14b8b 100644 >> --- a/tools/libxl/libxl_save_helper.c >> +++ b/tools/libxl/libxl_save_helper.c >> @@ -264,7 +264,7 @@ int main(int argc, char **argv) >> r = xc_domain_restore(xch, io_fd, dom, store_evtchn, &store_mfn, >> store_domid, console_evtchn, &console_mfn, >> console_domid, hvm, pae, superpages, >> - no_incr_genidad, &genidad, >> + no_incr_genidad, 0, &genidad, >> &helper_restore_callbacks); >> helper_stub_restore_results(store_mfn,console_mfn,genidad,0); >> complete(r); >> diff --git a/tools/xcutils/xc_restore.c b/tools/xcutils/xc_restore.c >> index 35d725c..e657c7f 100644 >> --- a/tools/xcutils/xc_restore.c >> +++ b/tools/xcutils/xc_restore.c >> @@ -52,7 +52,7 @@ main(int argc, char **argv) >> >> ret = xc_domain_restore(xch, io_fd, domid, store_evtchn, &store_mfn, 0, >> console_evtchn, &console_mfn, 0, hvm, pae, superpages, >> - 0, NULL, NULL); >> + 0, 0, NULL, NULL); >> >> if ( ret == 0 ) >> { >
Ian Campbell
2013-Jul-11 11:43 UTC
Re: [PATCH] tools/migrate: Fix regression when migrating from older version of Xen
On Thu, 2013-07-11 at 12:19 +0100, Andrew Cooper wrote:> On 11/07/13 10:55, Ian Campbell wrote: > > On Wed, 2013-07-10 at 20:19 +0100, Andrew Cooper wrote: > >> Commit 00a4b65f8534c9e6521eab2e6ce796ae36037774 Sep 7 2010 > >> "libxc: provide notification of final checkpoint to restore end" > >> broke migration from any version of Xen using tools from prior to that commit > >> > >> Older tools have no idea about an XC_SAVE_ID_LAST_CHECKPOINT, causing newer > >> tools xc_domain_restore() to start reading the qemu save record, as > >> ctx->last_checkpoint is 0. > > Can the receive not distinguish the case where it is receiving a > > checkpoint stream from a regular one? Either via some property of the > > stream or the parameters with which it was called? > > > > ctx->last_checkpoint should (be made to) only matter if you are doing > > check pointing and I think we can mandate that checking pointing is only > > supported between like versions of Xen. TBH it''s a bit odd to send the > > LAST_CHECKPOINT in anon-checkpoint stream anyway, but what''s done is > > done. > > > > Ian. > > Sending LAST_CHECKPOINT is the only thing which has allowed normal > migration to work since this regression. > > I cant find any way of distinguishing a normal vs checkpointed migrate > from the stream alone. The save_callback->checkpoint function pointer > may or may not be filled in by a toolstack, but is completely > out-of-band as far as the migration stream is concerned.I suppose your new last_generation-aware flag is similar to the "is a checkpoint restore" lag which I was thinking of, but it somewhat easier for the toolstack to know that it is receiving a checkpoint vs. normal migration compared with magically knowing the version on the other end of the connection. Ian.
Andrew Cooper
2013-Jul-15 14:24 UTC
Re: [PATCH] tools/migrate: Fix regression when migrating from older version of Xen
On 11/07/13 12:43, Ian Campbell wrote:> On Thu, 2013-07-11 at 12:19 +0100, Andrew Cooper wrote: >> On 11/07/13 10:55, Ian Campbell wrote: >>> On Wed, 2013-07-10 at 20:19 +0100, Andrew Cooper wrote: >>>> Commit 00a4b65f8534c9e6521eab2e6ce796ae36037774 Sep 7 2010 >>>> "libxc: provide notification of final checkpoint to restore end" >>>> broke migration from any version of Xen using tools from prior to that commit >>>> >>>> Older tools have no idea about an XC_SAVE_ID_LAST_CHECKPOINT, causing newer >>>> tools xc_domain_restore() to start reading the qemu save record, as >>>> ctx->last_checkpoint is 0. >>> Can the receive not distinguish the case where it is receiving a >>> checkpoint stream from a regular one? Either via some property of the >>> stream or the parameters with which it was called? >>> >>> ctx->last_checkpoint should (be made to) only matter if you are doing >>> check pointing and I think we can mandate that checking pointing is only >>> supported between like versions of Xen. TBH it''s a bit odd to send the >>> LAST_CHECKPOINT in anon-checkpoint stream anyway, but what''s done is >>> done. >>> >>> Ian. >> Sending LAST_CHECKPOINT is the only thing which has allowed normal >> migration to work since this regression. >> >> I cant find any way of distinguishing a normal vs checkpointed migrate >> from the stream alone. The save_callback->checkpoint function pointer >> may or may not be filled in by a toolstack, but is completely >> out-of-band as far as the migration stream is concerned. > I suppose your new last_generation-aware flag is similar to the "is a > checkpoint restore" lag which I was thinking of, but it somewhat easier > for the toolstack to know that it is receiving a checkpoint vs. normal > migration compared with magically knowing the version on the other end > of the connection. > > Ian. >So is this an indication of this patch being a good fix for the problem? (I guess this is a pseudo-ping given no specific objection) ~Andrew
Ian Campbell
2013-Jul-16 09:32 UTC
Re: [PATCH] tools/migrate: Fix regression when migrating from older version of Xen
On Mon, 2013-07-15 at 15:24 +0100, Andrew Cooper wrote:> On 11/07/13 12:43, Ian Campbell wrote: > > On Thu, 2013-07-11 at 12:19 +0100, Andrew Cooper wrote: > >> On 11/07/13 10:55, Ian Campbell wrote: > >>> On Wed, 2013-07-10 at 20:19 +0100, Andrew Cooper wrote: > >>>> Commit 00a4b65f8534c9e6521eab2e6ce796ae36037774 Sep 7 2010 > >>>> "libxc: provide notification of final checkpoint to restore end" > >>>> broke migration from any version of Xen using tools from prior to that commit > >>>> > >>>> Older tools have no idea about an XC_SAVE_ID_LAST_CHECKPOINT, causing newer > >>>> tools xc_domain_restore() to start reading the qemu save record, as > >>>> ctx->last_checkpoint is 0. > >>> Can the receive not distinguish the case where it is receiving a > >>> checkpoint stream from a regular one? Either via some property of the > >>> stream or the parameters with which it was called? > >>> > >>> ctx->last_checkpoint should (be made to) only matter if you are doing > >>> check pointing and I think we can mandate that checking pointing is only > >>> supported between like versions of Xen. TBH it''s a bit odd to send the > >>> LAST_CHECKPOINT in anon-checkpoint stream anyway, but what''s done is > >>> done. > >>> > >>> Ian. > >> Sending LAST_CHECKPOINT is the only thing which has allowed normal > >> migration to work since this regression. > >> > >> I cant find any way of distinguishing a normal vs checkpointed migrate > >> from the stream alone. The save_callback->checkpoint function pointer > >> may or may not be filled in by a toolstack, but is completely > >> out-of-band as far as the migration stream is concerned. > > I suppose your new last_generation-aware flag is similar to the "is a > > checkpoint restore" lag which I was thinking of, but it somewhat easier > > for the toolstack to know that it is receiving a checkpoint vs. normal > > migration compared with magically knowing the version on the other end > > of the connection. > > > > Ian. > > > > So is this an indication of this patch being a good fix for the problem? > (I guess this is a pseudo-ping given no specific objection)If it implements the semantics I describe then all you need to do is change the name. The reason I prefer a "incoming checkpointed stream" flag over a "understands last checkpoint" is that the former is something the toolstack knows about independently of the versions of Xen on either end of the socket. Ian.
Andrew Cooper
2013-Jul-16 09:32 UTC
Re: [PATCH] tools/migrate: Fix regression when migrating from older version of Xen
On 16/07/13 10:32, Ian Campbell wrote:> On Mon, 2013-07-15 at 15:24 +0100, Andrew Cooper wrote: >> On 11/07/13 12:43, Ian Campbell wrote: >>> On Thu, 2013-07-11 at 12:19 +0100, Andrew Cooper wrote: >>>> On 11/07/13 10:55, Ian Campbell wrote: >>>>> On Wed, 2013-07-10 at 20:19 +0100, Andrew Cooper wrote: >>>>>> Commit 00a4b65f8534c9e6521eab2e6ce796ae36037774 Sep 7 2010 >>>>>> "libxc: provide notification of final checkpoint to restore end" >>>>>> broke migration from any version of Xen using tools from prior to that commit >>>>>> >>>>>> Older tools have no idea about an XC_SAVE_ID_LAST_CHECKPOINT, causing newer >>>>>> tools xc_domain_restore() to start reading the qemu save record, as >>>>>> ctx->last_checkpoint is 0. >>>>> Can the receive not distinguish the case where it is receiving a >>>>> checkpoint stream from a regular one? Either via some property of the >>>>> stream or the parameters with which it was called? >>>>> >>>>> ctx->last_checkpoint should (be made to) only matter if you are doing >>>>> check pointing and I think we can mandate that checking pointing is only >>>>> supported between like versions of Xen. TBH it''s a bit odd to send the >>>>> LAST_CHECKPOINT in anon-checkpoint stream anyway, but what''s done is >>>>> done. >>>>> >>>>> Ian. >>>> Sending LAST_CHECKPOINT is the only thing which has allowed normal >>>> migration to work since this regression. >>>> >>>> I cant find any way of distinguishing a normal vs checkpointed migrate >>>> from the stream alone. The save_callback->checkpoint function pointer >>>> may or may not be filled in by a toolstack, but is completely >>>> out-of-band as far as the migration stream is concerned. >>> I suppose your new last_generation-aware flag is similar to the "is a >>> checkpoint restore" lag which I was thinking of, but it somewhat easier >>> for the toolstack to know that it is receiving a checkpoint vs. normal >>> migration compared with magically knowing the version on the other end >>> of the connection. >>> >>> Ian. >>> >> So is this an indication of this patch being a good fix for the problem? >> (I guess this is a pseudo-ping given no specific objection) > If it implements the semantics I describe then all you need to do is > change the name. > > The reason I prefer a "incoming checkpointed stream" flag over a > "understands last checkpoint" is that the former is something the > toolstack knows about independently of the versions of Xen on either end > of the socket. > > Ian. > >Ok - I will respin to that effect. ~Andrew