Hi Saving a domain without enough available space with xl in 4.2-rc5 fails to cleanup completely. The domain informations remains in Xenstore, the domain itself seems to remain in a suspended state and there is not xl process for it anymore. Reason is the use of the MUST macro in save_domain. It exits the process prematurely instead of cleaning up the external visible changes. Also the MUST macro is broken. exit() only takes "values & 0xff", so this macro may exit the process with 0 if the error value is chosen unwisely. Bastian -- A princess should not be afraid -- not with a brave knight to protect her. -- McCoy, "Shore Leave", stardate 3025.3
On Sun, Sep 16, 2012 at 04:58:30PM +0200, Bastian Blank wrote:> Saving a domain without enough available space with xl in 4.2-rc5 fails > to cleanup completely. The domain informations remains in Xenstore, > the domain itself seems to remain in a suspended state and there is not > xl process for it anymore.This patch fixes this bug. It resumes the domain in case of errors in the save process. Bastian Signed-off-by: Bastian Blank <waldi@debian.org> diff -r 4027d31caeb0 tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c Thu Sep 13 12:21:09 2012 +0100 +++ b/tools/libxl/xl_cmdimpl.c Sun Sep 16 17:33:25 2012 +0200 @@ -2990,15 +2990,18 @@ save_domain_core_writeconfig(fd, filename, config_data, config_len); - MUST(libxl_domain_suspend(ctx, domid, fd, 0, NULL)); + int rc = libxl_domain_suspend(ctx, domid, fd, 0, NULL); close(fd); - if (checkpoint) + if (rc < 0) + fprintf(stderr, "Failed to save domain, resuming domain\n"); + + if (checkpoint || rc < 0) libxl_domain_resume(ctx, domid, 1, 0); else libxl_domain_destroy(ctx, domid, 0); - exit(0); + exit(rc < 0 ? 1 : 0); } static pid_t create_migration_child(const char *rune, int *send_fd, -- There are certain things men must do to remain men. -- Kirk, "The Ultimate Computer", stardate 4929.4
On Sun, 2012-09-16 at 16:36 +0100, Bastian Blank wrote:> On Sun, Sep 16, 2012 at 04:58:30PM +0200, Bastian Blank wrote: > > Saving a domain without enough available space with xl in 4.2-rc5 fails > > to cleanup completely. The domain informations remains in Xenstore, > > the domain itself seems to remain in a suspended state and there is not > > xl process for it anymore. > > This patch fixes this bug. It resumes the domain in case of errors in > the save process.Thanks, I wrote the following commit message for you and committed. Please supply one yourself in the future. xl: resume the domain on suspend failure The MUST macro calls exit(3) on failure but we need to cleanup and resume. Signed-off-by: Bastian Blank <waldi@debian.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>> > Bastian > > Signed-off-by: Bastian Blank <waldi@debian.org> > > diff -r 4027d31caeb0 tools/libxl/xl_cmdimpl.c > --- a/tools/libxl/xl_cmdimpl.c Thu Sep 13 12:21:09 2012 +0100 > +++ b/tools/libxl/xl_cmdimpl.c Sun Sep 16 17:33:25 2012 +0200 > @@ -2990,15 +2990,18 @@ > > save_domain_core_writeconfig(fd, filename, config_data, config_len); > > - MUST(libxl_domain_suspend(ctx, domid, fd, 0, NULL)); > + int rc = libxl_domain_suspend(ctx, domid, fd, 0, NULL); > close(fd); > > - if (checkpoint) > + if (rc < 0) > + fprintf(stderr, "Failed to save domain, resuming domain\n"); > + > + if (checkpoint || rc < 0) > libxl_domain_resume(ctx, domid, 1, 0); > else > libxl_domain_destroy(ctx, domid, 0); > > - exit(0); > + exit(rc < 0 ? 1 : 0); > } > > static pid_t create_migration_child(const char *rune, int *send_fd,