All, I have long desired an ability to execute a domain save and leave the domain in a paused state. This is so that I can initiate an LVM snapshot to go with the checkpoint file. I know I can achieve this via a non-checkpoint save and a restore, but it seems a bit silly to reload the domain''s memory from disk and doubles the "suspension" time of the domain. So, ideally, I would like xl save -p that leaves the domain paused. I did put it in a feature request a long time ago, but it never made the cut. To this end and as more of a starting point, I have written my own basic patch. While this appears to work, I see a (very small) opportunity for the domU to run for a short time between the libxl_domain_resume and libxl_domain_pause calls. This defeats the object as I am trying to maintain a disk snapshot that is exactly in synch with the save state. Can anyone please offer some thoughts on how I can implement this properly. I have looked at the corresponding xc calls but meddling with those is way beyond my knowledge. Another way of looking at the problem would be able to perform an xl save on a paused domain, as this would achieve the same result. Thanks for reading and thanks for any suggestions that are forthcoming. I am not a C guru and even less of a Xen dev guru, so please treat me somewhat like an idiot. :) Thanks, Ian. (Against RELEASE-4.2.2) diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 7780426..d0394df 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -2976,7 +2976,7 @@ static void save_domain_core_writeconfig(int fd, const char *source, hdr.optional_data_len); } -static int save_domain(const char *p, const char *filename, int checkpoint, +static int save_domain(const char *p, const char *filename, int checkpoint, int leavepaused, const char *override_config_file) { int fd; @@ -3003,10 +3003,13 @@ static int save_domain(const char *p, const char *filename, int checkpoint, if (rc < 0) fprintf(stderr, "Failed to save domain, resuming domain\n"); - if (checkpoint || rc < 0) - libxl_domain_resume(ctx, domid, 1, 0); + if (leavepaused || checkpoint || rc < 0) { + libxl_domain_resume(ctx, domid, 1, 0); + if (leavepaused && ! (rc < 0)) + libxl_domain_pause(ctx, domid); + } else - libxl_domain_destroy(ctx, domid, 0); + libxl_domain_destroy(ctx, domid, 0); exit(rc < 0 ? 1 : 0); } diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 7780426..d0394df 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -2976,7 +2976,7 @@ static void save_domain_core_writeconfig(int fd, const char *source, hdr.optional_data_len); } -static int save_domain(const char *p, const char *filename, int checkpoint, +static int save_domain(const char *p, const char *filename, int checkpoint, int leavepaused, const char *override_config_file) { int fd; @@ -3003,10 +3003,13 @@ static int save_domain(const char *p, const char *filename, int checkpoint, if (rc < 0) fprintf(stderr, "Failed to save domain, resuming domain\n"); - if (checkpoint || rc < 0) - libxl_domain_resume(ctx, domid, 1, 0); + if (leavepaused || checkpoint || rc < 0) { + libxl_domain_resume(ctx, domid, 1, 0); + if (leavepaused && ! (rc < 0)) + libxl_domain_pause(ctx, domid); + } else - libxl_domain_destroy(ctx, domid, 0); + libxl_domain_destroy(ctx, domid, 0); exit(rc < 0 ? 1 : 0); }
On Thu, 2013-05-30 at 22:46 +0100, Ian Murray wrote:> To this end and as more of a starting point, I have written my own basic > patch. While this appears to work, I see a (very small) opportunity for > the domU to run for a short time between the libxl_domain_resume and > libxl_domain_pause calls. This defeats the object as I am trying to > maintain a disk snapshot that is exactly in synch with the save state. > > Can anyone please offer some thoughts on how I can implement this > properly.Does it work if you simply do the pause before the resume? Looking at the hypervisor side it appears that pauses are referenced counted and it looks (based on a cursory glance) that it will do the right thing.> I have looked at the corresponding xc calls but meddling with > those is way beyond my knowledge. Another way of looking at the problem > would be able to perform an xl save on a paused domain, as this would > achieve the same result.OOI what happens if you try that?> (Against RELEASE-4.2.2)FYI any eventual patch will need to be against unstable and then considered separately for backporting. Ian.
Thanks for reading and responding> > Does it work if you simply do the pause before the resume? Looking at > the hypervisor side it appears that pauses are referenced counted and it > looks (based on a cursory glance) that it will do the right thing. >I tried this before and it didn''t seems to work. However, it seems to working now, having repeated the exercise. From memory It seemed to leave the domain in either a "pss" or similar unhealthy state (I don''t remember exactly). I have since discovered this may be a different issue around suspending/checkpointing using a CentOS 5.x Xen kernel and also I was working with a domain I broke because of experimentation. Anyway, I will spend more time testing this solution.>> I have looked at the corresponding xc calls but meddling with >> those is way beyond my knowledge. Another way of looking at the problem >> would be able to perform an xl save on a paused domain, as this would >> achieve the same result. > > OOI what happens if you try that?As far as I remember it complained that the domain didn''t respond to the suspend request. I was going to repeat the exercise but that seems like a moot point now.> >> (Against RELEASE-4.2.2) > > FYI any eventual patch will need to be against unstable and then > considered separately for backporting.Sure. I just needed a working place to start. If I was to prepare a patch against unstable, would it be accepted? The code I ended up with was (which is what I think you expected):- if (leavepaused || checkpoint || rc < 0) { if (leavepaused && ! (rc < 0)) { libxl_domain_pause(ctx, domid); fprintf(stderr, "Pausing before resume\n"); } libxl_domain_resume(ctx, domid, 1, 0); } else libxl_domain_destroy(ctx, domid, 0);> > Ian. >
On Sun, 2013-06-09 at 22:41 +0100, Ian Murray wrote:> Thanks for reading and responding > > > > > > Does it work if you simply do the pause before the resume? Looking at > > the hypervisor side it appears that pauses are referenced counted and it > > looks (based on a cursory glance) that it will do the right thing. > > > > I tried this before and it didn''t seems to work. However, it seems to working now, having repeated the exercise. > > From memory It seemed to leave the domain in either a "pss" or similar unhealthy state (I don''t remember exactly). I have since discovered this may be a different issue around suspending/checkpointing using a CentOS 5.x Xen kernel and also I was working with a domain I broke because of experimentation. > > Anyway, I will spend more time testing this solution. > > >> I have looked at the corresponding xc calls but meddling with > >> those is way beyond my knowledge. Another way of looking at the problem > >> would be able to perform an xl save on a paused domain, as this would > >> achieve the same result. > > > > OOI what happens if you try that? > > As far as I remember it complained that the domain didn''t respond to the suspend request. I was going to repeat the exercise but that seems like a moot point now. > > > > > >> (Against RELEASE-4.2.2) > > > > FYI any eventual patch will need to be against unstable and then > > considered separately for backporting. > > Sure. I just needed a working place to start. > > If I was to prepare a patch against unstable, would it be accepted?It seems like useful functionality to me, so subject to reviewing the actual implementation I think it more than likely would.> > > The code I ended up with was (which is what I think you expected):-Yep, minus the fprintf which I don''t think is needed.> > if (leavepaused || checkpoint || rc < 0) { > if (leavepaused && ! (rc < 0)) {I think the libxl coding style would be to cuddle the ! against the bracket.> libxl_domain_pause(ctx, domid); > fprintf(stderr, "Pausing before resume\n"); > } > libxl_domain_resume(ctx, domid, 1, 0); > } > else > libxl_domain_destroy(ctx, domid, 0); > > > > > > Ian. > >
> > It seems like useful functionality to me, so subject to reviewing the > actual implementation I think it more than likely would. >I see there is a longterm goal for snapshoting.... " Full-VM snapshotting owner: ? status: none prognosis: Probably delay until 4.4 Have a way of coordinating the taking and restoring of VM memory and disk snapshots. This would involve some investigation into the best way to accomplish this." I don''t know if my proposed solution meets that requirement. I would argue that snapshotting of the virtual disk is beyond the scope of the hypervisor tools because there are many different ways to implement a virtual disk, each with their own "snapshotting" methods.... So perhaps it does meet the requirement.>> >> The code I ended up with was (which is what I think you expected):- > > Yep, minus the fprintf which I don''t think is needed. >Yeah, that was in to make sure I was defintely running the right code, as pause straight after the suspend has the same output>> >> if (leavepaused || checkpoint || rc < 0) { >> if (leavepaused && ! (rc < 0)) { > > I think the libxl coding style would be to cuddle the ! against the > bracket. >Lol, I will take a look at the rest of the code to make sure it is in the same style.>> libxl_domain_pause(ctx, domid); >> fprintf(stderr, "Pausing before resume\n"); >> } >> libxl_domain_resume(ctx, domid, 1, 0); >> } >> else >> libxl_domain_destroy(ctx, domid, 0); >> >> >> > >> > Ian. >> > >
On Wed, 2013-06-12 at 11:21 +0100, Ian Murray wrote:> > > > > It seems like useful functionality to me, so subject to reviewing the > > actual implementation I think it more than likely would. > > > > > I see there is a longterm goal for snapshoting.... > > " > > Full-VM snapshotting owner: ? status: none prognosis: Probably delay > until 4.4 Have a way of coordinating the taking and restoring of VM > memory and disk snapshots. This would involve some investigation into > the best way to accomplish this." > > I don''t know if my proposed solution meets that requirement. I would > argue that snapshotting of the virtual disk is beyond the scope of the > hypervisor tools because there are many different ways to implement a > virtual disk, each with their own "snapshotting" methods....I would imagine we would end up with something like having libxl support callbacks for the relevant events and for xl to implement them as script callouts and other toolstack to do whatever they need to do. But as the goal says there needs to be some investigation of what the toolstack want/need and what the best way to achieve things is.> So perhaps it does meet the requirement.I think it is at least complementary or orthogonal to it, and it seems to me like useful functionality in its own right. Ian.