Hello Everyone, FIRST OF ALL, this is an RFC and, more important, it implements a new feature (although it''s a small one), so it is not meant for 4.2. Of course, comments are more than welcome, if you happen to find the time to look at it (after all, it is why I''m releasing it)... Just don''t rush if you''re busy with some pressing release-related stuff! :-P The above being stated, this series tries to tackle the problem of moving an already existent domain from a (some) NUMA node(s) to a (some) other(s). In fact, changing vcpu-affinity affects the node affinity of the domain itself, but this is only effective wrt the memory that will be allocated in the future, while all the existent pages are untouched. This of course makes a lot of sense, but there might be the need to _fully_ move a domain, both vcpus and memory wise, and that''s what this series tries to achieve. Just let me add that the whole process looks a lot like a suspend/resume (or either to a local migration) to me, and that''s why I implemented it like that for now. I went for the suspend resume approach, but I think I can make it more migration-oriented solution if wanted, as it''d reduce the downtime (although, in that case, we''d need to have twice the memory of the domain available during the process). Of course, I can also add the proper logic and interface to the hypervisor, but I''m not sure it buys that much, as there will still be something very similar to suspension or migration, just called and used a little bit differently (but of course I''m open to change my mind :-)). Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
Dario Faggioli
2012-Jul-06 09:54 UTC
[PATCH 1 of 2 RFC] xl: parse extra_config options even when restoring
In preparation for the next patch. Of course, if no extra_config options are provided by the caller, this is harmless and cause no functional changes. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -1706,21 +1706,6 @@ static int create_domain(struct domain_c if (ret) { fprintf(stderr, "Failed to read config file: %s: %s\n", config_file, strerror(errno)); return ERROR_FAIL; } } - if (!restoring && extra_config && strlen(extra_config)) { - if (config_len > INT_MAX - (strlen(extra_config) + 2 + 1)) { - fprintf(stderr, "Failed to attach extra configration\n"); - return ERROR_FAIL; - } - /* allocate space for the extra config plus two EOLs plus \0 */ - config_data = realloc(config_data, config_len - + strlen(extra_config) + 2 + 1); - if (!config_data) { - fprintf(stderr, "Failed to realloc config_data\n"); - return ERROR_FAIL; - } - config_len += sprintf(config_data + config_len, "\n%s\n", - extra_config); - } config_source=config_file; } else { if (!config_data) { @@ -1731,6 +1716,22 @@ static int create_domain(struct domain_c config_source = "<saved>"; } + if (extra_config && strlen(extra_config)) { + if (config_len > INT_MAX - (strlen(extra_config) + 2 + 1)) { + fprintf(stderr, "Failed to attach extra configration\n"); + return ERROR_FAIL; + } + /* allocate space for the extra config plus two EOLs plus \0 */ + config_data = realloc(config_data, config_len + + strlen(extra_config) + 2 + 1); + if (!config_data) { + fprintf(stderr, "Failed to realloc config_data\n"); + return ERROR_FAIL; + } + config_len += sprintf(config_data + config_len, "\n%s\n", + extra_config); + } + if (!dom_info->quiet) printf("Parsing config from %s\n", config_source);
Dario Faggioli
2012-Jul-06 09:54 UTC
[PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity
By introducing a new ''-M'' option to the `xl vcpu-pin'' command. The actual memory "movement" is achieved suspending the domain to a ttemporary file and resuming it with the new vcpu-affinity. Signed-off-by: dario.faggioli <dario.faggioli@citrix.com> diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1 --- a/docs/man/xl.pod.1 +++ b/docs/man/xl.pod.1 @@ -606,7 +606,7 @@ after B<vcpu-set>, go to B<SEE ALSO> sec Lists VCPU information for a specific domain. If no domain is specified, VCPU information for all domains will be provided. -=item B<vcpu-pin> I<domain-id> I<vcpu> I<cpus> +=item B<vcpu-pin> [I<OPTIONS>] I<domain-id> I<vcpu> I<cpus> Pins the VCPU to only run on the specific CPUs. The keyword B<all> can be used to apply the I<cpus> list to all VCPUs in the @@ -617,6 +617,22 @@ different run state is appropriate. Pin this, by ensuring certain VCPUs can only run on certain physical CPUs. +B<OPTIONS> + +=over 4 + +=item I<-M> + +If and only if using this together with I<vcpu> equal to B<all>, not only +pin the VCPUs to I<cpus>, it also tries to make sure all the memory for +the domain comes from the NUMA nodes I<cpus> belong to. This involves +suspending and resuming the domain, so some (hopefully small) downtime +should be expected, as well as the change of the domain id. + +=back + +=back + =item B<vm-list> Prints information about guests. This list excludes information about diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -4061,7 +4061,52 @@ int main_vcpulist(int argc, char **argv) return 0; } -static void vcpupin(const char *d, const char *vcpu, char *cpu) +/* + * Easiest way for having all the memory allocated in such a way that it + * reflects the vcpu affinity is to save and re-create the domain (as it + * happens when migrating and suspend/resumimg). + */ +static int vcpupin_move(const char *d, char *cpu) +{ + struct domain_create dom_info; + char savefile[] = "/tmp/vcpupin-autosave.XXXXXX"; + char extra_config[1024]; + uint8_t *config_data; + int config_len; + int save_fd; + + /* Setup a temporary core file for the domain */ + save_fd = mkostemp(savefile, O_WRONLY|O_CREAT|O_TRUNC); + if (save_fd < 0) + return -1; + + /* Save and destroy the current incarnation of the domain */ + MUST(libxl_userdata_retrieve(ctx, domid, "xl", &config_data, &config_len)); + save_domain_core_writeconfig(save_fd, savefile, config_data, config_len); + + MUST(libxl_domain_suspend(ctx, domid, save_fd, LIBXL_SUSPEND_LIVE, NULL)); + MUST(libxl_domain_destroy(ctx, domid)); + close(save_fd); + domid = INVALID_DOMID; + + /* Default parameters plus some extra_config for domain re-creation */ + memset(&dom_info, 0, sizeof(dom_info)); + dom_info.quiet = 1; + dom_info.migrate_fd = -1; + dom_info.monitor = dom_info.daemonize = 1; + dom_info.incr_generationid = 1; + dom_info.restore_file = savefile; + sprintf(extra_config, "cpus=\"%s\"\n", cpu); + dom_info.extra_config = extra_config; + + /* Finally, re-create the domain */ + MUST(create_domain(&dom_info)); + unlink(savefile); + + return 0; +} + +static void vcpupin(const char *d, const char *vcpu, char *cpu, int move) { libxl_vcpuinfo *vcpuinfo; libxl_cpumap cpumap; @@ -4078,6 +4123,10 @@ static void vcpupin(const char *d, const } vcpuid = -1; } + if (vcpuid != -1 && move) { + fprintf(stderr, "Error: can only move when VCPU is \"all\"\n"); + return; + } find_domain(d); @@ -4094,18 +4143,22 @@ static void vcpupin(const char *d, const } } else { - if (!(vcpuinfo = libxl_list_vcpu(ctx, domid, &nb_vcpu, &i))) { - fprintf(stderr, "libxl_list_vcpu failed.\n"); - goto vcpupin_out1; - } - for (i = 0; i < nb_vcpu; i++) { - if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid, - &cpumap) == -1) { - fprintf(stderr, "libxl_set_vcpuaffinity failed" - " on vcpu `%u''.\n", vcpuinfo[i].vcpuid); + if (move && vcpupin_move(d, cpu)) + fprintf(stderr, "Could not move the domain\n"); + else { + if (!(vcpuinfo = libxl_list_vcpu(ctx, domid, &nb_vcpu, &i))) { + fprintf(stderr, "libxl_list_vcpu failed.\n"); + goto vcpupin_out1; } - } - libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu); + for (i = 0; i < nb_vcpu; i++) { + if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid, + &cpumap) == -1) { + fprintf(stderr, "libxl_set_vcpuaffinity failed" + " on vcpu `%u''.\n", vcpuinfo[i].vcpuid); + } + } + libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu); + } } vcpupin_out1: libxl_cpumap_dispose(&cpumap); @@ -4115,12 +4168,20 @@ static void vcpupin(const char *d, const int main_vcpupin(int argc, char **argv) { + int move_memory = 0; int opt; - if ((opt = def_getopt(argc, argv, "", "vcpu-pin", 3)) != -1) - return opt; - - vcpupin(argv[optind], argv[optind+1] , argv[optind+2]); + if ((opt = def_getopt(argc, argv, "M", "vcpu-pin", 3)) != -1) { + switch(opt) { + case 0: case 2: + return opt; + case ''M'': + move_memory = 1; + break; + } + } + + vcpupin(argv[optind], argv[optind+1] , argv[optind+2], move_memory); return 0; } diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c --- a/tools/libxl/xl_cmdtable.c +++ b/tools/libxl/xl_cmdtable.c @@ -207,7 +207,10 @@ struct cmd_spec cmd_table[] = { { "vcpu-pin", &main_vcpupin, 0, 1, "Set which CPUs a VCPU can use", - "<Domain> <VCPU|all> <CPUs|all>", + "[options] <Domain> <VCPU|all> <CPUs|all>", + "-M Move the memory to th nodes corresponding to CPUs\n" + " (involves suspending/resuming the domain, so some\n" + " downtime is to be expected)" }, { "vcpu-set", &main_vcpuset, 0, 1,
George Dunlap
2012-Jul-06 12:53 UTC
Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity
On 06/07/12 10:54, Dario Faggioli wrote:> By introducing a new ''-M'' option to the `xl vcpu-pin'' command. The actual > memory "movement" is achieved suspending the domain to a ttemporary file and > resuming it with the new vcpu-affinityHmm... this will work and be reliable, but it seems a bit clunky. Long term we want to be able to do node migration in the background without shutting down a VM, right? If that can be done in the 4.3 timeframe, then it seems unnecessary to implement something like this. -George> . > > Signed-off-by: dario.faggioli<dario.faggioli@citrix.com> > > diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1 > --- a/docs/man/xl.pod.1 > +++ b/docs/man/xl.pod.1 > @@ -606,7 +606,7 @@ after B<vcpu-set>, go to B<SEE ALSO> sec > Lists VCPU information for a specific domain. If no domain is > specified, VCPU information for all domains will be provided. > > -=item B<vcpu-pin> I<domain-id> I<vcpu> I<cpus> > +=item B<vcpu-pin> [I<OPTIONS>] I<domain-id> I<vcpu> I<cpus> > > Pins the VCPU to only run on the specific CPUs. The keyword > B<all> can be used to apply the I<cpus> list to all VCPUs in the > @@ -617,6 +617,22 @@ different run state is appropriate. Pin > this, by ensuring certain VCPUs can only run on certain physical > CPUs. > > +B<OPTIONS> > + > +=over 4 > + > +=item I<-M> > + > +If and only if using this together with I<vcpu> equal to B<all>, not only > +pin the VCPUs to I<cpus>, it also tries to make sure all the memory for > +the domain comes from the NUMA nodes I<cpus> belong to. This involves > +suspending and resuming the domain, so some (hopefully small) downtime > +should be expected, as well as the change of the domain id. > + > +=back > + > +=back > + > =item B<vm-list> > > Prints information about guests. This list excludes information about > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -4061,7 +4061,52 @@ int main_vcpulist(int argc, char **argv) > return 0; > } > > -static void vcpupin(const char *d, const char *vcpu, char *cpu) > +/* > + * Easiest way for having all the memory allocated in such a way that it > + * reflects the vcpu affinity is to save and re-create the domain (as it > + * happens when migrating and suspend/resumimg). > + */ > +static int vcpupin_move(const char *d, char *cpu) > +{ > + struct domain_create dom_info; > + char savefile[] = "/tmp/vcpupin-autosave.XXXXXX"; > + char extra_config[1024]; > + uint8_t *config_data; > + int config_len; > + int save_fd; > + > + /* Setup a temporary core file for the domain */ > + save_fd = mkostemp(savefile, O_WRONLY|O_CREAT|O_TRUNC); > + if (save_fd< 0) > + return -1; > + > + /* Save and destroy the current incarnation of the domain */ > + MUST(libxl_userdata_retrieve(ctx, domid, "xl",&config_data,&config_len)); > + save_domain_core_writeconfig(save_fd, savefile, config_data, config_len); > + > + MUST(libxl_domain_suspend(ctx, domid, save_fd, LIBXL_SUSPEND_LIVE, NULL)); > + MUST(libxl_domain_destroy(ctx, domid)); > + close(save_fd); > + domid = INVALID_DOMID; > + > + /* Default parameters plus some extra_config for domain re-creation */ > + memset(&dom_info, 0, sizeof(dom_info)); > + dom_info.quiet = 1; > + dom_info.migrate_fd = -1; > + dom_info.monitor = dom_info.daemonize = 1; > + dom_info.incr_generationid = 1; > + dom_info.restore_file = savefile; > + sprintf(extra_config, "cpus=\"%s\"\n", cpu); > + dom_info.extra_config = extra_config; > + > + /* Finally, re-create the domain */ > + MUST(create_domain(&dom_info)); > + unlink(savefile); > + > + return 0; > +} > + > +static void vcpupin(const char *d, const char *vcpu, char *cpu, int move) > { > libxl_vcpuinfo *vcpuinfo; > libxl_cpumap cpumap; > @@ -4078,6 +4123,10 @@ static void vcpupin(const char *d, const > } > vcpuid = -1; > } > + if (vcpuid != -1&& move) { > + fprintf(stderr, "Error: can only move when VCPU is \"all\"\n"); > + return; > + } > > find_domain(d); > > @@ -4094,18 +4143,22 @@ static void vcpupin(const char *d, const > } > } > else { > - if (!(vcpuinfo = libxl_list_vcpu(ctx, domid,&nb_vcpu,&i))) { > - fprintf(stderr, "libxl_list_vcpu failed.\n"); > - goto vcpupin_out1; > - } > - for (i = 0; i< nb_vcpu; i++) { > - if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid, > -&cpumap) == -1) { > - fprintf(stderr, "libxl_set_vcpuaffinity failed" > - " on vcpu `%u''.\n", vcpuinfo[i].vcpuid); > + if (move&& vcpupin_move(d, cpu)) > + fprintf(stderr, "Could not move the domain\n"); > + else { > + if (!(vcpuinfo = libxl_list_vcpu(ctx, domid,&nb_vcpu,&i))) { > + fprintf(stderr, "libxl_list_vcpu failed.\n"); > + goto vcpupin_out1; > } > - } > - libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu); > + for (i = 0; i< nb_vcpu; i++) { > + if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid, > +&cpumap) == -1) { > + fprintf(stderr, "libxl_set_vcpuaffinity failed" > + " on vcpu `%u''.\n", vcpuinfo[i].vcpuid); > + } > + } > + libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu); > + } > } > vcpupin_out1: > libxl_cpumap_dispose(&cpumap); > @@ -4115,12 +4168,20 @@ static void vcpupin(const char *d, const > > int main_vcpupin(int argc, char **argv) > { > + int move_memory = 0; > int opt; > > - if ((opt = def_getopt(argc, argv, "", "vcpu-pin", 3)) != -1) > - return opt; > - > - vcpupin(argv[optind], argv[optind+1] , argv[optind+2]); > + if ((opt = def_getopt(argc, argv, "M", "vcpu-pin", 3)) != -1) { > + switch(opt) { > + case 0: case 2: > + return opt; > + case ''M'': > + move_memory = 1; > + break; > + } > + } > + > + vcpupin(argv[optind], argv[optind+1] , argv[optind+2], move_memory); > return 0; > } > > diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c > --- a/tools/libxl/xl_cmdtable.c > +++ b/tools/libxl/xl_cmdtable.c > @@ -207,7 +207,10 @@ struct cmd_spec cmd_table[] = { > { "vcpu-pin", > &main_vcpupin, 0, 1, > "Set which CPUs a VCPU can use", > - "<Domain> <VCPU|all> <CPUs|all>", > + "[options]<Domain> <VCPU|all> <CPUs|all>", > + "-M Move the memory to th nodes corresponding to CPUs\n" > + " (involves suspending/resuming the domain, so some\n" > + " downtime is to be expected)" > }, > { "vcpu-set", > &main_vcpuset, 0, 1,
Ian Campbell
2012-Jul-06 13:25 UTC
Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity
On Fri, 2012-07-06 at 13:53 +0100, George Dunlap wrote:> On 06/07/12 10:54, Dario Faggioli wrote: > > By introducing a new ''-M'' option to the `xl vcpu-pin'' command. The actual > > memory "movement" is achieved suspending the domain to a ttemporary file and > > resuming it with the new vcpu-affinity > Hmm... this will work and be reliable, but it seems a bit clunky. Long > term we want to be able to do node migration in the background without > shutting down a VM, right? If that can be done in the 4.3 timeframe, > then it seems unnecessary to implement something like this.We could do something cleverer for HVM (or hybrid) guests to migrate pages while the guest is live but migrating a page under a PV guest''s feet requires quiescing it in the style of a migrate. We could probably manage to make it such that you just need the pause/frob/unpause phases without actually writing RAM to disk, and that infrastructure might be useful for other reasons I suppose. (e.g. I think bad page offlining currently uses a similar save/restore trick) Ian.
George Dunlap
2012-Jul-06 13:30 UTC
Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity
On 06/07/12 14:25, Ian Campbell wrote:> On Fri, 2012-07-06 at 13:53 +0100, George Dunlap wrote: >> On 06/07/12 10:54, Dario Faggioli wrote: >>> By introducing a new ''-M'' option to the `xl vcpu-pin'' command. The actual >>> memory "movement" is achieved suspending the domain to a ttemporary file and >>> resuming it with the new vcpu-affinity >> Hmm... this will work and be reliable, but it seems a bit clunky. Long >> term we want to be able to do node migration in the background without >> shutting down a VM, right? If that can be done in the 4.3 timeframe, >> then it seems unnecessary to implement something like this. > We could do something cleverer for HVM (or hybrid) guests to migrate > pages while the guest is live but migrating a page under a PV guest''s > feet requires quiescing it in the style of a migrate. > > We could probably manage to make it such that you just need the > pause/frob/unpause phases without actually writing RAM to disk, and that > infrastructure might be useful for other reasons I suppose. (e.g. I > think bad page offlining currently uses a similar save/restore trick)Yes, I think doing this chunks at a time is likely to be much preferrable to writing the whole thing to disk and reading it out again. Doing it "live" may end up with more total overhead, but it wont'' require the VM being actually down for the suspend/resume cycle, which on a large guest may be tens of seconds. -George
Ian Campbell
2012-Jul-06 13:38 UTC
Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity
On Fri, 2012-07-06 at 14:30 +0100, George Dunlap wrote:> On 06/07/12 14:25, Ian Campbell wrote: > > On Fri, 2012-07-06 at 13:53 +0100, George Dunlap wrote: > >> On 06/07/12 10:54, Dario Faggioli wrote: > >>> By introducing a new ''-M'' option to the `xl vcpu-pin'' command. The actual > >>> memory "movement" is achieved suspending the domain to a ttemporary file and > >>> resuming it with the new vcpu-affinity > >> Hmm... this will work and be reliable, but it seems a bit clunky. Long > >> term we want to be able to do node migration in the background without > >> shutting down a VM, right? If that can be done in the 4.3 timeframe, > >> then it seems unnecessary to implement something like this. > > We could do something cleverer for HVM (or hybrid) guests to migrate > > pages while the guest is live but migrating a page under a PV guest''s > > feet requires quiescing it in the style of a migrate. > > > > We could probably manage to make it such that you just need the > > pause/frob/unpause phases without actually writing RAM to disk, and that > > infrastructure might be useful for other reasons I suppose. (e.g. I > > think bad page offlining currently uses a similar save/restore trick) > Yes, I think doing this chunks at a time is likely to be much > preferrable to writing the whole thing to disk and reading it out > again. Doing it "live" may end up with more total overhead, but it > wont'' require the VM being actually down for the suspend/resume cycle, > which on a large guest may be tens of seconds.Tim also suggested a PV protocol whereby the tools hint to the guest "if you XENMEM_exchange this MFN you will get back an MFN which is much better for your performance". (at least I think that''s what Tim suggested, I may have misunderstood) Ian.
Dario Faggioli
2012-Jul-06 13:57 UTC
Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity
On Fri, 2012-07-06 at 13:53 +0100, George Dunlap wrote:> On 06/07/12 10:54, Dario Faggioli wrote: > > By introducing a new ''-M'' option to the `xl vcpu-pin'' command. The actual > > memory "movement" is achieved suspending the domain to a ttemporary file and > > resuming it with the new vcpu-affinity > Hmm... this will work and be reliable, but it seems a bit clunky. >If I can ask, the idea or the implementation? :-)> Long > term we want to be able to do node migration in the background without > shutting down a VM, right? >Definitely, and we also want to do that automatically, according to some load/performance/whatever run-time measurement, without any user intervention. However ...> If that can be done in the 4.3 timeframe, > then it seems unnecessary to implement something like this. >... I think something like this could still be useful. IOW, I think it would be worthwhile to have both the automatic memory migration happening in background and something like this explicit, do-it-all-now, as they serve different purposes. It''s sort of like in cpu scheduling, where you have free tasks/vcpus that run wherever the scheduler want, but you also have pinning, if at some point you want to confine them to some subset of cpus for whatever reason. Also, as it happens for pinning, you can do it at domain creation time, but then your might change your mind and you''d need something to have the system reflecting your actual needs. So, if you allow me, initial vcpu pinning is like NUMA automatic placement, and automatic memory migration happening in background is like "free scheduling", then this feature is like what we get when running `xl vcpu-pin''. (I hope the comparison helped in clarifying my view rather than making it even more obscure :-)). Also, as an example, what happens if you want to create a ne VM and you have enough memory, but not in a single node because of memory fragmentation? Ideally, that won''t happen thanks to the automatic memory migration, but that''s not guaranteed to be possible (there might be VM pinned, cpupools, and any sort of stuff), or to happen effectively and soon enough (it will be an heuristics after all). In such a situation, I figure this feature could be a useful tool for a system administrator. Than again, I''m talking about the feature. The implementation might well change, and I really expect being able to migrate memory in background to help beautifying it (the implementation) quite a bit. Sorry for writing so much. :-P Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2012-Jul-06 14:04 UTC
Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity
On 06/07/12 14:57, Dario Faggioli wrote:> On Fri, 2012-07-06 at 13:53 +0100, George Dunlap wrote: >> On 06/07/12 10:54, Dario Faggioli wrote: >>> By introducing a new ''-M'' option to the `xl vcpu-pin'' command. The actual >>> memory "movement" is achieved suspending the domain to a ttemporary file and >>> resuming it with the new vcpu-affinity >> Hmm... this will work and be reliable, but it seems a bit clunky. >> > If I can ask, the idea or the implementation? :-) > >> Long >> term we want to be able to do node migration in the background without >> shutting down a VM, right? >> > Definitely, and we also want to do that automatically, according to some > load/performance/whatever run-time measurement, without any user > intervention. However ... > >> If that can be done in the 4.3 timeframe, >> then it seems unnecessary to implement something like this. >> > ... I think something like this could still be useful. > > IOW, I think it would be worthwhile to have both the automatic memory > migration happening in background and something like this explicit, > do-it-all-now, as they serve different purposes.Well, I think you can have an explicit, do-it-now mechanism that doesn''t require suspending the VM; and in fact, I think such a mechanism would be much preferable. Compare it to vcpu pinning: if you decided you wanted to change the vcpu pinning, would you rather just be able to pin the maps while the VM was running, and have it take effect immediately (as it does now), or would you rather have to suspend and resume the VM, taking it out of commission for 10 seconds or so? I''m not opposed to the -M option meaning, "...and please move the memory based on the new nodemask". But I don''t think implementing it as suspend/resume is really that useful, except as a stopgap measure until we can get "live" memory migration working. If we can get that before 4.3, there''s no need for the stopgap measure. -George
Dario Faggioli
2012-Jul-06 14:05 UTC
Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity
On Fri, 2012-07-06 at 14:25 +0100, Ian Campbell wrote:> We could do something cleverer for HVM (or hybrid) guests to migrate > pages while the guest is live but migrating a page under a PV guest''s > feet requires quiescing it in the style of a migrate. >Which, as we''re talking about the same host, means reaching a point where I have two full copies of the guest memory in RAM, or not? (Just trying to figure out if I understood things sensibly).> We could probably manage to make it such that you just need the > pause/frob/unpause phases without actually writing RAM to disk, >Of course, that was in my e-mail too (it''s about the above that I''m not 100% sure...).> and that > infrastructure might be useful for other reasons I suppose. >Definitely. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2012-Jul-06 14:07 UTC
Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity
On 06/07/12 15:05, Dario Faggioli wrote:> On Fri, 2012-07-06 at 14:25 +0100, Ian Campbell wrote: >> We could do something cleverer for HVM (or hybrid) guests to migrate >> pages while the guest is live but migrating a page under a PV guest''s >> feet requires quiescing it in the style of a migrate. >> > Which, as we''re talking about the same host, means reaching a point > where I have two full copies of the guest memory in RAM, or not? > > (Just trying to figure out if I understood things sensibly).No, you can do batches at a time, based on how much free memory you have. If you have enough free memory to keep two copies in RAM at a time, you might as well. But even as little as moving 1MB at a time would probably be easily achievable and not to much overhead. -George
Dario Faggioli
2012-Jul-06 14:14 UTC
Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity
On Fri, 2012-07-06 at 15:04 +0100, George Dunlap wrote:> > IOW, I think it would be worthwhile to have both the automatic memory > > migration happening in background and something like this explicit, > > do-it-all-now, as they serve different purposes. > Well, I think you can have an explicit, do-it-now mechanism that doesn''t > require suspending the VM; and in fact, I think such a mechanism would > be much preferable. >I fully agree on this.> I''m not opposed to the -M option meaning, "...and please move the memory > based on the new nodemask". But I don''t think implementing it as > suspend/resume is really that useful, except as a stopgap measure until > we can get "live" memory migration working. >I see, then we agree on everything, basically. :-P> If we can get that before > 4.3, there''s no need for the stopgap measure. >Sure, that was the reason why it was an RFC after all... And it at least triggered this discussion. :-) Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2012-Jul-06 14:42 UTC
Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity
On Fri, 2012-07-06 at 15:05 +0100, Dario Faggioli wrote:> On Fri, 2012-07-06 at 14:25 +0100, Ian Campbell wrote: > > We could do something cleverer for HVM (or hybrid) guests to migrate > > pages while the guest is live but migrating a page under a PV guest''s > > feet requires quiescing it in the style of a migrate. > > > Which, as we''re talking about the same host, means reaching a point > where I have two full copies of the guest memory in RAM, or not? > > (Just trying to figure out if I understood things sensibly).You could move it over in chunks, freeing as you go so if chunk size was 1M you would only need 1M of extra memory, not the full domain size. Ian.