thr3ads.net - Xen devel - [PATCH 0 of 2 RFC] xl: move domeins among NUMA nodes [Jul 2012]

If this information is useful, please help other people find it:
Share via:

Dario Faggioli

2012-Jul-06 09:54 UTC

[PATCH 0 of 2 RFC] xl: move domeins among NUMA nodes

Hello Everyone,

FIRST OF ALL, this is an RFC and, more important, it implements a new feature
(although it''s a small one), so it is not meant for 4.2. Of course,
comments
are more than welcome, if you happen to find the time to look at it (after all,
it is why I''m releasing it)... Just don''t rush if
you''re busy with some
pressing release-related stuff! :-P

The above being stated, this series tries to tackle the problem of moving an
already existent domain from a (some) NUMA node(s) to a (some) other(s). In
fact, changing vcpu-affinity affects the node affinity of the domain itself,
but this is only effective wrt the memory that will be allocated in the future,
while all the existent pages are untouched. This of course makes a lot of
sense, but there might be the need to _fully_ move a domain, both vcpus and
memory wise, and that''s what this series tries to achieve.

Just let me add that the whole process looks a lot like a suspend/resume (or
either to a local migration) to me, and that''s why I implemented it
like that
for now. I went for the suspend resume approach, but I think I can make it more
migration-oriented solution if wanted, as it''d reduce the downtime
(although,
in that case, we''d need to have twice the memory of the domain
available during
the process).
Of course, I can also add the proper logic and interface to the hypervisor, but
I''m not sure it buys that much, as there will still be something very
similar
to suspension or migration, just called and used a little bit differently (but
of course I''m open to change my mind :-)).

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Dario Faggioli

2012-Jul-06 09:54 UTC

head link

[PATCH 1 of 2 RFC] xl: parse extra_config options even when restoring

In preparation for the next patch. Of course, if no extra_config options
are provided by the caller, this is harmless and cause no functional
changes.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1706,21 +1706,6 @@ static int create_domain(struct domain_c
             if (ret) { fprintf(stderr, "Failed to read config file: %s:
%s\n",
                                config_file, strerror(errno)); return
ERROR_FAIL; }
         }
-        if (!restoring && extra_config && strlen(extra_config))
{
-            if (config_len > INT_MAX - (strlen(extra_config) + 2 + 1)) {
-                fprintf(stderr, "Failed to attach extra
configration\n");
-                return ERROR_FAIL;
-            }
-            /* allocate space for the extra config plus two EOLs plus \0 */
-            config_data = realloc(config_data, config_len
-                + strlen(extra_config) + 2 + 1);
-            if (!config_data) {
-                fprintf(stderr, "Failed to realloc config_data\n");
-                return ERROR_FAIL;
-            }
-            config_len += sprintf(config_data + config_len, "\n%s\n",
-                extra_config);
-        }
         config_source=config_file;
     } else {
         if (!config_data) {
@@ -1731,6 +1716,22 @@ static int create_domain(struct domain_c
         config_source = "<saved>";
     }
 
+    if (extra_config && strlen(extra_config)) {
+        if (config_len > INT_MAX - (strlen(extra_config) + 2 + 1)) {
+            fprintf(stderr, "Failed to attach extra configration\n");
+            return ERROR_FAIL;
+        }
+        /* allocate space for the extra config plus two EOLs plus \0 */
+        config_data = realloc(config_data, config_len
+            + strlen(extra_config) + 2 + 1);
+        if (!config_data) {
+            fprintf(stderr, "Failed to realloc config_data\n");
+            return ERROR_FAIL;
+        }
+        config_len += sprintf(config_data + config_len, "\n%s\n",
+            extra_config);
+    }
+
     if (!dom_info->quiet)
         printf("Parsing config from %s\n", config_source);

Dario Faggioli

2012-Jul-06 09:54 UTC

head link

[PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

By introducing a new ''-M'' option to the `xl vcpu-pin''
command. The actual
memory "movement" is achieved suspending the domain to a ttemporary
file and
resuming it with the new vcpu-affinity.

Signed-off-by: dario.faggioli <dario.faggioli@citrix.com>

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -606,7 +606,7 @@ after B<vcpu-set>, go to B<SEE ALSO> sec
 Lists VCPU information for a specific domain.  If no domain is
 specified, VCPU information for all domains will be provided.
 
-=item B<vcpu-pin> I<domain-id> I<vcpu> I<cpus>
+=item B<vcpu-pin> [I<OPTIONS>] I<domain-id> I<vcpu>
I<cpus>
 
 Pins the VCPU to only run on the specific CPUs.  The keyword
 B<all> can be used to apply the I<cpus> list to all VCPUs in the
@@ -617,6 +617,22 @@ different run state is appropriate.  Pin
 this, by ensuring certain VCPUs can only run on certain physical
 CPUs.
 
+B<OPTIONS>
+
+=over 4
+
+=item I<-M>
+
+If and only if using this together with I<vcpu> equal to B<all>,
not only
+pin the VCPUs to I<cpus>, it also tries to make sure all the memory for
+the domain comes from the NUMA nodes I<cpus> belong to. This involves
+suspending and resuming the domain, so some (hopefully small) downtime
+should be expected, as well as the change of the domain id.
+
+=back
+
+=back
+
 =item B<vm-list>
 
 Prints information about guests. This list excludes information about
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -4061,7 +4061,52 @@ int main_vcpulist(int argc, char **argv)
     return 0;
 }
 
-static void vcpupin(const char *d, const char *vcpu, char *cpu)
+/*
+ * Easiest way for having all the memory allocated in such a way that it
+ * reflects the vcpu affinity is to save and re-create the domain (as it
+ * happens when migrating and suspend/resumimg).
+ */
+static int vcpupin_move(const char *d, char *cpu)
+{
+    struct domain_create dom_info;
+    char savefile[] = "/tmp/vcpupin-autosave.XXXXXX";
+    char extra_config[1024];
+    uint8_t *config_data;
+    int config_len;
+    int save_fd;
+
+    /* Setup a temporary core file for the domain */
+    save_fd = mkostemp(savefile, O_WRONLY|O_CREAT|O_TRUNC);
+    if (save_fd < 0)
+        return -1;
+
+    /* Save and destroy the current incarnation of the domain */
+    MUST(libxl_userdata_retrieve(ctx, domid, "xl", &config_data,
&config_len));
+    save_domain_core_writeconfig(save_fd, savefile, config_data, config_len);
+
+    MUST(libxl_domain_suspend(ctx, domid, save_fd, LIBXL_SUSPEND_LIVE, NULL));
+    MUST(libxl_domain_destroy(ctx, domid));
+    close(save_fd);
+    domid = INVALID_DOMID;
+
+    /* Default parameters plus some extra_config for domain re-creation */
+    memset(&dom_info, 0, sizeof(dom_info));
+    dom_info.quiet = 1;
+    dom_info.migrate_fd = -1;
+    dom_info.monitor = dom_info.daemonize = 1;
+    dom_info.incr_generationid = 1;
+    dom_info.restore_file = savefile;
+    sprintf(extra_config, "cpus=\"%s\"\n", cpu);
+    dom_info.extra_config = extra_config;
+
+    /* Finally, re-create the domain */
+    MUST(create_domain(&dom_info));
+    unlink(savefile);
+
+    return 0;
+}
+
+static void vcpupin(const char *d, const char *vcpu, char *cpu, int move)
 {
     libxl_vcpuinfo *vcpuinfo;
     libxl_cpumap cpumap;
@@ -4078,6 +4123,10 @@ static void vcpupin(const char *d, const
         }
         vcpuid = -1;
     }
+    if (vcpuid != -1 && move) {
+        fprintf(stderr, "Error: can only move when VCPU is
\"all\"\n");
+        return;
+    }
 
     find_domain(d);
 
@@ -4094,18 +4143,22 @@ static void vcpupin(const char *d, const
         }
     }
     else {
-        if (!(vcpuinfo = libxl_list_vcpu(ctx, domid, &nb_vcpu, &i))) {
-            fprintf(stderr, "libxl_list_vcpu failed.\n");
-            goto vcpupin_out1;
-        }
-        for (i = 0; i < nb_vcpu; i++) {
-            if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid,
-                                       &cpumap) == -1) {
-                fprintf(stderr, "libxl_set_vcpuaffinity failed"
-                                " on vcpu `%u''.\n",
vcpuinfo[i].vcpuid);
+        if (move && vcpupin_move(d, cpu))
+            fprintf(stderr, "Could not move the domain\n");
+        else {
+            if (!(vcpuinfo = libxl_list_vcpu(ctx, domid, &nb_vcpu,
&i))) {
+                fprintf(stderr, "libxl_list_vcpu failed.\n");
+                goto vcpupin_out1;
             }
-        }
-        libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu);
+            for (i = 0; i < nb_vcpu; i++) {
+                if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid,
+                                           &cpumap) == -1) {
+                    fprintf(stderr, "libxl_set_vcpuaffinity failed"
+                                    " on vcpu `%u''.\n",
vcpuinfo[i].vcpuid);
+                }
+            }
+            libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu);
+        }
     }
   vcpupin_out1:
     libxl_cpumap_dispose(&cpumap);
@@ -4115,12 +4168,20 @@ static void vcpupin(const char *d, const
 
 int main_vcpupin(int argc, char **argv)
 {
+    int move_memory = 0;
     int opt;
 
-    if ((opt = def_getopt(argc, argv, "", "vcpu-pin", 3))
!= -1)
-        return opt;
-
-    vcpupin(argv[optind], argv[optind+1] , argv[optind+2]);
+    if ((opt = def_getopt(argc, argv, "M", "vcpu-pin", 3))
!= -1) {
+        switch(opt) {
+        case 0: case 2:
+            return opt;
+        case ''M'':
+            move_memory = 1;
+            break;
+        }
+    }
+
+    vcpupin(argv[optind], argv[optind+1] , argv[optind+2], move_memory);
     return 0;
 }
 
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -207,7 +207,10 @@ struct cmd_spec cmd_table[] = {
     { "vcpu-pin",
       &main_vcpupin, 0, 1,
       "Set which CPUs a VCPU can use",
-      "<Domain> <VCPU|all> <CPUs|all>",
+      "[options] <Domain> <VCPU|all> <CPUs|all>",
+      "-M    Move the memory to th nodes corresponding to CPUs\n"
+      "      (involves suspending/resuming the domain, so some\n"
+      "      downtime is to be expected)"
     },
     { "vcpu-set",
       &main_vcpuset, 0, 1,

George Dunlap

2012-Jul-06 12:53 UTC

head link

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

On 06/07/12 10:54, Dario Faggioli wrote:> By introducing a new ''-M'' option to the `xl
vcpu-pin'' command. The actual
> memory "movement" is achieved suspending the domain to a
ttemporary file and
> resuming it with the new vcpu-affinityHmm... this will work and be reliable, but it seems a bit clunky.  Long 
term we want to be able to do node migration in the background without 
shutting down a VM, right?  If that can be done in the 4.3 timeframe, 
then it seems unnecessary to implement something like this.

  -George

> .
>
> Signed-off-by: dario.faggioli<dario.faggioli@citrix.com>
>
> diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
> --- a/docs/man/xl.pod.1
> +++ b/docs/man/xl.pod.1
> @@ -606,7 +606,7 @@ after B<vcpu-set>, go to B<SEE ALSO>  sec
>   Lists VCPU information for a specific domain.  If no domain is
>   specified, VCPU information for all domains will be provided.
>
> -=item B<vcpu-pin>  I<domain-id>  I<vcpu>  I<cpus>
> +=item B<vcpu-pin>  [I<OPTIONS>] I<domain-id> 
I<vcpu>  I<cpus>
>
>   Pins the VCPU to only run on the specific CPUs.  The keyword
>   B<all>  can be used to apply the I<cpus>  list to all VCPUs
in the
> @@ -617,6 +617,22 @@ different run state is appropriate.  Pin
>   this, by ensuring certain VCPUs can only run on certain physical
>   CPUs.
>
> +B<OPTIONS>
> +
> +=over 4
> +
> +=item I<-M>
> +
> +If and only if using this together with I<vcpu>  equal to
B<all>, not only
> +pin the VCPUs to I<cpus>, it also tries to make sure all the memory
for
> +the domain comes from the NUMA nodes I<cpus>  belong to. This
involves
> +suspending and resuming the domain, so some (hopefully small) downtime
> +should be expected, as well as the change of the domain id.
> +
> +=back
> +
> +=back
> +
>   =item B<vm-list>
>
>   Prints information about guests. This list excludes information about
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -4061,7 +4061,52 @@ int main_vcpulist(int argc, char **argv)
>       return 0;
>   }
>
> -static void vcpupin(const char *d, const char *vcpu, char *cpu)
> +/*
> + * Easiest way for having all the memory allocated in such a way that it
> + * reflects the vcpu affinity is to save and re-create the domain (as it
> + * happens when migrating and suspend/resumimg).
> + */
> +static int vcpupin_move(const char *d, char *cpu)
> +{
> +    struct domain_create dom_info;
> +    char savefile[] = "/tmp/vcpupin-autosave.XXXXXX";
> +    char extra_config[1024];
> +    uint8_t *config_data;
> +    int config_len;
> +    int save_fd;
> +
> +    /* Setup a temporary core file for the domain */
> +    save_fd = mkostemp(savefile, O_WRONLY|O_CREAT|O_TRUNC);
> +    if (save_fd<  0)
> +        return -1;
> +
> +    /* Save and destroy the current incarnation of the domain */
> +    MUST(libxl_userdata_retrieve(ctx, domid,
"xl",&config_data,&config_len));
> +    save_domain_core_writeconfig(save_fd, savefile, config_data,
config_len);
> +
> +    MUST(libxl_domain_suspend(ctx, domid, save_fd, LIBXL_SUSPEND_LIVE,
NULL));
> +    MUST(libxl_domain_destroy(ctx, domid));
> +    close(save_fd);
> +    domid = INVALID_DOMID;
> +
> +    /* Default parameters plus some extra_config for domain re-creation */
> +    memset(&dom_info, 0, sizeof(dom_info));
> +    dom_info.quiet = 1;
> +    dom_info.migrate_fd = -1;
> +    dom_info.monitor = dom_info.daemonize = 1;
> +    dom_info.incr_generationid = 1;
> +    dom_info.restore_file = savefile;
> +    sprintf(extra_config, "cpus=\"%s\"\n", cpu);
> +    dom_info.extra_config = extra_config;
> +
> +    /* Finally, re-create the domain */
> +    MUST(create_domain(&dom_info));
> +    unlink(savefile);
> +
> +    return 0;
> +}
> +
> +static void vcpupin(const char *d, const char *vcpu, char *cpu, int move)
>   {
>       libxl_vcpuinfo *vcpuinfo;
>       libxl_cpumap cpumap;
> @@ -4078,6 +4123,10 @@ static void vcpupin(const char *d, const
>           }
>           vcpuid = -1;
>       }
> +    if (vcpuid != -1&&  move) {
> +        fprintf(stderr, "Error: can only move when VCPU is
\"all\"\n");
> +        return;
> +    }
>
>       find_domain(d);
>
> @@ -4094,18 +4143,22 @@ static void vcpupin(const char *d, const
>           }
>       }
>       else {
> -        if (!(vcpuinfo = libxl_list_vcpu(ctx, domid,&nb_vcpu,&i)))
{
> -            fprintf(stderr, "libxl_list_vcpu failed.\n");
> -            goto vcpupin_out1;
> -        }
> -        for (i = 0; i<  nb_vcpu; i++) {
> -            if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid,
> -&cpumap) == -1) {
> -                fprintf(stderr, "libxl_set_vcpuaffinity failed"
> -                                " on vcpu `%u''.\n",
vcpuinfo[i].vcpuid);
> +        if (move&&  vcpupin_move(d, cpu))
> +            fprintf(stderr, "Could not move the domain\n");
> +        else {
> +            if (!(vcpuinfo = libxl_list_vcpu(ctx,
domid,&nb_vcpu,&i))) {
> +                fprintf(stderr, "libxl_list_vcpu failed.\n");
> +                goto vcpupin_out1;
>               }
> -        }
> -        libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu);
> +            for (i = 0; i<  nb_vcpu; i++) {
> +                if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid,
> +&cpumap) == -1) {
> +                    fprintf(stderr, "libxl_set_vcpuaffinity
failed"
> +                                    " on vcpu `%u''.\n",
vcpuinfo[i].vcpuid);
> +                }
> +            }
> +            libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu);
> +        }
>       }
>     vcpupin_out1:
>       libxl_cpumap_dispose(&cpumap);
> @@ -4115,12 +4168,20 @@ static void vcpupin(const char *d, const
>
>   int main_vcpupin(int argc, char **argv)
>   {
> +    int move_memory = 0;
>       int opt;
>
> -    if ((opt = def_getopt(argc, argv, "", "vcpu-pin",
3)) != -1)
> -        return opt;
> -
> -    vcpupin(argv[optind], argv[optind+1] , argv[optind+2]);
> +    if ((opt = def_getopt(argc, argv, "M", "vcpu-pin",
3)) != -1) {
> +        switch(opt) {
> +        case 0: case 2:
> +            return opt;
> +        case ''M'':
> +            move_memory = 1;
> +            break;
> +        }
> +    }
> +
> +    vcpupin(argv[optind], argv[optind+1] , argv[optind+2], move_memory);
>       return 0;
>   }
>
> diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
> --- a/tools/libxl/xl_cmdtable.c
> +++ b/tools/libxl/xl_cmdtable.c
> @@ -207,7 +207,10 @@ struct cmd_spec cmd_table[] = {
>       { "vcpu-pin",
>         &main_vcpupin, 0, 1,
>         "Set which CPUs a VCPU can use",
> -      "<Domain>  <VCPU|all>  <CPUs|all>",
> +      "[options]<Domain>  <VCPU|all> 
<CPUs|all>",
> +      "-M    Move the memory to th nodes corresponding to
CPUs\n"
> +      "      (involves suspending/resuming the domain, so
some\n"
> +      "      downtime is to be expected)"
>       },
>       { "vcpu-set",
>         &main_vcpuset, 0, 1,

Ian Campbell

2012-Jul-06 13:25 UTC

head link

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

On Fri, 2012-07-06 at 13:53 +0100, George Dunlap wrote:> On 06/07/12 10:54, Dario Faggioli wrote:
> > By introducing a new ''-M'' option to the `xl
vcpu-pin'' command. The actual
> > memory "movement" is achieved suspending the domain to a
ttemporary file and
> > resuming it with the new vcpu-affinity
> Hmm... this will work and be reliable, but it seems a bit clunky.  Long 
> term we want to be able to do node migration in the background without 
> shutting down a VM, right?  If that can be done in the 4.3 timeframe, 
> then it seems unnecessary to implement something like this.
We could do something cleverer for HVM (or hybrid) guests to migrate
pages while the guest is live but migrating a page under a PV guest''s
feet requires quiescing it in the style of a migrate.

We could probably manage to make it such that you just need the
pause/frob/unpause phases without actually writing RAM to disk, and that
infrastructure might be useful for other reasons I suppose. (e.g. I
think bad page offlining currently uses a similar save/restore trick)

Ian.

George Dunlap

2012-Jul-06 13:30 UTC

head link

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

On 06/07/12 14:25, Ian Campbell wrote:> On Fri, 2012-07-06 at 13:53 +0100, George Dunlap wrote:
>> On 06/07/12 10:54, Dario Faggioli wrote:
>>> By introducing a new ''-M'' option to the `xl
vcpu-pin'' command. The actual
>>> memory "movement" is achieved suspending the domain to a
ttemporary file and
>>> resuming it with the new vcpu-affinity
>> Hmm... this will work and be reliable, but it seems a bit clunky.  Long
>> term we want to be able to do node migration in the background without
>> shutting down a VM, right?  If that can be done in the 4.3 timeframe,
>> then it seems unnecessary to implement something like this.
> We could do something cleverer for HVM (or hybrid) guests to migrate
> pages while the guest is live but migrating a page under a PV
guest''s
> feet requires quiescing it in the style of a migrate.
>
> We could probably manage to make it such that you just need the
> pause/frob/unpause phases without actually writing RAM to disk, and that
> infrastructure might be useful for other reasons I suppose. (e.g. I
> think bad page offlining currently uses a similar save/restore trick)Yes, I think doing this chunks at a time is likely to be much 
preferrable to writing the whole thing to disk and reading it out 
again.  Doing it "live" may end up with more total overhead, but it 
wont'' require the VM being actually down for the suspend/resume cycle, 
which on a large guest may be tens of seconds.

  -George

Ian Campbell

2012-Jul-06 13:38 UTC

head link

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

On Fri, 2012-07-06 at 14:30 +0100, George Dunlap wrote:> On 06/07/12 14:25, Ian Campbell wrote:
> > On Fri, 2012-07-06 at 13:53 +0100, George Dunlap wrote:
> >> On 06/07/12 10:54, Dario Faggioli wrote:
> >>> By introducing a new ''-M'' option to the `xl
vcpu-pin'' command. The actual
> >>> memory "movement" is achieved suspending the domain
to a ttemporary file and
> >>> resuming it with the new vcpu-affinity
> >> Hmm... this will work and be reliable, but it seems a bit clunky. 
Long
> >> term we want to be able to do node migration in the background
without
> >> shutting down a VM, right?  If that can be done in the 4.3
timeframe,
> >> then it seems unnecessary to implement something like this.
> > We could do something cleverer for HVM (or hybrid) guests to migrate
> > pages while the guest is live but migrating a page under a PV
guest''s
> > feet requires quiescing it in the style of a migrate.
> >
> > We could probably manage to make it such that you just need the
> > pause/frob/unpause phases without actually writing RAM to disk, and
that
> > infrastructure might be useful for other reasons I suppose. (e.g. I
> > think bad page offlining currently uses a similar save/restore trick)
> Yes, I think doing this chunks at a time is likely to be much 
> preferrable to writing the whole thing to disk and reading it out 
> again.  Doing it "live" may end up with more total overhead, but
it
> wont'' require the VM being actually down for the suspend/resume
cycle,
> which on a large guest may be tens of seconds.
Tim also suggested a PV protocol whereby the tools hint to the guest "if
you XENMEM_exchange this MFN you will get back an MFN which is much
better for your performance".

(at least I think that''s what Tim suggested, I may have misunderstood)

Ian.

Dario Faggioli

2012-Jul-06 13:57 UTC

head link

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

On Fri, 2012-07-06 at 13:53 +0100, George Dunlap wrote:> On 06/07/12 10:54, Dario Faggioli wrote:
> > By introducing a new ''-M'' option to the `xl
vcpu-pin'' command. The actual
> > memory "movement" is achieved suspending the domain to a
ttemporary file and
> > resuming it with the new vcpu-affinity
> Hmm... this will work and be reliable, but it seems a bit clunky.
>If I can ask, the idea or the implementation? :-)
> Long 
> term we want to be able to do node migration in the background without 
> shutting down a VM, right?  
>Definitely, and we also want to do that automatically, according to some
load/performance/whatever run-time measurement, without any user
intervention. However ...
> If that can be done in the 4.3 timeframe, 
> then it seems unnecessary to implement something like this.
>... I think something like this could still be useful.

IOW, I think it would be worthwhile to have both the automatic memory
migration happening in background and something like this explicit,
do-it-all-now, as they serve different purposes.

It''s sort of like in cpu scheduling, where you have free tasks/vcpus
that run wherever the scheduler want, but you also have pinning, if at
some point you want to confine them to some subset of cpus for whatever
reason. Also, as it happens for pinning, you can do it at domain
creation time, but then your might change your mind and you''d need
something to have the system reflecting your actual needs.

So, if you allow me, initial vcpu pinning is like NUMA automatic
placement, and automatic memory migration happening in background is
like "free scheduling", then this feature is like what we get when
running `xl vcpu-pin''. (I hope the comparison helped in clarifying my
view rather than making it even more obscure :-)).

Also, as an example, what happens if you want to create a ne VM and you
have enough memory, but not in a single node because of memory
fragmentation? Ideally, that won''t happen thanks to the automatic
memory
migration, but that''s not guaranteed to be possible (there might be VM
pinned, cpupools, and any sort of stuff), or to happen effectively  and
soon enough (it will be an heuristics after all). In such a situation, I
figure this feature could be a useful tool for a system administrator.

Than again, I''m talking about the feature. The implementation might
well
change, and I really expect being able to migrate memory in background
to help beautifying it (the implementation) quite a bit.

Sorry for writing so much. :-P

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

George Dunlap

2012-Jul-06 14:04 UTC

head link

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

On 06/07/12 14:57, Dario Faggioli wrote:> On Fri, 2012-07-06 at 13:53 +0100, George Dunlap wrote:
>> On 06/07/12 10:54, Dario Faggioli wrote:
>>> By introducing a new ''-M'' option to the `xl
vcpu-pin'' command. The actual
>>> memory "movement" is achieved suspending the domain to a
ttemporary file and
>>> resuming it with the new vcpu-affinity
>> Hmm... this will work and be reliable, but it seems a bit clunky.
>>
> If I can ask, the idea or the implementation? :-)
>
>> Long
>> term we want to be able to do node migration in the background without
>> shutting down a VM, right?
>>
> Definitely, and we also want to do that automatically, according to some
> load/performance/whatever run-time measurement, without any user
> intervention. However ...
>
>> If that can be done in the 4.3 timeframe,
>> then it seems unnecessary to implement something like this.
>>
> ... I think something like this could still be useful.
>
> IOW, I think it would be worthwhile to have both the automatic memory
> migration happening in background and something like this explicit,
> do-it-all-now, as they serve different purposes.Well, I think you can have an explicit, do-it-now mechanism that
doesn''t
require suspending the VM; and in fact, I think such a mechanism would 
be much preferable.

Compare it to vcpu pinning: if you decided you wanted to change the vcpu 
pinning, would you rather just be able to pin the maps while the VM was 
running, and have it take effect immediately (as it does now), or would 
you rather have to suspend and resume the VM, taking it out of 
commission for 10 seconds or so?

I''m not opposed to the -M option meaning, "...and please move the
memory
based on the new nodemask".  But I don''t think implementing it as 
suspend/resume is really that useful, except as a stopgap measure until 
we can get "live" memory migration working.  If we can get that before
4.3, there''s no need for the stopgap measure.

  -George

Dario Faggioli

2012-Jul-06 14:05 UTC

head link

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

On Fri, 2012-07-06 at 14:25 +0100, Ian Campbell wrote: > We could do something cleverer for HVM (or hybrid) guests to migrate
> pages while the guest is live but migrating a page under a PV
guest''s
> feet requires quiescing it in the style of a migrate.
> Which, as we''re talking about the same host, means reaching a point
where I have two full copies of the guest memory in RAM, or not?

(Just trying to figure out if I understood things sensibly).
> We could probably manage to make it such that you just need the
> pause/frob/unpause phases without actually writing RAM to disk, 
>Of course, that was in my e-mail too (it''s about the above that
I''m not
100% sure...).
> and that
> infrastructure might be useful for other reasons I suppose. 
>Definitely.

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

George Dunlap

2012-Jul-06 14:07 UTC

head link

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

On 06/07/12 15:05, Dario Faggioli wrote:> On Fri, 2012-07-06 at 14:25 +0100, Ian Campbell wrote:
>> We could do something cleverer for HVM (or hybrid) guests to migrate
>> pages while the guest is live but migrating a page under a PV
guest''s
>> feet requires quiescing it in the style of a migrate.
>>
> Which, as we''re talking about the same host, means reaching a
point
> where I have two full copies of the guest memory in RAM, or not?
>
> (Just trying to figure out if I understood things sensibly).No, you can do batches at a time, based on how much free memory you 
have.  If you have enough free memory to keep two copies in RAM at a 
time, you might as well.  But even as little as moving 1MB at a time 
would probably be easily achievable and not to much overhead.

  -George

Dario Faggioli

2012-Jul-06 14:14 UTC

head link

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

On Fri, 2012-07-06 at 15:04 +0100, George Dunlap wrote:> > IOW, I think it would be worthwhile to have both the automatic memory
> > migration happening in background and something like this explicit,
> > do-it-all-now, as they serve different purposes.
> Well, I think you can have an explicit, do-it-now mechanism that
doesn''t
> require suspending the VM; and in fact, I think such a mechanism would 
> be much preferable.
> I fully agree on this.
> I''m not opposed to the -M option meaning, "...and please move
the memory
> based on the new nodemask".  But I don''t think implementing
it as
> suspend/resume is really that useful, except as a stopgap measure until 
> we can get "live" memory migration working.  
>I see, then we agree on everything, basically. :-P
> If we can get that before 
> 4.3, there''s no need for the stopgap measure.
> Sure, that was the reason why it was an RFC after all... And it at least
triggered this discussion. :-)

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Campbell

2012-Jul-06 14:42 UTC

head link

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

On Fri, 2012-07-06 at 15:05 +0100, Dario Faggioli wrote:> On Fri, 2012-07-06 at 14:25 +0100, Ian Campbell wrote: 
> > We could do something cleverer for HVM (or hybrid) guests to migrate
> > pages while the guest is live but migrating a page under a PV
guest''s
> > feet requires quiescing it in the style of a migrate.
> > 
> Which, as we''re talking about the same host, means reaching a
point
> where I have two full copies of the guest memory in RAM, or not?
> 
> (Just trying to figure out if I understood things sensibly).
You could move it over in chunks, freeing as you go so if chunk size was
1M you would only need 1M of extra memory, not the full domain size.

Ian.

Xen devel - Jul 2012 - [PATCH 0 of 2 RFC] xl: move domeins among NUMA nodes

[PATCH 0 of 2 RFC] xl: move domeins among NUMA nodes

[PATCH 1 of 2 RFC] xl: parse extra_config options even when restoring

[PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity

Re: [PATCH 2 of 2 RFC] xl: allow for moving the domain''s memory when changing vcpu affinity