thr3ads.net - Xen devel - reliable live migration of large and busy guests [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Olaf Hering

2012-Nov-06 20:28 UTC

reliable live migration of large and busy guests

We got a customer report about long-lasting and then failing live
migration of busy guests.

The guest has 64G memory, is busy with its set of applications and as a
result there will be always dirty pages to transfer. While some of this
can be solved with faster network connection, the underlying issue is
that tools/libxc/xc_domain_save.c:xc_domain_save will suspend a domain
after a given number of iterations to transfer the remaining dirty
pages. From what I understand this pausing of the guest (I dont know how
long it is actually paused) is causing issues within the guest, the
applications start to fail (again, no details).

Their suggestion is to add some knob to the overall live migration
process to avoid the suspend. If the guest could not be transfered with
the parameters passed to xc_domain_save(), abort the migration and let
it running on the old host.


My questions are:
Was such issue ever seen elsewhere?
Should ''xm migrate --live'' and ''xl migrate''
get something like a
--no-suspend option?


Olaf

Keir Fraser

2012-Nov-06 20:45 UTC

head link

Re: reliable live migration of large and busy guests

On 06/11/2012 20:28, "Olaf Hering" <olaf@aepfle.de> wrote:
> We got a customer report about long-lasting and then failing live
> migration of busy guests.
> 
> The guest has 64G memory, is busy with its set of applications and as a
> result there will be always dirty pages to transfer. While some of this
> can be solved with faster network connection, the underlying issue is
> that tools/libxc/xc_domain_save.c:xc_domain_save will suspend a domain
> after a given number of iterations to transfer the remaining dirty
> pages. From what I understand this pausing of the guest (I dont know how
> long it is actually paused) is causing issues within the guest, the
> applications start to fail (again, no details).
> 
> Their suggestion is to add some knob to the overall live migration
> process to avoid the suspend. If the guest could not be transfered with
> the parameters passed to xc_domain_save(), abort the migration and let
> it running on the old host.
> 
> 
> My questions are:
> Was such issue ever seen elsewhere?
It''s known that if you have a workload that is dirtying lots of pages
quickly, the final stop-and-copy phase will necessarily be large. A VM that
is busy dirtying lots of pages can dirty pages much quicker than they can be
transferred over the LAN.
> Should ''xm migrate --live'' and ''xl
migrate'' get something like a
> --no-suspend option?
Well, it is not really possible to avoid the suspend altogether, there is
always going to be some minimal ''dirty working set''. But could
provide
parameters to require the dirty working set to be smaller than X pages
within Y rounds of dirty page copying.

 -- Keir

Olaf Hering

2012-Nov-06 22:18 UTC

head link

Re: reliable live migration of large and busy guests

On Tue, Nov 06, Keir Fraser wrote:
> It''s known that if you have a workload that is dirtying lots of
pages
> quickly, the final stop-and-copy phase will necessarily be large. A VM that
> is busy dirtying lots of pages can dirty pages much quicker than they can
be
> transferred over the LAN.
In my opinion such migration should be done at application level.
> > Should ''xm migrate --live'' and ''xl
migrate'' get something like a
> > --no-suspend option?
> 
> Well, it is not really possible to avoid the suspend altogether, there is
> always going to be some minimal ''dirty working set''. But
could provide
> parameters to require the dirty working set to be smaller than X pages
> within Y rounds of dirty page copying.
Should such knobs be exposed to the tools like x[lm] migrate --knob1 val --knob2
val?

Olaf

Andrew Cooper

2012-Nov-06 23:18 UTC

head link

Re: reliable live migration of large and busy guests

On 06/11/12 22:18, Olaf Hering wrote:> On Tue, Nov 06, Keir Fraser wrote:
>
>> It''s known that if you have a workload that is dirtying lots
of pages
>> quickly, the final stop-and-copy phase will necessarily be large. A VM
that
>> is busy dirtying lots of pages can dirty pages much quicker than they
can be
>> transferred over the LAN.
> In my opinion such migration should be done at application level.
>
>>> Should ''xm migrate --live'' and ''xl
migrate'' get something like a
>>> --no-suspend option?
>> Well, it is not really possible to avoid the suspend altogether, there
is
>> always going to be some minimal ''dirty working set''.
But could provide
>> parameters to require the dirty working set to be smaller than X pages
>> within Y rounds of dirty page copying.
> Should such knobs be exposed to the tools like x[lm] migrate --knob1 val
--knob2 val?
>
> Olaf
We (Citrix) are currently looking at some fairly serious performance
issues with migration with both classic and pvops dom0 kernels (Patches
to follow in due course)

While that will make the situation better, it wont solve the problem you
have described.

As far as I understand (so please correct me if I am wrong), migration
works by transmitting pages until the rate of dirty pages per round
approaches constant, at which point the domain gets paused, all
remaining dirty pages are transmitted.  (With the proviso that currently
there are a maximum number of rounds until an automatic pausing - this
is automatically problematic with increasing guest sizes.)  Having these
knobs tweakable by the admin/toolstack seems like a very sensible idea.

The application problem you described could possibly be something
crashing because of a sufficiently large jump in time?

As potential food for thought:

Is there wisdom in having a new kind of live migrate which, when pausing
the VM on the source host, resumes the VM on the destination host.  Xen
would have to track not-yet-sent pages and pause the guest on pagefault,
and request the required page as a matter of priority.

The advantages of this approach would be that a timing sensitive
workloads would be paused for far less time.  Even if it was frequently
being paused for pagefaults, the time to get a single page over the LAN
would be far quicker than the entire dirty set, at which point on
resume, the interrupt paths would fire again; The timing paths would
quickly become fully populated.  Further to that, a busy workload in the
guest dirtying a page which has already been sent will not result in any
further network traffic.

The disadvantages would be that Xen would need 2 way communication with
the toolstack to prioritise which page is needed to resolve a pagefault,
and presumably the toolstack->toolstack protocol would be more
complicated.  In addition, it would be much harder to "roll back" the
migrate; Once you resume the guest on the destination host, you are
committed to completing the migrate.

I presume there are other issues I have overlooked, but this idea has
literally just occurred to me upon reading this thread so far.  Comments?

~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

Dan Magenheimer

2012-Nov-06 23:41 UTC

head link

Re: reliable live migration of large and busy guests

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Tuesday, November 06, 2012 4:19 PM
> To: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] reliable live migration of large and busy guests
> 
> As potential food for thought:
> 
> Is there wisdom in having a new kind of live migrate which, when pausing
> the VM on the source host, resumes the VM on the destination host.  Xen
> would have to track not-yet-sent pages and pause the guest on pagefault,
> and request the required page as a matter of priority.
> 
> The advantages of this approach would be that a timing sensitive
> workloads would be paused for far less time.  Even if it was frequently
> being paused for pagefaults, the time to get a single page over the LAN
> would be far quicker than the entire dirty set, at which point on
> resume, the interrupt paths would fire again; The timing paths would
> quickly become fully populated.  Further to that, a busy workload in the
> guest dirtying a page which has already been sent will not result in any
> further network traffic.
Something like this?

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.184.2368

Andrew Cooper

2012-Nov-07 13:44 UTC

head link

Re: reliable live migration of large and busy guests

On 06/11/12 23:41, Dan Magenheimer wrote:>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> Sent: Tuesday, November 06, 2012 4:19 PM
>> To: xen-devel@lists.xen.org
>> Subject: Re: [Xen-devel] reliable live migration of large and busy
guests
>>
>> As potential food for thought:
>>
>> Is there wisdom in having a new kind of live migrate which, when
pausing
>> the VM on the source host, resumes the VM on the destination host.  Xen
>> would have to track not-yet-sent pages and pause the guest on
pagefault,
>> and request the required page as a matter of priority.
>>
>> The advantages of this approach would be that a timing sensitive
>> workloads would be paused for far less time.  Even if it was frequently
>> being paused for pagefaults, the time to get a single page over the LAN
>> would be far quicker than the entire dirty set, at which point on
>> resume, the interrupt paths would fire again; The timing paths would
>> quickly become fully populated.  Further to that, a busy workload in
the
>> guest dirtying a page which has already been sent will not result in
any
>> further network traffic.
> Something like this?
>
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.184.2368 
Oh wow - something quite like that.  Thankyou very much.  I will read
the paper in full when I get a free moment, but the abstract looks very
interesting.

From an idealistic point of view, it might be quite nice to have several
live migrate mechanisms, so the user can choose whether they value
minimum downtime, minimum network utilisation, or maximum safety.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

Olaf Hering

2012-Nov-07 14:13 UTC

head link

Re: reliable live migration of large and busy guests

On Tue, Nov 06, Andrew Cooper wrote:
> Is there wisdom in having a new kind of live migrate which, when pausing
> the VM on the source host, resumes the VM on the destination host.  Xen
> would have to track not-yet-sent pages and pause the guest on pagefault,
> and request the required page as a matter of priority.
On the receiving side all missing pages could be handled as "paged"
(just nominating a missing pfn should be enough). A pager can then
request them from the sender host.

Olaf

Dan Magenheimer

2012-Nov-07 15:10 UTC

head link

Re: reliable live migration of large and busy guests

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Subject: Re: [Xen-devel] reliable live migration of large and busy guests
> 
> On 06/11/12 23:41, Dan Magenheimer wrote:
> >> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> >> Sent: Tuesday, November 06, 2012 4:19 PM
> >> To: xen-devel@lists.xen.org
> >> Subject: Re: [Xen-devel] reliable live migration of large and busy
guests
> >>
> >> As potential food for thought:
> >>
> >> Is there wisdom in having a new kind of live migrate which, when
pausing
> >> the VM on the source host, resumes the VM on the destination host.
Xen
> >> would have to track not-yet-sent pages and pause the guest on
pagefault,
> >> and request the required page as a matter of priority.
> >>
> >> The advantages of this approach would be that a timing sensitive
> >> workloads would be paused for far less time.  Even if it was
frequently
> >> being paused for pagefaults, the time to get a single page over
the LAN
> >> would be far quicker than the entire dirty set, at which point on
> >> resume, the interrupt paths would fire again; The timing paths
would
> >> quickly become fully populated.  Further to that, a busy workload
in the
> >> guest dirtying a page which has already been sent will not result
in any
> >> further network traffic.
> > Something like this?
> >
> > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.184.2368
> 
> Oh wow - something quite like that.  Thankyou very much.  I will read
> the paper in full when I get a free moment, but the abstract looks very
> interesting.
Hi Andrew --

FYI, selfballooning is now built into the Linux kernel (since about
summer of 2011, so may not be in many distros yet).  It is currently
tied to tmem (transcendent memory), which is not turned on by
default but if you start developing something like post-copy migration,
let me know.  AFAIK, there is no way to do selfballooning in
Windows (not even in userspace I think, since IIRC, unlike Linux
sysfs, there is no way to adjust the balloon size outside the
kernel... but I know nothing about Windows ;-)
 > From an idealistic point of view, it might be quite nice to have several
> live migrate mechanisms, so the user can choose whether they value
> minimum downtime, minimum network utilisation, or maximum safety.
Agreed.  IIRC, when post-copy was suggested for Xen years ago,
Ian Pratt was against it, though I don''t recall why, so Michael
Hines work was never pursued (outside of academia).  Probably
worth asking IanP before investing too much time into it.

Dan

George Dunlap

2012-Nov-08 10:58 UTC

head link

Re: reliable live migration of large and busy guests

On Wed, Nov 7, 2012 at 3:10 PM, Dan Magenheimer
<dan.magenheimer@oracle.com>wrote:
> > From an idealistic point of view, it might be quite nice to have
several
> > live migrate mechanisms, so the user can choose whether they value
> > minimum downtime, minimum network utilisation, or maximum safety.
>
> Agreed.  IIRC, when post-copy was suggested for Xen years ago,
> Ian Pratt was against it, though I don''t recall why, so Michael
> Hines work was never pursued (outside of academia).  Probably
> worth asking IanP before investing too much time into it.
>

Was he against a hybrid approach, where you "push" things first, and
then
"pull" things later?  Or just against a pure "pull"
approach?  I''m pretty
sure a pure "pull" approach would result in lower performance during
the
migration.

Just tossing another idea out there: What about throttling the VM if
it''s
dirtying too many pages?  You can use the "cap" feature of the credit1
scheduler to reduce the amount of cpu time a given VM gets, even if there
is free cpu time available.  You could play around with doing N iterations,
and then cranking down the cap on each iteration after that; then the
application wouldn''t have a several-second pause, it would just be
"running
slowly" for some period of time.

Overall doing a hybrid "send dirty pages for a while, then move and pull
the rest in" seems like the best approach in the long-run, but
it''s fairly
complicated.  A throttling approach is probably less optimal but simpler to
get working as a temporary measure.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Dan Magenheimer

2012-Nov-12 17:12 UTC

head link

Re: reliable live migration of large and busy guests

> From: George Dunlap [mailto:George.Dunlap@eu.citrix.com]
> Subject: Re: [Xen-devel] reliable live migration of large and busy guests
> 
> On Wed, Nov 7, 2012 at 3:10 PM, Dan Magenheimer
<dan.magenheimer@oracle.com> wrote:
> 
> 
> 	> From an idealistic point of view, it might be quite nice to have
several
> 
> 	> live migrate mechanisms, so the user can choose whether they value
> 	> minimum downtime, minimum network utilisation, or maximum safety.
> 
> 	Agreed.  IIRC, when post-copy was suggested for Xen years ago,
> 	Ian Pratt was against it, though I don''t recall why, so Michael
> 	Hines work was never pursued (outside of academia).  Probably
> 	worth asking IanP before investing too much time into it.
> 
> Was he against a hybrid approach, where you "push" things first,
and then "pull" things later?  Or
> just against a pure "pull" approach?  I''m pretty sure a
pure "pull" approach would result in lower
> performance during the migration.
Sorry, I don''t recall.
 > Just tossing another idea out there: What about throttling the VM if
it''s dirtying too many pages?
> You can use the "cap" feature of the credit1 scheduler to reduce
the amount of cpu time a given VM
> gets, even if there is free cpu time available.  You could play around with
doing N iterations, and
> then cranking down the cap on each iteration after that; then the
application wouldn''t have a several-
> second pause, it would just be "running slowly" for some period
of time.
> 
> Overall doing a hybrid "send dirty pages for a while, then move and
pull the rest in" seems like the
> best approach in the long-run, but it''s fairly complicated.  A
throttling approach is probably less
> optimal but simpler to get working as a temporary measure.
I agree there are lots of interesting hybrid possibilities worth exploring.
There are side-effects to be considered though... for example, the current
push approach is the only one that should be used when the goal of the
migration is to evacuate a physical machine so that it can be powered
off ASAP for maintenance or power management.

Xen devel - Nov 2012 - reliable live migration of large and busy guests

reliable live migration of large and busy guests

Re: reliable live migration of large and busy guests

Re: reliable live migration of large and busy guests

Re: reliable live migration of large and busy guests

Re: reliable live migration of large and busy guests

Re: reliable live migration of large and busy guests

Re: reliable live migration of large and busy guests

Re: reliable live migration of large and busy guests

Re: reliable live migration of large and busy guests

Re: reliable live migration of large and busy guests