We got a customer report about long-lasting and then failing live migration of busy guests. The guest has 64G memory, is busy with its set of applications and as a result there will be always dirty pages to transfer. While some of this can be solved with faster network connection, the underlying issue is that tools/libxc/xc_domain_save.c:xc_domain_save will suspend a domain after a given number of iterations to transfer the remaining dirty pages. From what I understand this pausing of the guest (I dont know how long it is actually paused) is causing issues within the guest, the applications start to fail (again, no details). Their suggestion is to add some knob to the overall live migration process to avoid the suspend. If the guest could not be transfered with the parameters passed to xc_domain_save(), abort the migration and let it running on the old host. My questions are: Was such issue ever seen elsewhere? Should ''xm migrate --live'' and ''xl migrate'' get something like a --no-suspend option? Olaf
On 06/11/2012 20:28, "Olaf Hering" <olaf@aepfle.de> wrote:> We got a customer report about long-lasting and then failing live > migration of busy guests. > > The guest has 64G memory, is busy with its set of applications and as a > result there will be always dirty pages to transfer. While some of this > can be solved with faster network connection, the underlying issue is > that tools/libxc/xc_domain_save.c:xc_domain_save will suspend a domain > after a given number of iterations to transfer the remaining dirty > pages. From what I understand this pausing of the guest (I dont know how > long it is actually paused) is causing issues within the guest, the > applications start to fail (again, no details). > > Their suggestion is to add some knob to the overall live migration > process to avoid the suspend. If the guest could not be transfered with > the parameters passed to xc_domain_save(), abort the migration and let > it running on the old host. > > > My questions are: > Was such issue ever seen elsewhere?It''s known that if you have a workload that is dirtying lots of pages quickly, the final stop-and-copy phase will necessarily be large. A VM that is busy dirtying lots of pages can dirty pages much quicker than they can be transferred over the LAN.> Should ''xm migrate --live'' and ''xl migrate'' get something like a > --no-suspend option?Well, it is not really possible to avoid the suspend altogether, there is always going to be some minimal ''dirty working set''. But could provide parameters to require the dirty working set to be smaller than X pages within Y rounds of dirty page copying. -- Keir
On Tue, Nov 06, Keir Fraser wrote:> It''s known that if you have a workload that is dirtying lots of pages > quickly, the final stop-and-copy phase will necessarily be large. A VM that > is busy dirtying lots of pages can dirty pages much quicker than they can be > transferred over the LAN.In my opinion such migration should be done at application level.> > Should ''xm migrate --live'' and ''xl migrate'' get something like a > > --no-suspend option? > > Well, it is not really possible to avoid the suspend altogether, there is > always going to be some minimal ''dirty working set''. But could provide > parameters to require the dirty working set to be smaller than X pages > within Y rounds of dirty page copying.Should such knobs be exposed to the tools like x[lm] migrate --knob1 val --knob2 val? Olaf
On 06/11/12 22:18, Olaf Hering wrote:> On Tue, Nov 06, Keir Fraser wrote: > >> It''s known that if you have a workload that is dirtying lots of pages >> quickly, the final stop-and-copy phase will necessarily be large. A VM that >> is busy dirtying lots of pages can dirty pages much quicker than they can be >> transferred over the LAN. > In my opinion such migration should be done at application level. > >>> Should ''xm migrate --live'' and ''xl migrate'' get something like a >>> --no-suspend option? >> Well, it is not really possible to avoid the suspend altogether, there is >> always going to be some minimal ''dirty working set''. But could provide >> parameters to require the dirty working set to be smaller than X pages >> within Y rounds of dirty page copying. > Should such knobs be exposed to the tools like x[lm] migrate --knob1 val --knob2 val? > > OlafWe (Citrix) are currently looking at some fairly serious performance issues with migration with both classic and pvops dom0 kernels (Patches to follow in due course) While that will make the situation better, it wont solve the problem you have described. As far as I understand (so please correct me if I am wrong), migration works by transmitting pages until the rate of dirty pages per round approaches constant, at which point the domain gets paused, all remaining dirty pages are transmitted. (With the proviso that currently there are a maximum number of rounds until an automatic pausing - this is automatically problematic with increasing guest sizes.) Having these knobs tweakable by the admin/toolstack seems like a very sensible idea. The application problem you described could possibly be something crashing because of a sufficiently large jump in time? As potential food for thought: Is there wisdom in having a new kind of live migrate which, when pausing the VM on the source host, resumes the VM on the destination host. Xen would have to track not-yet-sent pages and pause the guest on pagefault, and request the required page as a matter of priority. The advantages of this approach would be that a timing sensitive workloads would be paused for far less time. Even if it was frequently being paused for pagefaults, the time to get a single page over the LAN would be far quicker than the entire dirty set, at which point on resume, the interrupt paths would fire again; The timing paths would quickly become fully populated. Further to that, a busy workload in the guest dirtying a page which has already been sent will not result in any further network traffic. The disadvantages would be that Xen would need 2 way communication with the toolstack to prioritise which page is needed to resolve a pagefault, and presumably the toolstack->toolstack protocol would be more complicated. In addition, it would be much harder to "roll back" the migrate; Once you resume the guest on the destination host, you are committed to completing the migrate. I presume there are other issues I have overlooked, but this idea has literally just occurred to me upon reading this thread so far. Comments? ~Andrew> > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Dan Magenheimer
2012-Nov-06 23:41 UTC
Re: reliable live migration of large and busy guests
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com] > Sent: Tuesday, November 06, 2012 4:19 PM > To: xen-devel@lists.xen.org > Subject: Re: [Xen-devel] reliable live migration of large and busy guests > > As potential food for thought: > > Is there wisdom in having a new kind of live migrate which, when pausing > the VM on the source host, resumes the VM on the destination host. Xen > would have to track not-yet-sent pages and pause the guest on pagefault, > and request the required page as a matter of priority. > > The advantages of this approach would be that a timing sensitive > workloads would be paused for far less time. Even if it was frequently > being paused for pagefaults, the time to get a single page over the LAN > would be far quicker than the entire dirty set, at which point on > resume, the interrupt paths would fire again; The timing paths would > quickly become fully populated. Further to that, a busy workload in the > guest dirtying a page which has already been sent will not result in any > further network traffic.Something like this? http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.184.2368
On 06/11/12 23:41, Dan Magenheimer wrote:>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com] >> Sent: Tuesday, November 06, 2012 4:19 PM >> To: xen-devel@lists.xen.org >> Subject: Re: [Xen-devel] reliable live migration of large and busy guests >> >> As potential food for thought: >> >> Is there wisdom in having a new kind of live migrate which, when pausing >> the VM on the source host, resumes the VM on the destination host. Xen >> would have to track not-yet-sent pages and pause the guest on pagefault, >> and request the required page as a matter of priority. >> >> The advantages of this approach would be that a timing sensitive >> workloads would be paused for far less time. Even if it was frequently >> being paused for pagefaults, the time to get a single page over the LAN >> would be far quicker than the entire dirty set, at which point on >> resume, the interrupt paths would fire again; The timing paths would >> quickly become fully populated. Further to that, a busy workload in the >> guest dirtying a page which has already been sent will not result in any >> further network traffic. > Something like this? > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.184.2368Oh wow - something quite like that. Thankyou very much. I will read the paper in full when I get a free moment, but the abstract looks very interesting. From an idealistic point of view, it might be quite nice to have several live migrate mechanisms, so the user can choose whether they value minimum downtime, minimum network utilisation, or maximum safety. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
On Tue, Nov 06, Andrew Cooper wrote:> Is there wisdom in having a new kind of live migrate which, when pausing > the VM on the source host, resumes the VM on the destination host. Xen > would have to track not-yet-sent pages and pause the guest on pagefault, > and request the required page as a matter of priority.On the receiving side all missing pages could be handled as "paged" (just nominating a missing pfn should be enough). A pager can then request them from the sender host. Olaf
Dan Magenheimer
2012-Nov-07 15:10 UTC
Re: reliable live migration of large and busy guests
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com] > Subject: Re: [Xen-devel] reliable live migration of large and busy guests > > On 06/11/12 23:41, Dan Magenheimer wrote: > >> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com] > >> Sent: Tuesday, November 06, 2012 4:19 PM > >> To: xen-devel@lists.xen.org > >> Subject: Re: [Xen-devel] reliable live migration of large and busy guests > >> > >> As potential food for thought: > >> > >> Is there wisdom in having a new kind of live migrate which, when pausing > >> the VM on the source host, resumes the VM on the destination host. Xen > >> would have to track not-yet-sent pages and pause the guest on pagefault, > >> and request the required page as a matter of priority. > >> > >> The advantages of this approach would be that a timing sensitive > >> workloads would be paused for far less time. Even if it was frequently > >> being paused for pagefaults, the time to get a single page over the LAN > >> would be far quicker than the entire dirty set, at which point on > >> resume, the interrupt paths would fire again; The timing paths would > >> quickly become fully populated. Further to that, a busy workload in the > >> guest dirtying a page which has already been sent will not result in any > >> further network traffic. > > Something like this? > > > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.184.2368 > > Oh wow - something quite like that. Thankyou very much. I will read > the paper in full when I get a free moment, but the abstract looks very > interesting.Hi Andrew -- FYI, selfballooning is now built into the Linux kernel (since about summer of 2011, so may not be in many distros yet). It is currently tied to tmem (transcendent memory), which is not turned on by default but if you start developing something like post-copy migration, let me know. AFAIK, there is no way to do selfballooning in Windows (not even in userspace I think, since IIRC, unlike Linux sysfs, there is no way to adjust the balloon size outside the kernel... but I know nothing about Windows ;-)> From an idealistic point of view, it might be quite nice to have several > live migrate mechanisms, so the user can choose whether they value > minimum downtime, minimum network utilisation, or maximum safety.Agreed. IIRC, when post-copy was suggested for Xen years ago, Ian Pratt was against it, though I don''t recall why, so Michael Hines work was never pursued (outside of academia). Probably worth asking IanP before investing too much time into it. Dan
On Wed, Nov 7, 2012 at 3:10 PM, Dan Magenheimer <dan.magenheimer@oracle.com>wrote:> > From an idealistic point of view, it might be quite nice to have several > > live migrate mechanisms, so the user can choose whether they value > > minimum downtime, minimum network utilisation, or maximum safety. > > Agreed. IIRC, when post-copy was suggested for Xen years ago, > Ian Pratt was against it, though I don''t recall why, so Michael > Hines work was never pursued (outside of academia). Probably > worth asking IanP before investing too much time into it. >Was he against a hybrid approach, where you "push" things first, and then "pull" things later? Or just against a pure "pull" approach? I''m pretty sure a pure "pull" approach would result in lower performance during the migration. Just tossing another idea out there: What about throttling the VM if it''s dirtying too many pages? You can use the "cap" feature of the credit1 scheduler to reduce the amount of cpu time a given VM gets, even if there is free cpu time available. You could play around with doing N iterations, and then cranking down the cap on each iteration after that; then the application wouldn''t have a several-second pause, it would just be "running slowly" for some period of time. Overall doing a hybrid "send dirty pages for a while, then move and pull the rest in" seems like the best approach in the long-run, but it''s fairly complicated. A throttling approach is probably less optimal but simpler to get working as a temporary measure. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dan Magenheimer
2012-Nov-12 17:12 UTC
Re: reliable live migration of large and busy guests
> From: George Dunlap [mailto:George.Dunlap@eu.citrix.com] > Subject: Re: [Xen-devel] reliable live migration of large and busy guests > > On Wed, Nov 7, 2012 at 3:10 PM, Dan Magenheimer <dan.magenheimer@oracle.com> wrote: > > > > From an idealistic point of view, it might be quite nice to have several > > > live migrate mechanisms, so the user can choose whether they value > > minimum downtime, minimum network utilisation, or maximum safety. > > Agreed. IIRC, when post-copy was suggested for Xen years ago, > Ian Pratt was against it, though I don''t recall why, so Michael > Hines work was never pursued (outside of academia). Probably > worth asking IanP before investing too much time into it. > > Was he against a hybrid approach, where you "push" things first, and then "pull" things later? Or > just against a pure "pull" approach? I''m pretty sure a pure "pull" approach would result in lower > performance during the migration.Sorry, I don''t recall.> Just tossing another idea out there: What about throttling the VM if it''s dirtying too many pages? > You can use the "cap" feature of the credit1 scheduler to reduce the amount of cpu time a given VM > gets, even if there is free cpu time available. You could play around with doing N iterations, and > then cranking down the cap on each iteration after that; then the application wouldn''t have a several- > second pause, it would just be "running slowly" for some period of time. > > Overall doing a hybrid "send dirty pages for a while, then move and pull the rest in" seems like the > best approach in the long-run, but it''s fairly complicated. A throttling approach is probably less > optimal but simpler to get working as a temporary measure.I agree there are lots of interesting hybrid possibilities worth exploring. There are side-effects to be considered though... for example, the current push approach is the only one that should be used when the goal of the migration is to evacuate a physical machine so that it can be powered off ASAP for maintenance or power management.