Hi, With xen-unstable changset 10333:360f9dc71f51, live migration is not reliable. Migrating an active domain (I use a kernel build in my test) back and forth between two machines will result in the build or the domain crashing. I tweaked xc_linux_save.c to enable the verify pass without outputting all the debugging messages and I can see that one or two pages do not get a data match in the log. I have yet to see a failure of the domain with non-live migration, but I sometimes see a data mismatch on a page during the verification. Which would indicate that either suspend doesn''t mean what I think it does or pages of a suspended VM are being altered when they shouldn''t be. So, I guess I''ll start with the easy question: should non-live migration ever have a page fail to verify? If not, how can I identify the source of the problem? The harder question: how to identify the source of the corruption in live migration? Thanks, John Byrne _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I should have made clear I am testing on x86_64. John John Byrne wrote:> > Hi, > > With xen-unstable changset 10333:360f9dc71f51, live migration is not > reliable. Migrating an active domain (I use a kernel build in my test) > back and forth between two machines will result in the build or the > domain crashing. I tweaked xc_linux_save.c to enable the verify pass > without outputting all the debugging messages and I can see that one or > two pages do not get a data match in the log. > > I have yet to see a failure of the domain with non-live migration, but I > sometimes see a data mismatch on a page during the verification. Which > would indicate that either suspend doesn''t mean what I think it does or > pages of a suspended VM are being altered when they shouldn''t be. > > So, I guess I''ll start with the easy question: should non-live migration > ever have a page fail to verify? If not, how can I identify the source > of the problem? > > The harder question: how to identify the source of the corruption in > live migration? > > Thanks, > > John Byrne >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14 Jun 2006, at 01:32, John Byrne wrote:> So, I guess I''ll start with the easy question: should non-live > migration ever have a page fail to verify? If not, how can I identify > the source of the problem?They are probably pages shared with backend drivers (xenstore, blkback, netback, etc.). Since domain teardown is asynchronous, those backend drivers may still have those pages mapped and be able to update them while the save is in progress. It''s harmless (but of course false positives on the verify test are rather annoying!). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel