Andreas Olsowski
2011-Aug-19 17:56 UTC
[Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
I have 2 servers, both were installed with Debian 6.0.2 stable(squeeze). I took the xen-4.1.1.tar.gz and the very latest xen/stable-2.6.32.x from jeremys git. For dom0 .config i used one that was derived from the ones suggested on the pvops wiki page. It has worked fine before. For domU i use 3 different kernels, a 2.6.39 one, that is running fine on ~80 paravirtualized guests in my production envrionment. Also the lastest 3.0.3 and 3.1-rc2 from the kernel.org git. The config was updated for them via make oldconfig at different times. 3.0.3 has explicitly has DEBUG symbols in it, the others dont. I made damn sure my two test servers where as close to identical as they can possbily get. Everything installed by make install-xen and make install-tools is binary identical. The kernels have been copied over via scp. (scp /boot/*2.6.32.45* ...) It all boils down to this: BUG: unable to handle kernel paging request at ... This happens when i migrate one of my 3 test virtual machines (testvm-2.6 testvm-3.0 and testvm-3.1) from host1 to host2. host1 is called xenturio1, host2 is called tarballerina. config-2.6.32.45-xen0: http://pastebin.com/DLC3BcCF config-2.6.39-xenU: http://pastebin.com/r5KBpumE config-3.0.3-xenU: http://pastebin.com/DDjrYANv config-3.1-rc2-xenU+: http://pastebin.com/tWbt16yR sytem information on host1 and host2: http://pastebin.com/zs89a1rQ (cpuinfo, xl info, uname -a, md5sums of xen and kernel) Here come the logs: testvm-2.6@host1 to host2: xl console: http://pastebin.com/mUKugaYu vm-state after migration "r-----" xenctx: http://pastebin.com/viQzfwT1 testvm-3.0@host1 to host2: xl console: http://pastebin.com/iswQFN2a vm-state after migration "r-----" xenctx: http://pastebin.com/8VdSUrYd testvm-3.1@host1 to host2: xl console: did not produce any output vm-state after migration "---sc-" xenctx: http://pastebin.com/ymT0Rxhz xl-testvm.*.log output after killing them: http://pastebin.com/0L4905ft testvm-2.6@host2 to host1: xl console: http://pastebin.com/nNqUeJNR vm-state after migration "-b----" xenctx: http://pastebin.com/gfAVWe2v testvm-3.0@host2 to host1: xl console: did not produce any output vm-state after migration "-b----" xenctx: http://pastebin.com/nPiTTLEz testvm-3.1@host2 to host1: xl console: http://pastebin.com/3tqB4Zet vm-state after migration "-b----" xenctx: http://pastebin.com/bBtxePmr xl-testvm.*.log output after killing them: http://pastebin.com/3i4XzFsv Local migration works (migrate to localhost). I first encountered this on servers running 4.2 where one of 3 hosts could not migrate machines that have been created on it. As usual: input is greatly appreciated. If you want me to try any other kernel .config entries or want some different output tell me what exactly you would like to see and i will provides them a.s.a.p. If you are running xen4.1.1 with 2.6.32-jeremy kernels and you dont experience this problem, i would like to have your dom0 and domU .config files so i can test them. With best regards -- Andreas Olsowski _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Aug-20 02:37 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
I have tested linux 3.0.3 as dom0 kernel now and it has the same problem. Migration of HVM also does not work and the kernel of the HVM shows the same output as my PV domUs. I took another look at my dom0 kernel .config after make oldconfig''ing it for 3.0.3. I know have every possible XEN flag set in the kernel: http://pastebin.com/YxB8mkSU Next i will check xen4.2, maybe the results are different. -- Andreas Olsowski Leuphana Universität Lüneburg Rechen- und Medienzentrum Scharnhorststraße 1, C7.015 21335 Lüneburg Tel: ++49 4131 677 1309 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Aug-20 03:49 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
Am 20.08.2011 04:37, schrieb Andreas Olsowski:> Next i will check xen4.2, maybe the results are different.No they are not. But while trying to find the last 2.6.32.x kernel to boot bare-metal i found out, that this migration problem does NOT exist in 2.6.32.43! I will try 2.6.32.44 and 2.6.33.42 tomorrow , too tired now. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Aug-22 07:32 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
>>> On 19.08.11 at 19:56, Andreas Olsowski <andreas.olsowski@leuphana.de> wrote: > I have 2 servers, both were installed with Debian 6.0.2 stable(squeeze). > > I took the xen-4.1.1.tar.gz and the very latest xen/stable-2.6.32.x from > jeremys git. > > For dom0 .config i used one that was derived from the ones suggested on > the pvops wiki page. It has worked fine before. > > For domU i use 3 different kernels, a 2.6.39 one, that is running fine > on ~80 paravirtualized guests in my production envrionment. > Also the lastest 3.0.3 and 3.1-rc2 from the kernel.org git. > > The config was updated for them via make oldconfig at different times. > 3.0.3 has explicitly has DEBUG symbols in it, the others dont. > > > I made damn sure my two test servers where as close to identical as they > can possbily get. > Everything installed by make install-xen and make install-tools is > binary identical. > The kernels have been copied over via scp. (scp /boot/*2.6.32.45* ...) > > > It all boils down to this: > BUG: unable to handle kernel paging request at ... > > This happens when i migrate one of my 3 test virtual machines > (testvm-2.6 testvm-3.0 and testvm-3.1) from host1 to host2. > host1 is called xenturio1, host2 is called tarballerina. > > config-2.6.32.45-xen0: > http://pastebin.com/DLC3BcCF > > config-2.6.39-xenU: > http://pastebin.com/r5KBpumE > > config-3.0.3-xenU: > http://pastebin.com/DDjrYANv > > config-3.1-rc2-xenU+: > http://pastebin.com/tWbt16yR > > sytem information on host1 and host2: > http://pastebin.com/zs89a1rQ > (cpuinfo, xl info, uname -a, md5sums of xen and kernel)Does it also fail the other way round (host2 -> host1)? If not, your issue is likely fixed with 23102:1c7b601b1b35 on 4.1-testing (and with you posting on xen-devel rather than xen-users I would really have expected that you would have looked for similar reports or eventual fixes before complaining). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Aug-22 13:56 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
> Does it also fail the other way round (host2 -> host1)? If not, your > issue is likely fixed with 23102:1c7b601b1b35 on 4.1-testing (and > with you posting on xen-devel rather than xen-users I would really > have expected that you would have looked for similar reports or > eventual fixes before complaining).It did happen host2->host1 and host1->host2 with xen4.1.1- I did set up 2 servers with identical hardware now and in fact i dont have any problems with them migrating machines. I went on to upgrade all 3 servers (2x 32gb 1x96gb) to xen-4.1-testing. Now i can migrate 32gbhost->32gbhost and 32gbhost->96gbhost but 96gbhost->32gbhost still fails. BUG: unable to handle kernel paging request at fffffffffffffff8 with 2.6.39 and 3.1 guest kernels, 3.0 didnt produce any output on its tty0 anymore. This issue may be a little more then a memory size mismatch, since i have 3 servers running xen4.2 with 96gb ram two Dell R610s and one R710, where the R610s can migrate guests between each other just fine. They can also migrate to the R710 and back. But a host created on the R710 cant be migrated to a R610. The same exact thing happens with 4.1-testing. A guest created on a 32gb host can be migrated to the 96gb host and back to any 32gb host. But a guest created on the 96gb host can not be migrated to a 32gb host. Here is my server list: host1: Dell PE2950 32GB RAM 4.1.1/4.1-testing/4.2 available for testing host2: Dell PE2950 32GB RAM 4.1.1/4.1-testing/4.2 available for testing host3: Dell R710 96B RAM 4.1.1/4.1-testing/4.2 available for testing host4: Dell R610 96B RAM xen4.2 host4: Dell R610 96B RAM xen4.2 -- Andreas Olsowski _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Aug-24 20:34 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
On Mon, Aug 22, 2011 at 03:56:43PM +0200, Andreas Olsowski wrote:> >Does it also fail the other way round (host2 -> host1)? If not, your > >issue is likely fixed with 23102:1c7b601b1b35 on 4.1-testing (and > >with you posting on xen-devel rather than xen-users I would really > >have expected that you would have looked for similar reports or > >eventual fixes before complaining). > > It did happen host2->host1 and host1->host2 with xen4.1.1- > > I did set up 2 servers with identical hardware now and in fact i > dont have any problems with them migrating machines. > > I went on to upgrade all 3 servers (2x 32gb 1x96gb) to xen-4.1-testing.Did you check that your xen-4.1-testing had the patch above?> > Now i can migrate 32gbhost->32gbhost and 32gbhost->96gbhost but > 96gbhost->32gbhost still fails. > > BUG: unable to handle kernel paging request at fffffffffffffff8 > with 2.6.39 and 3.1 guest kernels, 3.0 didnt produce any output on > its tty0 anymore. > > > This issue may be a little more then a memory size mismatch, since i > have 3 servers running xen4.2 with 96gb ram two Dell R610s and one > R710, where the R610s can migrate guests between each other just > fine. > They can also migrate to the R710 and back. > But a host created on the R710 cant be migrated to a R610. > > The same exact thing happens with 4.1-testing. > A guest created on a 32gb host can be migrated to the 96gb host and > back to any 32gb host. > But a guest created on the 96gb host can not be migrated to a 32gb host.Which sounds like the patch above should have fixed. Again, did you check your binary and source tree to see if you have the mentioned patch? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Aug-25 07:15 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
> Which sounds like the patch above should have fixed. Again, did you> check your binary and source tree to see if you have the mentioned > patch? Yes at the time i tested it the patch was in 4.1-testing and 4.2, so i do have the patch applied. It had the desired effect, i can migrate TO the host with more RAM, but i still cannot migrate guests that have been created on it FROM that host. I can however create the guest somewhere else, migrate it there and then migrate it back. -- Andreas Olsowski _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Aug-26 15:00 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
On Thu, Aug 25, 2011 at 09:15:14AM +0200, Andreas Olsowski wrote:> > > Which sounds like the patch above should have fixed. Again, did you > > check your binary and source tree to see if you have the mentioned > > patch? > > Yes at the time i tested it the patch was in 4.1-testing and 4.2, so > i do have the patch applied. It had the desired effect, i can > migrate TO the host with more RAM, but i still cannot migrate guests > that have been created on it FROM that host.Ok, so you do have a workaround for that right now - and we kind of know that is something still amiss with the MFN calculations when migrating. My todo list is not getting any shorter sadly so not sure when I will get to try this out. But let me do that when I get my 32GB machine working again.> > I can however create the guest somewhere else, migrate it there and > then migrate it back.Yeah, that really points to either the tools not liking the MFN being too high or the hypervisor. Or the save/resume path in the Linux kernel is failing silently and sticking in invalid MFNs as it can''t deal with higher MFNs. In other words - need to run this to figure out. Unless you are up for helping out by debugging the code a bit and seeing if you can come with a fix? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Aug-26 17:26 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
> My todo list is not getting any shorter sadly so not sure when I will > get to try this out. But let me do that when I get my 32GB machine > working again.It would certainly be interesting to know if you experience the same thing on your platforms. This may or may not have sth to do with the hardware in play.> > Yeah, that really points to either the tools not liking the > MFN being too high or the hypervisor. Or the save/resume path in the > Linux kernel is failing silently and sticking in invalid MFNs > as it can''t deal with higher MFNs. > > In other words - need to run this to figure out. > > Unless you are up for helping out by debugging the code a bit and > seeing if you can come with a fix?Allthough i am willing, i probably wont be able to, since i lack the neccessary understanding of the low level workings of Xen and i am not very experienced at debugging C code/programs. However i did some additional testing, this time with xen4.2 and things have gotten worse: The two servers involved do BOTH have 96GB ram and are both running the latest xen 4.2 but are of different hardware (R710 and R610): http://pastebin.com/AaSpWZdg And this is happens when i throw a 32GB server (PE2950) in the mix: http://pastebin.com/7X8t022R So with 4.2 there are still migration errors, but whats worse, now i cant migrate anything anywhere anymore when the platform is different. Within the same platform everything works fine (2x R610): http://pastebin.com/ZWByjjY5 What is going on here? Could this be a xl toolstack problem after all? And why does half of it work in 4.1 and not with 4.2?? Stuff like: xc: error: Failed to pin batch of 493 page tables (22 = Invalid argument): Internal error and xc: error: Couldn''t set eXtended States for vcpu0 (22 = Invalid argument): Internal error look like some simple to debug errors. There is still one more thing left to test: xen 4.1-testing on a R610. For that i have to migrate the guests away to the other R610. I probably will get around to do it this weekend or at least on monday. Ill just reply my findings to this email once i have them. It would seem you are overloaded with too many different things, i hope you still find some time to relax and i am sorry for adding more stuff to your list. I will focus my future testing solely on 4.1-testing, just thought checking out 4.2 may help me understand ... instead i am even more confused. Have a nice weekend. With best regards Andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Aug-29 19:49 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
On Fri, Aug 26, 2011 at 07:26:29PM +0200, Andreas Olsowski wrote:> >My todo list is not getting any shorter sadly so not sure when I will > >get to try this out. But let me do that when I get my 32GB machine > >working again. > It would certainly be interesting to know if you experience the same > thing on your platforms. This may or may not have sth to do with the > hardware in play.OK, got my box online. Getting closer to trying to reproduce the problem.> > > > > >Yeah, that really points to either the tools not liking the > >MFN being too high or the hypervisor. Or the save/resume path in the > >Linux kernel is failing silently and sticking in invalid MFNs > >as it can''t deal with higher MFNs. > > > >In other words - need to run this to figure out. > > > >Unless you are up for helping out by debugging the code a bit and > >seeing if you can come with a fix? > > Allthough i am willing, i probably wont be able to, since i lack the > neccessary understanding of the low level workings of Xen and i am > not very experienced at debugging C code/programs.OK.> > > However i did some additional testing, this time with xen4.2 and > things have gotten worse:Yeah, xen-unstable past c/s 23379 is doing a lot of weird stuff for me.> > The two servers involved do BOTH have 96GB ram and are both running > the latest xen 4.2 but are of different hardware (R710 and R610): > http://pastebin.com/AaSpWZdg > > And this is happens when i throw a 32GB server (PE2950) in the mix: > http://pastebin.com/7X8t022R > > So with 4.2 there are still migration errors, but whats worse, now i > cant migrate anything anywhere anymore when the platform is > different. > > Within the same platform everything works fine (2x R610): > http://pastebin.com/ZWByjjY5 > > What is going on here?<sigh> Development - and not all developers test everything in the mix. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Aug-31 13:07 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
A little update, i now have all machines running on xen-4.1-testing with xen/stable-2.6.32.x That gave me the possiblity for additional tests. (I also tested xm/xend in addtion to xl/libxl, to make sure its not a xl/libxl problem.) I took the liberty to create a new test result matrix that should provide a better overview (in case someone else wants to get the whole picture): #################################################################### ##### xen 4.1 live migration fails between different platforms ##### #################################################################### XEN: xen-4.1-testing.hg dom0: xen/stable-2.6.32.x domU: linux-2.6.39 vanilla (also 3.0.3 and 3.1) toolstack: xl/libxl (at least FAIL type1 also occurs with xm/xend) # create means the guest has been created by this host # received means the guest has been migrate-received by this host XEN: xen-4.1-testing.hg dom0: xen/stable-2.6.32.x domU: linux-2.6.39 vanilla (also 3.0.3 and 3.1) toolstack: xl/libxl (at least FAIL type1 also occurs with xm/xend) # Dell PE 2950 and Dell PE 2950 create pe2950-1 -> pe2950-2 OK received pe2950-2 -> pe2950-1 OK create pe2950-2 -> pe2950-1 OK received pe2950-1 -> pe2950-2 OK # Dell PE 2950 and Dell R710 create pe2950-1 -> r710 OK received r710 -> pe2950-1 OK create r710 -> pe2950-1 FAIL (type 1): http://pastebin.com/iUeNPQyY # Dell PE 2950 and Dell R610 create pe2950-1 -> r610-1 FAIL (type 2): http://pastebin.com/fzMkuS5s create r610-1 -> pe2950-1 FAIL (type 1): http://pastebin.com/Lq6SGVPj # Dell R610 and Dell R610 create r610-1 -> r610-2 OK received r610-2 -> r610-1 OK create r610-2 -> r610-1 OK received r610-1 -> r610-2 OK # Dell R610 and Dell R710 create r610-1 -> r710 OK received r710 -> r610-1 OK create r710 -> r610-1 FAIL (type 2): http://pastebin.com/eff5Yx0C # Dell PE 2950 and Dell R710 and Dell R610 create pe2950-2 -> r710 OK received r710 -> r610 FAIL (type 2): http://pastebin.com/it7QPsJk create r610 -> r710 OK received r710 -> pe2950-2 FAIL (type 1 derived?): http://pastebin.com/R6pXSJpU #EOF with best regards Andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Sep-07 13:50 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
On Wed, Aug 31, 2011 at 03:07:22PM +0200, Andreas Olsowski wrote:> A little update, i now have all machines running on xen-4.1-testing > with xen/stable-2.6.32.x > That gave me the possiblity for additional tests. > > (I also tested xm/xend in addtion to xl/libxl, to make sure its not > a xl/libxl problem.) > > I took the liberty to create a new test result matrix that should > provide a better overview (in case someone else wants to get the > whole picture):So.. I don''t think the issue I am seeing is exactly the same. This is what ''xl'' gives me: :~/> xl migrate 3 tst010root@tst010''s password: migration target: Ready to receive domain. Saving to migration stream new xl format (info 0x0/0x0/326) Loading new save file incoming migration stream (new xl fmt info 0x0/0x0/326) Savefile contains xl domain config xc: Saving memory: iter 0 (last sent 0 skipped 0): 262400/262400 100% xc: Saving memory: iter 2 (last sent 1105 skipped 23): 262400/262400 100% xc: Saving memory: iter 3 (last sent 74 skipped 0): 262400/262400 100% xc: Saving memory: iter 4 (last sent 0 skipped 0): 262400/262400 100% xc: error: unexpected PFN mapping failure pfn 19d0 map_mfn 4e7e04 p2m_mfn 4e7e04: Internal error libxl: error: libxl_dom.c:363:libxl__domain_restore_common: restoring domain: Resource temporarily unavailable libxl: error: libxl_create.c:483:do_domain_create: cannot (re-)build domain: -3 libxl: error: libxl.c:733:libxl_domain_destroy: non-existant domain 4 migration target: Domain creation failed (code -3). libxl: error: libxl_utils.c:410:libxl_read_exactly: file/stream truncated reading ready message from migration receiver stream libxl: info: libxl_exec.c:125:libxl_report_child_exitstatus: migration target process [5810] exited with error status 3 Migration failed, resuming at sender. And on the receiving side (tst010) I get a monster off: (XEN) mm.c:945:d0 Error getting mfn 4e7e04 (pfn ffffffffffffffff) from L1 entry 80000004e7e04627 for l1e_owner=0, pg_owner=4 XEN) mm.c:945:d0 Error getting mfn 36fd19 (pfn ffffffffffffffff) from L1 entry 800000036fd19627 for l1e_owner=0, pg_owner=4 (XEN) mm.c:945:d0 Error getting mfn 36f583 (pfn ffffffffffffffff) from L1 entry 800000036f583627 for l1e_owner=0, pg_owner=4 .. (XEN) mm.c:945:d0 Error getting mfn 4e7d09 (pfn ffffffffffffffff) from L1 entry 80000004e7d09627 for l1e_owner=0, pg_owner=4 (XEN) event_channel.c:250:d3 EVTCHNOP failure: error -17 The migration is from a 4GB box to a 32GB box (worked), then back to the 4GB( worked) and then back to the 32GB (boom!). anyhow, let me try this with 4.1-testing branch. Running on the bleeding edge might not be the best idea sometimes.> > #################################################################### > ##### xen 4.1 live migration fails between different platforms ##### > #################################################################### > XEN: xen-4.1-testing.hg > dom0: xen/stable-2.6.32.x > domU: linux-2.6.39 vanilla (also 3.0.3 and 3.1) > > toolstack: xl/libxl > (at least FAIL type1 also occurs with xm/xend) > > # create means the guest has been created by this host > # received means the guest has been migrate-received by this host > > XEN: xen-4.1-testing.hg > dom0: xen/stable-2.6.32.x > domU: linux-2.6.39 vanilla (also 3.0.3 and 3.1) > > toolstack: xl/libxl > (at least FAIL type1 also occurs with xm/xend) > > > # Dell PE 2950 and Dell PE 2950 > create pe2950-1 -> pe2950-2 OK > received pe2950-2 -> pe2950-1 OK > create pe2950-2 -> pe2950-1 OK > received pe2950-1 -> pe2950-2 OK > > # Dell PE 2950 and Dell R710 > create pe2950-1 -> r710 OK > received r710 -> pe2950-1 OK > create r710 -> pe2950-1 FAIL (type 1): http://pastebin.com/iUeNPQyY > > # Dell PE 2950 and Dell R610 > create pe2950-1 -> r610-1 FAIL (type 2): http://pastebin.com/fzMkuS5s > create r610-1 -> pe2950-1 FAIL (type 1): http://pastebin.com/Lq6SGVPj > > # Dell R610 and Dell R610 > create r610-1 -> r610-2 OK > received r610-2 -> r610-1 OK > > create r610-2 -> r610-1 OK > received r610-1 -> r610-2 OK > > # Dell R610 and Dell R710 > create r610-1 -> r710 OK > received r710 -> r610-1 OK > > create r710 -> r610-1 FAIL (type 2): http://pastebin.com/eff5Yx0C > > # Dell PE 2950 and Dell R710 and Dell R610 > create pe2950-2 -> r710 OK > received r710 -> r610 FAIL (type 2): http://pastebin.com/it7QPsJk > > create r610 -> r710 OK > received r710 -> pe2950-2 FAIL (type 1 derived?): > http://pastebin.com/R6pXSJpU > > #EOF > > with best regards > > Andreas >> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Sep-08 17:32 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
On Wed, Sep 07, 2011 at 09:50:47AM -0400, Konrad Rzeszutek Wilk wrote:> On Wed, Aug 31, 2011 at 03:07:22PM +0200, Andreas Olsowski wrote: > > A little update, i now have all machines running on xen-4.1-testing > > with xen/stable-2.6.32.x > > That gave me the possiblity for additional tests. > > > > (I also tested xm/xend in addtion to xl/libxl, to make sure its not > > a xl/libxl problem.) > > > > I took the liberty to create a new test result matrix that should > > provide a better overview (in case someone else wants to get the > > whole picture): > > So.. I don''t think the issue I am seeing is exactly the same. This is > what ''xl'' gives me:Scratch that. I am seeing the error below if I: 1) Create guest on 4GB machine 2) Migrate it to the 32GB box (guest still works) 3) Migrate it to the 4GB box (guest dies - error below shows up and guest is dead). With 3.1-rc5 virgin - both Dom0 and DomU. Also Xen 4.1-testing on top of this. I tried just creating a guest on the 32GB and migrating it - and while it did migrate it was stuck in a hypercall_page call or crashed later on. Andreas, Thanks for reporting this.> > :~/ > > xl migrate 3 tst010 > root@tst010''s password: > migration target: Ready to receive domain. > Saving to migration stream new xl format (info 0x0/0x0/326) > Loading new save file incoming migration stream (new xl fmt info 0x0/0x0/326) > Savefile contains xl domain config > xc: Saving memory: iter 0 (last sent 0 skipped 0): 262400/262400 100% > xc: Saving memory: iter 2 (last sent 1105 skipped 23): 262400/262400 100% > xc: Saving memory: iter 3 (last sent 74 skipped 0): 262400/262400 100% > xc: Saving memory: iter 4 (last sent 0 skipped 0): 262400/262400 100% > xc: error: unexpected PFN mapping failure pfn 19d0 map_mfn 4e7e04 p2m_mfn 4e7e04: Internal error > libxl: error: libxl_dom.c:363:libxl__domain_restore_common: restoring domain: Resource temporarily unavailable > libxl: error: libxl_create.c:483:do_domain_create: cannot (re-)build domain: -3 > libxl: error: libxl.c:733:libxl_domain_destroy: non-existant domain 4 > migration target: Domain creation failed (code -3). > libxl: error: libxl_utils.c:410:libxl_read_exactly: file/stream truncated reading ready message from migration receiver stream > libxl: info: libxl_exec.c:125:libxl_report_child_exitstatus: migration target process [5810] exited with error status 3 > Migration failed, resuming at sender. > > > And on the receiving side (tst010) I get a monster off: > > (XEN) mm.c:945:d0 Error getting mfn 4e7e04 (pfn ffffffffffffffff) from L1 entry 80000004e7e04627 for l1e_owner=0, pg_owner=4 > XEN) mm.c:945:d0 Error getting mfn 36fd19 (pfn ffffffffffffffff) from L1 entry 800000036fd19627 for l1e_owner=0, pg_owner=4 > (XEN) mm.c:945:d0 Error getting mfn 36f583 (pfn ffffffffffffffff) from L1 entry 800000036f583627 for l1e_owner=0, pg_owner=4 > .. > (XEN) mm.c:945:d0 Error getting mfn 4e7d09 (pfn ffffffffffffffff) from L1 entry 80000004e7d09627 for l1e_owner=0, pg_owner=4 > (XEN) event_channel.c:250:d3 EVTCHNOP failure: error -17 > > > The migration is from a 4GB box to a 32GB box (worked), then back to the 4GB( worked) > and then back to the 32GB (boom!). > > anyhow, let me try this with 4.1-testing branch. Running on the bleeding > edge might not be the best idea sometimes._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Sep-08 18:12 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
On Thu, Sep 08, 2011 at 01:32:12PM -0400, Konrad Rzeszutek Wilk wrote:> On Wed, Sep 07, 2011 at 09:50:47AM -0400, Konrad Rzeszutek Wilk wrote: > > On Wed, Aug 31, 2011 at 03:07:22PM +0200, Andreas Olsowski wrote: > > > A little update, i now have all machines running on xen-4.1-testing > > > with xen/stable-2.6.32.x > > > That gave me the possiblity for additional tests. > > > > > > (I also tested xm/xend in addtion to xl/libxl, to make sure its not > > > a xl/libxl problem.) > > > > > > I took the liberty to create a new test result matrix that should > > > provide a better overview (in case someone else wants to get the > > > whole picture): > > > > So.. I don''t think the issue I am seeing is exactly the same. This is > > what ''xl'' gives me: > > Scratch that. I am seeing the error below if I: > > 1) Create guest on 4GB machine > 2) Migrate it to the 32GB box (guest still works) > 3) Migrate it to the 4GB box (guest dies - error below shows up and > guest is dead). > > With 3.1-rc5 virgin - both Dom0 and DomU. Also Xen 4.1-testing on top of this. > > I tried just creating a guest on the 32GB and migrating it - and while > it did migrate it was stuck in a hypercall_page call or crashed later on. > > Andreas, > > Thanks for reporting this.Oh wait. At some point you said that 2.6.32.43 worked for you.. Is that still the case? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Sep-08 19:50 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
On Thu, Sep 08, 2011 at 02:12:27PM -0400, Konrad Rzeszutek Wilk wrote:> On Thu, Sep 08, 2011 at 01:32:12PM -0400, Konrad Rzeszutek Wilk wrote: > > On Wed, Sep 07, 2011 at 09:50:47AM -0400, Konrad Rzeszutek Wilk wrote: > > > On Wed, Aug 31, 2011 at 03:07:22PM +0200, Andreas Olsowski wrote: > > > > A little update, i now have all machines running on xen-4.1-testing > > > > with xen/stable-2.6.32.x > > > > That gave me the possiblity for additional tests. > > > > > > > > (I also tested xm/xend in addtion to xl/libxl, to make sure its not > > > > a xl/libxl problem.) > > > > > > > > I took the liberty to create a new test result matrix that should > > > > provide a better overview (in case someone else wants to get the > > > > whole picture): > > > > > > So.. I don''t think the issue I am seeing is exactly the same. This is > > > what ''xl'' gives me: > > > > Scratch that. I am seeing the error below if I: > > > > 1) Create guest on 4GB machine > > 2) Migrate it to the 32GB box (guest still works) > > 3) Migrate it to the 4GB box (guest dies - error below shows up and > > guest is dead). > > > > With 3.1-rc5 virgin - both Dom0 and DomU. Also Xen 4.1-testing on top of this. > > > > I tried just creating a guest on the 32GB and migrating it - and while > > it did migrate it was stuck in a hypercall_page call or crashed later on. > > > > Andreas, > > > > Thanks for reporting this. > > Oh wait. At some point you said that 2.6.32.43 worked for you.. Is that still > the case?Can you please try one thing for me - can you make sure the boxes have exact same amount of memory? You can do ''mem=X'' on the Xen hypervisor line to set that. I think the problem you are running into is that you are migrating between different CPU families... Is the /proc/cpuinfo drastically different between the boxes? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Sep-09 05:59 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
On 09/08/2011 09:50 PM, Konrad Rzeszutek Wilk wrote:> On Thu, Sep 08, 2011 at 02:12:27PM -0400, Konrad Rzeszutek Wilk wrote: >> On Thu, Sep 08, 2011 at 01:32:12PM -0400, Konrad Rzeszutek Wilk wrote: >>> On Wed, Sep 07, 2011 at 09:50:47AM -0400, Konrad Rzeszutek Wilk wrote: >>>> On Wed, Aug 31, 2011 at 03:07:22PM +0200, Andreas Olsowski wrote: >>>>> A little update, i now have all machines running on xen-4.1-testing >>>>> with xen/stable-2.6.32.x >>>>> That gave me the possiblity for additional tests. >>>>> >>>>> (I also tested xm/xend in addtion to xl/libxl, to make sure its not >>>>> a xl/libxl problem.) >>>>> >>>>> I took the liberty to create a new test result matrix that should >>>>> provide a better overview (in case someone else wants to get the >>>>> whole picture): >>>> >>>> So.. I don''t think the issue I am seeing is exactly the same. This is >>>> what ''xl'' gives me: >>> >>> Scratch that. I am seeing the error below if I: >>> >>> 1) Create guest on 4GB machine >>> 2) Migrate it to the 32GB box (guest still works) >>> 3) Migrate it to the 4GB box (guest dies - error below shows up and >>> guest is dead). >>> >>> With 3.1-rc5 virgin - both Dom0 and DomU. Also Xen 4.1-testing on top of this. >>> >>> I tried just creating a guest on the 32GB and migrating it - and while >>> it did migrate it was stuck in a hypercall_page call or crashed later on. >>> >>> Andreas, >>> >>> Thanks for reporting this. >> >> Oh wait. At some point you said that 2.6.32.43 worked for you.. Is that still >> the case?> (Ignore e-mail from a few minutes ago, accidentally did not reply-all) Did I? I will have to check my sent emails, but im pretty sure that if i found a way that works i normally would use it. But i can try an older version later today. Btw. allthough you get the same error as i do, the circumstances are slightly different. This does not neccessarily have sth to todo with the amount of memory. I do see this on hosts where both have the same amount of ram but are a different hardware platform.> > Can you please try one thing for me - can you make sure the boxes have exact same > amount of memory? You can do ''mem=X'' on the Xen hypervisor line to set that.Running mem=8g and have turned balooning dom0 off. multiboot /boot/xen.gz placeholder dom0_mem=8192M module /boot/vmlinuz-2.6.32.45-xen0 placeholder root=UUID=216ff902-b505-45c4-9bcb-9d63b4cb8992 ro mem=8G nomodeset console=tty0 console=ttyS1,57600 earlyprintk=xen For some reason though, the two r610s show: root@netcatarina:~# cat /proc/meminfo MemTotal: 8378236 kB root@netcatarina:~# xl list |grep Domain-0 Domain-0 0 7445 8 r----- 124304.7 root@memoryana:~# cat /proc/meminfo MemTotal: 8378236 kB root@memoryana:~# xl list |grep Domain-0 Domain-0 0 7445 8 r----- 132125.0 wheras the r710: root@tarballerina:~# cat /proc/meminfo MemTotal: 7886716 kB root@tarballerina:~# xl list |grep Domain-0 Domain-0 0 7221 8 r----- 64497.0 On a sidenote: root@tarballerina:~# xl mem-set Domain-0 8192 libxl: error: libxl.c:2119:libxl_set_memory_target cannot get memory info from /local/domain/0/memory/static-max : No such file or directory The two r610s can xl set their memory just fine> > I think the problem you are running into is that you are migrating between > different CPU families... Is the /proc/cpuinfo drastically different between > the boxes?diff: < model : 26 < model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz < stepping : 5 < cpu MHz : 2261.074 < cache size : 8192 KB --- > model : 44 > model name : Intel(R) Xeon(R) CPU E5640 @ 2.67GHz > stepping : 2 > cpu MHz : 2660.050 > cache size : 12288 KB 13,14c13,14 < flags : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nonstop_tsc aperfmperf pni est ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm ida < bogomips : 4522.14 --- > flags : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall lm constant_tsc rep_good nonstop_tsc aperfmperf pni pclmulqdq est ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat > bogomips : 5320.10 diffrent flags are: nx and aes And thats r610 and r710. The cpu in the 2950 is older, a completely different platform, different chipset, no on-chip memory controller. -- Andreas Olsowski Leuphana Universität Lüneburg Rechen- und Medienzentrum Scharnhorststraße 1, C7.015 21335 Lüneburg Tel: ++49 4131 677 1309 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Sep-09 09:18 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
>>> On Thu, Sep 08, 2011 at 01:32:12PM -0400, Konrad Rzeszutek Wilk wrote: >>> Oh wait. At some point you said that 2.6.32.43 worked for you.. Is >>> that still >>> the case?I tested 2.6.32.43 and 2.6.32.40 (to be sure) again, they dont work either.>> Can you please try one thing for me - can you make sure the boxes have >> exact same >> amount of memory? You can do ''mem=X'' on the Xen hypervisor line to set >> that. > Running mem=8g and have turned balooning dom0 off. > > multiboot /boot/xen.gz placeholder dom0_mem=8192M > module /boot/vmlinuz-2.6.32.45-xen0 placeholder > root=UUID=216ff902-b505-45c4-9bcb-9d63b4cb8992 ro mem=8G nomodeset > console=tty0 console=ttyS1,57600 earlyprintk=xen > > > For some reason though, the two r610s show: > root@netcatarina:~# cat /proc/meminfo > MemTotal: 8378236 kB > root@netcatarina:~# xl list |grep Domain-0 > Domain-0 0 7445 8 r----- 124304.7 > > root@memoryana:~# cat /proc/meminfo > MemTotal: 8378236 kB > root@memoryana:~# xl list |grep Domain-0 > Domain-0 0 7445 8 r----- 132125.0 > > wheras the r710: > root@tarballerina:~# cat /proc/meminfo > MemTotal: 7886716 kB > root@tarballerina:~# xl list |grep Domain-0 > Domain-0 0 7221 8 r----- 64497.0After reboot it went back up to 8378236KB. I dont understand why the dom0 memory sometimes changes. The two 32gb PE2950s show 8378236 KB after boot and then drop to sth like 6575996 KB. The R610s stay at 8378236 KB, always. -- Andreas Olsowski Leuphana Universität Lüneburg Rechen- und Medienzentrum Scharnhorststraße 1, C7.015 21335 Lüneburg Tel: ++49 4131 677 1309 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Sep-12 16:47 UTC
Re: [Xen-devel] xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2.. between different physical machines and CPUs.
> This does not neccessarily have sth to todo with the amount of memory. > I do see this on hosts where both have the same amount of ram but > are a different hardware platform.<nods> Let me modify the subject a bit to reflect this.> >I think the problem you are running into is that you are migrating between > >different CPU families... Is the /proc/cpuinfo drastically different between > >the boxes? > diff: > < model : 26 > < model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz > < stepping : 5 > < cpu MHz : 2261.074 > < cache size : 8192 KB > --- > > model : 44 > > model name : Intel(R) Xeon(R) CPU E5640 @ 2.67GHz > > stepping : 2 > > cpu MHz : 2660.050 > > cache size : 12288 KB > 13,14c13,14 > < flags : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat > clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc > rep_good nonstop_tsc aperfmperf pni est ssse3 cx16 sse4_1 sse4_2 > popcnt hypervisor lahf_lm ida > < bogomips : 4522.14 > --- > > flags : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat > clflush acpi mmx fxsr sse sse2 ss ht syscall lm constant_tsc > rep_good nonstop_tsc aperfmperf pni pclmulqdq est ssse3 cx16 sse4_1 > sse4_2 popcnt aes hypervisor lahf_lm ida arat > > bogomips : 5320.10 > > diffrent flags are: nx and aesOn the Linux command line, try using ''noexec=off'' - that should take care of the ''nx'' bit. The aes.. the ''xl'' command has a bit easier syntax for setting the CPUID: cpuid=''host,family=15,model=26,stepping=5,aes=s'' That ought to take care of that. I don''t really understand how the old ''cpuid=[''...'']'' syntax worked (the one that ''xm'' used). It looks quite arcane - so I think doing some Google search is the only way to figure that out. But co-workers of mine remind me that CPUID instructions is trapped by the hypervisor (both HVM and PV - PV via a special opcode - look in arch/x86/include/asm/xen/interface.h for details) for the kernel _only_. There is no such guarantee for applications. Meaning that if the application uses the ''cpuid'' to figure out if ''aes'' is available instead of using /proc/cpuinfo, it _will_ get the ''aes'' on one machine. This application using CPUID and getting and not getting the right filtered value is not present with HVM guests - as the CPUID instruction is trapped there irregardless of whether it is running in the kernel or user-land. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel