Diana Crisan
2013-May-14 13:12 UTC
HVM Migration of domU on Qemu-upstream DM loses ACPI data in xenstore
This is problem 2 of 3 problems we are having with live migration and/or ACPI on Xen-4.3 and Xen-4.2. Any help would be appreciated. Detailed description of problem: We are using Xen-4.3-rc1 with dom0 running Ubuntu Precise and 3.5.0-23-generic kernel, and domU running Ubuntu Precise (12.04) cloud images running 3.2.0-39-virtual. We are using the xl.conf below on qemu-upstream-dm and HVM and two identical sending and receiving machines (hardware and software) When live migration is instigated between two identical hardware configurations using ''xl migrate'', the migrate completes but the xenstore entries on the sending and receiving side differ. Prior to issuing a migration the lines ''platform = ""'', ''acpi = "1"'', ''acpi_s3 = "1"'' and ''acpi_s4 = "1"'' are present in xenstore (despite the fact xl.conf does not explicitly specify them). However, after the migration succeeds on the receiving side, those lines are missing. This is replicable every time. How to replicate: 1. Take two machines with identical hardware and software, running the xen-4.3-rc1 version of Xen on Ubuntu Precise with 3.5.0-23-generic kernel. 2. Use the xl.conf below as a configuration file. 3. Create a VM using Ubuntu Precise and 3.5.0-23 generic. 4. Start the VM 5. Do xenstore-ls and save it to a file for later comparison. 6. xl migrate from one machine to the other 7. wait until it resumes on the receiving side 8. Do xenstore-ls on the receiving side and save it to a file. 9. Compare the two files and notice that the lines referring to the platform ACPI configuration in xenstore are missing. Expected results: The platform ACPI details should be present in xenstore after a migration. Actual results: The platform ACPI details are missing in xenstore after a migration. Notes: On xen-4.2, a similar thing happens. --xl.conf-- builder=''hvm'' memory = 512 name = "416-vm" vcpus=1 disk = [ ''tap:qcow2:/root/diana.qcow2,xvda,w'' ] vif = [''mac=00:16:3f:1d:6a:c0, bridge=defaultbr''] sdl=0 opengl=1 vnc=1 vnclisten="0.0.0.0" vncdisplay=0 vncunused=0 vncpasswd=''p'' stdvga=0 serial=''pty''
Ian Campbell
2013-May-17 17:33 UTC
Re: HVM Migration of domU on Qemu-upstream DM loses ACPI data in xenstore
On Tue, 2013-05-14 at 14:12 +0100, Diana Crisan wrote:> Prior to issuing a migration the lines ''platform = ""'', ''acpi = "1"'', > ''acpi_s3 = "1"'' and ''acpi_s4 = "1"'' are present in xenstore (despite > the fact xl.conf does not explicitly specify them). However, after the > migration succeeds on the receiving side, those lines are missing.Is the lack of these keys causing you a problem? IIRC they are used by the builder to communicate with hvmloader (the pre-BIOS loader used in HVM guests) so it can setup ACPI tables etc as appropriate. Nothing else should be using them. They are documented as INTERNAL in docs/misc/xenstore-paths.markdown. Ian.
Alex Bligh
2013-May-18 09:52 UTC
Re: HVM Migration of domU on Qemu-upstream DM loses ACPI data in xenstore
Ian, --On 17 May 2013 18:33:48 +0100 Ian Campbell <ian.campbell@citrix.com> wrote:> On Tue, 2013-05-14 at 14:12 +0100, Diana Crisan wrote: >> Prior to issuing a migration the lines ''platform = ""'', ''acpi = "1"'', >> ''acpi_s3 = "1"'' and ''acpi_s4 = "1"'' are present in xenstore (despite >> the fact xl.conf does not explicitly specify them). However, after the >> migration succeeds on the receiving side, those lines are missing. > > Is the lack of these keys causing you a problem? IIRC they are used by > the builder to communicate with hvmloader (the pre-BIOS loader used in > HVM guests) so it can setup ACPI tables etc as appropriate. Nothing else > should be using them. They are documented as INTERNAL in > docs/misc/xenstore-paths.markdown.(Diana is my colleague) We don''t know whether it causes a problem, but we were looking to find something something that might explain the stuck clock on migration Diana reported along side this on ACPI enabled hvm: http://lists.xen.org/archives/html/xen-devel/2013-05/msg01472.html We figured if ACPI wasn''t being set up right on the recipient (migrated) domain, this might be the problem (given the stuck clock only appears if you use ACPI). How does the recipient upstream QEMU / Xen know whether to emulate ACPI if this is not transferred? -- Alex Bligh
Ian Campbell
2013-May-18 11:02 UTC
Re: HVM Migration of domU on Qemu-upstream DM loses ACPI data in xenstore
On Sat, 2013-05-18 at 10:52 +0100, Alex Bligh wrote:> Ian, > > --On 17 May 2013 18:33:48 +0100 Ian Campbell <ian.campbell@citrix.com> > wrote: > > > On Tue, 2013-05-14 at 14:12 +0100, Diana Crisan wrote: > >> Prior to issuing a migration the lines ''platform = ""'', ''acpi = "1"'', > >> ''acpi_s3 = "1"'' and ''acpi_s4 = "1"'' are present in xenstore (despite > >> the fact xl.conf does not explicitly specify them). However, after the > >> migration succeeds on the receiving side, those lines are missing. > > > > Is the lack of these keys causing you a problem? IIRC they are used by > > the builder to communicate with hvmloader (the pre-BIOS loader used in > > HVM guests) so it can setup ACPI tables etc as appropriate. Nothing else > > should be using them. They are documented as INTERNAL in > > docs/misc/xenstore-paths.markdown. > > (Diana is my colleague) > > We don''t know whether it causes a problem, but we were looking to > find something something that might explain the stuck clock on migration > Diana reported along side this on ACPI enabled hvm: > http://lists.xen.org/archives/html/xen-devel/2013-05/msg01472.html > > We figured if ACPI wasn''t being set up right on the recipient (migrated) > domain, this might be the problem (given the stuck clock only appears > if you use ACPI). > > How does the recipient upstream QEMU / Xen know whether to emulate > ACPI if this is not transferred?These keys have nothing to do with that, all they do is cause hvmloader to expose ACPI tables to the guest or to tweak the content of those tables. That state is preserved as part of the memory image of the guest. The qemu state is also pickled as part of the save image. ACPI is jut a set of tables describing the hardware, there''s no "emulation" to turn off and on. Whatever magic I/O ports the ACPI AML references are always on, the setting just controls whether the guest gets to see that via the AML. Ian.
Alex Bligh
2013-May-18 11:17 UTC
Re: HVM Migration of domU on Qemu-upstream DM loses ACPI data in xenstore
Ian, --On 18 May 2013 12:02:23 +0100 Ian Campbell <Ian.Campbell@citrix.com> wrote:> These keys have nothing to do with that, all they do is cause hvmloader > to expose ACPI tables to the guest or to tweak the content of those > tables. That state is preserved as part of the memory image of the > guest. The qemu state is also pickled as part of the save image. > > ACPI is jut a set of tables describing the hardware, there''s no > "emulation" to turn off and on. Whatever magic I/O ports the ACPI AML > references are always on, the setting just controls whether the guest > gets to see that via the AML.Thanks. So it could not be that the guest gets to see that via the AML pre migration, but not post migration? In that case I can only conclude that some part of the qemu state is not migrating correctly, and the fact that it the cluck stock doesn''t happen if ACPI is enabled in xl.conf is only relevant as it influences how the guest (linux in this case) chooses its clock source (i.e. its broken in any case, just the guest does not notice if the relevant ACPI dtables aren''t exposed). Any ideas on how to debug this further? It is odd that the date command (used to set a date) will unstick the clock. -- Alex Bligh
Ian Campbell
2013-May-20 08:40 UTC
Re: HVM Migration of domU on Qemu-upstream DM loses ACPI data in xenstore
On Sat, 2013-05-18 at 12:17 +0100, Alex Bligh wrote:> Ian, > > --On 18 May 2013 12:02:23 +0100 Ian Campbell <Ian.Campbell@citrix.com> > wrote: > > > These keys have nothing to do with that, all they do is cause hvmloader > > to expose ACPI tables to the guest or to tweak the content of those > > tables. That state is preserved as part of the memory image of the > > guest. The qemu state is also pickled as part of the save image. > > > > ACPI is jut a set of tables describing the hardware, there''s no > > "emulation" to turn off and on. Whatever magic I/O ports the ACPI AML > > references are always on, the setting just controls whether the guest > > gets to see that via the AML. > > Thanks. So it could not be that the guest gets to see that via the AML > pre migration, but not post migration?AML lives in guest RAM, so no.> In that case I can only conclude that some part of the qemu state > is not migrating correctly, and the fact that it the cluck stock > doesn''t happen if ACPI is enabled in xl.conf is only relevant as > it influences how the guest (linux in this case) chooses its clock > source (i.e. its broken in any case, just the guest does not notice > if the relevant ACPI dtables aren''t exposed).You could perhaps verify this somewhat by playing with the kernel''s clocksource= option. Most (all?) of the clocks are actually emulated by the hypervisor rather than qemu for performance reasons, but that state is also pickled over a migrate.> Any ideas on how to debug this further? It is odd that the date command > (used to set a date) will unstick the clock.I''d have thought that would only poke the RTC, but to be honest I''m not sure. Ian.
Alex Bligh
2013-May-20 11:50 UTC
Re: HVM Migration of domU on Qemu-upstream DM loses ACPI data in xenstore
Ian, --On 20 May 2013 09:40:01 +0100 Ian Campbell <Ian.Campbell@citrix.com> wrote:> You could perhaps verify this somewhat by playing with the kernel''s > clocksource= option.clocksource=[hpet|pit|tsc|acpi_pm|cyclone|scx200_hrt] I''m guessing clocksource=tsc is the least dependent on ''other stuff''. We''ll have a play.>> Any ideas on how to debug this further? It is odd that the date command >> (used to set a date) will unstick the clock. > > I''d have thought that would only poke the RTC, but to be honest I''m not > sure.I''m pretty sure date itself only changes the wallclock (the clock command normally being needed to write to the RTC). If the images are running ntp, that may notice the wallclock change and write to CMOS I guess. -- Alex Bligh
Ian Campbell
2013-May-20 12:01 UTC
Re: HVM Migration of domU on Qemu-upstream DM loses ACPI data in xenstore
On Mon, 2013-05-20 at 12:50 +0100, Alex Bligh wrote:> Ian, > > --On 20 May 2013 09:40:01 +0100 Ian Campbell <Ian.Campbell@citrix.com> > wrote: > > > You could perhaps verify this somewhat by playing with the kernel''s > > clocksource= option. > > clocksource=[hpet|pit|tsc|acpi_pm|cyclone|scx200_hrt] > > I''m guessing clocksource=tsc is the least dependent on ''other stuff''.It''d be a good one to start with. You should be able to confirm under sysfs which one is used now, I''m guessing it is acpi_pm. It''d be worth trying each of the first 4 though. cyclone and scx200 seem a bit specific...> We''ll have a play. > > >> Any ideas on how to debug this further? It is odd that the date command > >> (used to set a date) will unstick the clock. > > > > I''d have thought that would only poke the RTC, but to be honest I''m not > > sure. > > I''m pretty sure date itself only changes the wallclock (the clock command > normally being needed to write to the RTC).Yes, I think you are right.> If the images are running > ntp, that may notice the wallclock change and write to CMOS I guess.
Diana Crisan
2013-May-21 10:08 UTC
Re: HVM Migration of domU on Qemu-upstream DM loses ACPI data in xenstore
Hello Ian,>On Mon, 2013-05-20 at 12:50 +0100, Alex Bligh wrote: >> Ian, >> >> --On 20 May 2013 09:40:01 +0100 Ian Campbell <Ian.Campbell@citrix.com> >> wrote: >> >> > You could perhaps verify this somewhat by playing with the kernel''s >> > clocksource= option. >> >> clocksource=[hpet|pit|tsc|acpi_pm|cyclone|scx200_hrt] >> >> I''m guessing clocksource=tsc is the least dependent on ''other stuff''.>It''d be a good one to start with. You should be able to confirm under >sysfs which one is used now, I''m guessing it is acpi_pm.>It''d be worth trying each of the first 4 though. cyclone and scx200 seem >a bit specific...I tested the first 4 options with both xen 4.2 and xen 4.3. Please see below my findings: Under 4.2: tsc: cannot use this as not HRT compatible; can''t switch while in HRT/NOHZ mode hpet: activated according to dmesg; after migration got this error: "i8042 no controller found; pm: device i8042 failed to restore: error -19" and this error (??) got the clock stuck again pit: did not get an error, but it did not seem to have listened to my request as I could see in dmesg it reading my request to switch to clocksource=pit but choosing to set the xen clocksource. Under 4.3: tsc: as above, not HRT compatible so cannot use it. hpet: still get the clock stuck after several migrations (3-4 migrations) pit: as above it is ignored. acpi_pm: it seems to be easier to replicate the problem with this clocksource - got it on the first migrate. Notes: the wallclock (date) and RTC (clock -r) on the hosts were in sync within a second>> We''ll have a play. >> >> >> Any ideas on how to debug this further? It is odd that the date command >> >> (used to set a date) will unstick the clock. >> > >> > I''d have thought that would only poke the RTC, but to be honest I''m not >> > sure. >> >> I''m pretty sure date itself only changes the wallclock (the clock command >> normally being needed to write to the RTC).>Yes, I think you are right.>> If the images are running >> ntp, that may notice the wallclock change and write to CMOS I guess.Please let me know if you need any more details. -- Diana Crisan