Hello, Sorry to be the bearer of bad news, but there is further fallout from the RTC changes in Xen 4.3 and Win2k3 guests. Since updating to Xen 4.3 in XenServer trunk, we have been seeing these issues a handful of times on each nightly test. We are currently running stable-4.3 with no relevant patches in this area. The issue exists solely with Win2k3 Enterprise Edition SP2 64bit, and occurs sporadically in a simple reboot loop of the VM. The VM sits at 100% CPU performing: d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 d98v0 io write port 70 val c d98v0 vmentry cycles 1944 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 d98v0 io read port 71 val c0 d98v0 vmentry cycles 3120 I am attempting to narrow down which conditions cause it to fall into this loop, but am really picking blindly at factors. One possibility is the domain time offset being set to -24 seconds, but I cant immediately see why that would have an effect. I have attached xen-hvmctx from the affected domain, and do have one example of a VM in this loop so can poke for other state, if there are any sensible suggestions ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 03.09.13 at 15:37, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > Since updating to Xen 4.3 in XenServer trunk, we have been seeing these > issues a handful of times on each nightly test. We are currently > running stable-4.3 with no relevant patches in this area. > > The issue exists solely with Win2k3 Enterprise Edition SP2 64bit, and > occurs sporadically in a simple reboot loop of the VM. > > The VM sits at 100% CPU performing: > > d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 > d98v0 io write port 70 val c > d98v0 vmentry cycles 1944 > d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 > d98v0 io read port 71 val c0 > d98v0 vmentry cycles 3120 > > I am attempting to narrow down which conditions cause it to fall into > this loop, but am really picking blindly at factors. One possibility is > the domain time offset being set to -24 seconds, but I cant immediately > see why that would have an effect.So based on my code inspection last time I''d expect this to be a tight loop in the RTC interrupt handler, waiting for REG_C to read as zero. Which raises two questions: Does that specific version of Windows not honor the WAET flags saying that REG_C reads are unnecessary? Or does this only occur during very early boot (where iirc a first, temporary RTC interrupt handler gets installed for a very brief period of time that doesn''t pay attention to the WAET flag)?> I have attached xen-hvmctx from the affected domain, and do have one > example of a VM in this loop so can poke for other state, if there are > any sensible suggestionsThe REG_A value says 64Hz for the periodic interrupt if I''m not mistaken, so RTC_PF getting re-set between two iterations would first of all hint at a significantly overcommitted system (such that no two iterations of the loop can complete within 1/64 second). Jan
On 04/09/13 10:38, Jan Beulich wrote:>>>> On 03.09.13 at 15:37, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> Since updating to Xen 4.3 in XenServer trunk, we have been seeing these >> issues a handful of times on each nightly test. We are currently >> running stable-4.3 with no relevant patches in this area. >> >> The issue exists solely with Win2k3 Enterprise Edition SP2 64bit, and >> occurs sporadically in a simple reboot loop of the VM. >> >> The VM sits at 100% CPU performing: >> >> d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 >> d98v0 io write port 70 val c >> d98v0 vmentry cycles 1944 >> d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 >> d98v0 io read port 71 val c0 >> d98v0 vmentry cycles 3120 >> >> I am attempting to narrow down which conditions cause it to fall into >> this loop, but am really picking blindly at factors. One possibility is >> the domain time offset being set to -24 seconds, but I cant immediately >> see why that would have an effect. > So based on my code inspection last time I''d expect this to be a > tight loop in the RTC interrupt handler, waiting for REG_C to read > as zero.The timing data and samples of %eip from xentrace would agree with your expectation.> > Which raises two questions: Does that specific version of Windows > not honor the WAET flags saying that REG_C reads are unnecessary? > Or does this only occur during very early boot (where iirc a first, > temporary RTC interrupt handler gets installed for a very brief period > of time that doesn''t pay attention to the WAET flag)?When the VM falls into the loop, it is still in text mode with "Starting windows..." and a block progress bar which is full. This means that ntldr has finished loading the base drivers using int 13h. From Xentrace, we do see that it is in 64 bit mode, so execution is probably right at the beginning of the kernel, even before switching the VGA mode.> >> I have attached xen-hvmctx from the affected domain, and do have one >> example of a VM in this loop so can poke for other state, if there are >> any sensible suggestions > The REG_A value says 64Hz for the periodic interrupt if I''m not > mistaken, so RTC_PF getting re-set between two iterations would > first of all hint at a significantly overcommitted system (such that > no two iterations of the loop can complete within 1/64 second). > > Jan >This is part of our automatic testing. There are two VMs (32 and 64bit variants) running the same set of tests, being basic lifecycle/migrate etc loops. The hosts are not overcommitted in the slightest. ~Andrew
>>> On 04.09.13 at 14:34, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 04/09/13 10:38, Jan Beulich wrote: >> Which raises two questions: Does that specific version of Windows >> not honor the WAET flags saying that REG_C reads are unnecessary? >> Or does this only occur during very early boot (where iirc a first, >> temporary RTC interrupt handler gets installed for a very brief period >> of time that doesn''t pay attention to the WAET flag)? > > When the VM falls into the loop, it is still in text mode with "Starting > windows..." and a block progress bar which is full. This means that > ntldr has finished loading the base drivers using int 13h. From > Xentrace, we do see that it is in 64 bit mode, so execution is probably > right at the beginning of the kernel, even before switching the VGA mode.So that might then be the early probing interrupt handler that I had found they install transiently.>>> I have attached xen-hvmctx from the affected domain, and do have one >>> example of a VM in this loop so can poke for other state, if there are >>> any sensible suggestions >> The REG_A value says 64Hz for the periodic interrupt if I''m not >> mistaken, so RTC_PF getting re-set between two iterations would >> first of all hint at a significantly overcommitted system (such that >> no two iterations of the loop can complete within 1/64 second). > > This is part of our automatic testing. There are two VMs (32 and 64bit > variants) running the same set of tests, being basic lifecycle/migrate > etc loops. The hosts are not overcommitted in the slightest.In which case, unless there''s some scheduler anomaly involved, I see no explanation for the behavior. Not knowing what precise data Xentrace produces - does that include any timing information? If so, what''s the smallest delta between two of these REG_C reads? Jan
On 04/09/13 13:44, Jan Beulich wrote:>>>> On 04.09.13 at 14:34, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> On 04/09/13 10:38, Jan Beulich wrote: >>> Which raises two questions: Does that specific version of Windows >>> not honor the WAET flags saying that REG_C reads are unnecessary? >>> Or does this only occur during very early boot (where iirc a first, >>> temporary RTC interrupt handler gets installed for a very brief period >>> of time that doesn''t pay attention to the WAET flag)? >> When the VM falls into the loop, it is still in text mode with "Starting >> windows..." and a block progress bar which is full. This means that >> ntldr has finished loading the base drivers using int 13h. From >> Xentrace, we do see that it is in 64 bit mode, so execution is probably >> right at the beginning of the kernel, even before switching the VGA mode. > So that might then be the early probing interrupt handler that I > had found they install transiently. > >>>> I have attached xen-hvmctx from the affected domain, and do have one >>>> example of a VM in this loop so can poke for other state, if there are >>>> any sensible suggestions >>> The REG_A value says 64Hz for the periodic interrupt if I''m not >>> mistaken, so RTC_PF getting re-set between two iterations would >>> first of all hint at a significantly overcommitted system (such that >>> no two iterations of the loop can complete within 1/64 second). >> This is part of our automatic testing. There are two VMs (32 and 64bit >> variants) running the same set of tests, being basic lifecycle/migrate >> etc loops. The hosts are not overcommitted in the slightest. > In which case, unless there''s some scheduler anomaly involved, I > see no explanation for the behavior. > > Not knowing what precise data Xentrace produces - does that include > any timing information? If so, what''s the smallest delta between two > of these REG_C reads? > > Jan >Yes. Here is a larger sample: ] 2.862018629 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 2.862018629 d98v0 io write port 70 val c ] 2.862019461 d98v0 vmentry cycles 1996 ] 2.862019936 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 2.862019936 d98v0 io read port 71 val c0 ] 2.862021275 d98v0 vmentry cycles 3214 ] 2.862021752 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 2.862021752 d98v0 io write port 70 val c ] 2.862022560 d98v0 vmentry cycles 1938 ] 2.862023068 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 2.862023068 d98v0 io read port 71 val c0 ] 2.862024410 d98v0 vmentry cycles 3220 ] 2.862024886 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 2.862024886 d98v0 io write port 70 val c ] 2.862025735 d98v0 vmentry cycles 2037 ] 2.862026207 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 2.862026207 d98v0 io read port 71 val c0 ] 2.862027537 d98v0 vmentry cycles 3190 ] 2.862028012 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 2.862028012 d98v0 io write port 70 val c ] 2.862028854 d98v0 vmentry cycles 2021 ] 2.862029322 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 2.862029322 d98v0 io read port 71 val c0 ] 2.862030668 d98v0 vmentry cycles 3232 ] 2.862031145 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 2.862031145 d98v0 io write port 70 val c ] 2.862031954 d98v0 vmentry cycles 1941 ] 2.862032462 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 2.862032462 d98v0 io read port 71 val c0 ] 2.862033762 d98v0 vmentry cycles 3118 ] 2.862034233 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 2.862034233 d98v0 io write port 70 val c ] 2.862035082 d98v0 vmentry cycles 2036 ] 2.862035558 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 2.862035558 d98v0 io read port 71 val c0 ] 2.862036937 d98v0 vmentry cycles 3309 George will know for certain, but as far as I am aware, the timestamp column is calculated from raw TSC counter values embedded in the trace records. Either way, it is on the order of 2 microseconds difference around iterations of the loops, with a substantial proportion of that being vmextry/vmexit. As an easy starting point of reference, I will completely disable writing the WAET table to see whether that makes a difference. ~Andrew
>>> On 04.09.13 at 15:37, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 04/09/13 13:44, Jan Beulich wrote: >> Not knowing what precise data Xentrace produces - does that include >> any timing information? If so, what''s the smallest delta between two >> of these REG_C reads? > > Yes. Here is a larger sample: > > ] 2.862018629 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 > 2.862018629 d98v0 io write port 70 val c > ] 2.862019461 d98v0 vmentry cycles 1996 > ] 2.862019936 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 > 2.862019936 d98v0 io read port 71 val c0 > ] 2.862021275 d98v0 vmentry cycles 3214 > ] 2.862021752 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 > 2.862021752 d98v0 io write port 70 val c > ] 2.862022560 d98v0 vmentry cycles 1938 > ] 2.862023068 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 > 2.862023068 d98v0 io read port 71 val c0 > ] 2.862024410 d98v0 vmentry cycles 3220 > ] 2.862024886 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 > 2.862024886 d98v0 io write port 70 val c > ] 2.862025735 d98v0 vmentry cycles 2037 > ] 2.862026207 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 > 2.862026207 d98v0 io read port 71 val c0 > ] 2.862027537 d98v0 vmentry cycles 3190 > ] 2.862028012 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 > 2.862028012 d98v0 io write port 70 val c > ] 2.862028854 d98v0 vmentry cycles 2021 > ] 2.862029322 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 > 2.862029322 d98v0 io read port 71 val c0 > ] 2.862030668 d98v0 vmentry cycles 3232 > ] 2.862031145 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 > 2.862031145 d98v0 io write port 70 val c > ] 2.862031954 d98v0 vmentry cycles 1941 > ] 2.862032462 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 > 2.862032462 d98v0 io read port 71 val c0 > ] 2.862033762 d98v0 vmentry cycles 3118 > ] 2.862034233 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 > 2.862034233 d98v0 io write port 70 val c > ] 2.862035082 d98v0 vmentry cycles 2036 > ] 2.862035558 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 > 2.862035558 d98v0 io read port 71 val c0 > ] 2.862036937 d98v0 vmentry cycles 3309 > > George will know for certain, but as far as I am aware, the timestamp > column is calculated from raw TSC counter values embedded in the trace > records.Given how close the entries are, it doesn''t even matter whether that''s TSC ticks or nanoseconds - they''re definitely way too close together to represent a state that should be permitted to get into with a periodic timer tick of 64Hz.> As an easy starting point of reference, I will completely disable > writing the WAET table to see whether that makes a difference.That''s going to be fruitless afaict - as said before, the early RTC interrupt handler they install doesn''t even look at the WAET info. What you will want to check is the rate at which rtc_periodic_interrupt() gets invoked in this case, as that''s the only place that can set RTC_PF (which you observe constantly set), and how this (presumably) ends up being much faster than intended (i.e. presumably some issue in pt_update_irq()). Jan
On 04/09/13 14:37, Andrew Cooper wrote:> On 04/09/13 13:44, Jan Beulich wrote: >>>>> On 04.09.13 at 14:34, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> On 04/09/13 10:38, Jan Beulich wrote: >>>> Which raises two questions: Does that specific version of Windows >>>> not honor the WAET flags saying that REG_C reads are unnecessary? >>>> Or does this only occur during very early boot (where iirc a first, >>>> temporary RTC interrupt handler gets installed for a very brief period >>>> of time that doesn''t pay attention to the WAET flag)? >>> When the VM falls into the loop, it is still in text mode with "Starting >>> windows..." and a block progress bar which is full. This means that >>> ntldr has finished loading the base drivers using int 13h. From >>> Xentrace, we do see that it is in 64 bit mode, so execution is probably >>> right at the beginning of the kernel, even before switching the VGA mode. >> So that might then be the early probing interrupt handler that I >> had found they install transiently. >> >>>>> I have attached xen-hvmctx from the affected domain, and do have one >>>>> example of a VM in this loop so can poke for other state, if there are >>>>> any sensible suggestions >>>> The REG_A value says 64Hz for the periodic interrupt if I''m not >>>> mistaken, so RTC_PF getting re-set between two iterations would >>>> first of all hint at a significantly overcommitted system (such that >>>> no two iterations of the loop can complete within 1/64 second). >>> This is part of our automatic testing. There are two VMs (32 and 64bit >>> variants) running the same set of tests, being basic lifecycle/migrate >>> etc loops. The hosts are not overcommitted in the slightest. >> In which case, unless there''s some scheduler anomaly involved, I >> see no explanation for the behavior. >> >> Not knowing what precise data Xentrace produces - does that include >> any timing information? If so, what''s the smallest delta between two >> of these REG_C reads? >> >> Jan >> > Yes. Here is a larger sample: > > ] 2.862018629 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 > 2.862018629 d98v0 io write port 70 val c > ] 2.862019461 d98v0 vmentry cycles 1996 > ] 2.862019936 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 > 2.862019936 d98v0 io read port 71 val c0 > ] 2.862021275 d98v0 vmentry cycles 3214 > ] 2.862021752 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 > 2.862021752 d98v0 io write port 70 val c > ] 2.862022560 d98v0 vmentry cycles 1938 > ] 2.862023068 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 > 2.862023068 d98v0 io read port 71 val c0 > ] 2.862024410 d98v0 vmentry cycles 3220 > ] 2.862024886 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 > 2.862024886 d98v0 io write port 70 val c > ] 2.862025735 d98v0 vmentry cycles 2037 > ] 2.862026207 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 > 2.862026207 d98v0 io read port 71 val c0 > ] 2.862027537 d98v0 vmentry cycles 3190 > ] 2.862028012 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 > 2.862028012 d98v0 io write port 70 val c > ] 2.862028854 d98v0 vmentry cycles 2021 > ] 2.862029322 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 > 2.862029322 d98v0 io read port 71 val c0 > ] 2.862030668 d98v0 vmentry cycles 3232 > ] 2.862031145 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 > 2.862031145 d98v0 io write port 70 val c > ] 2.862031954 d98v0 vmentry cycles 1941 > ] 2.862032462 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 > 2.862032462 d98v0 io read port 71 val c0 > ] 2.862033762 d98v0 vmentry cycles 3118 > ] 2.862034233 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814176 > 2.862034233 d98v0 io write port 70 val c > ] 2.862035082 d98v0 vmentry cycles 2036 > ] 2.862035558 d98v0 vmexit exit_reason IO_INSTRUCTION eip fffff80000814177 > 2.862035558 d98v0 io read port 71 val c0 > ] 2.862036937 d98v0 vmentry cycles 3309 > > George will know for certain, but as far as I am aware, the timestamp > column is calculated from raw TSC counter values embedded in the trace > records.Yes, with the following caveats: 1. Not all records have timestamps; if a record does not have a timestamp, the processing engine will use the most recent timestamp available. vmexit and vmentry have timestamps, but records about handling (such as the io record above) typically don''t. 2. In order for the conversion from cycles to seconds to be accurate, you have to specify the cpu hz of the processor on which the trace ran when running xenalyze. If you don''t specify anything, it will default to 2GHz. This will produce results within 2x of being correct, and therefore *qualitatively* similar. If Andy has not specified the cpu hz, then the actual delta between reads may be 0.5-2x what the trace shows, but still way too short for 64Hz. -George