Hello, In what we believe is now the final regression discovered when upgrading XenServer from Xen 4.1 to 4.3, there is an issue with RTC emulation. Win2003 SP2 is a WAET unaware operating system, whose RTC access pattern triggers Xen''s rtc_mode_no_ack logic. The result is that the domain falls into a tight loop reading RTC RegC, whoes value is always 0xc0. I have confirmed that switching Xen back to RTC strict mode fixes the regression, but I am presuming that this alone would be an unpopular fix upstream. At the moment, HVMloader unconditionally advertises the RTC_NO_ACK bit in the WAET table, and Xen unconditionally decides that the domain has been informed that it should not ack RTC interrupts as per the specification. This logic is broken. There is no guarantee that the domain has read the WAET table, but even if it has, there is no guarantee that it will act on the information it has been given. One option would be the toolstack to set this parameter up; it is in the best place to know whether a certain domain will correctly use the new available mode. This would involve moving the rtc mode in Xen to a per-domain setting. On the other hand, there are obvious advantages from Xen''s point of view with rtc_mode_no_ack, which couldn''t be taken with the above toolstack changes. So, I would like peoples opinions on what is the best course of action? Playing with this code has proved incredibly fragile in the past and I am not sure whether it is better to fix up Xen''s detection logic and hope nothing else breaks, or defer the decision to the toolstack and take the performance hit of running extra timers in Xen. ~Andrew
>>> On 19.11.13 at 13:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > In what we believe is now the final regression discovered when upgrading > XenServer from Xen 4.1 to 4.3, there is an issue with RTC emulation. > > Win2003 SP2 is a WAET unaware operating system, whose RTC access pattern > triggers Xen''s rtc_mode_no_ack logic. The result is that the domain > falls into a tight loop reading RTC RegC, whoes value is always 0xc0.I specifically tested with w2k3, and know (from disassembling) that it reads and evaluates the WAET table. While not impossible, it seems unlikely that this would have been switched back in SP2.> I have confirmed that switching Xen back to RTC strict mode fixes the > regression, but I am presuming that this alone would be an unpopular fix > upstream. > > At the moment, HVMloader unconditionally advertises the RTC_NO_ACK bit > in the WAET table, and Xen unconditionally decides that the domain has > been informed that it should not ack RTC interrupts as per the > specification.No, it not "should not" but "doesn''t need to".> This logic is broken. There is no guarantee that the domain has read > the WAET table, but even if it has, there is no guarantee that it will > act on the information it has been given.And again, there is no requirement to omit the ACKs, Xen just knows that it shouldn''t rely on them being issued. At least that''s been the intention.> One option would be the > toolstack to set this parameter up; it is in the best place to know > whether a certain domain will correctly use the new available mode. > This would involve moving the rtc mode in Xen to a per-domain setting.Yes, this was always the plan, and a first partial patch to do that had been posted quite a while ago. The thing disliked (by Tim, with me agreeing) was that the emulation mode got tied there to Viridian mode. As noted in a reply to one of George''s 4.4 status mails, I simply didn''t find time so far to convert the patch to one using a distinct HVM param. Jan
On 19/11/2013 13:13, Jan Beulich wrote:>>>> On 19.11.13 at 13:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> In what we believe is now the final regression discovered when upgrading >> XenServer from Xen 4.1 to 4.3, there is an issue with RTC emulation. >> >> Win2003 SP2 is a WAET unaware operating system, whose RTC access pattern >> triggers Xen''s rtc_mode_no_ack logic. The result is that the domain >> falls into a tight loop reading RTC RegC, whoes value is always 0xc0. > I specifically tested with w2k3, and know (from disassembling) that > it reads and evaluates the WAET table. While not impossible, it > seems unlikely that this would have been switched back in SP2.Hmm - That does indeed seem unlikely. We test w2k3sp1 and w2k3sp2 side-by-side. w2k3sp1 is completely fine with Xen as is, and w2k3sp2 (almost) always falls into this infinite loop. Our test of 20 reboots always encounters the issue.> >> I have confirmed that switching Xen back to RTC strict mode fixes the >> regression, but I am presuming that this alone would be an unpopular fix >> upstream. >> >> At the moment, HVMloader unconditionally advertises the RTC_NO_ACK bit >> in the WAET table, and Xen unconditionally decides that the domain has >> been informed that it should not ack RTC interrupts as per the >> specification. > No, it not "should not" but "doesn''t need to". > >> This logic is broken. There is no guarantee that the domain has read >> the WAET table, but even if it has, there is no guarantee that it will >> act on the information it has been given. > And again, there is no requirement to omit the ACKs, Xen just > knows that it shouldn''t rely on them being issued. At least that''s > been the intention. > >> One option would be the >> toolstack to set this parameter up; it is in the best place to know >> whether a certain domain will correctly use the new available mode. >> This would involve moving the rtc mode in Xen to a per-domain setting. > Yes, this was always the plan, and a first partial patch to do that > had been posted quite a while ago. The thing disliked (by Tim, with > me agreeing) was that the emulation mode got tied there to Viridian > mode. As noted in a reply to one of George''s 4.4 status mails, I > simply didn''t find time so far to convert the patch to one using a > distinct HVM param. > > JanOk - then I will do a series to make this properly controlled by the toolstack. ~Andrew