We''ve got reports of that change causing HVM data corruption issues. While I can''t see what''s wrong with the patch, I''d suggest at least reverting it from the 3.3 tree (which is what our code is based upon) for the time being. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I think the issue is that I did a bad backport to 3.3. The ''case 0xc3'' should be under twobyte_special_insn rather than twobyte_insn, right? The two separate paths got merged into one in xen-unstable. Of course this data corruption ought only to happen in cases where we''d previously have failed an mmio emulation (and hence probably killed the guest kernel?). -- Keir On 20/11/08 16:38, "Jan Beulich" <jbeulich@novell.com> wrote:> We''ve got reports of that change causing HVM data corruption issues. While > I can''t see what''s wrong with the patch, I''d suggest at least reverting it > from > the 3.3 tree (which is what our code is based upon) for the time being. > > Jan >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 17:13 +0000 on 20 Nov (1227201181), Keir Fraser wrote:> I think the issue is that I did a bad backport to 3.3. The ''case 0xc3'' > should be under twobyte_special_insn rather than twobyte_insn, right? The > two separate paths got merged into one in xen-unstable. > > Of course this data corruption ought only to happen in cases where we''d > previously have failed an mmio emulation (and hence probably killed the > guest kernel?).A more likely culprit is that some OSes use movnti to zero pages that used to be pagetables; when we couldn''t emulate it we just (correctly) unshadowed those pages. Cheers, Tim.> > -- Keir > > On 20/11/08 16:38, "Jan Beulich" <jbeulich@novell.com> wrote: > > > We''ve got reports of that change causing HVM data corruption issues. While > > I can''t see what''s wrong with the patch, I''d suggest at least reverting it > > from > > the 3.3 tree (which is what our code is based upon) for the time being. > > > > Jan > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 20/11/08 17:16, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:> At 17:13 +0000 on 20 Nov (1227201181), Keir Fraser wrote: >> I think the issue is that I did a bad backport to 3.3. The ''case 0xc3'' >> should be under twobyte_special_insn rather than twobyte_insn, right? The >> two separate paths got merged into one in xen-unstable. >> >> Of course this data corruption ought only to happen in cases where we''d >> previously have failed an mmio emulation (and hence probably killed the >> guest kernel?). > > A more likely culprit is that some OSes use movnti to zero pages that > used to be pagetables; when we couldn''t emulate it we just (correctly) > unshadowed those pages.Yes, you''re probably right. I wonder if we are relying on emulation failures to inform unshadowing at all often? We might have to revisit constraining x86_emulate() when called by shadow code, do you think? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser schrieb:> I think the issue is that I did a bad backport to 3.3. The ''case 0xc3'' > should be under twobyte_special_insn rather than twobyte_insn, right? The > two separate paths got merged into one in xen-unstable.The other way round, but yes, this seems to have caused the corruption. Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <keir.fraser@eu.citrix.com> 20.11.08 18:13 >>> >I think the issue is that I did a bad backport to 3.3. The ''case 0xc3'' >should be under twobyte_special_insn rather than twobyte_insn, right? The >two separate paths got merged into one in xen-unstable.Oh, indeed - if you mean it the other way around.>Of course this data corruption ought only to happen in cases where we''d >previously have failed an mmio emulation (and hence probably killed the >guest kernel?).Yes, we previously saw emulation failure messages. The guest wasn''t killed because of that, however. I have to admit it''s been a while since I last looked at mmio emulation - is it eagerly trying to emulate successive instructions, and return to native execution when emulation failed? If not, I could neither explain why only some data got corrupted here, nor why the guest didn''t get killed. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 21/11/08 11:04, "Jan Beulich" <jbeulich@novell.com> wrote:>> Of course this data corruption ought only to happen in cases where we''d >> previously have failed an mmio emulation (and hence probably killed the >> guest kernel?). > > Yes, we previously saw emulation failure messages. The guest wasn''t > killed because of that, however. I have to admit it''s been a while since > I last looked at mmio emulation - is it eagerly trying to emulate successive > instructions, and return to native execution when emulation failed? If > not, I could neither explain why only some data got corrupted here, nor > why the guest didn''t get killed.TimD had the correct explanation -- page-table pages getting recycled via Windows'' page scrubber. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 17:43 +0000 on 20 Nov (1227202988), Keir Fraser wrote:> Yes, you''re probably right. I wonder if we are relying on emulation failures > to inform unshadowing at all often? We might have to revisit constraining > x86_emulate() when called by shadow code, do you think?Yes, I think it would probably be worth looking at that. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Gianluca Guida <gianluca.guida@eu.citrix.com> 24.11.08 17:18 >>> >Tim Deegan wrote: >> At 17:43 +0000 on 20 Nov (1227202988), Keir Fraser wrote: >>> Yes, you''re probably right. I wonder if we are relying on emulation failures >>> to inform unshadowing at all often? We might have to revisit constraining >>> x86_emulate() when called by shadow code, do you think? >> >> Yes, I think it would probably be worth looking at that. > >In what kind of guest/workloads we were experiencing this corruption?SLE11 installation. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan wrote:> At 17:43 +0000 on 20 Nov (1227202988), Keir Fraser wrote: >> Yes, you''re probably right. I wonder if we are relying on emulation failures >> to inform unshadowing at all often? We might have to revisit constraining >> x86_emulate() when called by shadow code, do you think? > > Yes, I think it would probably be worth looking at that.In what kind of guest/workloads we were experiencing this corruption? Thanks, Gianluca _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> I think the issue is that I did a bad backport to 3.3. The ''case 0xc3'' > should be under twobyte_special_insn rather than twobyte_insn, right? The > two separate paths got merged into one in xen-unstable.This seems actually to be the case. The actual move from src.val to dst.val never happened with the current patch, and this made movnti to write in memory the original dst.val value, leading to memory corruption. By moving the switch case into twobyte_insn the problem goes away. Gianluca _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Maybe Matching Threads
- [PATCH] unshadow the page table page which are used as data page
- fsincos emulation on AMD CPUs
- [hybrid]: unable to boot hvm due to eflags.ID
- [PATCH] x86: AVX instruction emulation fixes
- [PATCH]Fix the bug of guest os installation failure and win2k boot failure