I guess I may miss sth important, but I don''t understand below code at end of fixup_page_fault: if ( paging_mode_enabled(d) ) { int ret = paging_fault(addr, regs); if ( ret == EXCRET_fault_fixed ) trace_trap_two_addr(TRC_PV_PAGING_FIXUP, regs->eip, addr); return ret; } Shouldn''t a check upon guest_mode(regs) is required, since the fault may be triggered by xen itself? For example, a null pointer deference by xen code (not passed from guest) may not get chance to reach "fatal page fault" and instead a guest page fault could be injected per my reading... Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 17/12/2008 03:17, "Tian, Kevin" <kevin.tian@intel.com> wrote:> I guess I may miss sth important, but I don''t understand below > code at end of fixup_page_fault: > > if ( paging_mode_enabled(d) ) > { > int ret = paging_fault(addr, regs); > if ( ret == EXCRET_fault_fixed ) > trace_trap_two_addr(TRC_PV_PAGING_FIXUP, regs->eip, addr); > return ret; > } > > Shouldn''t a check upon guest_mode(regs) is required, since the > fault may be triggered by xen itself? For example, a null pointer > deference by xen code (not passed from guest) may not get chance > to reach "fatal page fault" and instead a guest page fault could be > injected per my reading...Consider copy_from_guest() applied to a PV guest with dirty logging enabled. The #PF handler should fix up faults when accessing guest address space via shadow page tables, even when the access happens within Xen. I haven''t looked but I''m sure that the shadow code will only propagate the fault, and return EXCRET_fault_fixed, if it sees the fault occurred in guest context. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Wednesday, December 17, 2008 4:23 PM > >On 17/12/2008 03:17, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> I guess I may miss sth important, but I don''t understand below >> code at end of fixup_page_fault: >> >> if ( paging_mode_enabled(d) ) >> { >> int ret = paging_fault(addr, regs); >> if ( ret == EXCRET_fault_fixed ) >> trace_trap_two_addr(TRC_PV_PAGING_FIXUP, >regs->eip, addr); >> return ret; >> } >> >> Shouldn''t a check upon guest_mode(regs) is required, since the >> fault may be triggered by xen itself? For example, a null pointer >> deference by xen code (not passed from guest) may not get chance >> to reach "fatal page fault" and instead a guest page fault could be >> injected per my reading... > >Consider copy_from_guest() applied to a PV guest with dirty >logging enabled. >The #PF handler should fix up faults when accessing guest >address space via >shadow page tables, even when the access happens within Xen.If Xen access guest address space intentionally like a hypercall parameter, such fix up is desired. However what about an random illegal access in Xen with faulting address happening to fall into guest address space? Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 17/12/2008 08:32, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> Consider copy_from_guest() applied to a PV guest with dirty >> logging enabled. >> The #PF handler should fix up faults when accessing guest >> address space via >> shadow page tables, even when the access happens within Xen. > > If Xen access guest address space intentionally like a hypercall > parameter, such fix up is desired. However what about an random > illegal access in Xen with faulting address happening to fall into > guest address space?Well, HVM guests obviously have a separate address space, so no issue there. For a PV guest -- yes, Xen will then erroneously access guest address space instead of crashing. But this is no worse than what would happen if running without shadow page tables (i.e., dirty logging disabled). Fortunately Xen has no bugs. ;-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Wednesday, December 17, 2008 4:35 PM > >On 17/12/2008 08:32, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >>> Consider copy_from_guest() applied to a PV guest with dirty >>> logging enabled. >>> The #PF handler should fix up faults when accessing guest >>> address space via >>> shadow page tables, even when the access happens within Xen. >> >> If Xen access guest address space intentionally like a hypercall >> parameter, such fix up is desired. However what about an random >> illegal access in Xen with faulting address happening to fall into >> guest address space? > >Well, HVM guests obviously have a separate address space, so >no issue there. >For a PV guest -- yes, Xen will then erroneously access guest >address space >instead of crashing. But this is no worse than what would >happen if running >without shadow page tables (i.e., dirty logging disabled). >Fortunately Xen >has no bugs. ;-) >For PV, it looks OK since fixup guest address space also allows xen forwarding progress as xen/pv guest share one address space. However regarding to seperate address spaces in HVM shadow case, is it a wrong action to search shadow page table for page fault which is instead expected to be checked against monitor page table? It''s possible for one faulting address to have valid mapping in shadow, but not in monitor table, and then make faulting cpu in dead loop (fault, check shadow, re-execute, and fault again...). Above dead loop is observed when one of my colleague is fixing one xenoprof issue, where null pointer is not checked for de- reference in xen. Yes, the cause could be deduced by dumping cpu stack, but is it possible to check such condition and then throw out a ''fatal page fault'' in console which is more informative? Of course this is not bug issue, and more useful to developer. :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 17/12/2008 08:50, "Tian, Kevin" <kevin.tian@intel.com> wrote:> For PV, it looks OK since fixup guest address space also allows > xen forwarding progress as xen/pv guest share one address space. > However regarding to seperate address spaces in HVM shadow > case, is it a wrong action to search shadow page table for page > fault which is instead expected to be checked against monitor > page table? It''s possible for one faulting address to have valid > mapping in shadow, but not in monitor table, and then make > faulting cpu in dead loop (fault, check shadow, re-execute, and > fault again...). > > Above dead loop is observed when one of my colleague is fixing > one xenoprof issue, where null pointer is not checked for de- > reference in xen. Yes, the cause could be deduced by dumping > cpu stack, but is it possible to check such condition and then > throw out a ''fatal page fault'' in console which is more informative? > Of course this is not bug issue, and more useful to developer. :-)A Xen fault shouldn''t cause a lookup in guest tables for HVM guests. I think the issue here is actually that shadow code places some mapping of its own at address 0. We''ve had this issue before, where it stops NULL dereferences from crashing... It is surely something like that since most guests are (sensibly) not going to have a mapping at address 0. So it''s unlikely that a mapping has actually erroneously been propagated from the guest. CC''ing Tim and Gianluca. They probably know what this 0-based mapping is, and also whether it is feasible to move it. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Wednesday, December 17, 2008 5:04 PM > >On 17/12/2008 08:50, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> For PV, it looks OK since fixup guest address space also allows >> xen forwarding progress as xen/pv guest share one address space. >> However regarding to seperate address spaces in HVM shadow >> case, is it a wrong action to search shadow page table for page >> fault which is instead expected to be checked against monitor >> page table? It''s possible for one faulting address to have valid >> mapping in shadow, but not in monitor table, and then make >> faulting cpu in dead loop (fault, check shadow, re-execute, and >> fault again...). >> >> Above dead loop is observed when one of my colleague is fixing >> one xenoprof issue, where null pointer is not checked for de- >> reference in xen. Yes, the cause could be deduced by dumping >> cpu stack, but is it possible to check such condition and then >> throw out a ''fatal page fault'' in console which is more informative? >> Of course this is not bug issue, and more useful to developer. :-) > >A Xen fault shouldn''t cause a lookup in guest tables for HVM guests. >Agree. But it looks like current sh_page_fault only checks guest_mode in some paths. For example, my quick search seems to say no such check for fixed shadow fault path. Maybe an explicit check on guest mode at entry is clearer to state above guideline. But definitely I may overlook some lines...>I think the issue here is actually that shadow code places >some mapping of >its own at address 0. We''ve had this issue before, where it stops NULL >dereferences from crashing...Yeah, I recall that issue, which was from shadow linear mapping. :-)> >It is surely something like that since most guests are >(sensibly) not going >to have a mapping at address 0. So it''s unlikely that a >mapping has actually >erroneously been propagated from the guest. > >CC''ing Tim and Gianluca. They probably know what this 0-based >mapping is, >and also whether it is feasible to move it. >I''m also a bit suspicious now why address 0 doesn''t cause crash. May need more thinking. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Wednesday, December 17, 2008 5:04 PM > >On 17/12/2008 08:50, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> For PV, it looks OK since fixup guest address space also allows >> xen forwarding progress as xen/pv guest share one address space. >> However regarding to seperate address spaces in HVM shadow >> case, is it a wrong action to search shadow page table for page >> fault which is instead expected to be checked against monitor >> page table? It''s possible for one faulting address to have valid >> mapping in shadow, but not in monitor table, and then make >> faulting cpu in dead loop (fault, check shadow, re-execute, and >> fault again...). >> >> Above dead loop is observed when one of my colleague is fixing >> one xenoprof issue, where null pointer is not checked for de- >> reference in xen. Yes, the cause could be deduced by dumping >> cpu stack, but is it possible to check such condition and then >> throw out a ''fatal page fault'' in console which is more informative? >> Of course this is not bug issue, and more useful to developer. :-) > >A Xen fault shouldn''t cause a lookup in guest tables for HVM guests. > >I think the issue here is actually that shadow code places >some mapping of >its own at address 0. We''ve had this issue before, where it stops NULL >dereferences from crashing... > >It is surely something like that since most guests are >(sensibly) not going >to have a mapping at address 0. So it''s unlikely that a >mapping has actually >erroneously been propagated from the guest. >I''m told by Xiaowei that bug happens when guest is still in hvmloader, where faked 1:1 page table is used as guest page table. In that case sh_page_fault will enter fixed shadow fault path since page 0 mapping exists. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 09:04 +0000 on 17 Dec (1229504644), Keir Fraser wrote:> A Xen fault shouldn''t cause a lookup in guest tables for HVM guests.Looks like it does, though. :( The check in fixup_page_fault should probably be more like: if ( (paging_mode_external(d) && guest_mode(regs)) || (paging_mode_enabled(d) && !paging_mode_external(d)) ) That is, unconditionally call paging_fault() for PV guests (who shouldn''t be allowed map the 0 page, right?), and call for HVM guests only if the guest caused the fault. Potentially it can all be tidied up a bit by moving the other HVM-specific call out of the if (IN_HYPERVISOR_RANGE) block at the top.> I think the issue here is actually that shadow code places some mapping of > its own at address 0. We''ve had this issue before, where it stops NULL > dereferences from crashing...Yep; that was different, in that there was no page fault at all. But of course Xen has no bugs so the code was like that for years before anyone noticed. :) Cheers, Tim -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 17/12/2008 10:53, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:> At 09:04 +0000 on 17 Dec (1229504644), Keir Fraser wrote: >> A Xen fault shouldn''t cause a lookup in guest tables for HVM guests. > > Looks like it does, though. :( The check in fixup_page_fault should > probably be more like:But wouldn''t this just cause an infinite loop? Since shadow code would fix up the fault in the guest shadow tables, which the hypervisor is not running on? -- Keir> if ( (paging_mode_external(d) && guest_mode(regs)) > || (paging_mode_enabled(d) && !paging_mode_external(d)) ) > > That is, unconditionally call paging_fault() for PV guests (who > shouldn''t be allowed map the 0 page, right?), and call for HVM guests > only if the guest caused the fault. > > Potentially it can all be tidied up a bit by moving the other > HVM-specific call out of the if (IN_HYPERVISOR_RANGE) block at the top._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 17/12/2008 10:58, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> On 17/12/2008 10:53, "Tim Deegan" <Tim.Deegan@citrix.com> wrote: > >> At 09:04 +0000 on 17 Dec (1229504644), Keir Fraser wrote: >>> A Xen fault shouldn''t cause a lookup in guest tables for HVM guests. >> >> Looks like it does, though. :( The check in fixup_page_fault should >> probably be more like: > > But wouldn''t this just cause an infinite loop? Since shadow code would fix up > the fault in the guest shadow tables, which the hypervisor is not running on?For clarity: I don''t mean your patch would cause this. I mean the existing code would have this behaviour, if this were the problem. If you still think a patch is needed here then please send it along. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 10:58 +0000 on 17 Dec (1229511534), Keir Fraser wrote:> On 17/12/2008 10:53, "Tim Deegan" <Tim.Deegan@citrix.com> wrote: > > > At 09:04 +0000 on 17 Dec (1229504644), Keir Fraser wrote: > >> A Xen fault shouldn''t cause a lookup in guest tables for HVM guests. > > > > Looks like it does, though. :( The check in fixup_page_fault should > > probably be more like: > > But wouldn''t this just cause an infinite loop?I thought that was what Kevin was reporting.> Since shadow code would fix up the fault in the guest shadow tables, > which the hypervisor is not running on?Yes, exactly. But it would only do that for faulting accesses from Xen that (a) weren''t fixed up by any of the other mechanism, and (b) happened to be to addresses that were mapped in the guest tables. And Xen would never do such a thing. :) Tim.> -- Keir > > > if ( (paging_mode_external(d) && guest_mode(regs)) > > || (paging_mode_enabled(d) && !paging_mode_external(d)) ) > > > > That is, unconditionally call paging_fault() for PV guests (who > > shouldn''t be allowed map the 0 page, right?), and call for HVM guests > > only if the guest caused the fault. > > > > Potentially it can all be tidied up a bit by moving the other > > HVM-specific call out of the if (IN_HYPERVISOR_RANGE) block at the top. > >-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 11:02 +0000 on 17 Dec (1229511773), Keir Fraser wrote:> If you still think a patch is needed here then please send it along.I''m just getting started on testing one. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 17/12/2008 11:04, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:>>> At 09:04 +0000 on 17 Dec (1229504644), Keir Fraser wrote: >>>> A Xen fault shouldn''t cause a lookup in guest tables for HVM guests. >>> >>> Looks like it does, though. :( The check in fixup_page_fault should >>> probably be more like: >> >> But wouldn''t this just cause an infinite loop? > > I thought that was what Kevin was reporting.Oh yes, so he was. Duh. :-)>> Since shadow code would fix up the fault in the guest shadow tables, >> which the hypervisor is not running on? > > Yes, exactly. But it would only do that for faulting accesses from Xen > that (a) weren''t fixed up by any of the other mechanism, and (b) > happened to be to addresses that were mapped in the guest tables. And > Xen would never do such a thing. :)I certainly think that something like the patch you proposed would cause a better failure mode. Kevin: Can you try out what Tim proposed and if it indeed makes things better (at least, you get a proper crash!) we should get that checked in. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Don''t ever call the shadow code to fix a page fault in an external-mode guest if the fault came from Xen; it would be making changes to the wrong pagetables, potentially causing a pagefault loop in Xen. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> At 11:08 +0000 on 17 Dec (1229512101), Keir Fraser wrote:> I certainly think that something like the patch you proposed would cause a > better failure mode. > > Kevin: Can you try out what Tim proposed and if it indeed makes things > better (at least, you get a proper crash!) we should get that checked in. > > -- Keir-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Applied as c/s 18927, thanks. Will consider for backport to 3.3.2 if it holds up okay in xen-unstable. -- Keir On 17/12/2008 11:19, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:> Don''t ever call the shadow code to fix a page fault in an external-mode > guest if the fault came from Xen; it would be making changes to the > wrong pagetables, potentially causing a pagefault loop in Xen. > > Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> > > > At 11:08 +0000 on 17 Dec (1229512101), Keir Fraser wrote: >> I certainly think that something like the patch you proposed would cause a >> better failure mode. >> >> Kevin: Can you try out what Tim proposed and if it indeed makes things >> better (at least, you get a proper crash!) we should get that checked in. >> >> -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Wednesday, December 17, 2008 7:08 PM > >I certainly think that something like the patch you proposed >would cause a >better failure mode. > >Kevin: Can you try out what Tim proposed and if it indeed makes things >better (at least, you get a proper crash!) we should get that >checked in. >Sure. Since you''ve checked Tim''s patch in, we could try against latest tree. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Tian, Kevin >Sent: Thursday, December 18, 2008 8:07 AM > >>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >>Sent: Wednesday, December 17, 2008 7:08 PM >> >>I certainly think that something like the patch you proposed >>would cause a >>better failure mode. >> >>Kevin: Can you try out what Tim proposed and if it indeed makes things >>better (at least, you get a proper crash!) we should get that >>checked in. >> > >Sure. Since you''ve checked Tim''s patch in, we could try >against latest tree. >Well, it crashes immediately as expected now. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel