Razvan Cojocaru
2013-Dec-03 15:06 UTC
Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
Hello, here''s the setup: a Windows HVM domU and a Linux PV domU. The Linux domU wants to map pages from the Windows domU. No XSM involved. The Linux domU is perfectly able to map (using xc_map_foreign_range()) pages from the Windows domU, except for pages below 1M. For pages below 1M, it returns "invalid argument". The same code, trying to map the exact same pages, does succeed, however, if the application trying to map those pages runs from dom0. Why is this happening, and can anything be done about it so that the Linux domU becomes able to map those pages from the HVM Windows domU? Thanks, Razvan
Ian Campbell
2013-Dec-03 15:51 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
On Tue, 2013-12-03 at 17:06 +0200, Razvan Cojocaru wrote:> Hello, > > here''s the setup: a Windows HVM domU and a Linux PV domU. The Linux > domU wants to map pages from the Windows domU. No XSM involved. > > The Linux domU is perfectly able to map (using xc_map_foreign_range()) > pages from the Windows domU, except for pages below 1M.With no XSM how does it have the privilege to do this?> For pages > below 1M, it returns "invalid argument". The same code, trying to map > the exact same pages, does succeed, however, if the application trying > to map those pages runs from dom0.For dom0 it works because by default dom0 has the foreign mapping privilege.> Why is this happening, and can anything be done about it so that the > Linux domU becomes able to map those pages from the HVM Windows domU? > > > Thanks, > Razvan > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Razvan Cojocaru
2013-Dec-03 15:59 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
>> The Linux domU is perfectly able to map (using xc_map_foreign_range()) >> pages from the Windows domU, except for pages below 1M. > > With no XSM how does it have the privilege to do this?What I meant to say is that the domU is being allowed to do this sort of thing, i.e. the problem is definitely not caused by XSM.>> For pages >> below 1M, it returns "invalid argument". The same code, trying to map >> the exact same pages, does succeed, however, if the application trying >> to map those pages runs from dom0. > > For dom0 it works because by default dom0 has the foreign mapping > privilege.OK, and can the foreign mapping privilege be extended to the domU so that it can go about mapping pages under 1M? Is there some way this can be achieved with xl, or even by hacking the HV source code somehow? Thanks, Razvan
Ian Campbell
2013-Dec-03 16:09 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
On Tue, 2013-12-03 at 17:59 +0200, Razvan Cojocaru wrote:> >> The Linux domU is perfectly able to map (using xc_map_foreign_range()) > >> pages from the Windows domU, except for pages below 1M. > > > > With no XSM how does it have the privilege to do this? > > What I meant to say is that the domU is being allowed to do this sort > of thing, i.e. the problem is definitely not caused by XSM.OK, so XSM is involved but you are 101% certain that it is not preventing the mappings?> > >> For pages > >> below 1M, it returns "invalid argument". The same code, trying to map > >> the exact same pages, does succeed, however, if the application trying > >> to map those pages runs from dom0. > > > > For dom0 it works because by default dom0 has the foreign mapping > > privilege. > > OK, and can the foreign mapping privilege be extended to the domU so > that it can go about mapping pages under 1M?AFAIK the foreign mapping privilege should already allow this. You have just uncovered a bug somewhere. I''m afraid I don''t know where, it might be in your code, in libxc, in the privcmd ioctl driver or in the hypervisor. You probably need to instrument things up down the call stack to find out where these attempts are getting rejected.> Is there some way this > can be achieved with xl, or even by hacking the HV source code > somehow?You need to diagnose and fix the bug I think. Ian.
Razvan Cojocaru
2013-Dec-03 16:18 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
> OK, so XSM is involved but you are 101% certain that it is not > preventing the mappings?Yes, I really am :)> AFAIK the foreign mapping privilege should already allow this. You have > just uncovered a bug somewhere. I''m afraid I don''t know where, it might > be in your code, in libxc, in the privcmd ioctl driver or in the > hypervisor. > > You probably need to instrument things up down the call stack to find > out where these attempts are getting rejected.Right. Thanks!
Tomasz Wroblewski
2013-Dec-03 17:36 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
On 12/03/2013 05:09 PM, Ian Campbell wrote:> On Tue, 2013-12-03 at 17:59 +0200, Razvan Cojocaru wrote: >>>> The Linux domU is perfectly able to map (using xc_map_foreign_range()) >>>> pages from the Windows domU, except for pages below 1M. >>> >>> With no XSM how does it have the privilege to do this? >> >> What I meant to say is that the domU is being allowed to do this sort >> of thing, i.e. the problem is definitely not caused by XSM. > > OK, so XSM is involved but you are 101% certain that it is not > preventing the mappings? >We''ve ran into this issue in xenclient recently too, when we finally upgraded stubdomain''s kernel to pvops version. It seems pvops kernel contains safeguard to only allow <1M mappings if it''s dom0 (xen_initial_domain()). This check is placed in arch/x86/xen/mmu.c: static pte_t xen_make_pte(pteval_t pte) { phys_addr_t addr = (pte & PTE_PFN_MASK); ... /* * Unprivileged domains are allowed to do IOMAPpings for * PCI passthrough, but not map ISA space. The ISA * mappings are just dummy local mappings to keep other * parts of the kernel happy. */ if (unlikely(pte & _PAGE_IOMAP) && (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { pte = iomap_pte(pte); } else { pte &= ~_PAGE_IOMAP; pte = pte_pfn_to_mfn(pte); } return native_make_pte(pte); } We patched this out (in a fugly and probably not very correct way), for our stubdomain kernel, since we needed our stubdomain qemu vms to be able to map windows guest <1M range (since qemu needs to be able to write data and read data there in order to chat with seabios etc). Maybe Konrad (CC''ed) knows why the check is there in guest kernel, and a good way to solve this. I think the goal of check was to only stop <1M mapping of its own memory in order to stop pvops kernel boot messing it, but by ricochet it also prevents mapping of foreign domain <1M ranges...
Razvan Cojocaru
2013-Dec-03 18:59 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
> We''ve ran into this issue in xenclient recently too, when we finally > upgraded stubdomain''s kernel to pvops version. It seems pvops kernel > contains safeguard to only allow <1M mappings if it''s dom0 > (xen_initial_domain()). This check is placed in arch/x86/xen/mmu.c:Thanks Tomasz! That''s a great lead.
Konrad Rzeszutek Wilk
2013-Dec-03 19:07 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
On Tue, Dec 03, 2013 at 06:36:48PM +0100, Tomasz Wroblewski wrote:> On 12/03/2013 05:09 PM, Ian Campbell wrote: > >On Tue, 2013-12-03 at 17:59 +0200, Razvan Cojocaru wrote: > >>>>The Linux domU is perfectly able to map (using xc_map_foreign_range()) > >>>>pages from the Windows domU, except for pages below 1M. > >>> > >>>With no XSM how does it have the privilege to do this? > >> > >>What I meant to say is that the domU is being allowed to do this sort > >>of thing, i.e. the problem is definitely not caused by XSM. > > > >OK, so XSM is involved but you are 101% certain that it is not > >preventing the mappings? > > > We''ve ran into this issue in xenclient recently too, when we finally > upgraded stubdomain''s kernel to pvops version. It seems pvops kernel > contains safeguard to only allow <1M mappings if it''s dom0 > (xen_initial_domain()). This check is placed in arch/x86/xen/mmu.c: > > static pte_t xen_make_pte(pteval_t pte) > { > phys_addr_t addr = (pte & PTE_PFN_MASK); > > ... > /* > * Unprivileged domains are allowed to do IOMAPpings for > * PCI passthrough, but not map ISA space. The ISA > * mappings are just dummy local mappings to keep other > * parts of the kernel happy. > */ > if (unlikely(pte & _PAGE_IOMAP) && > (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { > pte = iomap_pte(pte); > } else { > pte &= ~_PAGE_IOMAP; > pte = pte_pfn_to_mfn(pte); > } > > return native_make_pte(pte); > } > > We patched this out (in a fugly and probably not very correct way), > for our stubdomain kernel, since we needed our stubdomain qemu vms > to be able to map windows guest <1M range (since qemu needs to be > able to write data and read data there in order to chat with seabios > etc). Maybe Konrad (CC''ed) knows why the check is there in guest > kernel, and a good way to solve this.For PV domU guests the ISA are usually RAM - so you don''t want during early bootup of a PV guest for it to scan MFNs it does not have access to. Granted it does not have access to them but it would have the MFNs coded in and any access to that area will result in .. Xen "fixing" up the PTEs (I can''t recall exaclty how). If you boot a PV Guest and remove the: (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { do you see anything that in the Xen console?> > I think the goal of check was to only stop <1M mapping of its own > memory in order to stop pvops kernel boot messing it, but by > ricochet it also prevents mapping of foreign domain <1M ranges...Duh! That was certainly unintentional.> > >
Tomasz Wroblewski
2013-Dec-04 10:24 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
On 12/03/2013 08:07 PM, Konrad Rzeszutek Wilk wrote:> On Tue, Dec 03, 2013 at 06:36:48PM +0100, Tomasz Wroblewski wrote: >> On 12/03/2013 05:09 PM, Ian Campbell wrote: >>> On Tue, 2013-12-03 at 17:59 +0200, Razvan Cojocaru wrote: >>>>>> The Linux domU is perfectly able to map (using xc_map_foreign_range()) >>>>>> pages from the Windows domU, except for pages below 1M. >>>>> >>>>> With no XSM how does it have the privilege to do this? >>>> >>>> What I meant to say is that the domU is being allowed to do this sort >>>> of thing, i.e. the problem is definitely not caused by XSM. >>> >>> OK, so XSM is involved but you are 101% certain that it is not >>> preventing the mappings? >>> >> We''ve ran into this issue in xenclient recently too, when we finally >> upgraded stubdomain''s kernel to pvops version. It seems pvops kernel >> contains safeguard to only allow <1M mappings if it''s dom0 >> (xen_initial_domain()). This check is placed in arch/x86/xen/mmu.c: >> >> static pte_t xen_make_pte(pteval_t pte) >> { >> phys_addr_t addr = (pte & PTE_PFN_MASK); >> >> ... >> /* >> * Unprivileged domains are allowed to do IOMAPpings for >> * PCI passthrough, but not map ISA space. The ISA >> * mappings are just dummy local mappings to keep other >> * parts of the kernel happy. >> */ >> if (unlikely(pte & _PAGE_IOMAP) && >> (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { >> pte = iomap_pte(pte); >> } else { >> pte &= ~_PAGE_IOMAP; >> pte = pte_pfn_to_mfn(pte); >> } >> >> return native_make_pte(pte); >> } >> >> We patched this out (in a fugly and probably not very correct way), >> for our stubdomain kernel, since we needed our stubdomain qemu vms >> to be able to map windows guest <1M range (since qemu needs to be >> able to write data and read data there in order to chat with seabios >> etc). Maybe Konrad (CC''ed) knows why the check is there in guest >> kernel, and a good way to solve this. > > For PV domU guests the ISA are usually RAM - so you don''t want during > early bootup of a PV guest for it to scan MFNs it does not have access > to. Granted it does not have access to them but it would have the > MFNs coded in and any access to that area will result in .. Xen > "fixing" up the PTEs (I can''t recall exaclty how). > > If you boot a PV Guest and remove the: > (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { > > do you see anything that in the Xen console? >I recall I wasn''t seeing anything, the pv domU was just hanging super early in the boot then. The way we worked around it is via attached patch (applied to PV domU''s kernel, in our case stubdom hosting qemu process). It keeps the <1M safeguard for local mapping but allows foreign mappings (detected via _PAGE_SPECIAL flag). Razvan, you can try attached patch as well applied to your pv domU kernel to see if it helps you.>> >> I think the goal of check was to only stop <1M mapping of its own >> memory in order to stop pvops kernel boot messing it, but by >> ricochet it also prevents mapping of foreign domain <1M ranges... > > Duh! That was certainly unintentional. > >> >> >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Dec-04 10:31 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
>>> On 04.12.13 at 11:24, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote: > On 12/03/2013 08:07 PM, Konrad Rzeszutek Wilk wrote: >> On Tue, Dec 03, 2013 at 06:36:48PM +0100, Tomasz Wroblewski wrote: >>> On 12/03/2013 05:09 PM, Ian Campbell wrote: >>>> On Tue, 2013-12-03 at 17:59 +0200, Razvan Cojocaru wrote: >>>>>>> The Linux domU is perfectly able to map (using xc_map_foreign_range()) >>>>>>> pages from the Windows domU, except for pages below 1M. >>>>>> >>>>>> With no XSM how does it have the privilege to do this? >>>>> >>>>> What I meant to say is that the domU is being allowed to do this sort >>>>> of thing, i.e. the problem is definitely not caused by XSM. >>>> >>>> OK, so XSM is involved but you are 101% certain that it is not >>>> preventing the mappings? >>>> >>> We''ve ran into this issue in xenclient recently too, when we finally >>> upgraded stubdomain''s kernel to pvops version. It seems pvops kernel >>> contains safeguard to only allow <1M mappings if it''s dom0 >>> (xen_initial_domain()). This check is placed in arch/x86/xen/mmu.c: >>> >>> static pte_t xen_make_pte(pteval_t pte) >>> { >>> phys_addr_t addr = (pte & PTE_PFN_MASK); >>> >>> ... >>> /* >>> * Unprivileged domains are allowed to do IOMAPpings for >>> * PCI passthrough, but not map ISA space. The ISA >>> * mappings are just dummy local mappings to keep other >>> * parts of the kernel happy. >>> */ >>> if (unlikely(pte & _PAGE_IOMAP) && >>> (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { >>> pte = iomap_pte(pte); >>> } else { >>> pte &= ~_PAGE_IOMAP; >>> pte = pte_pfn_to_mfn(pte); >>> } >>> >>> return native_make_pte(pte); >>> } >>> >>> We patched this out (in a fugly and probably not very correct way), >>> for our stubdomain kernel, since we needed our stubdomain qemu vms >>> to be able to map windows guest <1M range (since qemu needs to be >>> able to write data and read data there in order to chat with seabios >>> etc). Maybe Konrad (CC''ed) knows why the check is there in guest >>> kernel, and a good way to solve this. >> >> For PV domU guests the ISA are usually RAM - so you don''t want during >> early bootup of a PV guest for it to scan MFNs it does not have access >> to. Granted it does not have access to them but it would have the >> MFNs coded in and any access to that area will result in .. Xen >> "fixing" up the PTEs (I can''t recall exaclty how). >> >> If you boot a PV Guest and remove the: >> (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { >> >> do you see anything that in the Xen console? >> > I recall I wasn''t seeing anything, the pv domU was just hanging super early > in the boot then. The way we worked around it is via attached > patch (applied to PV domU''s kernel, in our case stubdom hosting qemu > process). It keeps the <1M safeguard for local mapping but allows > foreign mappings (detected via _PAGE_SPECIAL flag).I''ve been following this thread, with each new response making it less clear what is being talked about here: The original request was to map the MFN backing a guest''s PFN below 1M. That says nothing about the value of the MFN (and iirc Xen doesn''t allocate MFNs from the first 1M to any guest on x86). Yet the safe guard ought to be dealing with a specific MFN range only. Can someone explain what I''m missing here? Jan
Ian Campbell
2013-Dec-04 10:39 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
On Wed, 2013-12-04 at 10:31 +0000, Jan Beulich wrote:> >>> On 04.12.13 at 11:24, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote: > > On 12/03/2013 08:07 PM, Konrad Rzeszutek Wilk wrote: > >> On Tue, Dec 03, 2013 at 06:36:48PM +0100, Tomasz Wroblewski wrote: > >>> On 12/03/2013 05:09 PM, Ian Campbell wrote: > >>>> On Tue, 2013-12-03 at 17:59 +0200, Razvan Cojocaru wrote: > >>>>>>> The Linux domU is perfectly able to map (using xc_map_foreign_range()) > >>>>>>> pages from the Windows domU, except for pages below 1M. > >>>>>> > >>>>>> With no XSM how does it have the privilege to do this? > >>>>> > >>>>> What I meant to say is that the domU is being allowed to do this sort > >>>>> of thing, i.e. the problem is definitely not caused by XSM. > >>>> > >>>> OK, so XSM is involved but you are 101% certain that it is not > >>>> preventing the mappings? > >>>> > >>> We''ve ran into this issue in xenclient recently too, when we finally > >>> upgraded stubdomain''s kernel to pvops version. It seems pvops kernel > >>> contains safeguard to only allow <1M mappings if it''s dom0 > >>> (xen_initial_domain()). This check is placed in arch/x86/xen/mmu.c: > >>> > >>> static pte_t xen_make_pte(pteval_t pte) > >>> { > >>> phys_addr_t addr = (pte & PTE_PFN_MASK); > >>> > >>> ... > >>> /* > >>> * Unprivileged domains are allowed to do IOMAPpings for > >>> * PCI passthrough, but not map ISA space. The ISA > >>> * mappings are just dummy local mappings to keep other > >>> * parts of the kernel happy. > >>> */ > >>> if (unlikely(pte & _PAGE_IOMAP) && > >>> (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { > >>> pte = iomap_pte(pte); > >>> } else { > >>> pte &= ~_PAGE_IOMAP; > >>> pte = pte_pfn_to_mfn(pte); > >>> } > >>> > >>> return native_make_pte(pte); > >>> } > >>> > >>> We patched this out (in a fugly and probably not very correct way), > >>> for our stubdomain kernel, since we needed our stubdomain qemu vms > >>> to be able to map windows guest <1M range (since qemu needs to be > >>> able to write data and read data there in order to chat with seabios > >>> etc). Maybe Konrad (CC''ed) knows why the check is there in guest > >>> kernel, and a good way to solve this. > >> > >> For PV domU guests the ISA are usually RAM - so you don''t want during > >> early bootup of a PV guest for it to scan MFNs it does not have access > >> to. Granted it does not have access to them but it would have the > >> MFNs coded in and any access to that area will result in .. Xen > >> "fixing" up the PTEs (I can''t recall exaclty how). > >> > >> If you boot a PV Guest and remove the: > >> (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { > >> > >> do you see anything that in the Xen console? > >> > > I recall I wasn''t seeing anything, the pv domU was just hanging super early > > in the boot then. The way we worked around it is via attached > > patch (applied to PV domU''s kernel, in our case stubdom hosting qemu > > process). It keeps the <1M safeguard for local mapping but allows > > foreign mappings (detected via _PAGE_SPECIAL flag). > > I''ve been following this thread, with each new response making it > less clear what is being talked about here: The original request > was to map the MFN backing a guest''s PFN below 1M. That says > nothing about the value of the MFN (and iirc Xen doesn''t allocate > MFNs from the first 1M to any guest on x86). Yet the safe guard > ought to be dealing with a specific MFN range only. > > Can someone explain what I''m missing here?I believe the intention is to catch domain 0''s 1:1 mapping of the first 1M of host RAM. Ian.
Jan Beulich
2013-Dec-04 10:42 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
>>> On 04.12.13 at 11:39, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Wed, 2013-12-04 at 10:31 +0000, Jan Beulich wrote: >> >>> On 04.12.13 at 11:24, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote: >> > On 12/03/2013 08:07 PM, Konrad Rzeszutek Wilk wrote: >> >> On Tue, Dec 03, 2013 at 06:36:48PM +0100, Tomasz Wroblewski wrote: >> >>> On 12/03/2013 05:09 PM, Ian Campbell wrote: >> >>>> On Tue, 2013-12-03 at 17:59 +0200, Razvan Cojocaru wrote: >> >>>>>>> The Linux domU is perfectly able to map (using xc_map_foreign_range()) >> >>>>>>> pages from the Windows domU, except for pages below 1M. >> >>>>>> >> >>>>>> With no XSM how does it have the privilege to do this? >> >>>>> >> >>>>> What I meant to say is that the domU is being allowed to do this sort >> >>>>> of thing, i.e. the problem is definitely not caused by XSM. >> >>>> >> >>>> OK, so XSM is involved but you are 101% certain that it is not >> >>>> preventing the mappings? >> >>>> >> >>> We''ve ran into this issue in xenclient recently too, when we finally >> >>> upgraded stubdomain''s kernel to pvops version. It seems pvops kernel >> >>> contains safeguard to only allow <1M mappings if it''s dom0 >> >>> (xen_initial_domain()). This check is placed in arch/x86/xen/mmu.c: >> >>> >> >>> static pte_t xen_make_pte(pteval_t pte) >> >>> { >> >>> phys_addr_t addr = (pte & PTE_PFN_MASK); >> >>> >> >>> ... >> >>> /* >> >>> * Unprivileged domains are allowed to do IOMAPpings for >> >>> * PCI passthrough, but not map ISA space. The ISA >> >>> * mappings are just dummy local mappings to keep other >> >>> * parts of the kernel happy. >> >>> */ >> >>> if (unlikely(pte & _PAGE_IOMAP) && >> >>> (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { >> >>> pte = iomap_pte(pte); >> >>> } else { >> >>> pte &= ~_PAGE_IOMAP; >> >>> pte = pte_pfn_to_mfn(pte); >> >>> } >> >>> >> >>> return native_make_pte(pte); >> >>> } >> >>> >> >>> We patched this out (in a fugly and probably not very correct way), >> >>> for our stubdomain kernel, since we needed our stubdomain qemu vms >> >>> to be able to map windows guest <1M range (since qemu needs to be >> >>> able to write data and read data there in order to chat with seabios >> >>> etc). Maybe Konrad (CC''ed) knows why the check is there in guest >> >>> kernel, and a good way to solve this. >> >> >> >> For PV domU guests the ISA are usually RAM - so you don''t want during >> >> early bootup of a PV guest for it to scan MFNs it does not have access >> >> to. Granted it does not have access to them but it would have the >> >> MFNs coded in and any access to that area will result in .. Xen >> >> "fixing" up the PTEs (I can''t recall exaclty how). >> >> >> >> If you boot a PV Guest and remove the: >> >> (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { >> >> >> >> do you see anything that in the Xen console? >> >> >> > I recall I wasn''t seeing anything, the pv domU was just hanging super early > >> > in the boot then. The way we worked around it is via attached >> > patch (applied to PV domU''s kernel, in our case stubdom hosting qemu >> > process). It keeps the <1M safeguard for local mapping but allows >> > foreign mappings (detected via _PAGE_SPECIAL flag). >> >> I''ve been following this thread, with each new response making it >> less clear what is being talked about here: The original request >> was to map the MFN backing a guest''s PFN below 1M. That says >> nothing about the value of the MFN (and iirc Xen doesn''t allocate >> MFNs from the first 1M to any guest on x86). Yet the safe guard >> ought to be dealing with a specific MFN range only. >> >> Can someone explain what I''m missing here? > > I believe the intention is to catch domain 0''s 1:1 mapping of the first > 1M of host RAM.But iirc Razvan started out with wanting to map PFNs inside a Windows guest. Jan
Ian Campbell
2013-Dec-04 10:45 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
On Wed, 2013-12-04 at 10:42 +0000, Jan Beulich wrote:> >>> On 04.12.13 at 11:39, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Wed, 2013-12-04 at 10:31 +0000, Jan Beulich wrote: > >> >>> On 04.12.13 at 11:24, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote: > >> > On 12/03/2013 08:07 PM, Konrad Rzeszutek Wilk wrote: > >> >> On Tue, Dec 03, 2013 at 06:36:48PM +0100, Tomasz Wroblewski wrote: > >> >>> On 12/03/2013 05:09 PM, Ian Campbell wrote: > >> >>>> On Tue, 2013-12-03 at 17:59 +0200, Razvan Cojocaru wrote: > >> >>>>>>> The Linux domU is perfectly able to map (using xc_map_foreign_range()) > >> >>>>>>> pages from the Windows domU, except for pages below 1M. > >> >>>>>> > >> >>>>>> With no XSM how does it have the privilege to do this? > >> >>>>> > >> >>>>> What I meant to say is that the domU is being allowed to do this sort > >> >>>>> of thing, i.e. the problem is definitely not caused by XSM. > >> >>>> > >> >>>> OK, so XSM is involved but you are 101% certain that it is not > >> >>>> preventing the mappings? > >> >>>> > >> >>> We''ve ran into this issue in xenclient recently too, when we finally > >> >>> upgraded stubdomain''s kernel to pvops version. It seems pvops kernel > >> >>> contains safeguard to only allow <1M mappings if it''s dom0 > >> >>> (xen_initial_domain()). This check is placed in arch/x86/xen/mmu.c: > >> >>> > >> >>> static pte_t xen_make_pte(pteval_t pte) > >> >>> { > >> >>> phys_addr_t addr = (pte & PTE_PFN_MASK); > >> >>> > >> >>> ... > >> >>> /* > >> >>> * Unprivileged domains are allowed to do IOMAPpings for > >> >>> * PCI passthrough, but not map ISA space. The ISA > >> >>> * mappings are just dummy local mappings to keep other > >> >>> * parts of the kernel happy. > >> >>> */ > >> >>> if (unlikely(pte & _PAGE_IOMAP) && > >> >>> (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { > >> >>> pte = iomap_pte(pte); > >> >>> } else { > >> >>> pte &= ~_PAGE_IOMAP; > >> >>> pte = pte_pfn_to_mfn(pte); > >> >>> } > >> >>> > >> >>> return native_make_pte(pte); > >> >>> } > >> >>> > >> >>> We patched this out (in a fugly and probably not very correct way), > >> >>> for our stubdomain kernel, since we needed our stubdomain qemu vms > >> >>> to be able to map windows guest <1M range (since qemu needs to be > >> >>> able to write data and read data there in order to chat with seabios > >> >>> etc). Maybe Konrad (CC''ed) knows why the check is there in guest > >> >>> kernel, and a good way to solve this. > >> >> > >> >> For PV domU guests the ISA are usually RAM - so you don''t want during > >> >> early bootup of a PV guest for it to scan MFNs it does not have access > >> >> to. Granted it does not have access to them but it would have the > >> >> MFNs coded in and any access to that area will result in .. Xen > >> >> "fixing" up the PTEs (I can''t recall exaclty how). > >> >> > >> >> If you boot a PV Guest and remove the: > >> >> (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { > >> >> > >> >> do you see anything that in the Xen console? > >> >> > >> > I recall I wasn''t seeing anything, the pv domU was just hanging super early > > > >> > in the boot then. The way we worked around it is via attached > >> > patch (applied to PV domU''s kernel, in our case stubdom hosting qemu > >> > process). It keeps the <1M safeguard for local mapping but allows > >> > foreign mappings (detected via _PAGE_SPECIAL flag). > >> > >> I''ve been following this thread, with each new response making it > >> less clear what is being talked about here: The original request > >> was to map the MFN backing a guest''s PFN below 1M. That says > >> nothing about the value of the MFN (and iirc Xen doesn''t allocate > >> MFNs from the first 1M to any guest on x86). Yet the safe guard > >> ought to be dealing with a specific MFN range only. > >> > >> Can someone explain what I''m missing here? > > > > I believe the intention is to catch domain 0''s 1:1 mapping of the first > > 1M of host RAM. > > But iirc Razvan started out with wanting to map PFNs inside a > Windows guest.Correct. The check for mapping domain 0''s 1:1 map is overly broad I think, and erroneously prevents a domU from mapping a foreign PFN < 1M. Ian.
Jan Beulich
2013-Dec-04 10:54 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
>>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote: > Correct. The check for mapping domain 0''s 1:1 map is overly broad I > think, and erroneously prevents a domU from mapping a foreign PFN < 1M.But that''s the source of my not understanding: xen_make_pte() derives addr from the passed in pte, and that pte can - for a foreign domain''s page - hardly hold a PFN. Otherwise how would the translation to MFN be supposed to happen? Yet, if it''s a machine address that''s coming in, it can''t point into the low 1Mb. Jan
Ian Campbell
2013-Dec-04 11:04 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote:> >>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > Correct. The check for mapping domain 0''s 1:1 map is overly broad I > > think, and erroneously prevents a domU from mapping a foreign PFN < 1M. > > But that''s the source of my not understanding: xen_make_pte() > derives addr from the passed in pte, and that pte can - for a > foreign domain''s page - hardly hold a PFN. Otherwise how would > the translation to MFN be supposed to happen? Yet, if it''s a > machine address that''s coming in, it can''t point into the low 1Mb.Isn''t it a foreign gpfn at this point, which for an HVM guest is actually a PFN not an MFN? You are making me think I might be talking out my a**e though, because what is a foreign mapping even doing in xen_make_pte -- those need to be instantiated in a special way. Ian.
Tomasz Wroblewski
2013-Dec-04 11:23 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
On 12/04/2013 12:04 PM, Ian Campbell wrote:> On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote: >>>>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote: >>> Correct. The check for mapping domain 0''s 1:1 map is overly broad I >>> think, and erroneously prevents a domU from mapping a foreign PFN < 1M. >> >> But that''s the source of my not understanding: xen_make_pte() >> derives addr from the passed in pte, and that pte can - for a >> foreign domain''s page - hardly hold a PFN. Otherwise how would >> the translation to MFN be supposed to happen? Yet, if it''s a >> machine address that''s coming in, it can''t point into the low 1Mb. > > Isn''t it a foreign gpfn at this point, which for an HVM guest is > actually a PFN not an MFN? > > You are making me think I might be talking out my a**e though, because > what is a foreign mapping even doing in xen_make_pte -- those need to be > instantiated in a special way. >I believe the callpath for this is xen_remap_domain_range() (mmu.c) | v remap_area_pfn_pte() (mmu.c) | v pfn_pte() (somewhere, one of the pgtable.h hdrs) | v __pte() (paravirt.h) | v xen_make_pte (mmu.c) via pv_mmu_ops.make_pte Sorry, can''t offer much insight as to why addr in pte holds the hvm''s PFN, but it seems the case.
Jan Beulich
2013-Dec-04 11:36 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
>>> On 04.12.13 at 12:23, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote: > On 12/04/2013 12:04 PM, Ian Campbell wrote: >> On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote: >>>>>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote: >>>> Correct. The check for mapping domain 0''s 1:1 map is overly broad I >>>> think, and erroneously prevents a domU from mapping a foreign PFN < 1M. >>> >>> But that''s the source of my not understanding: xen_make_pte() >>> derives addr from the passed in pte, and that pte can - for a >>> foreign domain''s page - hardly hold a PFN. Otherwise how would >>> the translation to MFN be supposed to happen? Yet, if it''s a >>> machine address that''s coming in, it can''t point into the low 1Mb. >> >> Isn''t it a foreign gpfn at this point, which for an HVM guest is >> actually a PFN not an MFN? >> >> You are making me think I might be talking out my a**e though, because >> what is a foreign mapping even doing in xen_make_pte -- those need to be >> instantiated in a special way. >> > I believe the callpath for this is > > xen_remap_domain_range() (mmu.c) > | > v > remap_area_pfn_pte() (mmu.c) > | > v > pfn_pte() (somewhere, one of the pgtable.h hdrs) > | > v > __pte() (paravirt.h) > | > v > xen_make_pte (mmu.c) via pv_mmu_ops.make_pte > > Sorry, can''t offer much insight as to why addr in pte holds the hvm''s PFN, > but it seems the case.But that''s a fundamental thing to explain. As Ian says - foreign PFNs shouldn''t make it here, or else how do you know how to translate them to MFNs (as you can''t consult the local P2M table to do so)? Jan
Mihai Donțu
2013-Dec-04 11:42 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
On Wed, 4 Dec 2013 11:24:21 +0100 Tomasz Wroblewski wrote:> >> We've ran into this issue in xenclient recently too, when we > >> finally upgraded stubdomain's kernel to pvops version. It seems > >> pvops kernel contains safeguard to only allow <1M mappings if it's > >> dom0 (xen_initial_domain()). This check is placed in > >> arch/x86/xen/mmu.c: > >> > >> static pte_t xen_make_pte(pteval_t pte) > >> { > >> phys_addr_t addr = (pte & PTE_PFN_MASK); > >> > >> ... > >> /* > >> * Unprivileged domains are allowed to do IOMAPpings for > >> * PCI passthrough, but not map ISA space. The ISA > >> * mappings are just dummy local mappings to keep other > >> * parts of the kernel happy. > >> */ > >> if (unlikely(pte & _PAGE_IOMAP) && > >> (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { > >> pte = iomap_pte(pte); > >> } else { > >> pte &= ~_PAGE_IOMAP; > >> pte = pte_pfn_to_mfn(pte); > >> } > >> > >> return native_make_pte(pte); > >> } > >> > >> We patched this out (in a fugly and probably not very correct way), > >> for our stubdomain kernel, since we needed our stubdomain qemu vms > >> to be able to map windows guest <1M range (since qemu needs to be > >> able to write data and read data there in order to chat with > >> seabios etc). Maybe Konrad (CC'ed) knows why the check is there in > >> guest kernel, and a good way to solve this. > > > > For PV domU guests the ISA are usually RAM - so you don't want > > during early bootup of a PV guest for it to scan MFNs it does not > > have access to. Granted it does not have access to them but it > > would have the MFNs coded in and any access to that area will > > result in .. Xen "fixing" up the PTEs (I can't recall exaclty how). > > > > If you boot a PV Guest and remove the: > > (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { > > > > do you see anything that in the Xen console? > > > I recall I wasn't seeing anything, the pv domU was just hanging super > early in the boot then. The way we worked around it is via attached > patch (applied to PV domU's kernel, in our case stubdom hosting qemu > process). It keeps the <1M safeguard for local mapping but allows > foreign mappings (detected via _PAGE_SPECIAL flag). > > Razvan, you can try attached patch as well applied to your pv domU > kernel to see if it helps you. >Razvan and I are working together to find a solution to this. I took your patch for a spin and while that code path is taken when invoking xc_map_foreign_range(), the call still fails with EINVAL. I haven't yet determined if the call stops in the domU kernel or it reaches xen and gets terminated there. I've tried this on Ubuntu's 3.8. on top of XenServer's xen-4.3.1. Thanks, -- Mihai Donțu _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Tomasz Wroblewski
2013-Dec-04 12:01 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
On 12/04/2013 12:36 PM, Jan Beulich wrote:>>>> On 04.12.13 at 12:23, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote: >> On 12/04/2013 12:04 PM, Ian Campbell wrote: >>> On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote: >>>>>>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote: >>>>> Correct. The check for mapping domain 0''s 1:1 map is overly broad I >>>>> think, and erroneously prevents a domU from mapping a foreign PFN < 1M. >>>> >>>> But that''s the source of my not understanding: xen_make_pte() >>>> derives addr from the passed in pte, and that pte can - for a >>>> foreign domain''s page - hardly hold a PFN. Otherwise how would >>>> the translation to MFN be supposed to happen? Yet, if it''s a >>>> machine address that''s coming in, it can''t point into the low 1Mb. >>> >>> Isn''t it a foreign gpfn at this point, which for an HVM guest is >>> actually a PFN not an MFN? >>> >>> You are making me think I might be talking out my a**e though, because >>> what is a foreign mapping even doing in xen_make_pte -- those need to be >>> instantiated in a special way. >>> >> I believe the callpath for this is >> >> xen_remap_domain_range() (mmu.c) >> | >> v >> remap_area_pfn_pte() (mmu.c) >> | >> v >> pfn_pte() (somewhere, one of the pgtable.h hdrs) >> | >> v >> __pte() (paravirt.h) >> | >> v >> xen_make_pte (mmu.c) via pv_mmu_ops.make_pte >> >> Sorry, can''t offer much insight as to why addr in pte holds the hvm''s PFN, >> but it seems the case. > > But that''s a fundamental thing to explain. As Ian says - foreign PFNs > shouldn''t make it here, or else how do you know how to translate > them to MFNs (as you can''t consult the local P2M table to do so)? >I was under the impression that the translation is done inside in xen inside HYPERVISOR_mmu_update, which gets called from xen_remap_domain_mfn_range shortly after setting up the ptes via xen_make_pte: int xen_remap_domain_mfn_range(struct vm_area_struct *vma, unsigned long addr, xen_pfn_t mfn, int nr, pgprot_t prot, unsigned domid, struct page **pages) ... err = apply_to_page_range(vma->vm_mm, addr, range, remap_area_mfn_pte_fn, &rmd); ^^^ this calls xen_make_pte via the callpath I quoted in previous post if (err) goto out; err = HYPERVISOR_mmu_update(mmu_update, batch, NULL, domid); ^^^ this goes into xen and does p2m translation and mmu setup etc if (err < 0) goto out; ...
Jan Beulich
2013-Dec-04 12:14 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
>>> On 04.12.13 at 13:01, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote: > On 12/04/2013 12:36 PM, Jan Beulich wrote: >> But that''s a fundamental thing to explain. As Ian says - foreign PFNs >> shouldn''t make it here, or else how do you know how to translate >> them to MFNs (as you can''t consult the local P2M table to do so)? >> > I was under the impression that the translation is done inside in xen inside > HYPERVISOR_mmu_update,That hypercall does translation only for auto-translated guests, which a normal PV one clearly isn''t. Jan
Ian Campbell
2013-Dec-04 12:23 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
On Wed, 2013-12-04 at 12:14 +0000, Jan Beulich wrote:> >>> On 04.12.13 at 13:01, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote: > > On 12/04/2013 12:36 PM, Jan Beulich wrote: > >> But that''s a fundamental thing to explain. As Ian says - foreign PFNs > >> shouldn''t make it here, or else how do you know how to translate > >> them to MFNs (as you can''t consult the local P2M table to do so)? > >> > > I was under the impression that the translation is done inside in xen inside > > HYPERVISOR_mmu_update, > > That hypercall does translation only for auto-translated guests, > which a normal PV one clearly isn''t.When mapping a foreign owned page it is the remote owners mode which matters though, isn''t it? Ian.
Jan Beulich
2013-Dec-04 12:39 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
>>> On 04.12.13 at 13:23, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Wed, 2013-12-04 at 12:14 +0000, Jan Beulich wrote: >> >>> On 04.12.13 at 13:01, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote: >> > On 12/04/2013 12:36 PM, Jan Beulich wrote: >> >> But that''s a fundamental thing to explain. As Ian says - foreign PFNs >> >> shouldn''t make it here, or else how do you know how to translate >> >> them to MFNs (as you can''t consult the local P2M table to do so)? >> >> >> > I was under the impression that the translation is done inside in xen > inside >> > HYPERVISOR_mmu_update, >> >> That hypercall does translation only for auto-translated guests, >> which a normal PV one clearly isn''t. > > When mapping a foreign owned page it is the remote owners mode which > matters though, isn''t it?Oh, right. Which - for the code at hand - makes it even more difficult to do the right thing (refuse PV DomU mappings of MFNs below 1Mb, but allow translated DomU mappings of PFNs in that range). I.e. we''re back to why execution goes that route in the first place for foreign mappings and doesn''t - like on XenoLinux - bypass the normal PTE construction code. Jan
Tomasz Wroblewski
2013-Dec-04 14:19 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
> Razvan and I are working together to find a solution to this. I took > your patch for a spin and while that code path is taken when invoking > xc_map_foreign_range(), the call still fails with EINVAL. I haven''t yet > determined if the call stops in the domU kernel or it reaches xen and > gets terminated there. I''ve tried this on Ubuntu''s 3.8. on top of > XenServer''s xen-4.3.1. >Not sure why the patch doesn''t work for you (you applied it to domU kernel which ties to map, right?), but before we applied this, the EINVAL was coming from hypervisor''s HYPERVISOR_mmu_update in xen_remap_domain_mfn_range(), since the PTE constructed by xen_make_pte was invalid for the other domain.> Thanks, >
Mihai Donțu
2013-Dec-04 16:15 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
On Wed, 4 Dec 2013 15:19:54 +0100 Tomasz Wroblewski wrote:> > Razvan and I are working together to find a solution to this. I took > > your patch for a spin and while that code path is taken when > > invoking xc_map_foreign_range(), the call still fails with EINVAL. > > I haven't yet determined if the call stops in the domU kernel or it > > reaches xen and gets terminated there. I've tried this on Ubuntu's > > 3.8. on top of XenServer's xen-4.3.1. > > > > Not sure why the patch doesn't work for you (you applied it to domU > kernel which ties to map, right?), but before we applied this, the > EINVAL was coming from hypervisor's HYPERVISOR_mmu_update in > xen_remap_domain_mfn_range(), since the PTE constructed by > xen_make_pte was invalid for the other domain. > > Thanks, > > >I'm sorry, I take back what I said before. The patch works OK, I just interpreted the results wrong (some pages really _are_ unaccessible, even from dom0). Thank you for all your help. :-) -- Mihai Donțu _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2013-Dec-04 16:40 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
On Wed, Dec 04, 2013 at 11:36:33AM +0000, Jan Beulich wrote:> >>> On 04.12.13 at 12:23, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote: > > On 12/04/2013 12:04 PM, Ian Campbell wrote: > >> On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote: > >>>>>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote: > >>>> Correct. The check for mapping domain 0''s 1:1 map is overly broad I > >>>> think, and erroneously prevents a domU from mapping a foreign PFN < 1M. > >>> > >>> But that''s the source of my not understanding: xen_make_pte() > >>> derives addr from the passed in pte, and that pte can - for a > >>> foreign domain''s page - hardly hold a PFN. Otherwise how would > >>> the translation to MFN be supposed to happen? Yet, if it''s a > >>> machine address that''s coming in, it can''t point into the low 1Mb. > >> > >> Isn''t it a foreign gpfn at this point, which for an HVM guest is > >> actually a PFN not an MFN? > >> > >> You are making me think I might be talking out my a**e though, because > >> what is a foreign mapping even doing in xen_make_pte -- those need to be > >> instantiated in a special way. > >> > > I believe the callpath for this is > > > > xen_remap_domain_range() (mmu.c) > > | > > v > > remap_area_pfn_pte() (mmu.c) > > | > > v > > pfn_pte() (somewhere, one of the pgtable.h hdrs) > > | > > v > > __pte() (paravirt.h) > > | > > v > > xen_make_pte (mmu.c) via pv_mmu_ops.make_pte > > > > Sorry, can''t offer much insight as to why addr in pte holds the hvm''s PFN, > > but it seems the case. > > But that''s a fundamental thing to explain. As Ian says - foreign PFNs > shouldn''t make it here, or else how do you know how to translate > them to MFNs (as you can''t consult the local P2M table to do so)?This is all done via the toolstack which does the /dev/xen ioctl to map some of its user-space memory in the guest memory. It ends up getting the MFNs via some hypercall (forgotten which) and inputs those in the IOCTL_PRIVCMD_MMAP ioctl. That function ends up calling remap with _PAGE_IOMAP (well actually VM_IO) so that the xen_make_pte will ignore the P2M and use that specific MFN value. It is kind of nasty. I was hoping we could remove the _PAGE_IOMAP usage out - but this is the last bastion where it is used. The check that the xen_make_pte for the VM_IO for 1:1 pages is not really needed anymore - as we have the 1:1 pages in the P2M (except for the InfiniBand MMIO regions which are at 60TB and the P2M doesn''t reach there - but that is different bug). So the check there could actually be lessen - and we can piggyback on the _PTE_SPECIAL. Hm, and only keep the _PAGE_IOMAP check in the xen_pte_val - which we would only be set by xen_make_pte iff P2M says the page is 1:1. Not compile tested: diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index ce563be..98efb65 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -409,7 +409,8 @@ static pteval_t pte_pfn_to_mfn(pteval_t val) if (mfn & IDENTITY_FRAME_BIT) { mfn &= ~IDENTITY_FRAME_BIT; flags |= _PAGE_IOMAP; - } + } else + flags &= _PAGE_IOMAP; } val = ((pteval_t)mfn << PAGE_SHIFT) | flags; } @@ -441,7 +442,7 @@ static pteval_t xen_pte_val(pte_t pte) pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT; } #endif - if (xen_initial_domain() && (pteval & _PAGE_IOMAP)) + if (pteval & _PAGE_IOMAP) /* Set by xen_make_pte for 1:1 PFNs. */ return pteval; return pte_mfn_to_pfn(pteval); @@ -498,17 +499,14 @@ static pte_t xen_make_pte(pteval_t pte) #endif /* * Unprivileged domains are allowed to do IOMAPpings for - * PCI passthrough, but not map ISA space. The ISA - * mappings are just dummy local mappings to keep other - * parts of the kernel happy. + * PCI passthrough. _PAGE_SPECIAL is done when user-space uses + * IOCTL_PRIVCMD_MMAP and gives us the MFNs. The _PAGE_IOMAP + * is supplied to use by xen_set_fixmap. */ - if (unlikely(pte & _PAGE_IOMAP) && - (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { + if (unlikely(pte & _PAGE_SPECIAL | _PAGE_IOMAP)) pte = iomap_pte(pte); - } else { - pte &= ~_PAGE_IOMAP; + else pte = pte_pfn_to_mfn(pte); - } return native_make_pte(pte); }> Jan > >
Tomasz Wroblewski
2013-Dec-04 17:16 UTC
Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
On 12/04/2013 05:40 PM, Konrad Rzeszutek Wilk wrote:> On Wed, Dec 04, 2013 at 11:36:33AM +0000, Jan Beulich wrote: >>>>> On 04.12.13 at 12:23, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote: >>> On 12/04/2013 12:04 PM, Ian Campbell wrote: >>>> On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote: >>>>>>>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote: >>>>>> Correct. The check for mapping domain 0''s 1:1 map is overly broad I >>>>>> think, and erroneously prevents a domU from mapping a foreign PFN < 1M. >>>>> >>>>> But that''s the source of my not understanding: xen_make_pte() >>>>> derives addr from the passed in pte, and that pte can - for a >>>>> foreign domain''s page - hardly hold a PFN. Otherwise how would >>>>> the translation to MFN be supposed to happen? Yet, if it''s a >>>>> machine address that''s coming in, it can''t point into the low 1Mb. >>>> >>>> Isn''t it a foreign gpfn at this point, which for an HVM guest is >>>> actually a PFN not an MFN? >>>> >>>> You are making me think I might be talking out my a**e though, because >>>> what is a foreign mapping even doing in xen_make_pte -- those need to be >>>> instantiated in a special way. >>>> >>> I believe the callpath for this is >>> >>> xen_remap_domain_range() (mmu.c) >>> | >>> v >>> remap_area_pfn_pte() (mmu.c) >>> | >>> v >>> pfn_pte() (somewhere, one of the pgtable.h hdrs) >>> | >>> v >>> __pte() (paravirt.h) >>> | >>> v >>> xen_make_pte (mmu.c) via pv_mmu_ops.make_pte >>> >>> Sorry, can''t offer much insight as to why addr in pte holds the hvm''s PFN, >>> but it seems the case. >> >> But that''s a fundamental thing to explain. As Ian says - foreign PFNs >> shouldn''t make it here, or else how do you know how to translate >> them to MFNs (as you can''t consult the local P2M table to do so)? > > This is all done via the toolstack which does the /dev/xen ioctl to map > some of its user-space memory in the guest memory. It ends up getting > the MFNs via some hypercall (forgotten which) and inputs those in the > IOCTL_PRIVCMD_MMAP ioctl. That function ends up calling remap with > _PAGE_IOMAP (well actually VM_IO) so that the xen_make_pte will ignore > the P2M and use that specific MFN value. > > It is kind of nasty. I was hoping we could remove the _PAGE_IOMAP usage > out - but this is the last bastion where it is used. > > The check that the xen_make_pte for the VM_IO for 1:1 pages is not > really needed anymore - as we have the 1:1 pages in the P2M (except for > the InfiniBand MMIO regions which are at 60TB and the P2M doesn''t reach > there - but that is different bug). > > So the check there could actually be lessen - and we can piggyback on > the _PTE_SPECIAL. Hm, and only keep the _PAGE_IOMAP check in the > xen_pte_val - which we would only be set by xen_make_pte iff P2M says > the page is 1:1. > > > Not compile tested: > > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c > index ce563be..98efb65 100644 > --- a/arch/x86/xen/mmu.c > +++ b/arch/x86/xen/mmu.c > @@ -409,7 +409,8 @@ static pteval_t pte_pfn_to_mfn(pteval_t val) > if (mfn & IDENTITY_FRAME_BIT) { > mfn &= ~IDENTITY_FRAME_BIT; > flags |= _PAGE_IOMAP; > - } > + } else > + flags &= _PAGE_IOMAP; > } > val = ((pteval_t)mfn << PAGE_SHIFT) | flags; > } > @@ -441,7 +442,7 @@ static pteval_t xen_pte_val(pte_t pte) > pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT; > } > #endif > - if (xen_initial_domain() && (pteval & _PAGE_IOMAP)) > + if (pteval & _PAGE_IOMAP) /* Set by xen_make_pte for 1:1 PFNs. */ > return pteval; > > return pte_mfn_to_pfn(pteval); > @@ -498,17 +499,14 @@ static pte_t xen_make_pte(pteval_t pte) > #endif > /* > * Unprivileged domains are allowed to do IOMAPpings for > - * PCI passthrough, but not map ISA space. The ISA > - * mappings are just dummy local mappings to keep other > - * parts of the kernel happy. > + * PCI passthrough. _PAGE_SPECIAL is done when user-space uses > + * IOCTL_PRIVCMD_MMAP and gives us the MFNs. The _PAGE_IOMAP > + * is supplied to use by xen_set_fixmap. > */ > - if (unlikely(pte & _PAGE_IOMAP) && > - (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { > + if (unlikely(pte & _PAGE_SPECIAL | _PAGE_IOMAP)) > pte = iomap_pte(pte);I think this wont work because _PAGE_SPECIAL is not set at this point yet (inside xen_make_pte). It is only set after xen_make_pte. This is why my patch contained this extra, rather nasty, hunk, which made _PAGE_SPECIAL set a bit earlier: +static inline pte_t foreign_special_pfn_pte(unsigned long page_nr, pgprot_t pgprot) +{ + return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) | + massage_pgprot(pgprot) | _PAGE_SPECIAL); +} + + static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr, void *data) { struct remap_data *rmd = data; - pte_t pte = pte_mkspecial(pfn_pte(rmd->mfn++, rmd->prot)); + pte_t pte = foreign_special_pfn_pte(rmd->mfn++, rmd->prot); rmd->mmu_update->ptr = virt_to_machine(ptep).maddr; rmd->mmu_update->val = pte_val_ma(pte); I''ve basically made a new function foreign_special_pfn_pte which is unrolled pte_mkspecial with a small difference that it sets _PAGE_SPECIAL bit before calling __pte, not after (because __pte calls into xen_make_pte). Maybe cleanest way of fixing this would be just to have separate path for this which doesn''t use xen_make_pte at all?