Hello everyone, I am trying to reserve an area of memory using e820''s reserve_e820_ram. Everything happens as expected, but then there is something that confuses me. I have some questions regarding it. Why is e820 being used in arch_init_memory (xen/arch/x86/mm.c in my case) and not boot_e820 (the one we changed)? I am asking because from what I understand this will make my use of reserve_e820_ram useless, but I think I still not have all the information I need to know how it all connects. When I try to use reserve_e820_ram on the actual e820 variable the boot process fails in an assertion on file xen/arch/x86/mm.c in function share_xen_page_with_guest. It must be because the pages I reserved are not part of the Xen heap. To create a range that I can later use as a guest''s RAM can I use dom_cow instead of dom_io in arch_init_memory? Or will that create problems when allocating the pages to an unprivileged domain? I don''t want that memory range in use by the main memory pool used by the allocators. From what I understand. The pages must have to be assigned to a particular domain, dom_xen|io|cow. I see this as keeping them mapped and usable before assigning them to the domU I want. Is that thought correct? Later I want to be able to get them, and assign them to the domU when it is started and gets its memory allocated. Thank you for your time, Francisco
At 12:52 +0100 on 29 Mar (1333025578), Francisco Rocha wrote:> Why is e820 being used in arch_init_memory (xen/arch/x86/mm.c in my > case) and not boot_e820 (the one we changed)? I am asking because > from what I understand this will make my use of reserve_e820_ram > useless, but I think I still not have all the information I need to > know how it all connects.arch_init_memory() is only using e820 to find out which addresses are MMIO regions. If we were to use boot_e820 we would mark all the reserve_e820_ram()''d regions as MMIO, which is probably not what you want.> To create a range that I can later use as a guest''s RAM can I use > dom_cow instead of dom_io in arch_init_memory?No! dom_cow is the owner of all copy-on-write shared pages. You don''t want to get your reserved area caught up in that lot. :)> Or will that create problems when allocating the pages to an > unprivileged domain? I don''t want that memory range in use by the > main memory pool used by the allocators.AIUI just calling reserve_e820_ram() to exclude the memory from boot_e820 should DTRT. Is that not working for you?> From what I understand. The pages must have to be assigned to a particular domain, dom_xen|io|cow. > I see this as keeping them mapped and usable before assigning them to the domU I want. Is that thought correct?I think it shoudl be OK to leave them with no owner -- as long as they''re not in the memory allocator''s free pools they won''t get given to any other domain. Then once you''re ready to use them you can assign the directly to your domU. Another option would be to assign them to dom_xen and use share_xen_page_with_guest to let domU map them. Can you give us some details about how your current approach is failing? Cheers, Tim.
________________________________________ From: Tim Deegan [tim@xen.org] Sent: 05 April 2012 11:37 To: Francisco Rocha Cc: xen-devel@lists.xen.org Subject: Re: [Xen-devel] reserve e820 ram At 12:52 +0100 on 29 Mar (1333025578), Francisco Rocha wrote:> Why is e820 being used in arch_init_memory (xen/arch/x86/mm.c in my > case) and not boot_e820 (the one we changed)? I am asking because > from what I understand this will make my use of reserve_e820_ram > useless, but I think I still not have all the information I need to > know how it all connects.arch_init_memory() is only using e820 to find out which addresses are MMIO regions. If we were to use boot_e820 we would mark all the reserve_e820_ram()''d regions as MMIO, which is probably not what you want.> To create a range that I can later use as a guest''s RAM can I use > dom_cow instead of dom_io in arch_init_memory?No! dom_cow is the owner of all copy-on-write shared pages. You don''t want to get your reserved area caught up in that lot. :)> Or will that create problems when allocating the pages to an > unprivileged domain? I don''t want that memory range in use by the > main memory pool used by the allocators.AIUI just calling reserve_e820_ram() to exclude the memory from boot_e820 should DTRT. Is that not working for you?> From what I understand. The pages must have to be assigned to a particular domain, dom_xen|io|cow. > I see this as keeping them mapped and usable before assigning them to the domU I want. Is that thought correct?I think it shoudl be OK to leave them with no owner -- as long as they''re not in the memory allocator''s free pools they won''t get given to any other domain. Then once you''re ready to use them you can assign the directly to your domU. How would this "assign" be done? Because when I remove them from the boot_e820 they are not mapped. That confuses me a bit. Another option would be to assign them to dom_xen and use share_xen_page_with_guest to let domU map them. Can you give us some details about how your current approach is failing? So far I have tried this approach. 1. use reserve_e820_ram() with boot_e820 to avoid the main memory pool. 2. use reserve_e820_ram() with e820 and change arch_init_memory() to assign the reserved range to a dummy domain I have created and let it process the IO areas as normal. One of the problems happens here because some of the mfns are invalid and I always end up getting 65536 pages assign to my domain when I count them using page_list_for_each. Why is this happening? I am not able to understand it yet. I have been exploring the pdx_groups to see if it is related. 3. I was planning to change populate_physmap to get the memory from the dummy domain. I would have to condition the calls to alloc_domheap_pages and use share_xen_page_with_guest, is that it? This part is still not very clear to me. I was hoping to do something like going through the dummy domain''s page list and steal them from it. :-) Cheers, Tim. Thank you for the help, cheers, Francisco
________________________________________ From: Tim Deegan [tim@xen.org] Sent: 05 April 2012 11:37 To: Francisco Rocha Cc: xen-devel@lists.xen.org Subject: Re: [Xen-devel] reserve e820 ram At 12:52 +0100 on 29 Mar (1333025578), Francisco Rocha wrote:> Why is e820 being used in arch_init_memory (xen/arch/x86/mm.c in my > case) and not boot_e820 (the one we changed)? I am asking because > from what I understand this will make my use of reserve_e820_ram > useless, but I think I still not have all the information I need to > know how it all connects.arch_init_memory() is only using e820 to find out which addresses are MMIO regions. If we were to use boot_e820 we would mark all the reserve_e820_ram()''d regions as MMIO, which is probably not what you want.> To create a range that I can later use as a guest''s RAM can I use > dom_cow instead of dom_io in arch_init_memory?No! dom_cow is the owner of all copy-on-write shared pages. You don''t want to get your reserved area caught up in that lot. :)> Or will that create problems when allocating the pages to an > unprivileged domain? I don''t want that memory range in use by the > main memory pool used by the allocators.AIUI just calling reserve_e820_ram() to exclude the memory from boot_e820 should DTRT. Is that not working for you?> From what I understand. The pages must have to be assigned to a particular domain, dom_xen|io|cow. > I see this as keeping them mapped and usable before assigning them to the domU I want. Is that thought correct?I think it shoudl be OK to leave them with no owner -- as long as they''re not in the memory allocator''s free pools they won''t get given to any other domain. Then once you''re ready to use them you can assign the directly to your domU. Another option would be to assign them to dom_xen and use share_xen_page_with_guest to let domU map them. Can you give us some details about how your current approach is failing? Cheers, Tim. This part is working. I am able to reserve a range of memory and boot a HVM guest that uses pages from that range. The problem is when I try to restrict dom0 from accessing does pages, it fails in allocating the memory to the guest. Is get_page_from_l1e always called by dom0? Can a guest run when dom0 is restricted from accessing its memory? I would only want to restrict access for certain operations. Cheers, Francisco
Hi, At 12:22 +0100 on 11 Apr (1334146973), Francisco Rocha wrote:> This part is working. > > I am able to reserve a range of memory and boot a HVM guest > that uses pages from that range. The problem is when I try > to restrict dom0 from accessing does pages, it fails in allocating > the memory to the guest.Doe sit fail in allocating the memory or in populating it? Dom0 has to map the new domain''s memory to put the BIOs and firmware in before it boots.> Is get_page_from_l1e always called by dom0?get_page_from_l1e is called for any pagetables entry (PV or shadowed HVM) that maps a page of memory. So it will be called when dom0 triues to map the memory.> Can a guest run when dom0 is restricted from > accessing its memory? I would only want to restrict access > for certain operations.Dom0 maps domU''s memory three times: Once (by force) to populate the BIOS &C at buid time. In Qemu (again, by force) to emulate domU''s hardware. In the PV backend drivers (using the grant tables) for block & net I/O. You can handle the build-time map by allowing them and the making sure they all get pulled down before the domain is unpaused for the first time (Or by having a separate trusted/privileged builder domain that does nothing but build domains). You can handle the second by using stub domains to run qemu in a different domain, or by only usoing PV domUs. The third is pretty much a requirement if the domU''s going to do any I/O via dom0, but at least with grant tables the ACL is under domU''s control. Or if you have an IOMMU you can give the domU direct access to its own network card and disk controller. Cheers, Tim.
________________________________________ From: Tim Deegan [tim@xen.org] Sent: 11 April 2012 12:58 To: Francisco Rocha Cc: xen-devel@lists.xen.org Subject: Re: [Xen-devel] reserve e820 ram Hi, At 12:22 +0100 on 11 Apr (1334146973), Francisco Rocha wrote:> This part is working. > > I am able to reserve a range of memory and boot a HVM guest > that uses pages from that range. The problem is when I try > to restrict dom0 from accessing does pages, it fails in allocating > the memory to the guest.Doe sit fail in allocating the memory or in populating it? Dom0 has to map the new domain''s memory to put the BIOs and firmware in before it boots. Sorry, it allocates the memory but fails when trying to populate it. This happened because I changed get_page_from_l1e to restrict access.> Is get_page_from_l1e always called by dom0?get_page_from_l1e is called for any pagetables entry (PV or shadowed HVM) that maps a page of memory. So it will be called when dom0 triues to map the memory. Thank you.> Can a guest run when dom0 is restricted from > accessing its memory? I would only want to restrict access > for certain operations.Dom0 maps domU''s memory three times: Once (by force) to populate the BIOS &C at buid time. In Qemu (again, by force) to emulate domU''s hardware. In the PV backend drivers (using the grant tables) for block & net I/O. You can handle the build-time map by allowing them and the making sure they all get pulled down before the domain is unpaused for the first time (Or by having a separate trusted/privileged builder domain that does nothing but build domains). All right, I will look for this stage in the code. You can handle the second by using stub domains to run qemu in a different domain, or by only usoing PV domUs. If I use the stub domain provided with xen the dom0 will not perform the second mapping, right? The third is pretty much a requirement if the domU''s going to do any I/O via dom0, but at least with grant tables the ACL is under domU''s control. Or if you have an IOMMU you can give the domU direct access to its own network card and disk controller. I only have one ethernet card but i can get an ethernet expresscard. Can I do this in my the machine that gives me the output that follows? (XEN) Intel VT-d Snoop Control not enabled. (XEN) Intel VT-d Dom0 DMA Passthrough not enabled. (XEN) Intel VT-d Queued Invalidation enabled. (XEN) Intel VT-d Interrupt Remapping enabled. (XEN) Intel VT-d Shared EPT tables not enabled. The not enabled means I should enable them in the BIOS? Because I have looked everywhere and I can''t find any other options realted to VT-d. (XEN) VMX: Supported advanced features: (XEN) - APIC MMIO access virtualisation (XEN) - APIC TPR shadow (XEN) - Extended Page Tables (EPT) (XEN) - Virtual-Processor Identifiers (VPID) (XEN) - Virtual NMI (XEN) - MSR direct-access bitmap (XEN) - Unrestricted Guest (XEN) HVM: ASIDs enabled. (XEN) HVM: VMX enabled (XEN) HVM: Hardware Assisted Paging (HAP) detected (XEN) HVM: HAP page sizes: 4kB, 2MB Cheers, Tim. Thank you for the help Tim! Cheers, Francisco
Hi, Can you please set up your mail client to indent quoted text? It''s not clear which parts of your email are quoted and which are your replies. At 13:53 +0100 on 11 Apr (1334152395), Francisco Rocha wrote:> You can handle the second by using > stub domains to run qemu in a different domain, or by only usoing PV > domUs. > > If I use the stub domain provided with xen the dom0 will not perform the > second mapping, right?Yes; instead, the stub domain will perform it - so you''ll need to allow that to happen. (Basically the stub domain''s code lives inside the guest''s protection boundary, like its BIOS code &c).> The third is pretty much a requirement if the domU''s going to do > any I/O via dom0, but at least with grant tables the ACL is under domU''s > control. Or if you have an IOMMU you can give the domU direct access to > its own network card and disk controller. > > I only have one ethernet card but i can get an ethernet expresscard. > > Can I do this in my the machine that gives me the output that follows? > > (XEN) Intel VT-d Snoop Control not enabled. > (XEN) Intel VT-d Dom0 DMA Passthrough not enabled. > (XEN) Intel VT-d Queued Invalidation enabled. > (XEN) Intel VT-d Interrupt Remapping enabled. > (XEN) Intel VT-d Shared EPT tables not enabled.Yes; you should be able to do it on this machine without changing any BIOS settings. Tim.
On 04/18/2012 01:02 PM, Tim Deegan wrote: Hi, Can you please set up your mail client to indent quoted text? It''s not clear which parts of your email are quoted and which are your replies. Sorry about that. At 13:53 +0100 on 11 Apr (1334152395), Francisco Rocha wrote:> You can handle the second by using > stub domains to run qemu in a different domain, or by only usoing PV > domUs. > > If I use the stub domain provided with xen the dom0 will not perform the > second mapping, right?Yes; instead, the stub domain will perform it - so you''ll need to allow that to happen. (Basically the stub domain''s code lives inside the guest''s protection boundary, like its BIOS code &c).> The third is pretty much a requirement if the domU''s going to do > any I/O via dom0, but at least with grant tables the ACL is under domU''s > control. Or if you have an IOMMU you can give the domU direct access to > its own network card and disk controller. > > I only have one ethernet card but i can get an ethernet expresscard. > > Can I do this in my the machine that gives me the output that follows? > > (XEN) Intel VT-d Snoop Control not enabled. > (XEN) Intel VT-d Dom0 DMA Passthrough not enabled. > (XEN) Intel VT-d Queued Invalidation enabled. > (XEN) Intel VT-d Interrupt Remapping enabled. > (XEN) Intel VT-d Shared EPT tables not enabled.Yes; you should be able to do it on this machine without changing any BIOS settings. Tim. Hi Tim, I was thinking about changing my approach. I think that for now I will leave those pages off because I am mostly interested in protecting other areas. Those accesses for now are inevitable to get the VM to properly operate. Now, the question is if it is possible to use page table entries to do what I want to do. The objective would be to use a bit flag that would determine if the pages are returned when a call to map_foreign_range is made. So, my final objective would be that only pages used for the three operations you describe are accessible to Dom0. Everything that is not BIOS and related, Qemu or PV backend drivers will not be returned. From what I see in the header files you use 12-bits from a 24-bit flag (x86_64). Can we do it? This would again take us to controlling access at get_page_from_l1e(), right? Thank you, Francisco _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org lists.xen.org/xen-devel
At 15:36 +0100 on 18 Apr (1334763404), Francisco Rocha wrote:> Hi Tim, > > I was thinking about changing my approach. > > I think that for now I will leave those pages off because I am > mostly interested in protecting other areas. > > Those accesses for now are inevitable to get the VM to properly > operate. Now, the question is if it is possible to use page table > entries to do what I want to do. > > The objective would be to use a bit flag that would determine if > the pages are returned when a call to map_foreign_range is made. > So, my final objective would be that only pages used for the three > operations you describe are accessible to Dom0. > Everything that is not BIOS and related, Qemu or PV backend > drivers will not be returned. > > From what I see in the header files you use 12-bits from a 24-bit > flag (x86_64). Can we do it? This would again take us to controlling > access at get_page_from_l1e(), right?Are you talking about the count_info and type_info fields? yes, I think you can probably put a new flag or two in there. Choosing which pages qemu can map will be interesting, though -- it needs to map anything the VM uses for I/O. But maybe you can just define the things you protect and declare taht they can''t be used for I/O. That sounds easier. :) Cheers, Tim.
On 04/18/2012 05:43 PM, Tim Deegan wrote:> > At 15:36 +0100 on 18 Apr (1334763404), Francisco Rocha wrote: > > Hi Tim, > > > > I was thinking about changing my approach. > > > > I think that for now I will leave those pages off because I am > > mostly interested in protecting other areas. > > > > Those accesses for now are inevitable to get the VM to properly > > operate. Now, the question is if it is possible to use page table > > entries to do what I want to do. > > > > The objective would be to use a bit flag that would determine if > > the pages are returned when a call to map_foreign_range is made. > > So, my final objective would be that only pages used for the three > > operations you describe are accessible to Dom0. > > Everything that is not BIOS and related, Qemu or PV backend > > drivers will not be returned. > > > > From what I see in the header files you use 12-bits from a 24-bit > > flag (x86_64). Can we do it? This would again take us to controlling > > access at get_page_from_l1e(), right? > > Are you talking about the count_info and type_info fields? yes, I think > you can probably put a new flag or two in there. >I was thinking about the ones used in page table entries (_PAGE_PRESENT|RW, etc). So, I can do the type of control I want to achieve using type_info, maybe the flags I was thinking about are not the best option for what I want.> > Choosing which pages > qemu can map will be interesting, though -- it needs to map anything the > VM uses for I/O. But maybe you can just define the things you protect > and declare taht they can''t be used for I/O. That sounds easier. :) >The objective is to protect the kernel and its data structures. That is why I was considering the flags I previously mentioned. There is one denominated _PAGE_GUEST_KERNEL. I see that we have them all available. int get_page_from_l1e(...) ... struct page_info *page = mfn_to_page(mfn); uint32_t l1f = l1e_get_flags(l1e); ... Which flags do you recommend I use to try this out?> > > Cheers, > > Tim. > >Thank you, Francisco
At 18:10 +0100 on 18 Apr (1334772613), Francisco Rocha wrote:> On 04/18/2012 05:43 PM, Tim Deegan wrote: > > > > At 15:36 +0100 on 18 Apr (1334763404), Francisco Rocha wrote: > > > Hi Tim, > > > > > > I was thinking about changing my approach. > > > > > > I think that for now I will leave those pages off because I am > > > mostly interested in protecting other areas. > > > > > > Those accesses for now are inevitable to get the VM to properly > > > operate. Now, the question is if it is possible to use page table > > > entries to do what I want to do. > > > > > > The objective would be to use a bit flag that would determine if > > > the pages are returned when a call to map_foreign_range is made. > > > So, my final objective would be that only pages used for the three > > > operations you describe are accessible to Dom0. > > > Everything that is not BIOS and related, Qemu or PV backend > > > drivers will not be returned. > > > > > > From what I see in the header files you use 12-bits from a 24-bit > > > flag (x86_64). Can we do it? This would again take us to controlling > > > access at get_page_from_l1e(), right? > > > > Are you talking about the count_info and type_info fields? yes, I think > > you can probably put a new flag or two in there. > > > I was thinking about the ones used in page table entries > (_PAGE_PRESENT|RW, etc).Oh. That''s probably not so suitable for access control since (a) there may be more that one PTE pointing to the same page, even controlled by different domains, and what if they have different flags? and (b) given a page number you can''t easily find a PTE that points to it to look at the bits. Th type_info and count_info fields, on the other hand, exist once per page and are entirely under Xen''s control.> > Choosing which pages > > qemu can map will be interesting, though -- it needs to map anything the > > VM uses for I/O. But maybe you can just define the things you protect > > and declare taht they can''t be used for I/O. That sounds easier. :) > > > The objective is to protect the kernel and its data structures. > That is why I was considering the flags I previously mentioned. > There is one denominated _PAGE_GUEST_KERNEL.That''s part of the 64-bit PV interface; if the guest is 32-bit or HVM it won''t be used like that. I think you''ll have to modify the kernel to explicitly tell Xen which pages are kernel ones (wih a hypercall) and then remember that with a bit in the count_info/type_info. Cheers, Tim.
On 04/20/2012 09:16 AM, Tim Deegan wrote:> > At 18:10 +0100 on 18 Apr (1334772613), Francisco Rocha wrote: > > On 04/18/2012 05:43 PM, Tim Deegan wrote: > > > > > > At 15:36 +0100 on 18 Apr (1334763404), Francisco Rocha wrote: > > > > Hi Tim, > > > > > > > > I was thinking about changing my approach. > > > > > > > > I think that for now I will leave those pages off because I am > > > > mostly interested in protecting other areas. > > > > > > > > Those accesses for now are inevitable to get the VM to properly > > > > operate. Now, the question is if it is possible to use page table > > > > entries to do what I want to do. > > > > > > > > The objective would be to use a bit flag that would determine if > > > > the pages are returned when a call to map_foreign_range is made. > > > > So, my final objective would be that only pages used for the three > > > > operations you describe are accessible to Dom0. > > > > Everything that is not BIOS and related, Qemu or PV backend > > > > drivers will not be returned. > > > > > > > > From what I see in the header files you use 12-bits from a 24-bit > > > > flag (x86_64). Can we do it? This would again take us to controlling > > > > access at get_page_from_l1e(), right? > > > > > > Are you talking about the count_info and type_info fields? yes, I > think > > > you can probably put a new flag or two in there. > > > > > I was thinking about the ones used in page table entries > > (_PAGE_PRESENT|RW, etc). > > Oh. That''s probably not so suitable for access control since (a) there > may be more that one PTE pointing to the same page, even controlled by > different domains, and what if they have different flags? and (b) given > a page number you can''t easily find a PTE that points to it to look at > the bits. > > Th type_info and count_info fields, on the other hand, exist once per > page and are entirely under Xen''s control. > > > > Choosing which pages > > > qemu can map will be interesting, though -- it needs to map > anything the > > > VM uses for I/O. But maybe you can just define the things you protect > > > and declare taht they can''t be used for I/O. That sounds easier. :) > > > > > The objective is to protect the kernel and its data structures. > > That is why I was considering the flags I previously mentioned. > > There is one denominated _PAGE_GUEST_KERNEL. > > That''s part of the 64-bit PV interface; if the guest is 32-bit or HVM it > won''t be used like that. I think you''ll have to modify the kernel to > explicitly tell Xen which pages are kernel ones (wih a hypercall) and > then remember that with a bit in the count_info/type_info. >Hi Tim, I have been changing a xen kernel driver to test the idea of telling the hypervisor. From what I have read guests have the pfn -> mfn table, but I am not able to find a function to make the conversion. I was using the pfn because the mfn_valid at the hypervisor level was saying the mfn was valid. :-) So, the pfn_to_mfn function at guest level gets the mfn but from the guest point of view, is that correct? How can I get the mfn from the pfn/gmfn? I am using a 32-bit guest.> > > Cheers, > > Tim. >Cheers, Francisco