Hi All, I have been inspecting Xen''s security properties for a while and I have some question regarding Guest page tables Isolation. In para-virtualized guests. My understanding is (Please correct me if I am wrong) that Xen achieves the isolation through (1) making all page tables non-writable so that the guest have to ask Xen to do the update through hypercalls and (2) having Xen validation each page-table update to make sure domain X cannot access domain Y''s memory. Now by looking inside the code, I cannot see where does this happen. I took a thorough look at the do_mmu_update hypercall and I observed that the function extracts the new page table entry value directly from the input parameter "req.val". Afterwards, it calls the function mod_lX_entry() where X refers to the page table level.. These functions in turn calls the macro Update_entry which calls the function Update_intpte which probably ends up calling __copy_to_user or compxchg_user directly to update the page table entry. As far as I understand, all the observed path does not include any security checking. MY QUESTIONS ARE: 1-Does Xen check that the passed value refers to a physical page that really belongs to the calling domain? If yes, where is the code piece that does that? If no, then what guarantees that the guest wont map a page belonging to another guest? 2-If the guest is updating a higher level page table (l2 for example) then the entry point to a lower level page table. Does Xen check that the new cannot be rewritten by the guest? again where is the code or what is the security guarantee? 3-Does Xen keep track of all page tables of a certain guest or it just relies on the type_info value stored in the page data structure? 4-How does then guarantee that upon process switching the new cr3 value will point to a page table that is protected by Xen? One final thing. Can I force all guests (including para-virtualized ones )to use shadow page tables? Thanks, Ahmed _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 23:16 +0100 on 14 Apr (1239751001), Ahmed Azab wrote:> In para-virtualized guests. My understanding is (Please correct me if I > am wrong) that Xen achieves the isolation through (1) making all page > tables non-writable so that the guest have to ask Xen to do the update > through hypercalls and (2) having Xen validation each page-table update > to make sure domain X cannot access domain Y''s memory.Yes. (Guests may also try to write directly to their pagetables, in which case Xen intercepts the pagefault, emulates the instruction and performs the implicit hypercall to change the contents).> MY QUESTIONS ARE: > 1-Does Xen check that the passed value refers to a physical page that > really belongs to the calling domain? If yes, where is the code piece > that does that? If no, then what guarantees that the guest wont map a > page belonging to another guest?mod_l1_entry() calls get_page_from_l1e() which calls get_page_and_type(), which does reference counting and enforces security restrictions.> 2-If the guest is updating a higher level page table (l2 for example) > then the entry point to a lower level page table. Does Xen check that > the new cannot be rewritten by the guest? again where is the code or > what is the security guarantee?Similarly, mod_lX_entry->get_page_from_lXe->get_page_and_type> 3-Does Xen keep track of all page tables of a certain guest or it just > relies on the type_info value stored in the page data structure?It relies on the type-info; only a page with the correct type may be used as a top-level page table. To get that type, its contents must be verified (including recursively checking the types of pages it points to).> 4-How does then guarantee that upon process switching the new cr3 value > will point to a page table that is protected by Xen?The pagetable types are mutually exclusive with eth "writable" type, which a page must have before a validated write-access l1e can point to it.> One final thing. Can I force all guests (including para-virtualized ones > )to use shadow page tables?You can (and see the xc_domain_save routines for an example), but for PV guests the shadow pagetables don''t enforce these access restrictions, since they can rely on the pagetables already being correct. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I''m running an HVM guest using shadow page tables on a 64bit machine. I''m working on a project where I mark certain pages read-only and capture the writes into these pages. I then try to emulate the write instructions using x86_emulate as is done in arch/x86/mm/shadow/multi.c. The instruction I''m trying to emulate is: asm("mov %%gs,%0" : "=m" (p->thread.gsindex)); Since the source operand is a segment register, and the x86_emulate_ops structure that is being used does not have a ops->read_segment function defined, the emulation fails. Is there an easy way to add or activate this functionality? Perhaps a full emulator, since one would expect to see other cases of memory writes that are not handled as well. Thanks, John _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 07/05/2009 20:39, "Emre Can Sezer" <ecsezer@ncsu.edu> wrote:> I''m running an HVM guest using shadow page tables on a 64bit machine. > I''m working on a project where I mark certain pages read-only and > capture the writes into these pages. I then try to emulate the write > instructions using x86_emulate as is done in arch/x86/mm/shadow/multi.c. > > The instruction I''m trying to emulate is: > asm("mov %%gs,%0" : "=m" (p->thread.gsindex)); > > Since the source operand is a segment register, and the x86_emulate_ops > structure that is being used does not have a ops->read_segment function > defined, the emulation fails. > > Is there an easy way to add or activate this functionality? Perhaps a > full emulator, since one would expect to see other cases of memory > writes that are not handled as well.Easily implemented -- you pass through to hvm_get_segment_register(). My guess is you''ll quickly fault on another instruction which is not so easily fixed up, however. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yup. Not only did hvm_get_segment_register() work like a charm, but I also ran into another problem as you have foretold. The instruction is fxsave, which uses a mask to copy some CPU information to a 512byte memory. Any chance of an emulation function for this instruction? As a side note, I know of quite a few research papers that mention emulating memory writes to pages, some using Xen. This leads me to believe that the problem of emulating most of these functions should have been solved. I know it''s not relevant for Xen production code, but I''m wondering if there is a full emulator (perhaps QEMU?) inside Xen that I can switch to instead of trying to add these functionalities in an ad-hoc manner? John Keir Fraser wrote:> On 07/05/2009 20:39, "Emre Can Sezer" <ecsezer@ncsu.edu> wrote: > > >> I''m running an HVM guest using shadow page tables on a 64bit machine. >> I''m working on a project where I mark certain pages read-only and >> capture the writes into these pages. I then try to emulate the write >> instructions using x86_emulate as is done in arch/x86/mm/shadow/multi.c. >> >> The instruction I''m trying to emulate is: >> asm("mov %%gs,%0" : "=m" (p->thread.gsindex)); >> >> Since the source operand is a segment register, and the x86_emulate_ops >> structure that is being used does not have a ops->read_segment function >> defined, the emulation fails. >> >> Is there an easy way to add or activate this functionality? Perhaps a >> full emulator, since one would expect to see other cases of memory >> writes that are not handled as well. >> > > Easily implemented -- you pass through to hvm_get_segment_register(). My > guess is you''ll quickly fault on another instruction which is not so easily > fixed up, however. > > -- Keir > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/05/2009 23:15, "Emre Can Sezer" <ecsezer@ncsu.edu> wrote:> Yup. Not only did hvm_get_segment_register() work like a charm, but I > also ran into another problem as you have foretold. > > The instruction is fxsave, which uses a mask to copy some CPU > information to a 512byte memory. Any chance of an emulation function > for this instruction?Go for it. ;-) Define a 512-byte array, fxsave into it, and then write the array to guest memory. Look at how some other FPU ops that write to memory are implemented for further guidance. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I am trying to figure out how an HVM guest is passed an interrupt. Say that a network packet has arrived and the QEMU driver in dom0 has to notify an HVM guest of the packet''s arrival. Could someone please give a brief, high-level description of this process? I read the Intel Architectures Software Developer''s Guide on VM Execution bits and Virtual interrupts and also some Xenwiki stuff about it but I''m still not sure what''s going on. I don''t have any device pass-through or stubdom or PV Drivers for my HVM guest. I would also appreciate references to some Xen files/functions related to this process. Thanks, John _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 05/06/2009 19:43, "Emre Can Sezer" <ecsezer@ncsu.edu> wrote:> I am trying to figure out how an HVM guest is passed an interrupt. Say > that a network packet has arrived and the QEMU driver in dom0 has to > notify an HVM guest of the packet''s arrival. Could someone please give > a brief, high-level description of this process? I read the Intel > Architectures Software Developer''s Guide on VM Execution bits and > Virtual interrupts and also some Xenwiki stuff about it but I''m still > not sure what''s going on. I don''t have any device pass-through or > stubdom or PV Drivers for my HVM guest. I would also appreciate > references to some Xen files/functions related to this process.Following will give you enough to grep around for the details: Hypercall is HVMOP_set_pci_intx_level: qemu-dm uses this to assert a PCI INTx virtual interrupt line. Handled by hvm_pci_intx_assert() -> vioapic_irq_positive_edge() -> vioapic_deliver() -> ioapic_inj_irq() -> vlapic_set_irq()&vcpu_kick(). Final function there wakes the guest vcpu which on vmentry calls vmx_intr_assist() -> hvm_vcpu_has_pending_irq() -> vlapic_has_pending_irq(), which will return a pending vector. Vmx_intr_assist() then delivers that vector via vmx_inject_extint(). Hardware then delivers the interrupt automatically during vmentry. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 05/06/2009 19:43, "Emre Can Sezer" <ecsezer@ncsu.edu> wrote: > > >> I am trying to figure out how an HVM guest is passed an interrupt. Say >> that a network packet has arrived and the QEMU driver in dom0 has to >> notify an HVM guest of the packet''s arrival. Could someone please give >> a brief, high-level description of this process? I read the Intel >> Architectures Software Developer''s Guide on VM Execution bits and >> Virtual interrupts and also some Xenwiki stuff about it but I''m still >> not sure what''s going on. I don''t have any device pass-through or >> stubdom or PV Drivers for my HVM guest. I would also appreciate >> references to some Xen files/functions related to this process. >> > > Following will give you enough to grep around for the details: > Hypercall is HVMOP_set_pci_intx_level: qemu-dm uses this to assert a PCI > INTx virtual interrupt line. Handled by hvm_pci_intx_assert() -> > vioapic_irq_positive_edge() -> vioapic_deliver() -> ioapic_inj_irq() -> > vlapic_set_irq()&vcpu_kick(). Final function there wakes the guest vcpu > which on vmentry calls vmx_intr_assist() -> hvm_vcpu_has_pending_irq() -> > vlapic_has_pending_irq(), which will return a pending vector. > Vmx_intr_assist() then delivers that vector via vmx_inject_extint(). > Hardware then delivers the interrupt automatically during vmentry. > > -- Keir > >Thanks Keir. I do have another question though. I am trying to find out whether a page fault occurred during an interrupt handling. I implemented two page tables for HVM guests that help me track execution within the guest kernel. So there is a very good chance that an interrupt might also result in a page fault as soon as it is injected. I tried counting these events by checking the IF flags in regs->rflags and also looking at VIF and VIP flags without success. Is this a viable method for determining whether a page fault was caused during interrupt handling? If not, is there any VM state I can check? Thanks, John _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 08/06/2009 20:24, "Emre Can Sezer" <ecsezer@ncsu.edu> wrote:> Thanks Keir. I do have another question though. I am trying to find > out whether a page fault occurred during an interrupt handling. I > implemented two page tables for HVM guests that help me track execution > within the guest kernel. So there is a very good chance that an > interrupt might also result in a page fault as soon as it is injected. > I tried counting these events by checking the IF flags in regs->rflags > and also looking at VIF and VIP flags without success. Is this a viable > method for determining whether a page fault was caused during interrupt > handling? If not, is there any VM state I can check?You can easily determine if the page fault happens during interrupt injection, but once the guest OS starts handling the interrupt it will be hard to track. The OS is likely to ACK the interrupt quite early and re-set EFLAGS.IF to 1 before it actually executes the device driver ISR. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel