Hi, As part of my research I am going to play around with page permissions. However, I might have to do this pretty often, so I''m thinking about having two page tables that are synchronized and only differ in their permissions. The idea is to have a set of permissions when code block A is being executed and another set when block B is being executed. I plan to capture execution jumps by specifying the inactive block as non-executable. I am running a HVM guest on a x86_64 machine. I''m only interested in kernel pages, in that I don''t have to have a second page table for user level pages as their permissions will be the same. So far I can think of only two ways of doing this. First, I can have two top level shadow page tables and use one of the unused slots in struct arch_domain to store this page. Then I modify propagate_l*e_from_guest functions to ensure that they create and synchronize the second page table. Second, I can have pages that are twice as large as original page tables. I''m not sure what the implications are concerning shadow cache and the linear page table mappings. Which one of these methods would be easier to implement? Is there an easier way of having two sets of page tables? If I had the means, would it be worth switching to AMD for the NPT? Thanks in advance, John _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Not sure it will help what-so-ever, but if you look up any posts that I''ve made to this list and some of their responses, you''ll get "some" information related to this topic. I''d love to hear answers to this post. Take care, Sina -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Emre Can Sezer Sent: Wednesday, December 17, 2008 6:06 PM To: Xen Devel Subject: [Xen-devel] Two shadow page tables for HVM Hi, As part of my research I am going to play around with page permissions. However, I might have to do this pretty often, so I''m thinking about having two page tables that are synchronized and only differ in their permissions. The idea is to have a set of permissions when code block A is being executed and another set when block B is being executed. I plan to capture execution jumps by specifying the inactive block as non-executable. I am running a HVM guest on a x86_64 machine. I''m only interested in kernel pages, in that I don''t have to have a second page table for user level pages as their permissions will be the same. So far I can think of only two ways of doing this. First, I can have two top level shadow page tables and use one of the unused slots in struct arch_domain to store this page. Then I modify propagate_l*e_from_guest functions to ensure that they create and synchronize the second page table. Second, I can have pages that are twice as large as original page tables. I''m not sure what the implications are concerning shadow cache and the linear page table mappings. Which one of these methods would be easier to implement? Is there an easier way of having two sets of page tables? If I had the means, would it be worth switching to AMD for the NPT? Thanks in advance, John _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, At 18:06 -0500 on 17 Dec (1229537167), Emre Can Sezer wrote:> So far I can think of only two ways of doing this. First, I can have two > top level shadow page tables and use one of the unused slots in struct > arch_domain to store this page. Then I modify propagate_l*e_from_guest > functions to ensure that they create and synchronize the second page table.You could double up the shadow pagetable types, so that as well as having a 32-bit l1 shadow there would also be a 32-bit alternate-mode shadow. Then by doubling the number of times multi.c is built, you could hopefully do what you want without _too_ much extra hacking. Switching back and forth would involve chaging the paging mode and calling shadow_update_paging_modes() to cause the right set of shadows to be loaded.> Second, I can have pages that are twice as large as original page tables. > I''m not sure what the implications are concerning shadow cache and the > linear page table mappings.I think that would involve a lot more hacking around in the code that builds the tables, and probably many more infuriating bugs. :)> Which one of these methods would be easier to implement? Is there an > easier way of having two sets of page tables? If I had the means, would it > be worth switching to AMD for the NPT?Probably -- duplicating the p2m table with appropriate changes would be simpler than duplicating all shadows everywhere, and the switchover would be trivial. One thing to consider in either case is how to choose which frames are accessible: if you modify the shadows you will at least be able to see the virtual addresses so you can decide what''s kernel and what isn''t; with NPT you deal only in guest-physical addresses. But then again, in the NPT case you don''t have to worry about aliased mappings of the frame. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan wrote:> Hi, > > At 18:06 -0500 on 17 Dec (1229537167), Emre Can Sezer wrote: > >> So far I can think of only two ways of doing this. First, I can have two >> top level shadow page tables and use one of the unused slots in struct >> arch_domain to store this page. Then I modify propagate_l*e_from_guest >> functions to ensure that they create and synchronize the second page table. >> > > You could double up the shadow pagetable types, so that as well as > having a 32-bit l1 shadow there would also be a 32-bit alternate-mode > shadow. Then by doubling the number of times multi.c is built, you > could hopefully do what you want without _too_ much extra hacking. > Switching back and forth would involve chaging the paging mode and > calling shadow_update_paging_modes() to cause the right set of shadows > to be loaded. >Wouldn''t this mean that the two page tables are NOT synchronized? When we switch paging modes, wouldn''t we have to rebuild the entire shadow page tables from guest? The reason I was thinking of synchronized page tables is because I will have to switch between them quite often - several times during a system call. So I want to minimize the tlb flushes and make the switch as fast as possible. With synced PT''s, my plan was to set the guest CR3 to point to the new top level page table and only flush the kernel pages. When considering the performance penalties of flushing the kernel page tables from the TLB, how significant is traversing all the shadow page tables for the guest kernel and updating their permissions? If there isn''t an order of magnitude of difference, it might be reasonable to take the short cut in implementation.>> Second, I can have pages that are twice as large as original page tables. >> I''m not sure what the implications are concerning shadow cache and the >> linear page table mappings. >> > > I think that would involve a lot more hacking around in the code that > builds the tables, and probably many more infuriating bugs. :) > > >> Which one of these methods would be easier to implement? Is there an >> easier way of having two sets of page tables? If I had the means, would it >> be worth switching to AMD for the NPT? >> > > Probably -- duplicating the p2m table with appropriate changes would be > simpler than duplicating all shadows everywhere, and the switchover > would be trivial. > > One thing to consider in either case is how to choose which frames are > accessible: if you modify the shadows you will at least be able to see > the virtual addresses so you can decide what''s kernel and what isn''t; > with NPT you deal only in guest-physical addresses. But then again, in > the NPT case you don''t have to worry about aliased mappings of the > frame. > > Cheers, > > Tim. > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Emre Can Sezer wrote:> Wouldn''t this mean that the two page tables are NOT synchronized? When > we switch paging modes, wouldn''t we have to rebuild the entire shadow > page tables from guest?No. When updating shadows from guest, the shadow code will update the changes for each existing shadow of the page.> When considering the performance penalties of flushing the kernel page > tables from the TLB, how significant is traversing all the shadow page > tables for the guest kernel and updating their permissions? If there > isn''t an order of magnitude of difference, it might be reasonable to > take the short cut in implementation.It''s up to the permission you''re updating and how widely you want these permissions applied. If (e.g. NX bit) one bit in the upper level set permission for the whole part of the tree it maps, then you can just change permissions on the top level shadows. Be sure, though, to cope with the fault you get correctly. Gianluca _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan wrote:> Hi, > > At 12:05 -0500 on 22 Dec (1229947511), Emre Can Sezer wrote: > >> Wouldn''t this mean that the two page tables are NOT synchronized? When we >> switch paging modes, wouldn''t we have to rebuild the entire shadow page >> tables from guest? >> > > We maintain shadow pagetables for pagetables that are not in use, and > even for modes that aren''t in use. We only get rid of shadows when we''re > running out of memory, or when the guest uses the page for sonmething else. > If we didn''t do that our context-switch costs would be enormous. >I''ve been trying to understand the shadow code for a while now and I have one last question about this approach. In my case, the guest OS will have only a single set of page tables and in return I will have two sets of shadows for them. I understand that once you change your shadow mode, the shadow pages are still kept and the mapping is stored in the shadow cache. However, if a page table is updated in one mode, how does the other mode know of this change? As far as I understand, the same gfn will be inserted to the hash twice with two types. However, does Xen determine that the guest page''s contents have changed? That change must somehow propagate to the second shadow mode''s page tables with the appropriate permission changes. How is this being done? How does Xen determine that the page contents have changed? Thanks for all the input, John> >> The reason I was thinking of synchronized page tables is because I will >> have to switch between them quite often - several times during a system >> call. So I want to minimize the tlb flushes and make the switch as fast as >> possible. With synced PT''s, my plan was to set the guest CR3 to point to >> the new top level page table and only flush the kernel pages. >> > > That might be just as expensive -- ISTR Keir measured the cost of invlpg > vs TLB flush a while ago and found that invlpg''ing more than one or two > PTEs was slower than just flushing the whole TLB. > > >> When considering the performance penalties of flushing the kernel page >> tables from the TLB, how significant is traversing all the shadow page >> tables for the guest kernel and updating their permissions? If there isn''t >> an order of magnitude of difference, it might be reasonable to take the >> short cut in implementation. >> > > I don''t have any measurements for doing walks of the whole set of > shadows, but in general we''ve found it''s worth doing almost any trick > that will avoid that. > > Cheers, > > Tim. > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I finally got around to implementing two paging modes. Everything works fine until I swap modes :) I get a shadow page fault with error_code 0. This happens right after I swap the paging mode. Any clues as to what might be the cause? I walked through the code that updates paging modes. It appears that we simply make an *empty* top level shadow and install it as top level shadow page table. If this is the case, shouldn''t the first fault have a non-zero error code? Thanks, John Tim Deegan wrote:> Hi, > > At 11:16 -0500 on 29 Dec (1230549392), Emre Can Sezer wrote: > >> I''ve been trying to understand the shadow code for a while now and I have >> one last question about this approach. In my case, the guest OS will have >> only a single set of page tables and in return I will have two sets of >> shadows for them. I understand that once you change your shadow mode, the >> shadow pages are still kept and the mapping is stored in the shadow cache. >> However, if a page table is updated in one mode, how does the other mode >> know of this change? As far as I understand, the same gfn will be inserted >> to the hash twice with two types. However, does Xen determine that the >> guest page''s contents have changed? That change must somehow propagate to >> the second shadow mode''s page tables with the appropriate permission >> changes. How is this being done? How does Xen determine that the page >> contents have changed? >> > > When making PTEs in the shadow tables, we never put in a writeable > mapping of a page that has any shadows. Then, in the pagefault handler, > we spot that the guest is writing to a shadowed page and call the > emulator so we can figure out what it''s trying to write. The emulator > calls us back when it wants to write to memory, and in the callback we > call the propagation code for _every_ kind of shadow the page has. > > We need to do that even within a single paging mode, because if the > guest uses linear page tables a single page can be shadowed as l4, l3, > l2 and l1 at the same time. > > The "out of sync" optimization relaxes the rule of never allowing a > writeable mapping of a shadowed page, but it doesn''t apply to pages that > are shadowed more than once anyway. Might be best to turn it off > while you''re working in this, anyway. :) > > Cheers, > > Tim. > > >>>> The reason I was thinking of synchronized page tables is because I will >>>> have to switch between them quite often - several times during a system >>>> call. So I want to minimize the tlb flushes and make the switch as fast >>>> as possible. With synced PT''s, my plan was to set the guest CR3 to point >>>> to the new top level page table and only flush the kernel pages. >>>> >>>> >>> That might be just as expensive -- ISTR Keir measured the cost of invlpg >>> vs TLB flush a while ago and found that invlpg''ing more than one or two >>> PTEs was slower than just flushing the whole TLB. >>> >>> >>> >>>> When considering the performance penalties of flushing the kernel page >>>> tables from the TLB, how significant is traversing all the shadow page >>>> tables for the guest kernel and updating their permissions? If there >>>> isn''t an order of magnitude of difference, it might be reasonable to take >>>> the short cut in implementation. >>>> >>>> >>> I don''t have any measurements for doing walks of the whole set of >>> shadows, but in general we''ve found it''s worth doing almost any trick >>> that will avoid that. >>> >>> Cheers, >>> >>> Tim. >>> >>> >>> >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 17:08 -0500 on 09 Jan (1231520939), Emre Can Sezer wrote:> I finally got around to implementing two paging modes. Everything works > fine until I swap modes :) > > I get a shadow page fault with error_code 0. This happens right after I > swap the paging mode. Any clues as to what might be the cause? > > I walked through the code that updates paging modes. It appears that we > simply make an *empty* top level shadow and install it as top level shadow > page table. If this is the case, shouldn''t the first fault have a non-zero > error code?The TLB will be empty when you return so the first fault will be an instruction fetch, presumably from kernel space (since that''s when you want to switch modes). If the guest has PAE or 64-bit pagetabels and EFER.NXE turned on, it should have error code 0x10; otherwise 0 is correct. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan wrote:> At 17:08 -0500 on 09 Jan (1231520939), Emre Can Sezer wrote: > >> I finally got around to implementing two paging modes. Everything works >> fine until I swap modes :) >> >> I get a shadow page fault with error_code 0. This happens right after I >> swap the paging mode. Any clues as to what might be the cause? >> >> I walked through the code that updates paging modes. It appears that we >> simply make an *empty* top level shadow and install it as top level shadow >> page table. If this is the case, shouldn''t the first fault have a non-zero >> error code? >> > > The TLB will be empty when you return so the first fault will be an > instruction fetch, presumably from kernel space (since that''s when you > want to switch modes). If the guest has PAE or 64-bit pagetabels and > EFER.NXE turned on, it should have error code 0x10; otherwise 0 is correct. >Unfortunately I''m still stuck with the same problem. When in normal mode, I observe the instruction fetch error when execution is jumping to a module. The va and rip are the same. I switch to "alternate" paging mode. Since the TLB is empty, I expect the guest to try to fetch the instruction again. At this point the root shadow page table is empty (first time we ever switched to this mode), so I only expect to get a page not present error, since the NX bit is not set. Well, I don''t see either. It faults with error code 0 and a va that is different from the rip (rip is the same as before). I''m using 64-bit PT''s and as far as I can tell EFER.NXE is turned on. At least cpu_has_nx returns true and that I get page faults with PFEC_instr_fetch error with both paging modes. Here is the summary of page fault errors: ... (XEN) sh_page_fault: d:v=1:0 va=0xffffffffa000f050 err=17, rip=ffffffffa000f050 (XEN) <ECS> Switching to ALTERNATE paging mode (XEN) <ECS-alt> sh_page_fault: d:v=1:0 va=0xffffffff8062cef0 err=0, rip=ffffffffa000f050 (XEN) <ECS-alt> sh_page_fault: d:v=1:0 va=0xffffffff805d8010 err=0, rip=ffffffffa000f050 (XEN) <ECS-alt> sh_page_fault: d:v=1:0 va=0xffffffff8020cea0 err=10, rip=ffffffff8020cea0 (XEN) <ECS> Switching to NORMAL paging mode (XEN) <ECS> Done ... I''m also confused about the last page fault. No page fault occurred that propagated this page''s pte from the guest (I turned off prefetching). I''m inclined to think that I have some artifacts from the initial paging mode. The only thing I haven''t fully ported to the alternate paging mode is the super page handling. But I''m not sure if that has anything to do with the error code. Any thoughts? Am I correct in thinking that when I first switch the paging mode, the top level page table is empty and that we should at least get a page not present error for ANY instruction? Thanks, John>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 19:39 -0500 on 26 Jan (1232998748), Emre Can Sezer wrote:> Unfortunately I''m still stuck with the same problem. When in normal > mode, I observe the instruction fetch error when execution is jumping to > a module. The va and rip are the same. I switch to "alternate" paging > mode. Since the TLB is empty, I expect the guest to try to fetch the > instruction again. At this point the root shadow page table is empty > (first time we ever switched to this mode), so I only expect to get a > page not present error, since the NX bit is not set. Well, I don''t see > either. It faults with error code 0 and a va that is different from the > rip (rip is the same as before).Mysterious! Does this address line up with any of the other register or descriptor state?> I''m using 64-bit PT''s and as far as I > can tell EFER.NXE is turned on. At least cpu_has_nx returns true and > that I get page faults with PFEC_instr_fetch error with both paging modes. > > Here is the summary of page fault errors: > ... > (XEN) sh_page_fault: d:v=1:0 va=0xffffffffa000f050 err=17, > rip=ffffffffa000f050 > (XEN) <ECS> Switching to ALTERNATE paging mode > (XEN) <ECS-alt> sh_page_fault: d:v=1:0 va=0xffffffff8062cef0 err=0, > rip=ffffffffa000f050 > (XEN) <ECS-alt> sh_page_fault: d:v=1:0 va=0xffffffff805d8010 err=0, > rip=ffffffffa000f050 > (XEN) <ECS-alt> sh_page_fault: d:v=1:0 va=0xffffffff8020cea0 err=10, > rip=ffffffff8020cea0 > (XEN) <ECS> Switching to NORMAL paging mode > (XEN) <ECS> Done > ... > > I''m also confused about the last page fault. No page fault occurred > that propagated this page''s pte from the guest (I turned off > prefetching). I''m inclined to think that I have some artifacts from the > initial paging mode.Seems like a fair explanation.> The only thing I haven''t fully ported to the alternate paging mode is > the super page handling. But I''m not sure if that has anything to do > with the error code.I can''t see why it should have.> Any thoughts? Am I correct in thinking that when I first switch the > paging mode, the top level page table is empty and that we should at > least get a page not present error for ANY instruction?That is what I would expect. If you''re not seeing that then either the TLB''s not being flushed or your shadows are leaking from one mode to another. Obviously, on subsequent switches to the alternate mode, you''ll have partially filled shadows and patterns like the one above would be quite reasonable. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> That is what I would expect. If you''re not seeing that then either the > TLB''s not being flushed or your shadows are leaking from one mode to > another.Doesn''t changing paging modes set and update the guest cr3 resulting in a guest TLB flush? I would like to manually flush them if there is a way but I''m hopelessly confused about the tlbflush functions in tlbflush.h. Which one do I call to flush guest TLBs? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 14:07 -0500 on 27 Jan (1233065276), Emre Can Sezer wrote:> >> That is what I would expect. If you''re not seeing that then either the >> TLB''s not being flushed or your shadows are leaking from one mode to >> another. > Doesn''t changing paging modes set and update the guest cr3 resulting in a > guest TLB flush?It''s certainly supposed to. :)> I would like to manually flush them if there is a way but > I''m hopelessly confused about the tlbflush functions in tlbflush.h. Which > one do I call to flush guest TLBs?local_flush_tlb() (or is it tlb_flush_local() these days?) will flush the Xen TLB and as a side-effect flush all guest tags if you have a tagged TLB. If you _just_ want to make sure the guest TLB is flushed on the current pCPU , call hvm_flush_guest_tlbs(), which will throw away all the guest tags. (If you don''t have a tagged TLB then each vmexit flushes the guest TLB entries). Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>> I''m using 64-bit PT''s and as far as I >> can tell EFER.NXE is turned on. At least cpu_has_nx returns true and >> that I get page faults with PFEC_instr_fetch error with both paging modes. >> >> Here is the summary of page fault errors: >> ... >> (XEN) sh_page_fault: d:v=1:0 va=0xffffffffa000f050 err=17, >> rip=ffffffffa000f050 >> (XEN) <ECS> Switching to ALTERNATE paging mode >> (XEN) <ECS-alt> sh_page_fault: d:v=1:0 va=0xffffffff8062cef0 err=0, >> rip=ffffffffa000f050 >> (XEN) <ECS-alt> sh_page_fault: d:v=1:0 va=0xffffffff805d8010 err=0, >> rip=ffffffffa000f050 >> (XEN) <ECS-alt> sh_page_fault: d:v=1:0 va=0xffffffff8020cea0 err=10, >> rip=ffffffff8020cea0 >> (XEN) <ECS> Switching to NORMAL paging mode >> (XEN) <ECS> Done >> ... >> >> I''m also confused about the last page fault. No page fault occurred >> that propagated this page''s pte from the guest (I turned off >> prefetching). I''m inclined to think that I have some artifacts from the >> initial paging mode. >> > > Seems like a fair explanation. >The intel software development manual states: P flag = 0 --> PF due to page not present P flag = 1 --> PF due to protection violation If this flag is used as it is, it would explain the error code being 0. I''m looking into why there isn''t another instruction fetch. John>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel