Roger Cruz
2007-Jul-02 20:25 UTC
[Xen-devel] Walking an HVM''s shadow page tables and other memory management questions.
Hello, I''m new to Xen and especially to the hypervisor code. I''m working off a 3.0.4.1 base and have the following questions regarding the memory management code for an x86, 32-bit platform (capable of supporting PAE). I''m doing some research into providing grant table hypercall support from a Windows 2003 HVM. I have made all the necessary changes to allow the hypercall to make it into the hypervisor and execute the correct grant table ops. I''m now testing the GNTTABOP_map_grant_ref with the GNTMAP_host_map and it correctly obtains the MFN from the grantor domain. It then attempts to take the HVM host VA address (a windows kernel VA from the non-paged pool) and walk the guest''s page table to obtain the PFN. I am building the hypervisor by simply typing "make xen" without any other configuration changes from a default source installation. The first problem I encountered is that it appears the code assumes the guest to be in PAE mode. In particular, guest_walk_tables() in xen/arch/x86/mm/shadow/multi.c, line 252 has this code snippet: #else /* PAE only... */ /* Get l3e from the cache of the guest''s top level table */ gw->l3e = (guest_l3e_t *)&v->arch.shadow.gl3e[guest_l3_table_offset(va)]; #endif /* PAE or 64... */ Which accesses the L3 entries fro the shadow page tables. When I instrument this code, I get l3e to be 0 as shown below (the line #s won''t match because of the instrumentation). (XEN) multi.c:236:d1 guest_walk_tables: va: 0x81699000. (XEN) multi.c:257:d1 guest_walk_tables: get l3e from cache: 0xff1a6ed0. (XEN) multi.c:263:d1 guest_walk_tables: l3e not present: 0x0. (XEN) multi.c:574:d1 sh_guest_map_l1e: va:81699000 If I add the /PAE switch to the boot.ini file, then I can get past this problem. Hence my statement that it appears the hypervisor is assuming guests are running with at least PAE mode enable, which may not be the case. Could someone please guide me here? The 2nd problem I encountered also has to do with walking the shadow page tables to obtain the MFN of the underlying Windows VA address. sh_guest_map_l1e(), Line 520 in the same file, has this code executed after it walks the guest page tables to obtain the walk_t gw variable. if ( gw.l2e && (guest_l2e_get_flags(*gw.l2e) & _PAGE_PRESENT) && !(guest_supports_superpages(v) && (guest_l2e_get_flags(*gw.l2e) & _PAGE_PSE)) ) (XEN) mm.c:2573:d1 grant host mapping: va:81696000 frame:0x15f140 (XEN) mm.c:2507:d1 grant va mapping: va:81696000 (XEN) multi.c:236:d1 guest_walk_tables: va: 0x81696000. (XEN) multi.c:257:d1 guest_walk_tables: get l3e from cache: 0xff1a6ed0. (XEN) multi.c:270:d1 guest_walk_tables: l3e flags: 0x1, pfn:0xe9a (XEN) , mfn:0x9e13d<G><1>multi.c:285:d1 hypervisor l2e mapped address 0xfec8b058 (XEN) multi.c:315:d1 large pages. 0x1e3 (XEN) multi.c:574:d1 sh_guest_map_l1e: va:81696000 (XEN) multi.c:579:d1 sh_guest_map_l1e: gw.l2e flags:0x1e3, supports large 1 (XEN) multi.c:596:d1 pl1e :0x0, (XEN) mm.c:2512:d1 Could not find L1 PTE for address 81696000 It looks like it specifically avoids mapping a superpage found in Windows PDE into the hypervisor''s virtual space, which I assume are 4KB-pages. What puzzles me is that for a hypercall to read the arguments from the caller''s guest space, it uses __hvm_copy which calls shadow_gva_to_gfn() to walk the guest''s shadow page tables to get to the underlying MFN. Couldn''t this code here also do the same? Thanks in advance for any insight into this area. Roger Cruz Principal SW Engineer Marathon Technologies Corp. 978-489-1153 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Jul-03 09:04 UTC
Re: [Xen-devel] Walking an HVM''s shadow page tables and other memory management questions.
You are barking up the wrong tree by attempting to poke the mapping into a guest pte. After all, what would you poke? Guest PTEs address the guest-pseudo-physical space, in which the foreign page is not present. You actually want to follow ia64¹s lead here. When running in auto-translate¹ mode (i.e., on shadow page tables) then the guest address for a host mapping should not be interpreted as a virtual address but instead as a pseudo-physical address. So you will be mapping a grant reference into the pseudo-physical space and then a guest PTE can map the appropriate pseudo-physical frame number in the usual way. The slightly tricky bit is working out how to encode a grant-mapping in the p2m table. My advice would be to use a page-not-present encoding (p2m table entries are the same format as page-table entries) as this then lets you define special encodings of your choice with most of the remaining bits. Tim Deegan may be able to give more advice. -- Keir On 2/7/07 21:25, "Roger Cruz" <rcruz@marathontechnologies.com> wrote:> Hello, > > I¹m new to Xen and especially to the hypervisor code. I¹m working off a > 3.0.4.1 base and have the following questions regarding the memory management > code for an x86, 32-bit platform (capable of supporting PAE). I¹m doing some > research into providing grant table hypercall support from a Windows 2003 HVM. > I have made all the necessary changes to allow the hypercall to make it into > the hypervisor and execute the correct grant table ops. > > I¹m now testing the GNTTABOP_map_grant_ref with the GNTMAP_host_map and it > correctly obtains the MFN from the grantor domain. It then attempts to take > the HVM host VA address (a windows kernel VA from the non-paged pool) and walk > the guest¹s page table to obtain the PFN. I am building the hypervisor by > simply typing ³make xen² without any other configuration changes from a > default source installation. > > The first problem I encountered is that it appears the code assumes the guest > to be in PAE mode. In particular, guest_walk_tables() in > xen/arch/x86/mm/shadow/multi.c, line 252 has this code snippet: > > #else /* PAE only... */ > /* Get l3e from the cache of the guest''s top level table */ > gw->l3e = (guest_l3e_t *)&v->arch.shadow.gl3e[guest_l3_table_offset(va)]; > #endif /* PAE or 64... */ > > Which accesses the L3 entries fro the shadow page tables. When I instrument > this code, I get l3e to be 0 as shown below (the line #s won¹t match because > of the instrumentation). > > (XEN) multi.c:236:d1 guest_walk_tables: va: 0x81699000. > (XEN) multi.c:257:d1 guest_walk_tables: get l3e from cache: 0xff1a6ed0. > (XEN) multi.c:263:d1 guest_walk_tables: l3e not present: 0x0. > (XEN) multi.c:574:d1 sh_guest_map_l1e: va:81699000 > > If I add the /PAE switch to the boot.ini file, then I can get past this > problem. Hence my statement that it appears the hypervisor is assuming guests > are running with at least PAE mode enable, which may not be the case. Could > someone please guide me here? > > The 2nd problem I encountered also has to do with walking the shadow page > tables to obtain the MFN of the underlying Windows VA address. > sh_guest_map_l1e(), Line 520 in the same file, has this code executed after it > walks the guest page tables to obtain the walk_t gw variable. > > if ( gw.l2e && > (guest_l2e_get_flags(*gw.l2e) & _PAGE_PRESENT) && > !(guest_supports_superpages(v) && (guest_l2e_get_flags(*gw.l2e) & > _PAGE_PSE)) ) > > > (XEN) mm.c:2573:d1 grant host mapping: va:81696000 frame:0x15f140 > (XEN) mm.c:2507:d1 grant va mapping: va:81696000 > (XEN) multi.c:236:d1 guest_walk_tables: va: 0x81696000. > (XEN) multi.c:257:d1 guest_walk_tables: get l3e from cache: 0xff1a6ed0. > (XEN) multi.c:270:d1 guest_walk_tables: l3e flags: 0x1, pfn:0xe9a > (XEN) , mfn:0x9e13d<G><1>multi.c:285:d1 hypervisor l2e mapped address > 0xfec8b058 > (XEN) multi.c:315:d1 large pages. 0x1e3 > (XEN) multi.c:574:d1 sh_guest_map_l1e: va:81696000 > (XEN) multi.c:579:d1 sh_guest_map_l1e: gw.l2e flags:0x1e3, supports large 1 > (XEN) multi.c:596:d1 pl1e :0x0, > (XEN) mm.c:2512:d1 Could not find L1 PTE for address 81696000 > > > It looks like it specifically avoids mapping a superpage found in Windows PDE > into the hypervisor¹s virtual space, which I assume are 4KB-pages. What > puzzles me is that for a hypercall to read the arguments from the caller¹s > guest space, it uses __hvm_copy which calls shadow_gva_to_gfn() to walk the > guest¹s shadow page tables to get to the underlying MFN. Couldn¹t this code > here also do the same? > > Thanks in advance for any insight into this area. > > > Roger Cruz > Principal SW Engineer > Marathon Technologies Corp. > 978-489-1153 > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Roger Cruz
2007-Jul-06 19:09 UTC
RE: [Xen-devel] Walking an HVM''s shadow page tables and other memory management questions.
I took your suggestion and passed in from Windows code the PFN (indirectly) to be associated with the MFN found in the grant table entry referenced. I modified create_grant_host_mapping() in mm.c to have an additional IF statement based on a new flag I added: // RRC: new code. if ( flags & GNTMAP_contains_physaddr ) return create_pfn_to_mfn_mapping(addr, frame, current); The new routine does something similar to the XENMEM_add_to_physmap code. This code appear to do what I wanted, which is to replace the mapping from GPFN to MFN. When I run it, it does appear to change the mapping correctly as you can see from my debug statements: // RRC: new routine to use with grant table map ref. static int create_pfn_to_mfn_mapping( uint64_t addr, mfn_t mfn, struct vcpu *v) { unsigned long prev_mfn, gpfn; struct domain *d = v->domain; // This call is only valid for translated domains. // The MFN specified must not be 0. if ( !shadow_mode_translate(d) || (mfn == 0) ) { return -EINVAL; } // Get the frame number. gpfn = (unsigned long)(addr >> PAGE_SHIFT); LOCK_BIGLOCK(d); /* Remove previously mapped page if it was present. */ prev_mfn = gmfn_to_mfn(d, gpfn); // RRC: debug gdprintk(XENLOG_WARNING, "create_pfn_to_mfn_mapping: pfn:%"PRIx64" prev MFN:0x%x\n", (u64)gpfn, (u32)prev_mfn); if ( mfn_valid(prev_mfn) ) { if ( IS_XEN_HEAP_FRAME(mfn_to_page(prev_mfn)) ) { gdprintk(XENLOG_WARNING, "removing xen heap frame\n"); /* Xen heap frames are simply unhooked from this phys slot. */ guest_physmap_remove_page(d, gpfn, prev_mfn); } else { gdprintk(XENLOG_WARNING, "removing normal domain frame\n"); /* Normal domain memory is freed, to avoid leaking memory. */ guest_remove_page(d, gpfn); } } /* Map at new location. */ guest_physmap_add_page(d, gpfn, mfn); UNLOCK_BIGLOCK(d); return 0; } XEN) mm.c:2643:d3 grant host mapping: pa:1697000 frame:0x14f0d4 (XEN) mm.c:2608:d3 create_pfn_to_mfn_mapping: pfn:1697 prev MFN:0x7e5f0 (XEN) mm.c:2621:d3 removing normal domain frame (XEN) memory.c:164:d3 guest_remove_page GMFN:0x1697, MFN:0x7e5f0 (XEN) memory.c:181:d3 guest_remove_page type_info:0xe8000001, count_info:0x80000004 (XEN) memory.c:192:d3 guest_remove_page page_is_removable:0x0 (XEN) common.c:2194:d3 sh_remove_all_mappings gmfn:0x7e5f0 (XEN) common.c:3098:d3 sh_p2m_remove_page removing gfn=0x1697 mfn=0x7e5f0 (XEN) common.c:2194:d3 sh_remove_all_mappings gmfn:0x7e5f0 (XEN) common.c:3137:d3 shadow_guest_physmap_add_page: gfn 0x1697, mfn 0x14f0d4 (XEN) common.c:3147:d3 shadow_guest_physmap_add_page: gfn 0x1697, omfn 0xffffffff (XEN) common.c:3172:d3 shadow_guest_physmap_add_page: sh_mfn_to_gfn ogfn 0x1697, mfn 0x14f0d4 (XEN) common.c:3204:d3 shadow_guest_physmap_add_page: shadow_set_p2m_entry gfn=0x1697 -> mfn 0x14f0d4 (XEN) common.c:3209:d3 shadow_guest_physmap_add_page: set_gpfn_from_mfn gfn=0x1697 -> mfn 0x14f0d4 I then attempt to read the newly re-mapped page from Windows and I get an error message from get_page_from_l1e() in mm.c /* Foreign mappings into guests in shadow external mode don''t * contribute to writeable mapping refcounts. (This allows the * qemu-dm helper process in dom0 to map the domain''s memory without * messing up the count of "real" writable mappings.) */ okay = (((l1e_get_flags(l1e) & _PAGE_RW) && !(unlikely(shadow_mode_external(d) && (d !current->domain)))) ? get_page_and_type(page, d, PGT_writable_page) : get_page(page, d)); if ( !okay ) { MEM_LOG("Error getting mfn %lx (pfn %lx) from L1 entry %" PRIpte " for dom%d", mfn, get_gpfn_from_mfn(mfn), l1e_get_intpte(l1e), d->domain_id); } It looks like get_page_and_type is returning 0. (XEN) mm.c:633:d3 l1e_get_flags(l1e) =0x63, shadow_mode_external(d)0x4000, current->domain=0x3,get_page_and_type=0x0, get_page(page, d)=0x0 (XEN) mm.c:639:d3 Error getting mfn 14f0d4 (pfn 1697) from L1 entry 000000014f0d4063 for dom3 Domain 2 is the grantor and Domain 3 is the grantee in this example. It appears to me that it is failing because dom3 is not the owner of the shared page if ( unlikely((x & PGC_count_mask) == 0) || /* Not allocated? */ unlikely((nx & PGC_count_mask) == 0) || /* Count overflow? */ unlikely(d != _domain) ) /* Wrong owner? */ Any suggestions? ________________________________ From: Keir Fraser [mailto:keir@xensource.com] Sent: Tuesday, July 03, 2007 5:04 AM To: Roger Cruz; xen-devel@lists.xensource.com Cc: Tim Deegan Subject: Re: [Xen-devel] Walking an HVM''s shadow page tables and other memory management questions. You are barking up the wrong tree by attempting to poke the mapping into a guest pte. After all, what would you poke? Guest PTEs address the guest-pseudo-physical space, in which the foreign page is not present. You actually want to follow ia64''s lead here. When running in ''auto-translate'' mode (i.e., on shadow page tables) then the guest address for a host mapping should not be interpreted as a virtual address but instead as a pseudo-physical address. So you will be mapping a grant reference into the pseudo-physical space and then a guest PTE can map the appropriate pseudo-physical frame number in the usual way. The slightly tricky bit is working out how to encode a grant-mapping in the p2m table. My advice would be to use a page-not-present encoding (p2m table entries are the same format as page-table entries) as this then lets you define special encodings of your choice with most of the remaining bits. Tim Deegan may be able to give more advice. -- Keir On 2/7/07 21:25, "Roger Cruz" <rcruz@marathontechnologies.com> wrote: Hello, I''m new to Xen and especially to the hypervisor code. I''m working off a 3.0.4.1 base and have the following questions regarding the memory management code for an x86, 32-bit platform (capable of supporting PAE). I''m doing some research into providing grant table hypercall support from a Windows 2003 HVM. I have made all the necessary changes to allow the hypercall to make it into the hypervisor and execute the correct grant table ops. I''m now testing the GNTTABOP_map_grant_ref with the GNTMAP_host_map and it correctly obtains the MFN from the grantor domain. It then attempts to take the HVM host VA address (a windows kernel VA from the non-paged pool) and walk the guest''s page table to obtain the PFN. I am building the hypervisor by simply typing "make xen" without any other configuration changes from a default source installation. The first problem I encountered is that it appears the code assumes the guest to be in PAE mode. In particular, guest_walk_tables() in xen/arch/x86/mm/shadow/multi.c, line 252 has this code snippet: #else /* PAE only... */ /* Get l3e from the cache of the guest''s top level table */ gw->l3e = (guest_l3e_t *)&v->arch.shadow.gl3e[guest_l3_table_offset(va)]; #endif /* PAE or 64... */ Which accesses the L3 entries fro the shadow page tables. When I instrument this code, I get l3e to be 0 as shown below (the line #s won''t match because of the instrumentation). (XEN) multi.c:236:d1 guest_walk_tables: va: 0x81699000. (XEN) multi.c:257:d1 guest_walk_tables: get l3e from cache: 0xff1a6ed0. (XEN) multi.c:263:d1 guest_walk_tables: l3e not present: 0x0. (XEN) multi.c:574:d1 sh_guest_map_l1e: va:81699000 If I add the /PAE switch to the boot.ini file, then I can get past this problem. Hence my statement that it appears the hypervisor is assuming guests are running with at least PAE mode enable, which may not be the case. Could someone please guide me here? The 2nd problem I encountered also has to do with walking the shadow page tables to obtain the MFN of the underlying Windows VA address. sh_guest_map_l1e(), Line 520 in the same file, has this code executed after it walks the guest page tables to obtain the walk_t gw variable. if ( gw.l2e && (guest_l2e_get_flags(*gw.l2e) & _PAGE_PRESENT) && !(guest_supports_superpages(v) && (guest_l2e_get_flags(*gw.l2e) & _PAGE_PSE)) ) (XEN) mm.c:2573:d1 grant host mapping: va:81696000 frame:0x15f140 (XEN) mm.c:2507:d1 grant va mapping: va:81696000 (XEN) multi.c:236:d1 guest_walk_tables: va: 0x81696000. (XEN) multi.c:257:d1 guest_walk_tables: get l3e from cache: 0xff1a6ed0. (XEN) multi.c:270:d1 guest_walk_tables: l3e flags: 0x1, pfn:0xe9a (XEN) , mfn:0x9e13d<G><1>multi.c:285:d1 hypervisor l2e mapped address 0xfec8b058 (XEN) multi.c:315:d1 large pages. 0x1e3 (XEN) multi.c:574:d1 sh_guest_map_l1e: va:81696000 (XEN) multi.c:579:d1 sh_guest_map_l1e: gw.l2e flags:0x1e3, supports large 1 (XEN) multi.c:596:d1 pl1e :0x0, (XEN) mm.c:2512:d1 Could not find L1 PTE for address 81696000 It looks like it specifically avoids mapping a superpage found in Windows PDE into the hypervisor''s virtual space, which I assume are 4KB-pages. What puzzles me is that for a hypercall to read the arguments from the caller''s guest space, it uses __hvm_copy which calls shadow_gva_to_gfn() to walk the guest''s shadow page tables to get to the underlying MFN. Couldn''t this code here also do the same? Thanks in advance for any insight into this area. Roger Cruz Principal SW Engineer Marathon Technologies Corp. 978-489-1153 ________________________________ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Jul-07 08:42 UTC
Re: [Xen-devel] Walking an HVM''s shadow page tables and other memory management questions.
The phys-to-machine mechanism needs to be taught about foreign mappings. Currently it will always assume that the pages mapped into the p2m belong to the local domain. For encoding I would suggest the p2m entries have PAGE_PRESENT clear and then use some special encoding of the N-1 other bits to indicate a foreign page. I expect then shadow code may need modifying to pass around a foreign domain pointer in some contexts. -- Keir On 6/7/07 20:09, "Roger Cruz" <rcruz@marathontechnologies.com> wrote:> It looks like get_page_and_type is returning 0. > > (XEN) mm.c:633:d3 l1e_get_flags(l1e) =0x63, shadow_mode_external(d)= 0x4000, > current->domain=0x3,get_page_and_type=0x0, get_page(page, d)=0x0 > (XEN) mm.c:639:d3 Error getting mfn 14f0d4 (pfn 1697) from L1 entry > 000000014f0d4063 for dom3 > > Domain 2 is the grantor and Domain 3 is the grantee in this example. It > appears to me that it is failing because dom3 is not the owner of the shared > page > > if ( unlikely((x & PGC_count_mask) == 0) || /* Not allocated? */ > unlikely((nx & PGC_count_mask) == 0) || /* Count overflow? */ > unlikely(d != _domain) ) /* Wrong owner? */ > > Any suggestions?_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel