Roger Cruz
2007-Jul-02 20:25 UTC
[Xen-devel] Walking an HVM''s shadow page tables and other memory management questions.
Hello,
I''m new to Xen and especially to the hypervisor code. I''m
working off a
3.0.4.1 base and have the following questions regarding the memory
management code for an x86, 32-bit platform (capable of supporting PAE).
I''m doing some research into providing grant table hypercall support
from a Windows 2003 HVM. I have made all the necessary changes to allow
the hypercall to make it into the hypervisor and execute the correct
grant table ops.
I''m now testing the GNTTABOP_map_grant_ref with the GNTMAP_host_map and
it correctly obtains the MFN from the grantor domain. It then attempts
to take the HVM host VA address (a windows kernel VA from the non-paged
pool) and walk the guest''s page table to obtain the PFN. I am building
the hypervisor by simply typing "make xen" without any other
configuration changes from a default source installation.
The first problem I encountered is that it appears the code assumes the
guest to be in PAE mode. In particular, guest_walk_tables() in
xen/arch/x86/mm/shadow/multi.c, line 252 has this code snippet:
#else /* PAE only... */
/* Get l3e from the cache of the guest''s top level table */
gw->l3e = (guest_l3e_t
*)&v->arch.shadow.gl3e[guest_l3_table_offset(va)];
#endif /* PAE or 64... */
Which accesses the L3 entries fro the shadow page tables. When I
instrument this code, I get l3e to be 0 as shown below (the line #s
won''t match because of the instrumentation).
(XEN) multi.c:236:d1 guest_walk_tables: va: 0x81699000.
(XEN) multi.c:257:d1 guest_walk_tables: get l3e from cache: 0xff1a6ed0.
(XEN) multi.c:263:d1 guest_walk_tables: l3e not present: 0x0.
(XEN) multi.c:574:d1 sh_guest_map_l1e: va:81699000
If I add the /PAE switch to the boot.ini file, then I can get past this
problem. Hence my statement that it appears the hypervisor is assuming
guests are running with at least PAE mode enable, which may not be the
case. Could someone please guide me here?
The 2nd problem I encountered also has to do with walking the shadow
page tables to obtain the MFN of the underlying Windows VA address.
sh_guest_map_l1e(), Line 520 in the same file, has this code executed
after it walks the guest page tables to obtain the walk_t gw variable.
if ( gw.l2e &&
(guest_l2e_get_flags(*gw.l2e) & _PAGE_PRESENT) &&
!(guest_supports_superpages(v) && (guest_l2e_get_flags(*gw.l2e)
& _PAGE_PSE)) )
(XEN) mm.c:2573:d1 grant host mapping: va:81696000 frame:0x15f140
(XEN) mm.c:2507:d1 grant va mapping: va:81696000
(XEN) multi.c:236:d1 guest_walk_tables: va: 0x81696000.
(XEN) multi.c:257:d1 guest_walk_tables: get l3e from cache: 0xff1a6ed0.
(XEN) multi.c:270:d1 guest_walk_tables: l3e flags: 0x1, pfn:0xe9a
(XEN) , mfn:0x9e13d<G><1>multi.c:285:d1 hypervisor l2e mapped
address
0xfec8b058
(XEN) multi.c:315:d1 large pages. 0x1e3
(XEN) multi.c:574:d1 sh_guest_map_l1e: va:81696000
(XEN) multi.c:579:d1 sh_guest_map_l1e: gw.l2e flags:0x1e3, supports
large 1
(XEN) multi.c:596:d1 pl1e :0x0,
(XEN) mm.c:2512:d1 Could not find L1 PTE for address 81696000
It looks like it specifically avoids mapping a superpage found in
Windows PDE into the hypervisor''s virtual space, which I assume are
4KB-pages. What puzzles me is that for a hypercall to read the
arguments from the caller''s guest space, it uses __hvm_copy which calls
shadow_gva_to_gfn() to walk the guest''s shadow page tables to get to
the
underlying MFN. Couldn''t this code here also do the same?
Thanks in advance for any insight into this area.
Roger Cruz
Principal SW Engineer
Marathon Technologies Corp.
978-489-1153
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Keir Fraser
2007-Jul-03 09:04 UTC
Re: [Xen-devel] Walking an HVM''s shadow page tables and other memory management questions.
You are barking up the wrong tree by attempting to poke the mapping into a guest pte. After all, what would you poke? Guest PTEs address the guest-pseudo-physical space, in which the foreign page is not present. You actually want to follow ia64¹s lead here. When running in auto-translate¹ mode (i.e., on shadow page tables) then the guest address for a host mapping should not be interpreted as a virtual address but instead as a pseudo-physical address. So you will be mapping a grant reference into the pseudo-physical space and then a guest PTE can map the appropriate pseudo-physical frame number in the usual way. The slightly tricky bit is working out how to encode a grant-mapping in the p2m table. My advice would be to use a page-not-present encoding (p2m table entries are the same format as page-table entries) as this then lets you define special encodings of your choice with most of the remaining bits. Tim Deegan may be able to give more advice. -- Keir On 2/7/07 21:25, "Roger Cruz" <rcruz@marathontechnologies.com> wrote:> Hello, > > I¹m new to Xen and especially to the hypervisor code. I¹m working off a > 3.0.4.1 base and have the following questions regarding the memory management > code for an x86, 32-bit platform (capable of supporting PAE). I¹m doing some > research into providing grant table hypercall support from a Windows 2003 HVM. > I have made all the necessary changes to allow the hypercall to make it into > the hypervisor and execute the correct grant table ops. > > I¹m now testing the GNTTABOP_map_grant_ref with the GNTMAP_host_map and it > correctly obtains the MFN from the grantor domain. It then attempts to take > the HVM host VA address (a windows kernel VA from the non-paged pool) and walk > the guest¹s page table to obtain the PFN. I am building the hypervisor by > simply typing ³make xen² without any other configuration changes from a > default source installation. > > The first problem I encountered is that it appears the code assumes the guest > to be in PAE mode. In particular, guest_walk_tables() in > xen/arch/x86/mm/shadow/multi.c, line 252 has this code snippet: > > #else /* PAE only... */ > /* Get l3e from the cache of the guest''s top level table */ > gw->l3e = (guest_l3e_t *)&v->arch.shadow.gl3e[guest_l3_table_offset(va)]; > #endif /* PAE or 64... */ > > Which accesses the L3 entries fro the shadow page tables. When I instrument > this code, I get l3e to be 0 as shown below (the line #s won¹t match because > of the instrumentation). > > (XEN) multi.c:236:d1 guest_walk_tables: va: 0x81699000. > (XEN) multi.c:257:d1 guest_walk_tables: get l3e from cache: 0xff1a6ed0. > (XEN) multi.c:263:d1 guest_walk_tables: l3e not present: 0x0. > (XEN) multi.c:574:d1 sh_guest_map_l1e: va:81699000 > > If I add the /PAE switch to the boot.ini file, then I can get past this > problem. Hence my statement that it appears the hypervisor is assuming guests > are running with at least PAE mode enable, which may not be the case. Could > someone please guide me here? > > The 2nd problem I encountered also has to do with walking the shadow page > tables to obtain the MFN of the underlying Windows VA address. > sh_guest_map_l1e(), Line 520 in the same file, has this code executed after it > walks the guest page tables to obtain the walk_t gw variable. > > if ( gw.l2e && > (guest_l2e_get_flags(*gw.l2e) & _PAGE_PRESENT) && > !(guest_supports_superpages(v) && (guest_l2e_get_flags(*gw.l2e) & > _PAGE_PSE)) ) > > > (XEN) mm.c:2573:d1 grant host mapping: va:81696000 frame:0x15f140 > (XEN) mm.c:2507:d1 grant va mapping: va:81696000 > (XEN) multi.c:236:d1 guest_walk_tables: va: 0x81696000. > (XEN) multi.c:257:d1 guest_walk_tables: get l3e from cache: 0xff1a6ed0. > (XEN) multi.c:270:d1 guest_walk_tables: l3e flags: 0x1, pfn:0xe9a > (XEN) , mfn:0x9e13d<G><1>multi.c:285:d1 hypervisor l2e mapped address > 0xfec8b058 > (XEN) multi.c:315:d1 large pages. 0x1e3 > (XEN) multi.c:574:d1 sh_guest_map_l1e: va:81696000 > (XEN) multi.c:579:d1 sh_guest_map_l1e: gw.l2e flags:0x1e3, supports large 1 > (XEN) multi.c:596:d1 pl1e :0x0, > (XEN) mm.c:2512:d1 Could not find L1 PTE for address 81696000 > > > It looks like it specifically avoids mapping a superpage found in Windows PDE > into the hypervisor¹s virtual space, which I assume are 4KB-pages. What > puzzles me is that for a hypercall to read the arguments from the caller¹s > guest space, it uses __hvm_copy which calls shadow_gva_to_gfn() to walk the > guest¹s shadow page tables to get to the underlying MFN. Couldn¹t this code > here also do the same? > > Thanks in advance for any insight into this area. > > > Roger Cruz > Principal SW Engineer > Marathon Technologies Corp. > 978-489-1153 > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Roger Cruz
2007-Jul-06 19:09 UTC
RE: [Xen-devel] Walking an HVM''s shadow page tables and other memory management questions.
I took your suggestion and passed in from Windows code the PFN
(indirectly) to be associated with the MFN found in the grant table
entry referenced.
I modified create_grant_host_mapping() in mm.c to have an additional IF
statement based on a new flag I added:
// RRC: new code.
if ( flags & GNTMAP_contains_physaddr )
return create_pfn_to_mfn_mapping(addr, frame, current);
The new routine does something similar to the XENMEM_add_to_physmap
code. This code appear to do what I wanted, which is to replace the
mapping from GPFN to MFN. When I run it, it does appear to change the
mapping correctly as you can see from my debug statements:
// RRC: new routine to use with grant table map ref.
static int create_pfn_to_mfn_mapping(
uint64_t addr, mfn_t mfn, struct vcpu *v)
{
unsigned long prev_mfn, gpfn;
struct domain *d = v->domain;
// This call is only valid for translated domains.
// The MFN specified must not be 0.
if ( !shadow_mode_translate(d) || (mfn == 0) )
{
return -EINVAL;
}
// Get the frame number.
gpfn = (unsigned long)(addr >> PAGE_SHIFT);
LOCK_BIGLOCK(d);
/* Remove previously mapped page if it was present. */
prev_mfn = gmfn_to_mfn(d, gpfn);
// RRC: debug
gdprintk(XENLOG_WARNING, "create_pfn_to_mfn_mapping:
pfn:%"PRIx64"
prev MFN:0x%x\n",
(u64)gpfn, (u32)prev_mfn);
if ( mfn_valid(prev_mfn) )
{
if ( IS_XEN_HEAP_FRAME(mfn_to_page(prev_mfn)) )
{
gdprintk(XENLOG_WARNING, "removing xen heap frame\n");
/* Xen heap frames are simply unhooked from this phys slot.
*/
guest_physmap_remove_page(d, gpfn, prev_mfn);
}
else
{
gdprintk(XENLOG_WARNING, "removing normal domain
frame\n");
/* Normal domain memory is freed, to avoid leaking memory.
*/
guest_remove_page(d, gpfn);
}
}
/* Map at new location. */
guest_physmap_add_page(d, gpfn, mfn);
UNLOCK_BIGLOCK(d);
return 0;
}
XEN) mm.c:2643:d3 grant host mapping: pa:1697000 frame:0x14f0d4
(XEN) mm.c:2608:d3 create_pfn_to_mfn_mapping: pfn:1697 prev MFN:0x7e5f0
(XEN) mm.c:2621:d3 removing normal domain frame
(XEN) memory.c:164:d3 guest_remove_page GMFN:0x1697, MFN:0x7e5f0
(XEN) memory.c:181:d3 guest_remove_page type_info:0xe8000001,
count_info:0x80000004
(XEN) memory.c:192:d3 guest_remove_page page_is_removable:0x0
(XEN) common.c:2194:d3 sh_remove_all_mappings gmfn:0x7e5f0
(XEN) common.c:3098:d3 sh_p2m_remove_page removing gfn=0x1697
mfn=0x7e5f0
(XEN) common.c:2194:d3 sh_remove_all_mappings gmfn:0x7e5f0
(XEN) common.c:3137:d3 shadow_guest_physmap_add_page: gfn 0x1697, mfn
0x14f0d4
(XEN) common.c:3147:d3 shadow_guest_physmap_add_page: gfn 0x1697, omfn
0xffffffff
(XEN) common.c:3172:d3 shadow_guest_physmap_add_page: sh_mfn_to_gfn ogfn
0x1697, mfn 0x14f0d4
(XEN) common.c:3204:d3 shadow_guest_physmap_add_page:
shadow_set_p2m_entry gfn=0x1697 -> mfn 0x14f0d4
(XEN) common.c:3209:d3 shadow_guest_physmap_add_page: set_gpfn_from_mfn
gfn=0x1697 -> mfn 0x14f0d4
I then attempt to read the newly re-mapped page from Windows and I get
an error message from get_page_from_l1e() in mm.c
/* Foreign mappings into guests in shadow external mode don''t
* contribute to writeable mapping refcounts. (This allows the
* qemu-dm helper process in dom0 to map the domain''s memory
without
* messing up the count of "real" writable mappings.) */
okay = (((l1e_get_flags(l1e) & _PAGE_RW) &&
!(unlikely(shadow_mode_external(d) && (d
!current->domain))))
? get_page_and_type(page, d, PGT_writable_page)
: get_page(page, d));
if ( !okay )
{
MEM_LOG("Error getting mfn %lx (pfn %lx) from L1 entry %"
PRIpte
" for dom%d",
mfn, get_gpfn_from_mfn(mfn),
l1e_get_intpte(l1e), d->domain_id);
}
It looks like get_page_and_type is returning 0.
(XEN) mm.c:633:d3 l1e_get_flags(l1e) =0x63, shadow_mode_external(d)0x4000,
current->domain=0x3,get_page_and_type=0x0, get_page(page, d)=0x0
(XEN) mm.c:639:d3 Error getting mfn 14f0d4 (pfn 1697) from L1 entry
000000014f0d4063 for dom3
Domain 2 is the grantor and Domain 3 is the grantee in this example. It
appears to me that it is failing because dom3 is not the owner of the
shared page
if ( unlikely((x & PGC_count_mask) == 0) || /* Not allocated?
*/
unlikely((nx & PGC_count_mask) == 0) || /* Count overflow?
*/
unlikely(d != _domain) ) /* Wrong owner? */
Any suggestions?
________________________________
From: Keir Fraser [mailto:keir@xensource.com]
Sent: Tuesday, July 03, 2007 5:04 AM
To: Roger Cruz; xen-devel@lists.xensource.com
Cc: Tim Deegan
Subject: Re: [Xen-devel] Walking an HVM''s shadow page tables and other
memory management questions.
You are barking up the wrong tree by attempting to poke the mapping into
a guest pte. After all, what would you poke? Guest PTEs address the
guest-pseudo-physical space, in which the foreign page is not present.
You actually want to follow ia64''s lead here. When running in
''auto-translate'' mode (i.e., on shadow page tables) then the
guest
address for a host mapping should not be interpreted as a virtual
address but instead as a pseudo-physical address.
So you will be mapping a grant reference into the pseudo-physical space
and then a guest PTE can map the appropriate pseudo-physical frame
number in the usual way. The slightly tricky bit is working out how to
encode a grant-mapping in the p2m table. My advice would be to use a
page-not-present encoding (p2m table entries are the same format as
page-table entries) as this then lets you define special encodings of
your choice with most of the remaining bits.
Tim Deegan may be able to give more advice.
-- Keir
On 2/7/07 21:25, "Roger Cruz" <rcruz@marathontechnologies.com>
wrote:
Hello,
I''m new to Xen and especially to the hypervisor code. I''m
working off a
3.0.4.1 base and have the following questions regarding the memory
management code for an x86, 32-bit platform (capable of supporting PAE).
I''m doing some research into providing grant table hypercall support
from a Windows 2003 HVM. I have made all the necessary changes to allow
the hypercall to make it into the hypervisor and execute the correct
grant table ops.
I''m now testing the GNTTABOP_map_grant_ref with the GNTMAP_host_map and
it correctly obtains the MFN from the grantor domain. It then attempts
to take the HVM host VA address (a windows kernel VA from the non-paged
pool) and walk the guest''s page table to obtain the PFN. I am building
the hypervisor by simply typing "make xen" without any other
configuration changes from a default source installation.
The first problem I encountered is that it appears the code assumes the
guest to be in PAE mode. In particular, guest_walk_tables() in
xen/arch/x86/mm/shadow/multi.c, line 252 has this code snippet:
#else /* PAE only... */
/* Get l3e from the cache of the guest''s top level table */
gw->l3e = (guest_l3e_t
*)&v->arch.shadow.gl3e[guest_l3_table_offset(va)];
#endif /* PAE or 64... */
Which accesses the L3 entries fro the shadow page tables. When I
instrument this code, I get l3e to be 0 as shown below (the line #s
won''t match because of the instrumentation).
(XEN) multi.c:236:d1 guest_walk_tables: va: 0x81699000.
(XEN) multi.c:257:d1 guest_walk_tables: get l3e from cache: 0xff1a6ed0.
(XEN) multi.c:263:d1 guest_walk_tables: l3e not present: 0x0.
(XEN) multi.c:574:d1 sh_guest_map_l1e: va:81699000
If I add the /PAE switch to the boot.ini file, then I can get past this
problem. Hence my statement that it appears the hypervisor is assuming
guests are running with at least PAE mode enable, which may not be the
case. Could someone please guide me here?
The 2nd problem I encountered also has to do with walking the shadow
page tables to obtain the MFN of the underlying Windows VA address.
sh_guest_map_l1e(), Line 520 in the same file, has this code executed
after it walks the guest page tables to obtain the walk_t gw variable.
if ( gw.l2e &&
(guest_l2e_get_flags(*gw.l2e) & _PAGE_PRESENT) &&
!(guest_supports_superpages(v) && (guest_l2e_get_flags(*gw.l2e)
& _PAGE_PSE)) )
(XEN) mm.c:2573:d1 grant host mapping: va:81696000 frame:0x15f140
(XEN) mm.c:2507:d1 grant va mapping: va:81696000
(XEN) multi.c:236:d1 guest_walk_tables: va: 0x81696000.
(XEN) multi.c:257:d1 guest_walk_tables: get l3e from cache: 0xff1a6ed0.
(XEN) multi.c:270:d1 guest_walk_tables: l3e flags: 0x1, pfn:0xe9a
(XEN) , mfn:0x9e13d<G><1>multi.c:285:d1 hypervisor l2e mapped
address
0xfec8b058
(XEN) multi.c:315:d1 large pages. 0x1e3
(XEN) multi.c:574:d1 sh_guest_map_l1e: va:81696000
(XEN) multi.c:579:d1 sh_guest_map_l1e: gw.l2e flags:0x1e3, supports
large 1
(XEN) multi.c:596:d1 pl1e :0x0,
(XEN) mm.c:2512:d1 Could not find L1 PTE for address 81696000
It looks like it specifically avoids mapping a superpage found in
Windows PDE into the hypervisor''s virtual space, which I assume are
4KB-pages. What puzzles me is that for a hypercall to read the
arguments from the caller''s guest space, it uses __hvm_copy which calls
shadow_gva_to_gfn() to walk the guest''s shadow page tables to get to
the
underlying MFN. Couldn''t this code here also do the same?
Thanks in advance for any insight into this area.
Roger Cruz
Principal SW Engineer
Marathon Technologies Corp.
978-489-1153
________________________________
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Keir Fraser
2007-Jul-07 08:42 UTC
Re: [Xen-devel] Walking an HVM''s shadow page tables and other memory management questions.
The phys-to-machine mechanism needs to be taught about foreign mappings. Currently it will always assume that the pages mapped into the p2m belong to the local domain. For encoding I would suggest the p2m entries have PAGE_PRESENT clear and then use some special encoding of the N-1 other bits to indicate a foreign page. I expect then shadow code may need modifying to pass around a foreign domain pointer in some contexts. -- Keir On 6/7/07 20:09, "Roger Cruz" <rcruz@marathontechnologies.com> wrote:> It looks like get_page_and_type is returning 0. > > (XEN) mm.c:633:d3 l1e_get_flags(l1e) =0x63, shadow_mode_external(d)= 0x4000, > current->domain=0x3,get_page_and_type=0x0, get_page(page, d)=0x0 > (XEN) mm.c:639:d3 Error getting mfn 14f0d4 (pfn 1697) from L1 entry > 000000014f0d4063 for dom3 > > Domain 2 is the grantor and Domain 3 is the grantee in this example. It > appears to me that it is failing because dom3 is not the owner of the shared > page > > if ( unlikely((x & PGC_count_mask) == 0) || /* Not allocated? */ > unlikely((nx & PGC_count_mask) == 0) || /* Count overflow? */ > unlikely(d != _domain) ) /* Wrong owner? */ > > Any suggestions?_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel