Hollis Blanchard
2005-Nov-16 23:24 UTC
[Xen-devel] FYI: userland <-> hypervisor parameter passing
This is the changeset I just committed to the PowerPC tree to pass parameters between userland and Xen on PPC. If you''ll recall, the problem is that the userland tools (libxc) pass virtual addresses all the way down to the hypervisor. These pointers are worthless when userspace and the hypervisor run in separate address spaces. The kernel could easily perform the virtual->machine address translation using the pfn2mfn arrays, except that these data structures contain virtual pointers to still *other* structures. It is not feasible for the kernel to understand and munge all the data structures passed in to perform this translation, then translate back on the return path. Another possibility I have not investigated would be to replace the nested pointers with nested structures. This would be a more invasive change, and currently we''re trying to get by on PPC without any/many architecture-neutral changes. From previous conversations, I understand that the x86-64 hypervisor runs in a separate virtual address space. However, due to the layout of the x86 page tables, it''s relatively straightforward (though not ideal) to walk the page tables and perform the translation in software. The approach taken in this patch may speed up the translation currently done on x86-64, or it may not. On PowerPC, Xen runs in real mode (untranslated mode), which is also a separate address space. However, PowerPC''s page tables are not easily walkable in software, so that option is not available to us. Instead, this patch has userspace allocate all structures from the same page, with some translation information stuffed at the base of the page. Userland records the virtual base of the page, and the kernel records the physical base. (We have not yet implemented the pfn2mfn macros, but the hypervisor can easily convert physical->machine addresses.) The Xen copy_to/from_user() functions operate on virtual addresses, masking with PAGE_SIZE to find the needed translation information. This translation is currently only done for dom0_op and memory_op hcalls, which are the offenders I''ve run into so far. Multi-page data structures are not yet supported. I think we will need to allocate the needed pages in userspace, then have kernel and hypervisor cooperation to rearrange them to be machine-contiguous. I''m not really looking forward to that. The x86 implementation should be much simpler than PowerPC. xencomm_alloc() would be malloc+mlock, and xencomm_free() would be munlock. I have not yet implemented this; if there is interest in this approach I may. [Side note: "domctrl" is a custom domain builder we''re using because libxc needs some portability love that nobody seems interested in at the moment (probably justifiably so). domctrl directly shares the bottom parts of libxc anyways, which is why you see libxc patches in this changeset.] PowerPC Xen tree: http://xenbits.xensource.com/ext/xenppc-unstable.hg PowerPC Xen Linux tree: http://xenbits.xensource.com/ext/linux-ppc-2.6.hg -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2005-Nov-16 23:50 UTC
RE: [Xen-devel] FYI: userland <-> hypervisor parameter passing
> From previous conversations, I understand that the x86-64 > hypervisor runs in a separate virtual address space. However, > due to the layout of the x86 page tables, it''s relatively > straightforward (though not ideal) to walk the page tables > and perform the translation in software. The approach taken > in this patch may speed up the translation currently done on > x86-64, or it may not.That''s not actually the case -- the hypervisor shares an address space with the kernel. However, for VT/Pacifica (hvm) domains the hypervisor is in a separate address space. Thus, there are similar issues with paravirtualized device drivers calling into the hypervisor within hvm domains.> This translation is currently only done for > dom0_op and memory_op hcalls, which are the offenders I''ve > run into so far.In the hvm paravirt driver case its memory and grant table ops that we need to call from the kernel. This is handled by pre-registering the memory and avoiding external references. Even so, current patches aint pretty yet, though I think they can be made so. It''s dom0 ops that really cause the problems because of the wide use of pass-by-ptr. Since they''re typically non performance critical, it makes me wander whether we should treat them differently. Just how slow is looking up a virtual address on power?> [Side note: "domctrl" is a custom domain builder we''re using > because libxc needs some portability love that nobody seems > interested in at the moment (probably justifiably so).Yeah, this isn''t the time to crack things open -- that''s why we''ve been pushing back on the paravirt driver patches. Early in 3.1 we''d better have figured this out... Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2005-Nov-17 10:45 UTC
Re: [Xen-devel] FYI: userland <-> hypervisor parameter passing
On 16 Nov 2005, at 23:24, Hollis Blanchard wrote:> Instead, this patch has userspace allocate all structures from the > same page, > with some translation information stuffed at the base of the page. > Userland > records the virtual base of the page, and the kernel records the > physical > base. (We have not yet implemented the pfn2mfn macros, but the > hypervisor can > easily convert physical->machine addresses.) The Xen > copy_to/from_user() > functions operate on virtual addresses, masking with PAGE_SIZE to find > the > needed translation information. This translation is currently only > done for > dom0_op and memory_op hcalls, which are the offenders I''ve run into so > far.I think it''d be cleaner to have a completely separate ''xencomm'' communications address space. Then xencomm_alloc() would return two pointers: 1. Local virtual address that caller can use to read/write the allocated block. 2. xencomm pointer that gets passed to Xen during hypercall Your new_xencomm() allocates a new page of memory, then passes the MFN to Xen as part of a new ''allocate xencomm addr space'' hypercall. Xen returns the allocated xencomm address for that MFN (or batch of MFNs), thus establishing the xencomm<->machine mapping in both Xen and in the guest. copy_to_user/copy_from_user and friends all expect the user pointer to be in xencomm address space, and will do a xencomm->xen-virtual-address translation (the xen-virtual-address mapping is created during the establish hypercall described above). The above is what I intend to implement for x86 after 3.0 (it''s easy to maintain backward compatibility), so as long as any proposed generic interface allows the above implementation then I''m happy. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hollis Blanchard
2005-Nov-17 18:53 UTC
Re: [Xen-devel] FYI: userland <-> hypervisor parameter passing
On Thursday 17 November 2005 04:45, Keir Fraser wrote:> > I think it''d be cleaner to have a completely separate ''xencomm'' > communications address space. Then xencomm_alloc() would return two > pointers: > 1. Local virtual address that caller can use to read/write the > allocated block. > 2. xencomm pointer that gets passed to Xen during hypercall > > Your new_xencomm() allocates a new page of memory, then passes the MFN > to Xen as part of a new ''allocate xencomm addr space'' hypercall. Xen > returns the allocated xencomm address for that MFN (or batch of MFNs), > thus establishing the xencomm<->machine mapping in both Xen and in the > guest.Ok, so all the nested pointers will be in the machine address space. I think that will require some rework of the layering in xc_private.c, since functions like xc_add_mmu_update() dereference the pointers they''re passed.> copy_to_user/copy_from_user and friends all expect the user pointer to > be in xencomm address space, and will do a xencomm->xen-virtual-address > translation (the xen-virtual-address mapping is created during the > establish hypercall described above). > > The above is what I intend to implement for x86 after 3.0 (it''s easy to > maintain backward compatibility), so as long as any proposed generic > interface allows the above implementation then I''m happy.Perfectly reasonable. If it didn''t require so much out-of-tree libxc hacking, I would volunteer to implement this design right now. So I think I should leave the PPC implementation as-is for now, but I will be looking forward to post-3.0 (as I suppose we all will, for various reasons ;) . -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2005-Nov-18 11:35 UTC
Re: [Xen-devel] FYI: userland <-> hypervisor parameter passing
On 17 Nov 2005, at 18:53, Hollis Blanchard wrote:> Ok, so all the nested pointers will be in the machine address space. I > think > that will require some rework of the layering in xc_private.c, since > functions like xc_add_mmu_update() dereference the pointers they''re > passed.Yes, that''s true. Hopefully we can remove that, or worst case will need a xencomm->virtual reverse translation function....> Perfectly reasonable. If it didn''t require so much out-of-tree libxc > hacking, > I would volunteer to implement this design right now.Yeah, it''s going to cause some short-term pain.> So I think I should leave the PPC implementation as-is for now, but I > will be > looking forward to post-3.0 (as I suppose we all will, for various > reasons ;) .:-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel