Hollis Blanchard
2005-Aug-17 19:51 UTC
[Xen-devel] passing hypercall parameters by pointer
Many Xen hypercalls pass mlocked pointers as parameters for both input and output. For example, xc_get_pfn_list() is a nice one with multiple levels of structures/mlocking. Considering just the tools for the moment, those pointers are userspace addresses. Ultimately the hypervisor ends up with that userspace address, from which it reads and writes data. This is OK for x86, since userspace, kernel, and hypervisor all share the same virtual address space (and userspace has carefully mlocked the relevent memory). On PowerPC though, the hypervisor runs in real mode (no MMU translation). Unlike x86, PowerPC exceptions arrive in real mode, and also PowerPC does not force a TLB flush when switching between real and virtual modes. So a virtual address is pretty much worthless as a hypervisor parameter; performing the MMU translation in software is infeasible. Although it rarely passes parameters by pointer, the way the pSeries hypervisor handles this is having the kernel always pass a "pseudo-physical" address (to borrow Xen terminology), which is trivially translatable to a "machine" address in the hypervisor. The processor has some notion of a large (e.g. 64M) chunk of contiguous machine memory, so the hypervisor keeps a table of chunks which can be used to translate pseudo-physical addresses. Of course, userspace doesn''t know psuedo-physical addresses, only the kernel does. So one way or another, to pass parameters by pointer to the PPC hypervisor, the kernel is going to need to translate them. That also means userspace memory areas will be limited to one page (since virtually consecutive pages may not be representable by a single pseudo-physical address). If we''re stuck with structure addresses in hypercalls, one possible solution is to modify libxc so that all parameter addresses are physical pointers within the same page, then pass that page''s physical address into the hypercall. Something like this: ulong magicpage_vaddr; ulong magicpage_paddr; libxc_init() { #ifdef __powerpc__ posix_memalign(&magicpage_vaddr, PAGE_SIZE, PAGE_SIZE); mlock(magicpage_vaddr); magicpage_paddr = new_translate_syscall(magicpage_vaddr); #endif ... } xc_get_pfn_list() { dom0_op_t *op; ulong op_paddr; magicalloc(&op, &op_paddr, sizeof(dom0_op_t)); ... } #ifdef __powerpc__ magicalloc(ulong &usable_addr, ulong &hcall_addr, int bytes) { *usable_addr = magicpage_vaddr + offset; *hcall_addr = magicpage_paddr + offset; offset += bytes; } do_xen_hypercall(ptr) { ptr -= magicpage_vaddr - magicpage_paddr; do_privcmd(..., ptr); } #endif (Note that this is for discussion only, not a proposed interface.) Each architecture would provide their own magicalloc and do_xen_hypercall, and for x86 magicalloc would be malloc+mlock and both pointers are the same. x86 do_xen_hypercall would remain unchanged. Basically, any current use of mlock in libxc would be replaced with calls to magicalloc. For example, if we''re willing to change the embedded pointers in dom0_ops to offsets, we do not need to invent a new "translate" system call. Other suggestions are welcome. -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Many Xen hypercalls pass mlocked pointers as parameters for > both input and output. For example, xc_get_pfn_list() is a > nice one with multiple levels of structures/mlocking. > > Considering just the tools for the moment, those pointers are > userspace addresses. Ultimately the hypervisor ends up with > that userspace address, from which it reads and writes data. > This is OK for x86, since userspace, kernel, and hypervisor > all share the same virtual address space (and userspace has > carefully mlocked the relevent memory). > > On PowerPC though, the hypervisor runs in real mode (no MMU > translation). > Unlike x86, PowerPC exceptions arrive in real mode, and also > PowerPC does not force a TLB flush when switching between > real and virtual modes. So a virtual address is pretty much > worthless as a hypervisor parameter; performing the MMU > translation in software is infeasible.I think I''d prefer to hide all of this by co-operation between the kernel and the hypervisor''s copy to/from user. The kernel can easily translate a virtual address and length into a list of psuedo-phyiscal frame numbers and initial offset. Xen''s copy from user function can then use this list when doing its work. Ian> Although it rarely passes parameters by pointer, the way the > pSeries hypervisor handles this is having the kernel always > pass a "pseudo-physical" > address (to borrow Xen terminology), which is trivially > translatable to a "machine" address in the hypervisor. The > processor has some notion of a large (e.g. 64M) chunk of > contiguous machine memory, so the hypervisor keeps a table of > chunks which can be used to translate pseudo-physical addresses. > > Of course, userspace doesn''t know psuedo-physical addresses, > only the kernel does. So one way or another, to pass > parameters by pointer to the PPC hypervisor, the kernel is > going to need to translate them. That also means userspace > memory areas will be limited to one page (since virtually > consecutive pages may not be representable by a single > pseudo-physical address). > > If we''re stuck with structure addresses in hypercalls, one > possible solution is to modify libxc so that all parameter > addresses are physical pointers within the same page, then > pass that page''s physical address into the hypercall. > Something like this: > > ulong magicpage_vaddr; > ulong magicpage_paddr; > > libxc_init() { > #ifdef __powerpc__ > posix_memalign(&magicpage_vaddr, PAGE_SIZE, PAGE_SIZE); > mlock(magicpage_vaddr); > magicpage_paddr = new_translate_syscall(magicpage_vaddr); > #endif > ... > } > > xc_get_pfn_list() { > dom0_op_t *op; > ulong op_paddr; > magicalloc(&op, &op_paddr, sizeof(dom0_op_t)); > ... > } > > #ifdef __powerpc__ > magicalloc(ulong &usable_addr, ulong &hcall_addr, int bytes) { > *usable_addr = magicpage_vaddr + offset; > *hcall_addr = magicpage_paddr + offset; > offset += bytes; > } > > do_xen_hypercall(ptr) { > ptr -= magicpage_vaddr - magicpage_paddr; > do_privcmd(..., ptr); > } > #endif > > (Note that this is for discussion only, not a proposed interface.) > > Each architecture would provide their own magicalloc and > do_xen_hypercall, and for x86 magicalloc would be > malloc+mlock and both pointers are the same. x86 > do_xen_hypercall would remain unchanged. Basically, any > current use of mlock in libxc would be replaced with calls to > magicalloc. > > For example, if we''re willing to change the embedded pointers > in dom0_ops to offsets, we do not need to invent a new > "translate" system call. > > Other suggestions are welcome. > > -- > Hollis Blanchard > IBM Linux Technology Center > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt wrote:>>Many Xen hypercalls pass mlocked pointers as parameters for >>both input and output. For example, xc_get_pfn_list() is a >>nice one with multiple levels of structures/mlocking. >> >>Considering just the tools for the moment, those pointers are >>userspace addresses. Ultimately the hypervisor ends up with >>that userspace address, from which it reads and writes data. >>This is OK for x86, since userspace, kernel, and hypervisor >>all share the same virtual address space (and userspace has >>carefully mlocked the relevent memory).This is a problem even on x86 for VMX domains which execute hypercalls because of para virtualized device drivers.>> >>On PowerPC though, the hypervisor runs in real mode (no MMU >>translation). >>Unlike x86, PowerPC exceptions arrive in real mode, and also >>PowerPC does not force a TLB flush when switching between >>real and virtual modes. So a virtual address is pretty much >>worthless as a hypervisor parameter; performing the MMU >>translation in software is infeasible. > > > I think I''d prefer to hide all of this by co-operation between the > kernel and the hypervisor''s copy to/from user. >This is basically what Xiaofeng attempted to do in this patch: http://article.gmane.org/gmane.comp.emulators.xen.devel/11107 although the virtual -> pseudo physical is also done in the hypervisor. Please let us know if the patch is acceptable in light of your email.> The kernel can easily translate a virtual address and length into a list > of psuedo-phyiscal frame numbers and initial offset. Xen''s copy from > user function can then use this list when doing its work.The other alternative (which we talked about at OLS) is to use a couple of pinned pages for parameter passing - but it doesn''t work very well for: a) Multiple levels of structures/pointers b) Arguments which may be bigger than a couple of pages (xc_get_pfn_list() for a bigmem domain for example). -Arun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hollis Blanchard
2005-Aug-17 22:04 UTC
Re: [Xen-devel] passing hypercall parameters by pointer
On Wednesday 17 August 2005 15:44, Ian Pratt wrote:> > Many Xen hypercalls pass mlocked pointers as parameters for > > both input and output. For example, xc_get_pfn_list() is a > > nice one with multiple levels of structures/mlocking. > > > > Considering just the tools for the moment, those pointers are > > userspace addresses. Ultimately the hypervisor ends up with > > that userspace address, from which it reads and writes data. > > This is OK for x86, since userspace, kernel, and hypervisor > > all share the same virtual address space (and userspace has > > carefully mlocked the relevent memory). > > > > On PowerPC though, the hypervisor runs in real mode (no MMU > > translation). > > Unlike x86, PowerPC exceptions arrive in real mode, and also > > PowerPC does not force a TLB flush when switching between > > real and virtual modes. So a virtual address is pretty much > > worthless as a hypervisor parameter; performing the MMU > > translation in software is infeasible. > > I think I''d prefer to hide all of this by co-operation between the > kernel and the hypervisor''s copy to/from user. > > The kernel can easily translate a virtual address and length into a list > of psuedo-phyiscal frame numbers and initial offset. Xen''s copy from > user function can then use this list when doing its work.Could you elaborate a little? Consider this structure: typedef struct { /* IN variables. */ domid_t domain; memory_t max_pfns; void *buffer; /* OUT variables. */ memory_t num_pfns; } dom0_getmemlist_t; libxc creates this struct and passes it to the kernel, and the kernel doesn''t know anything about the internals. Are you saying that privcmd_ioctl() should look like this? switch ( cmd ) { case IOCTL_PRIVCMD_HYPERCALL: { privcmd_hypercall_t hypercall; dom0_op_t *op = (dom0_op_t *)&hypercall; if ( copy_from_user(&hypercall, (void *)data, sizeof(hypercall)) ) return -EFAULT; /* NEW switch statement: */ switch (op->cmd) { case DOM0_GETMEMLIST: op->u.getmemlist.buffer = virt_to_phys(op->u.getmemlist.buffer); break; case DOM0_SETDOMAININFO: ... case DOM0_READCONSOLE: ... } } break; } Right now the kernel doesn''t peer inside the hypercall structures at all. -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hollis Blanchard
2005-Aug-17 22:11 UTC
Re: [Xen-devel] passing hypercall parameters by pointer
On Wednesday 17 August 2005 16:07, Arun Sharma wrote:> Ian Pratt wrote: > >>Many Xen hypercalls pass mlocked pointers as parameters for > >>both input and output. For example, xc_get_pfn_list() is a > >>nice one with multiple levels of structures/mlocking. > >> > >>Considering just the tools for the moment, those pointers are > >>userspace addresses. Ultimately the hypervisor ends up with > >>that userspace address, from which it reads and writes data. > >>This is OK for x86, since userspace, kernel, and hypervisor > >>all share the same virtual address space (and userspace has > >>carefully mlocked the relevent memory). > > This is a problem even on x86 for VMX domains which execute hypercalls > because of para virtualized device drivers. > > >>On PowerPC though, the hypervisor runs in real mode (no MMU > >>translation). > >>Unlike x86, PowerPC exceptions arrive in real mode, and also > >>PowerPC does not force a TLB flush when switching between > >>real and virtual modes. So a virtual address is pretty much > >>worthless as a hypervisor parameter; performing the MMU > >>translation in software is infeasible. > > > > I think I''d prefer to hide all of this by co-operation between the > > kernel and the hypervisor''s copy to/from user. > > This is basically what Xiaofeng attempted to do in this patch: > > http://article.gmane.org/gmane.comp.emulators.xen.devel/11107 > > although the virtual -> pseudo physical is also done in the hypervisor. > Please let us know if the patch is acceptable in light of your email.This patch does performs MMU translation in software. Even if you like that on x86, trying to do that on PowerPC is considerably more expensive. Just the page table lookup could be 16 loads and compares, and that''s not counting segmentation.> > The kernel can easily translate a virtual address and length into a list > > of psuedo-phyiscal frame numbers and initial offset. Xen''s copy from > > user function can then use this list when doing its work. > > The other alternative (which we talked about at OLS) is to use a couple > of pinned pages for parameter passing - but it doesn''t work very well for: > > a) Multiple levels of structures/pointers > b) Arguments which may be bigger than a couple of pages > (xc_get_pfn_list() for a bigmem domain for example).This is pretty much the proposal I sent earlier. The multiple levels of pointers can be handled as I showed, by creating an allocator that manages the couple pages. I have no answer for parameters that are very large, but I wonder how many cases there are. For example, DOM0_READCONSOLE could just be limited to 4KB reads, and if there''s more data than that, call it again. Perhaps there is some case-specific solution to xc_get_pfn_list() as well. -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ling, Xiaofeng
2005-Aug-18 00:47 UTC
RE: [Xen-devel] passing hypercall parameters by pointer
Arun Sharma <mailto:arun.sharma@intel.com> wrote:> Ian Pratt wrote: > The other alternative (which we talked about at OLS) is to use a > couple of pinned pages for parameter passing - but it doesn''t work > very well for: > > a) Multiple levels of structures/pointersA good example is do_multicall. A complete implementation need to enum all the hypercall and try to deal with each hypercall if it uses points.> b) Arguments which may be bigger than a couple of pages > (xc_get_pfn_list() for a bigmem domain for example). > > -Arun_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Ian Pratt >Sent: Thursday, August 18, 2005 4:44 AM >> On PowerPC though, the hypervisor runs in real mode (no MMU >> translation). >> Unlike x86, PowerPC exceptions arrive in real mode, and also >> PowerPC does not force a TLB flush when switching between >> real and virtual modes. So a virtual address is pretty much >> worthless as a hypervisor parameter; performing the MMU >> translation in software is infeasible. > >I think I''d prefer to hide all of this by co-operation between the >kernel and the hypervisor''s copy to/from user. > >The kernel can easily translate a virtual address and length into alist>of psuedo-phyiscal frame numbers and initial offset. Xen''s copy from >user function can then use this list when doing its work. > >Ian >So this is a common concern for hypervisor residing in a different address space as guest. For PowerPC, it''s real mode (hypervisor) VS virtual mode (guest). For vmx domain, hypervisor has its own monitor page table separated from shadow page table. Expect the final solution to be uniform too. ;-) See if I understand your suggestion closely here. Previous Xiaofeng''s patch has following flow when accessing guest address space: ---hypervisor--- - Search gva in guest page table to get pfn - Get mfn by pfn - map mfn into hypervisor''s space - Then directly access the new va'' Then your suggestion is to make gva->pfn search happening in guest. And hypervisor will still have rest steps to manipulate monitor page table first and then access new va''. (PowerPC will access mfn directly). Finally in either option, copy_from/to_user becomes a memcpy to a new va'' without exception happening. Now, question comes out. The pseudo-physical frame number list itself also presents as a parameter to hypervisor, and there''s no promise that this list will be confined in single page. You also need extra info in this list if multiple parameters are pointers. How to access this scalable list effectively seems to be same puzzle as the subject. For x86 people may set a maximum limitation, but how about 64bit platform? Good example is always get_pfn_list, which always breaks assumption for size of parameter. ;-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Hollis Blanchard >Sent: Thursday, August 18, 2005 6:11 AM > >I have no answer for parameters that are very large, but I wonder howmany>cases there are. For example, DOM0_READCONSOLE could just be limited >to 4KB >reads, and if there''s more data than that, call it again. Perhaps thereis>some case-specific solution to xc_get_pfn_list() as well. >If one hypercall wants to get specific context at one point atomically, "call it again" several times actually returns mixed contexts belonging to different time points. That''s not desired. Even if people want to add atomic protection for such type of case, performance will be affected a lot and more risk to suffer dead-lock. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Hollis Blanchard >Sent: Thursday, August 18, 2005 6:05 AM > case DOM0_GETMEMLIST: > op->u.getmemlist.buffer virt_to_phys(op->u.getmemlist.buffer); > break;If following Ian''s suggestion, you have to create a list of pfn here instead of only converting start address. There''s no guaranty that the buffer is limited in one page. ;-) Thanks, Kevin> case DOM0_SETDOMAININFO: > ... > case DOM0_READCONSOLE: > ... > } > } > break; > } > >Right now the kernel doesn''t peer inside the hypercall structures atall.> >-- >Hollis Blanchard >IBM Linux Technology Center > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hollis Blanchard
2005-Aug-18 15:58 UTC
Re: [Xen-devel] passing hypercall parameters by pointer
On Aug 18, 2005, at 1:56 AM, Tian, Kevin wrote:>> From: Hollis Blanchard >> Sent: Thursday, August 18, 2005 6:05 AM >> case DOM0_GETMEMLIST: >> op->u.getmemlist.buffer = >> virt_to_phys(op->u.getmemlist.buffer); >> break; > > If following Ian''s suggestion, you have to create a list of pfn here > instead of only converting start address. There''s no guaranty that the > buffer is limited in one page. ;-)Actually that was an explicitly stated limitation. But I think I like this scatterlist idea. So for every pointer (buffer in the above example), instead the pseudo-physical address to a scatterlist will be passed to the hypervisor, and then copy_to/from_user expects a scatterlist address instead of a pointer. I think the copy_to/from_user and get/put_user API would need to change though: you''d need the value, the scatterlist pointer, and an offset into the scatterlist. So x86 would need a slight API change, but could continue without dealing with any scatterlists, i.e. no ABI change. The PowerPC kernel would need knowledge of every hypercall structure to create and translate the scatterlist. I know that''s an idea Jimi isn''t fond of, but it really seems like the best solution here. -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jimi Xenidis
2005-Aug-19 02:00 UTC
Re: [Xen-devel] passing hypercall parameters by pointer
>>>>> "HB" == Hollis Blanchard <hollisb@us.ibm.com> writes:hmm let me bubble up my intro :) HB> I know that''s an idea Jimi isn''t fond of, but it really seems HB> like the best solution here. Why I dislike this solution. 1. Currently, the kernel has no intimate knowledge of the managment calls. This is goodness since this gives the freedom to "innovate" in the management area without impacting the kernel, we now would require kernel updates that grok management structures, creating more opportunity for versioning chaos and bloating of the kernel patch. 2. We are complicating the kernel and the hypervisor in order to keep a user app simple. Does anyone care that a user app suffer a little performace impact? Frankly, I''m much more worried about unecessarily impacting the hypervisor. I believe a negotiated managment area that the application serializes all arguements into to be a far better solution, the area can be of arbitrary size and it the added complexity to the application is trivial. Am I missing something? -JX -- "I got an idea, an idea so smart my head would explode if I even began to know what I was talking about." -- Peter Griffin (Family Guy) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 19 Aug 2005, at 03:00, Jimi Xenidis wrote:> I believe a negotiated managment area that the application serializes > all arguements into to be a far better solution, the area can be of > arbitrary size and it the added complexity to the application is > trivial. > > Am I missing something?This is the correct answer imo. get_pfn_list() needs to die anyway: there are better ways to get the list of mfns belonging to a guest (you can get the list back from increase_reservation, or you can map the guest''s pfn->mfn map). The current mlock() scheme in libxc is screwed anyway -- we mlock/munlock regions that may overlap at page granularity. Fixing this would lead naturally to a preallocation scheme. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> The current mlock() scheme in libxc is screwed anyway -- we > mlock/munlock regions that may overlap at page granularity. > Fixing this would lead naturally to a preallocation scheme.That''s a very good point. For the moment, we should remove all the munlock() calls for safety. The amount of unnecessary memory we''ll end up pinning will be tiny, so we shouldn''t worry about it. Post 3.0 we can completely redo the dom0 op interface, but the rest of the hypercall interface will have to remain backward compatible, at least for x86_*. Since passing by VA is so convenient on the architectures that support it we may not want to do anything different on these anyhow. For VT paravirt drivers I think pre-registration will work fine. The set of hypercalls we need to support is small anyhow. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jimi Xenidis
2005-Aug-19 11:52 UTC
RE: [Xen-devel] passing hypercall parameters by pointer
>>>>> "IP" == Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> writes:IP> Post 3.0 we can completely redo the dom0 op interface, but the rest of IP> the hypercall interface will have to remain backward compatible, at IP> least for x86_*. Just to clarify, "the rest" refers to hypercalls made from the kernel, correct? Any hypercall using VAs made from user space are at issue here. IP> Since passing by VA is so convenient on the architectures that IP> support it we may not want to do anything different on these IP> anyhow. I agree, why create a new mapping when a usable one exists. At least for common kernel code, we will need to wrap such VAs in a macro so that the "psuedo-physical" is passed in for PPC. I assume this is reasonable? -JX -- "I got an idea, an idea so smart my head would explode if I even began to know what I was talking about." -- Peter Griffin (Family Guy) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 19 Aug 2005, at 12:52, Jimi Xenidis wrote:> IP> Since passing by VA is so convenient on the architectures that > IP> support it we may not want to do anything different on these > IP> anyhow. > > I agree, why create a new mapping when a usable one exists. > > At least for common kernel code, we will need to wrap such VAs in a > macro so that the "psuedo-physical" is passed in for PPC. I assume > this is reasonable?This is all potentially fixable before 3.0 final. Paravirt x86 can continue to use guest virtual addresses. The idea would be that the registration scheme would essentially create a parameter-passing ''address space'' into which you hook pages of memory. On x86 we would map the address space onto regions of kernel va space. On other arches we would map the address space onto physical addresses that get mapped into Xen''s va space. get_user/put_user/copy_from_user/copy_to_user will take guest addresses that point into this parameter-passing address space. At least we can scope it out by doing a few hypercalls to start with -- probably dom0_ops first and see how it pans out. I think it will work quite well... -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 19 Aug 2005, at 12:34, Ian Pratt wrote:> That''s a very good point. For the moment, we should remove all the > munlock() calls for safety. The amount of unnecessary memory we''ll end > up pinning will be tiny, so we shouldn''t worry about it.The munlock()s indicate where we should deallocate bounce buffers back to the pre-reservation pool. We should at least mark those places so we don''t have to search for them again later. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> This is all potentially fixable before 3.0 final. Paravirt > x86 can continue to use guest virtual addresses. The idea > would be that the registration scheme would essentially > create a parameter-passing ''address space'' into which you > hook pages of memory. On x86 we would map the address space > onto regions of kernel va space. On other arches we would map > the address space onto physical addresses that get mapped > into Xen''s va space. > get_user/put_user/copy_from_user/copy_to_user will take guest > addresses that point into this parameter-passing address space. > > At least we can scope it out by doing a few hypercalls to > start with -- probably dom0_ops first and see how it pans > out. I think it will work quite well...I''d be inclined to first go after the ops that are needed for the paravirtualized drivers (mem_op, grantab_op). Perhaps people could post a few patch examples for dicsussion? NB: This in no way represents a commitment to get this into 3.0-final. Let''s have a look at the patches and decide. [Right now, anything that isn''t fixing bugs or sorting out xenbus/tools is actually a distraction] Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hollis Blanchard
2005-Aug-19 13:57 UTC
Re: [Xen-devel] passing hypercall parameters by pointer
On Aug 19, 2005, at 7:17 AM, Keir Fraser wrote:> > On 19 Aug 2005, at 12:52, Jimi Xenidis wrote: > >> IP> Since passing by VA is so convenient on the architectures that >> IP> support it we may not want to do anything different on these >> IP> anyhow. >> >> I agree, why create a new mapping when a usable one exists. >> >> At least for common kernel code, we will need to wrap such VAs in a >> macro so that the "psuedo-physical" is passed in for PPC. I assume >> this is reasonable? > > This is all potentially fixable before 3.0 final. Paravirt x86 can > continue to use guest virtual addresses. The idea would be that the > registration scheme would essentially create a parameter-passing > ''address space'' into which you hook pages of memory. On x86 we would > map the address space onto regions of kernel va space. On other arches > we would map the address space onto physical addresses that get mapped > into Xen''s va space. get_user/put_user/copy_from_user/copy_to_user > will take guest addresses that point into this parameter-passing > address space.Could you flesh this out a little more? I *think* what you''re saying is this (on PowerPC): - at boot, the kernel notifies Xen of a parameter page - replace libxc calls to mlock() with register_this_address() (which could be a privcmd ioctl) - register_this_address() stuffs the userspace pointer and corresponding pseudo-physical pointer into a table in the parameter page - libxc ignorantly creates its structures with userspace addresses - once the hypercall arrives in Xen, copy_from_user() is passed the userspace address - copy_from_user() consults the table in the parameter page to translate userspace -> pseudo-physical, then translates pseudo-physical -> machine Is that right? -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 19 Aug 2005, at 14:57, Hollis Blanchard wrote:> Could you flesh this out a little more? I *think* what you''re saying > is this (on PowerPC): > - at boot, the kernel notifies Xen of a parameter pageIt can be multiple pages, and the mappings can change over time. Think of something like set_parameter_page(parameter_address_space_frame, physical_address_space_fram) establishing a mapping from parameter address space to phys address space.> - replace libxc calls to mlock() with register_this_address() (which > could be a privcmd ioctl)Yep. I think libxc would request via a privcmd ioctl. The kernel can extend the parameter-passing region, or allocate a subsection of the existing region, and mmap it into user space. It would also return to libxc the range of parameter-passing addresses that have been allocated to it.> - libxc ignorantly creates its structures with userspace addresseslibxc would create structs with parameter-passing addresses. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hollis Blanchard
2005-Aug-19 15:18 UTC
Re: [Xen-devel] passing hypercall parameters by pointer
On Aug 19, 2005, at 9:35 AM, Keir Fraser wrote:> > On 19 Aug 2005, at 14:57, Hollis Blanchard wrote: > >> - replace libxc calls to mlock() with register_this_address() (which >> could be a privcmd ioctl) > > Yep. I think libxc would request via a privcmd ioctl. The kernel can > extend the parameter-passing region, or allocate a subsection of the > existing region, and mmap it into user space. It would also return to > libxc the range of parameter-passing addresses that have been > allocated to it. > >> - libxc ignorantly creates its structures with userspace addresses > > libxc would create structs with parameter-passing addresses.Does "parameter-passing addresses" mean offsets inside the parameter passing space? I think pseudocode is going to be more effective than English here. Let''s take DOM0_PERFCCONTROL as an example: main() { xc_perfc_desc_t *desc = malloc(); mlock(desc); // <------------- [1] xc_perfc_control(desc); } xc_perfc_control(xc_perfc_desc_t *desc) { dom0_op_t dop; dop.cmd = DOM0_PERFCCONTROL; dop.u.perfccontrol.desc = desc; // <------------ [2] do_dom0_op(&dop); } Even if you replace malloc/mlock at [1] with a call that maps "parameter passing" space into this process, what address will you put in the struct at [2]? That would have to be an offset within the parameter passing space, right? -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 19 Aug 2005, at 16:18, Hollis Blanchard wrote:> Even if you replace malloc/mlock at [1] with a call that maps > "parameter passing" space into this process, what address will you put > in the struct at [2]? That would have to be an offset within the > parameter passing space, right?Yes. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel