On Tue, Dec 18, 2012 at 01:51:49PM +0000, Mats Petersson wrote:> On 18/12/12 11:40, Ian Campbell wrote: > >On Tue, 2012-12-18 at 11:28 +0000, Mats Petersson wrote: > >>On 18/12/12 11:17, Ian Campbell wrote: > >>>On Mon, 2012-12-17 at 17:38 +0000, Mats Petersson wrote: > >>>>On 17/12/12 16:57, Ian Campbell wrote: > >>>>>On Fri, 2012-12-14 at 17:00 +0000, Mats Petersson wrote: > >>>>>>Ian, Konrad: > >>>>>>I took Konrad''s latest version [I think] and applied my patch (which > >>>>>>needed some adjustments as there are other "post 3.7" changes to same > >>>>>>source code - including losing the xen_flush_tlb_all() ??) > >>>>>> > >>>>>>Attached are the patches: > >>>>>>arm-enlighten.patch, which updates the ARM code. > >>>>>>improve-pricmd.patch, which updates the privcmd code. > >>>>>> > >>>>>>Ian, can you have a look at the ARM code - which I quickly hacked > >>>>>>together, I haven''t compiled it, and I certainly haven''t tested it, > >>>>>There are a lot of build errors as you might expect (patch below, a few > >>>>>warnings remain). You can find a cross compiler at > >>>>>http://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.6.3/ > >>>>> > >>>>>or you can use > >>>>>drall:/home/ianc/devel/cross/x86_64-gcc-4.6.0-nolibc_arm-unknown-linux-gnueabi.tar.bz2 > >>>>> > >>>>>which is an older version from the same place. > >>>>> > >>>>>Anyway, the patch... > >>>>>>and > >>>>>>it needs further changes to make my changes actually make it more > >>>>>>efficient. > >>>>>Right, the benefit on PVH or ARM would be in batching the > >>>>>XENMEM_add_to_physmap_range calls. The batching of the > >>>>>apply_to_page_range which this patch adds isn''t useful because there is > >>>>>no HYPERVISOR_mmu_update call to batch in this case. So basically this > >>>>>patch as it stands does a lot of needless work for no gain I''m afraid. > >>>>So, basically, what is an improvement on x86 isn''t anything useful on > >>>>ARM, and you''d prefer to loop around in privcmd.c calling into > >>>>xen_remap_domain_mfn_range() a lot of times? > >>>Not at all. ARM (and PVH) still benefits from the interface change but > >>>the implementation of the benefit is totally different. > >>> > >>>For normal x86 PV you want to batch the HYPERVISOR_mmu_update. > >>> > >>>For both x86 PVH and ARM this hypercall doesn''t exist but instead there > >>>is a call to HYPERVISOR_memory_op XENMEM_add_to_physmap_range which is > >>>something which would benefit from batching. > >>So, you want me to fix that up? > >If you want to sure, yes please. > > > >But feel free to just make the existing code work with the interface, > >without adding any batching. That should be a much smaller change than > >what you proposed. > > > >(aside; I do wonder how much of this x86/arm code could be made generic) > I think, once it goes to PVH everywhere, quite a bit (as I believe > the hypercalls should be the same by then, right?) > > In the PVOPS kernel, it''s probably a bit more job. I''m sure it can > be done, but with a bit more work. > > I think I''ll do the minimal patch first, then, if I find some spare > time, work on the "batching" variant.OK. The batching is IMHO just using the multicall variant.> > > >>To make xentrace not work until it is fixed wouldn''t be a terrible > >>thing, would it? > >On ARM I think it is fine (I doubt this is the only thing stopping > >xentrace from working). I suspect people would be less impressed with > >breaking xentrace on x86. For PVH it probably is a requirement for it to > >keep working, I''m not sure though. > Ok, ENOSYS it is for remap_range() then. > > > >> Then we can remove that old gunk from x86 as well > >>(eventually). > >>Thanks. I was starting to wonder if I''d been teleported back to the time > >>when I struggled with pointers... > >>Maybe it needs a better comment. > >The other thing I had missed was that this was a pure increment and not > >taking the value at the same time, which also confused me. > > > >Splitting the increment out from the dereference usually makes these > >things clearer, I was obviously just being a bit hard of thinking > >yesterday! > No worries. I will see about making a more readable comment (and for > ARM, I can remove the whole if/else and just do the one increment > (based on above discussion), so should make the code better.You can use the v3.8 tree as your base - it has the required PVH and ARM patches. There is one bug (where dom0 crashes) - and I just sent a git pull for that in your Linus''s tree: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-linus-3.8
On 18/12/12 16:07, Konrad Rzeszutek Wilk wrote:> On Tue, Dec 18, 2012 at 01:51:49PM +0000, Mats Petersson wrote: >> On 18/12/12 11:40, Ian Campbell wrote: >>> On Tue, 2012-12-18 at 11:28 +0000, Mats Petersson wrote: >>>> On 18/12/12 11:17, Ian Campbell wrote: >>>>> On Mon, 2012-12-17 at 17:38 +0000, Mats Petersson wrote: >>>>>> On 17/12/12 16:57, Ian Campbell wrote: >>>>>>> On Fri, 2012-12-14 at 17:00 +0000, Mats Petersson wrote: >>>>>>>> Ian, Konrad: >>>>>>>> I took Konrad''s latest version [I think] and applied my patch (which >>>>>>>> needed some adjustments as there are other "post 3.7" changes to same >>>>>>>> source code - including losing the xen_flush_tlb_all() ??) >>>>>>>> >>>>>>>> Attached are the patches: >>>>>>>> arm-enlighten.patch, which updates the ARM code. >>>>>>>> improve-pricmd.patch, which updates the privcmd code. >>>>>>>> >>>>>>>> Ian, can you have a look at the ARM code - which I quickly hacked >>>>>>>> together, I haven''t compiled it, and I certainly haven''t tested it, >>>>>>> There are a lot of build errors as you might expect (patch below, a few >>>>>>> warnings remain). You can find a cross compiler at >>>>>>> http://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.6.3/ >>>>>>> >>>>>>> or you can use >>>>>>> drall:/home/ianc/devel/cross/x86_64-gcc-4.6.0-nolibc_arm-unknown-linux-gnueabi.tar.bz2 >>>>>>> >>>>>>> which is an older version from the same place. >>>>>>> >>>>>>> Anyway, the patch... >>>>>>>> and >>>>>>>> it needs further changes to make my changes actually make it more >>>>>>>> efficient. >>>>>>> Right, the benefit on PVH or ARM would be in batching the >>>>>>> XENMEM_add_to_physmap_range calls. The batching of the >>>>>>> apply_to_page_range which this patch adds isn''t useful because there is >>>>>>> no HYPERVISOR_mmu_update call to batch in this case. So basically this >>>>>>> patch as it stands does a lot of needless work for no gain I''m afraid. >>>>>> So, basically, what is an improvement on x86 isn''t anything useful on >>>>>> ARM, and you''d prefer to loop around in privcmd.c calling into >>>>>> xen_remap_domain_mfn_range() a lot of times? >>>>> Not at all. ARM (and PVH) still benefits from the interface change but >>>>> the implementation of the benefit is totally different. >>>>> >>>>> For normal x86 PV you want to batch the HYPERVISOR_mmu_update. >>>>> >>>>> For both x86 PVH and ARM this hypercall doesn''t exist but instead there >>>>> is a call to HYPERVISOR_memory_op XENMEM_add_to_physmap_range which is >>>>> something which would benefit from batching. >>>> So, you want me to fix that up? >>> If you want to sure, yes please. >>> >>> But feel free to just make the existing code work with the interface, >>> without adding any batching. That should be a much smaller change than >>> what you proposed. >>> >>> (aside; I do wonder how much of this x86/arm code could be made generic) >> I think, once it goes to PVH everywhere, quite a bit (as I believe >> the hypercalls should be the same by then, right?) >> >> In the PVOPS kernel, it''s probably a bit more job. I''m sure it can >> be done, but with a bit more work. >> >> I think I''ll do the minimal patch first, then, if I find some spare >> time, work on the "batching" variant. > OK. The batching is IMHO just using the multicall variant. > >>>> To make xentrace not work until it is fixed wouldn''t be a terrible >>>> thing, would it? >>> On ARM I think it is fine (I doubt this is the only thing stopping >>> xentrace from working). I suspect people would be less impressed with >>> breaking xentrace on x86. For PVH it probably is a requirement for it to >>> keep working, I''m not sure though. >> Ok, ENOSYS it is for remap_range() then. >>>> Then we can remove that old gunk from x86 as well >>>> (eventually). >>>> Thanks. I was starting to wonder if I''d been teleported back to the time >>>> when I struggled with pointers... >>>> Maybe it needs a better comment. >>> The other thing I had missed was that this was a pure increment and not >>> taking the value at the same time, which also confused me. >>> >>> Splitting the increment out from the dereference usually makes these >>> things clearer, I was obviously just being a bit hard of thinking >>> yesterday! >> No worries. I will see about making a more readable comment (and for >> ARM, I can remove the whole if/else and just do the one increment >> (based on above discussion), so should make the code better. > You can use the v3.8 tree as your base - it has the required PVH and ARM > patches. There is one bug (where dom0 crashes) - and I just sent > a git pull for that in your Linus''s tree: > > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-linus-3.8Thanks. I have attached a patch for x86 (+generic changes) and one for arm. Both compile and the x86 has been tested with localhost migration. Ian: Can you review the code - I''m fairly sure it does the right thing, without too much "extra" code, and achieves "what we want". It''s not optimized, but if the hypercalls are batched by multicall, it shouldn''t be horrible. In fact I expect it will be marginally better than before, because it saves one level of calls with 7 or so arguments per mapped page since we''ve pushed the loop down a level to the do_remap_mfn() - I''ve kept the structure similar to x86, should we decide to merge the code into something common in the future. No doubt it will diverge, but starting with something similar gives it a little chance... ;) Note: the purpose of this post is mainly to confirm that the ARM solution is "on track". -- Mats _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Tue, 2012-12-18 at 19:34 +0000, Mats Petersson wrote:> >> I think I''ll do the minimal patch first, then, if I find some spare > >> time, work on the "batching" variant. > > OK. The batching is IMHO just using the multicall variant.The XENMEM_add_to_physmap_range is itself batched (it contains ranges of mfns etc), so we don''t just want to batch the actual hypercalls we really want to make sure each hypercall processes a batch, this will lets us optimise the flushes in the hypervisor. I don''t know if they mutlicall infrastructure allows for that but it would be pretty trivial to do it explicitly I expect these patches will need to be folded into one to avoid a bisecability hazard? xen-privcmd-remap-array-arm.patch:> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c > index 7a32976..dc50a53 100644 > --- a/arch/arm/xen/enlighten.c > +++ b/arch/arm/xen/enlighten.c > @@ -73,7 +73,7 @@ static int map_foreign_page(unsigned long lpfn, unsigned long fgmfn, > } > > struct remap_data { > - xen_pfn_t fgmfn; /* foreign domain''s gmfn */ > + xen_pfn_t *fgmfn; /* foreign domain''s gmfn */ > pgprot_t prot; > domid_t domid; > struct vm_area_struct *vma; > @@ -90,38 +90,120 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr, > unsigned long pfn = page_to_pfn(page); > pte_t pte = pfn_pte(pfn, info->prot); > > - if (map_foreign_page(pfn, info->fgmfn, info->domid)) > + // TODO: We should really batch these updates > + if (map_foreign_page(pfn, *info->fgmfn, info->domid)) > return -EFAULT; > set_pte_at(info->vma->vm_mm, addr, ptep, pte); > + info->fgmfn++;Looks good.> return 0; > } > > -int xen_remap_domain_mfn_range(struct vm_area_struct *vma, > - unsigned long addr, > - xen_pfn_t mfn, int nr, > - pgprot_t prot, unsigned domid, > - struct page **pages) > +/* do_remap_mfn() - helper function to map foreign pages > + * @vma: the vma for the pages to be mapped into > + * @addr: the address at which to map the pages > + * @mfn: pointer to array of MFNs to map > + * @nr: the number entries in the MFN array > + * @prot: page protection mask > + * @domid: id of the domain that we are mapping from > + * @pages: page information (for PVH, not used currently) > + * > + * This function takes an array of mfns and maps nr pages from that into > + * this kernel''s memory. The owner of the pages is defined by domid. Where the > + * pages are mapped is determined by addr, and vma is used for "accounting" of > + * the pages. > + * > + * Return value is zero for success, negative for failure. > + */ > +static int do_remap_mfn(struct vm_area_struct *vma, > + unsigned long addr, > + xen_pfn_t *mfn, int nr, > + pgprot_t prot, unsigned domid, > + struct page **pages)Since xen_remap_domain_mfn_range isn''t implemented on ARM the only caller is xen_remap_domain_mfn_array so you might as well just call this function ..._array.> > { > int err; > struct remap_data data; > > - /* TBD: Batching, current sole caller only does page at a time */ > - if (nr > 1) > - return -EINVAL; > + BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));Where does this restriction come from? I think it is true of X86 PV MMU (which has certain requirements about how and when PTEs are changed) but I don''t think ARM or PVH need it because they use two level paging so the PTEs are just normal native ones with no extra restrictions. Maybe it is useful to enforce similarity between PV and PVH/ARM though?> data.fgmfn = mfn; > - data.prot = prot; > + data.prot = prot; > data.domid = domid; > - data.vma = vma; > - data.index = 0; > + data.vma = vma; > > > data.pages = pages; > - err = apply_to_page_range(vma->vm_mm, addr, nr << PAGE_SHIFT, > - remap_pte_fn, &data); > - return err; > + data.index = 0; > + > + while (nr) { > + unsigned long range = 1 << PAGE_SHIFT; > + > + err = apply_to_page_range(vma->vm_mm, addr, range, > + remap_pte_fn, &data); > + /* Warning: We do probably need to care about what error we > + get here. However, currently, the remap_area_mfn_pte_fn is^ this isn''t the name of the fn> > + only likely to return EFAULT or some other "things are very > + bad" error code, which the rest of the calling code won''t > + be able to fix up. So we just exit with the error we got.It expect it is more important to accumulate the individual errors from remap_pte_fn into err_ptr.> + */ > + if (err) > + return err; > + > + nr--; > + addr += range; > + } >This while loop (and therefore this change) is unnecessary. The single call to apply_to_page_range is sufficient and as your TODO notes any batching should happen in remap_pte_fn (which can handle both accumulation and flushing when the batch is large enough).> + return 0; > +} > + > +/* xen_remap_domain_mfn_array() - Used to map an array of foreign pages > + * @vma: the vma for the pages to be mapped into > + * @addr: the address at which to map the pagesphysical address, right?> + * @mfn: pointer to array of MFNs to map > + * @nr: the number entries in the MFN array > + * @err_ptr: pointer to array of integers, one per MFN, for an error > + * value for each page. The err_ptr must not be NULL.Nothing seems to be filling this in?> + * @prot: page protection mask > + * @domid: id of the domain that we are mapping from > + * @pages: page information (for PVH, not used currently)No such thing as PVH on ARM. Also pages is used by this code.> > + * > + * This function takes an array of mfns and maps nr pages from that into this > + * kernel''s memory. The owner of the pages is defined by domid. Where the pages > + * are mapped is determined by addr, and vma is used for "accounting" of the > + * pages. The err_ptr array is filled in on any page that is not sucessfully > > successfully > > + * mapped in. > + * > + * Return value is zero for success, negative ERRNO value for failure. > + * Note that the error value -ENOENT is considered a "retry", so when this > + * error code is seen, another call should be made with the list of pages that > + * are marked as -ENOENT in the err_ptr array. > + */ > +int xen_remap_domain_mfn_array(struct vm_area_struct *vma, > + unsigned long addr, > + xen_pfn_t *mfn, int nr, > + int *err_ptr, pgprot_t prot, > + unsigned domid, > + struct page **pages) > +{ > + /* We BUG_ON because it''s a programmer error to pass a NULL err_ptr, > + * and the consequences later is quite hard to detect what the actual > + * cause of "wrong memory was mapped in". > + * Note: This variant doesn''t actually use err_ptr at the moment.True ;-)> + */ > + BUG_ON(err_ptr == NULL); > + return do_remap_mfn(vma, addr, mfn, nr, prot, domid, pages); > +} > +EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array); > + > +/* Not used in ARM. Use xen_remap_domain_mfn_array(). */ > +int xen_remap_domain_mfn_range(struct vm_area_struct *vma, > + unsigned long addr, > + xen_pfn_t mfn, int nr, > + pgprot_t prot, unsigned domid, > + struct page **pages) > +{ > + return -ENOSYS; > } > EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range); > > + > int xen_unmap_domain_mfn_range(struct vm_area_struct *vma, > int nr, struct page **pages) > { >
New patch attached. I haven''t done the relevant spelling fixes etc, as I had a little mishap with git, and need to fix up a few things. Thought you may want to have a look over the ARM side meanwhile, and if this is OK, I will post an "official" V5 patch to the list. Further comments below. -- Mats On 19/12/12 10:59, Ian Campbell wrote:> On Tue, 2012-12-18 at 19:34 +0000, Mats Petersson wrote: > >>>> I think I''ll do the minimal patch first, then, if I find some spare >>>> time, work on the "batching" variant. >>> OK. The batching is IMHO just using the multicall variant. > The XENMEM_add_to_physmap_range is itself batched (it contains ranges of > mfns etc), so we don''t just want to batch the actual hypercalls we > really want to make sure each hypercall processes a batch, this will > lets us optimise the flushes in the hypervisor. > > I don''t know if they mutlicall infrastructure allows for that but it > would be pretty trivial to do it explicitlyYes, I''m sure it is. I would prefer to do that AFTER my x86 patch has gone in, if that''s possible. (Or that someone who can actually understands the ARM architecture and can test it on actual ARM does it...)> > I expect these patches will need to be folded into one to avoid a > bisecability hazard?Yes, that is certainly my plan. I just made two patches for ease of "what is ARM and what isn''t" - but final submit should be as one patch.> > xen-privcmd-remap-array-arm.patch: >> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c >> index 7a32976..dc50a53 100644 >> --- a/arch/arm/xen/enlighten.c >> +++ b/arch/arm/xen/enlighten.c >> @@ -73,7 +73,7 @@ static int map_foreign_page(unsigned long lpfn, unsigned long fgmfn, >> } >> >> struct remap_data { >> - xen_pfn_t fgmfn; /* foreign domain''s gmfn */ >> + xen_pfn_t *fgmfn; /* foreign domain''s gmfn */ >> pgprot_t prot; >> domid_t domid; >> struct vm_area_struct *vma; >> @@ -90,38 +90,120 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr, >> unsigned long pfn = page_to_pfn(page); >> pte_t pte = pfn_pte(pfn, info->prot); >> >> - if (map_foreign_page(pfn, info->fgmfn, info->domid)) >> + // TODO: We should really batch these updates >> + if (map_foreign_page(pfn, *info->fgmfn, info->domid)) >> return -EFAULT; >> set_pte_at(info->vma->vm_mm, addr, ptep, pte); >> + info->fgmfn++; > Looks good. > >> return 0; >> } >> >> -int xen_remap_domain_mfn_range(struct vm_area_struct *vma, >> - unsigned long addr, >> - xen_pfn_t mfn, int nr, >> - pgprot_t prot, unsigned domid, >> - struct page **pages) >> +/* do_remap_mfn() - helper function to map foreign pages >> + * @vma: the vma for the pages to be mapped into >> + * @addr: the address at which to map the pages >> + * @mfn: pointer to array of MFNs to map >> + * @nr: the number entries in the MFN array >> + * @prot: page protection mask >> + * @domid: id of the domain that we are mapping from >> + * @pages: page information (for PVH, not used currently) >> + * >> + * This function takes an array of mfns and maps nr pages from that into >> + * this kernel''s memory. The owner of the pages is defined by domid. Where the >> + * pages are mapped is determined by addr, and vma is used for "accounting" of >> + * the pages. >> + * >> + * Return value is zero for success, negative for failure. >> + */ >> +static int do_remap_mfn(struct vm_area_struct *vma, >> + unsigned long addr, >> + xen_pfn_t *mfn, int nr, >> + pgprot_t prot, unsigned domid, >> + struct page **pages) > Since xen_remap_domain_mfn_range isn''t implemented on ARM the only > caller is xen_remap_domain_mfn_array so you might as well just call this > function ..._array.Yes, could do. As I stated in the commentary text, I tried to keep the code similar in structure to x86. [Actually, one iteration of my code had two API functions, one of which called the other, but it was considered a better solution to make one common function, and have the two x86 functions call that one].>> { >> int err; >> struct remap_data data; >> >> - /* TBD: Batching, current sole caller only does page at a time */ >> - if (nr > 1) >> - return -EINVAL; >> + BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO))); > Where does this restriction come from? > > I think it is true of X86 PV MMU (which has certain requirements about > how and when PTEs are changed) but I don''t think ARM or PVH need it > because they use two level paging so the PTEs are just normal native > ones with no extra restrictions. > > Maybe it is useful to enforce similarity between PV and PVH/ARM though?I don''t know if it''s useful or not. I''m sure removing it will make little difference, but since the VM flags are set by the calling code, the privcmd.c or higher level code would have to be correct for whatever architecture they are on. Not sure if it is "helpful" to allow certain combinations in one arch, when it''s not in another. My choice would be to keep the restriction until there is a good reason to remove it, but if you feel it is beneficial to remove it, feel free to say so. [Perhaps the ARM code should have a comment to the effect of "not needed on PVH or ARM, kept for compatibility with PVOPS on x86" - so when PVOPS is "retired" in some years time, it can be removed]> > >> data.fgmfn = mfn; >> - data.prot = prot; >> + data.prot = prot; >> data.domid = domid; >> - data.vma = vma; >> - data.index = 0; >> + data.vma = vma; >> >> >> data.pages = pages; >> - err = apply_to_page_range(vma->vm_mm, addr, nr << PAGE_SHIFT, >> - remap_pte_fn, &data); >> - return err; >> + data.index = 0; >> + >> + while (nr) { >> + unsigned long range = 1 << PAGE_SHIFT; >> + >> + err = apply_to_page_range(vma->vm_mm, addr, range, >> + remap_pte_fn, &data); >> + /* Warning: We do probably need to care about what error we >> + get here. However, currently, the remap_area_mfn_pte_fn is > ^ this isn''t the name of the fnFixed.>> + only likely to return EFAULT or some other "things are very >> + bad" error code, which the rest of the calling code won''t >> + be able to fix up. So we just exit with the error we got. > It expect it is more important to accumulate the individual errors from > remap_pte_fn into err_ptr.Yes, but since that exits on error with EFAULT, the calling code won''t "accept" the errors, and thus the whole house of cards fall apart at this point. There should probably be a task to fix this up properly, hence the comment. But right now, any error besides ENOENT is "unacceptable" by the callers of this code. If you want me to add this to the comment, I''m happy to. But as long as remap_pte_fn returns EFAULT on error, nothing will work after an error.> >> + */ >> + if (err) >> + return err; >> + >> + nr--; >> + addr += range; >> + } >> > This while loop (and therefore this change) is unnecessary. The single > call to apply_to_page_range is sufficient and as your TODO notes any > batching should happen in remap_pte_fn (which can handle both > accumulation and flushing when the batch is large enough).Ah, I see how you mean. I have updated the code accordingly.> >> + return 0; >> +} >> + >> +/* xen_remap_domain_mfn_array() - Used to map an array of foreign pages >> + * @vma: the vma for the pages to be mapped into >> + * @addr: the address at which to map the pages > physical address, right?Virtual, at least if we assume that in " st->va & PAGE_MASK," ''va'' actually means virtual address - it would be rather devious to keep use the name va as a physical address - although I have seen such things in the past. I shall amend the comments to say such "virtual address" to be more clear. [Not done in the attached patch, just realized this when re-reading my comments that I probably should...]> >> + * @mfn: pointer to array of MFNs to map >> + * @nr: the number entries in the MFN array >> + * @err_ptr: pointer to array of integers, one per MFN, for an error >> + * value for each page. The err_ptr must not be NULL. > Nothing seems to be filling this in?As discussed above (and below).> >> + * @prot: page protection mask >> + * @domid: id of the domain that we are mapping from >> + * @pages: page information (for PVH, not used currently) > No such thing as PVH on ARM. Also pages is used by this code.Removed part in ().> >> + * >> + * This function takes an array of mfns and maps nr pages from that into this >> + * kernel''s memory. The owner of the pages is defined by domid. Where the pages >> + * are mapped is determined by addr, and vma is used for "accounting" of the >> + * pages. The err_ptr array is filled in on any page that is not sucessfully >> >> successfullyThanks...>> >> + * mapped in. >> + * >> + * Return value is zero for success, negative ERRNO value for failure. >> + * Note that the error value -ENOENT is considered a "retry", so when this >> + * error code is seen, another call should be made with the list of pages that >> + * are marked as -ENOENT in the err_ptr array. >> + */ >> +int xen_remap_domain_mfn_array(struct vm_area_struct *vma, >> + unsigned long addr, >> + xen_pfn_t *mfn, int nr, >> + int *err_ptr, pgprot_t prot, >> + unsigned domid, >> + struct page **pages) >> +{ >> + /* We BUG_ON because it''s a programmer error to pass a NULL err_ptr, >> + * and the consequences later is quite hard to detect what the actual >> + * cause of "wrong memory was mapped in". >> + * Note: This variant doesn''t actually use err_ptr at the moment. > True ;-)Do you prefer the "not used" comment moved up a bit? -- Mats> >> + */ >> + BUG_ON(err_ptr == NULL); >> + return do_remap_mfn(vma, addr, mfn, nr, prot, domid, pages); >> +} >> +EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array); >> + >> +/* Not used in ARM. Use xen_remap_domain_mfn_array(). */ >> +int xen_remap_domain_mfn_range(struct vm_area_struct *vma, >> + unsigned long addr, >> + xen_pfn_t mfn, int nr, >> + pgprot_t prot, unsigned domid, >> + struct page **pages) >> +{ >> + return -ENOSYS; >> } >> EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range); >> >> + >> int xen_unmap_domain_mfn_range(struct vm_area_struct *vma, >> int nr, struct page **pages) >> { >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Wed, 2012-12-19 at 12:10 +0000, Mats Petersson wrote:> >> + only likely to return EFAULT or some other "things are very > >> + bad" error code, which the rest of the calling code won''t > >> + be able to fix up. So we just exit with the error we got. > > It expect it is more important to accumulate the individual errors from > > remap_pte_fn into err_ptr. > Yes, but since that exits on error with EFAULT, the calling code won''t > "accept" the errors, and thus the whole house of cards fall apart at > this point. > > There should probably be a task to fix this up properly, hence the > comment. But right now, any error besides ENOENT is "unacceptable" by > the callers of this code. If you want me to add this to the comment, I''m > happy to. But as long as remap_pte_fn returns EFAULT on error, nothing > will work after an error.Are you sure? privcmd.c has some special casing for ENOENT but it looks like it should just pass through other errors back to userspace. In any case surely this needs fixing? On the X86 side err_ptr is the result of the mmupdate hypercall which can already be other than ENOENT, including EFAULT, ESRCH etc. Ian.
On 19/12/12 12:22, Ian Campbell wrote:> On Wed, 2012-12-19 at 12:10 +0000, Mats Petersson wrote: > >>>> + only likely to return EFAULT or some other "things are very >>>> + bad" error code, which the rest of the calling code won''t >>>> + be able to fix up. So we just exit with the error we got. >>> It expect it is more important to accumulate the individual errors from >>> remap_pte_fn into err_ptr. >> Yes, but since that exits on error with EFAULT, the calling code won''t >> "accept" the errors, and thus the whole house of cards fall apart at >> this point. >> >> There should probably be a task to fix this up properly, hence the >> comment. But right now, any error besides ENOENT is "unacceptable" by >> the callers of this code. If you want me to add this to the comment, I''m >> happy to. But as long as remap_pte_fn returns EFAULT on error, nothing >> will work after an error. > Are you sure? privcmd.c has some special casing for ENOENT but it looks > like it should just pass through other errors back to userspace. > > In any case surely this needs fixing? > > On the X86 side err_ptr is the result of the mmupdate hypercall which > can already be other than ENOENT, including EFAULT, ESRCH etc.Yes, but the ONLY error that is "acceptable" (as in, doesn''t lead to the calling code revoking the mapping and returning an error) is ENOENT. Or at least, that''s how I believe it should SHOULD be - since only ENOENT is a "success" error code, anything else pretty much means that the operation requested didn''t work properly. If you are aware of any use-case where EFAULT, ESRCH or other error codes would still result in a valid, usable memory mapping - I have a fair understanding of the xc_* code, and although I may not know every piece of that code, I''m fairly certainly that is the expected behaviour. -- Mats> > Ian. >
On Wed, 2012-12-19 at 15:08 +0000, Mats Petersson wrote:> On 19/12/12 12:22, Ian Campbell wrote: > > On Wed, 2012-12-19 at 12:10 +0000, Mats Petersson wrote: > > > >>>> + only likely to return EFAULT or some other "things are very > >>>> + bad" error code, which the rest of the calling code won''t > >>>> + be able to fix up. So we just exit with the error we got. > >>> It expect it is more important to accumulate the individual errors from > >>> remap_pte_fn into err_ptr. > >> Yes, but since that exits on error with EFAULT, the calling code won''t > >> "accept" the errors, and thus the whole house of cards fall apart at > >> this point. > >> > >> There should probably be a task to fix this up properly, hence the > >> comment. But right now, any error besides ENOENT is "unacceptable" by > >> the callers of this code. If you want me to add this to the comment, I''m > >> happy to. But as long as remap_pte_fn returns EFAULT on error, nothing > >> will work after an error. > > Are you sure? privcmd.c has some special casing for ENOENT but it looks > > like it should just pass through other errors back to userspace. > > > > In any case surely this needs fixing? > > > > On the X86 side err_ptr is the result of the mmupdate hypercall which > > can already be other than ENOENT, including EFAULT, ESRCH etc. > Yes, but the ONLY error that is "acceptable" (as in, doesn''t lead to the > calling code revoking the mapping and returning an error) is ENOENT.Hr, Probably the right thing is for map_foreign_page to propagate the result of XENMEM_add_to_physmap_range and for remap_pte_fn to store it in err_ptr. That -EFAULT thing just looks wrong to me.> > Or at least, that''s how I believe it should SHOULD be - since only > ENOENT is a "success" error code, anything else pretty much means that > the operation requested didn''t work properly. If you are aware of any > use-case where EFAULT, ESRCH or other error codes would still result in > a valid, usable memory mapping - I have a fair understanding of the xc_* > code, and although I may not know every piece of that code, I''m fairly > certainly that is the expected behaviour. > > -- > Mats > > > > Ian. > > >
On 19/12/12 15:45, Ian Campbell wrote:> On Wed, 2012-12-19 at 15:08 +0000, Mats Petersson wrote: >> On 19/12/12 12:22, Ian Campbell wrote: >>> On Wed, 2012-12-19 at 12:10 +0000, Mats Petersson wrote: >>> >>>>>> + only likely to return EFAULT or some other "things are very >>>>>> + bad" error code, which the rest of the calling code won''t >>>>>> + be able to fix up. So we just exit with the error we got. >>>>> It expect it is more important to accumulate the individual errors from >>>>> remap_pte_fn into err_ptr. >>>> Yes, but since that exits on error with EFAULT, the calling code won''t >>>> "accept" the errors, and thus the whole house of cards fall apart at >>>> this point. >>>> >>>> There should probably be a task to fix this up properly, hence the >>>> comment. But right now, any error besides ENOENT is "unacceptable" by >>>> the callers of this code. If you want me to add this to the comment, I''m >>>> happy to. But as long as remap_pte_fn returns EFAULT on error, nothing >>>> will work after an error. >>> Are you sure? privcmd.c has some special casing for ENOENT but it looks >>> like it should just pass through other errors back to userspace. >>> >>> In any case surely this needs fixing? >>> >>> On the X86 side err_ptr is the result of the mmupdate hypercall which >>> can already be other than ENOENT, including EFAULT, ESRCH etc. >> Yes, but the ONLY error that is "acceptable" (as in, doesn''t lead to the >> calling code revoking the mapping and returning an error) is ENOENT. > Hr, Probably the right thing is for map_foreign_page to propagate the > result of XENMEM_add_to_physmap_range and for remap_pte_fn to store it > in err_ptr. That -EFAULT thing just looks wrong to me.Ok, so you want me to fix that up, I suppose? I mean, I just copied the behaviour that was already there - just massaged the code around a bit... -- Mats> >> Or at least, that''s how I believe it should SHOULD be - since only >> ENOENT is a "success" error code, anything else pretty much means that >> the operation requested didn''t work properly. If you are aware of any >> use-case where EFAULT, ESRCH or other error codes would still result in >> a valid, usable memory mapping - I have a fair understanding of the xc_* >> code, and although I may not know every piece of that code, I''m fairly >> certainly that is the expected behaviour. >> >> -- >> Mats >>> Ian. >>> >
On Wed, 2012-12-19 at 15:47 +0000, Mats Petersson wrote:> On 19/12/12 15:45, Ian Campbell wrote: > > On Wed, 2012-12-19 at 15:08 +0000, Mats Petersson wrote: > >> On 19/12/12 12:22, Ian Campbell wrote: > >>> On Wed, 2012-12-19 at 12:10 +0000, Mats Petersson wrote: > >>> > >>>>>> + only likely to return EFAULT or some other "things are very > >>>>>> + bad" error code, which the rest of the calling code won''t > >>>>>> + be able to fix up. So we just exit with the error we got. > >>>>> It expect it is more important to accumulate the individual errors from > >>>>> remap_pte_fn into err_ptr. > >>>> Yes, but since that exits on error with EFAULT, the calling code won''t > >>>> "accept" the errors, and thus the whole house of cards fall apart at > >>>> this point. > >>>> > >>>> There should probably be a task to fix this up properly, hence the > >>>> comment. But right now, any error besides ENOENT is "unacceptable" by > >>>> the callers of this code. If you want me to add this to the comment, I''m > >>>> happy to. But as long as remap_pte_fn returns EFAULT on error, nothing > >>>> will work after an error. > >>> Are you sure? privcmd.c has some special casing for ENOENT but it looks > >>> like it should just pass through other errors back to userspace. > >>> > >>> In any case surely this needs fixing? > >>> > >>> On the X86 side err_ptr is the result of the mmupdate hypercall which > >>> can already be other than ENOENT, including EFAULT, ESRCH etc. > >> Yes, but the ONLY error that is "acceptable" (as in, doesn''t lead to the > >> calling code revoking the mapping and returning an error) is ENOENT. > > Hr, Probably the right thing is for map_foreign_page to propagate the > > result of XENMEM_add_to_physmap_range and for remap_pte_fn to store it > > in err_ptr. That -EFAULT thing just looks wrong to me. > Ok, so you want me to fix that up, I suppose? I mean, I just copied the > behaviour that was already there - just massaged the code around a bit...Yes please, it didn''t really matter before but I think it matters after your changes.> > -- > Mats > > > >> Or at least, that''s how I believe it should SHOULD be - since only > >> ENOENT is a "success" error code, anything else pretty much means that > >> the operation requested didn''t work properly. If you are aware of any > >> use-case where EFAULT, ESRCH or other error codes would still result in > >> a valid, usable memory mapping - I have a fair understanding of the xc_* > >> code, and although I may not know every piece of that code, I''m fairly > >> certainly that is the expected behaviour. > >> > >> -- > >> Mats > >>> Ian. > >>> > > >
On 19/12/12 15:51, Ian Campbell wrote:> On Wed, 2012-12-19 at 15:47 +0000, Mats Petersson wrote: >> On 19/12/12 15:45, Ian Campbell wrote: >>> On Wed, 2012-12-19 at 15:08 +0000, Mats Petersson wrote: >>>> On 19/12/12 12:22, Ian Campbell wrote: >>>>> On Wed, 2012-12-19 at 12:10 +0000, Mats Petersson wrote: >>>>> >>>>>>>> + only likely to return EFAULT or some other "things are very >>>>>>>> + bad" error code, which the rest of the calling code won''t >>>>>>>> + be able to fix up. So we just exit with the error we got. >>>>>>> It expect it is more important to accumulate the individual errors from >>>>>>> remap_pte_fn into err_ptr. >>>>>> Yes, but since that exits on error with EFAULT, the calling code won''t >>>>>> "accept" the errors, and thus the whole house of cards fall apart at >>>>>> this point. >>>>>> >>>>>> There should probably be a task to fix this up properly, hence the >>>>>> comment. But right now, any error besides ENOENT is "unacceptable" by >>>>>> the callers of this code. If you want me to add this to the comment, I''m >>>>>> happy to. But as long as remap_pte_fn returns EFAULT on error, nothing >>>>>> will work after an error. >>>>> Are you sure? privcmd.c has some special casing for ENOENT but it looks >>>>> like it should just pass through other errors back to userspace. >>>>> >>>>> In any case surely this needs fixing? >>>>> >>>>> On the X86 side err_ptr is the result of the mmupdate hypercall which >>>>> can already be other than ENOENT, including EFAULT, ESRCH etc. >>>> Yes, but the ONLY error that is "acceptable" (as in, doesn''t lead to the >>>> calling code revoking the mapping and returning an error) is ENOENT. >>> Hr, Probably the right thing is for map_foreign_page to propagate the >>> result of XENMEM_add_to_physmap_range and for remap_pte_fn to store it >>> in err_ptr. That -EFAULT thing just looks wrong to me. >> Ok, so you want me to fix that up, I suppose? I mean, I just copied the >> behaviour that was already there - just massaged the code around a bit... > Yes please, it didn''t really matter before but I think it matters after > your changes.Ok, I will try to fix. But since I can''t test it, it will still be "does it compile" testing... {Would be nice to understand what has changed - as far as I see, the old code was just as much broken as the new code} -- Mats> >> -- >> Mats >>>> Or at least, that''s how I believe it should SHOULD be - since only >>>> ENOENT is a "success" error code, anything else pretty much means that >>>> the operation requested didn''t work properly. If you are aware of any >>>> use-case where EFAULT, ESRCH or other error codes would still result in >>>> a valid, usable memory mapping - I have a fair understanding of the xc_* >>>> code, and although I may not know every piece of that code, I''m fairly >>>> certainly that is the expected behaviour. >>>> >>>> -- >>>> Mats >>>>> Ian. >>>>> >