Adin Scannell
2011-Dec-17 03:22 UTC
[PATCH] Add necessary bits to pvops Linux for mapping paged-out pages
I''ve ported a couple of patches to the Linux pvops kernel that are necessary for correctly running domains with paging. In a nutshell: in the case of a foreign attempt to map a paged-out page, the correct error code will now be propogated up to libxc, which will already handle it correctly. This required an implementation of mmap_batch_v2. I''ve tested it using a highly-paged domain and everything looks okay (qemu will receive the appropriate error via the mmap_batch_v2 and retry). (My apologies if anyone/everyone receives this set of patches more than once, I''ve had some trouble with both my connection dying and guilt freaking out while I''m sending, leaving things in a bit of an unknown state.)
Adin Scannell
2011-Dec-17 03:22 UTC
[PATCH 1/3] Make xen_remap_domain_mfn_range return value meaningful in case of error
Original patch from Olaf Hering <ohering@novell.com> This change fixes the xc_map_foreign_bulk interface, which would otherwise cause SIGBUS when pages are gone because -ENOENT is not returned as expected by the IOCTL_PRIVCMD_MMAPBATCH_V2 ioctl. Signed-off-by: Jan Beulich <jbeulich@novell.com> This is a port to the mainline Linux tree. Functions were refactored and renamed. I believe that this is the only required change to match the semantics of the original patch. Signed-off-by: Adin Scannell <adin@scannell.ca> --- arch/x86/xen/mmu.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 87f6673..288a7fc 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -2350,8 +2350,8 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma, if (err) goto out; - err = -EFAULT; - if (HYPERVISOR_mmu_update(mmu_update, batch, NULL, domid) < 0) + err = HYPERVISOR_mmu_update(mmu_update, batch, NULL, domid); + if (err < 0) goto out; nr -= batch; -- 1.6.2.5
Handle GNTST_eagain status from GNTTABOP_map_grant_ref and GNTTABOP_copy operations properly to allow usage of xenpaging without causing crashes or data corruption. Replace all relevant HYPERVISOR_grant_table_op() calls with a retry loop. This loop is implemented as a macro to allow different GNTTABOP_* args. It will sleep up to 33 seconds and wait for the page to be paged in again. All ->status checks were updated to use the GNTST_* namespace. All return values are converted from GNTST_* namespace to 0/-EINVAL, since all callers did not use the actual return value. Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Patrick Colp <pjcolp@cs.ubc.ca> This is a port to the mainline Linux tree. This required dropping many backend drivers which have not yet made it in. Additionally, several of the referenced functions have moved and/or been refactored. I attempted to minimize changes while keeping the same semantics. Signed-off-by: Adin Scannell <adin@scannell.ca> Index: linux/drivers/block/xen-blkback/blkback.c ==================================================================--- drivers/block/xen-blkback/blkback.c | 4 ++- drivers/xen/grant-table.c | 7 ++++- drivers/xen/xenbus/xenbus_client.c | 4 +++ include/xen/grant_table.h | 39 +++++++++++++++++++++++++++++++++++ include/xen/interface/grant_table.h | 6 ++++- 5 files changed, 56 insertions(+), 4 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 15ec4db..d3fb290 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -390,7 +390,9 @@ static int xen_blkbk_map(struct blkif_request *req, * the page from the other domain. */ for (i = 0; i < nseg; i++) { - if (unlikely(map[i].status != 0)) { + if (unlikely(map[i].status == GNTST_eagain)) + gnttab_check_GNTST_eagain_do_while(GNTTABOP_map_grant_ref, &map[i]); + if (unlikely(map[i].status != GNTST_okay)) { pr_debug(DRV_PFX "invalid buffer -- could not remap it\n"); map[i].handle = BLKBACK_INVALID_HANDLE; ret |= 1; diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index bf1c094..48826c3 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -464,9 +464,12 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, for (i = 0; i < count; i++) { /* Do not add to override if the map failed. */ - if (map_ops[i].status) + if (map_ops[i].status != GNTST_okay && map_ops[i].status != GNTST_eagain) continue; + if (map_ops[i].status == GNTST_eagain) + return -EAGAIN; + if (map_ops[i].flags & GNTMAP_contains_pte) { pte = (pte_t *) (mfn_to_virt(PFN_DOWN(map_ops[i].host_addr)) + (map_ops[i].host_addr & ~PAGE_MASK)); @@ -565,7 +568,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx) return -ENOSYS; } - BUG_ON(rc || setup.status); + BUG_ON(rc || (setup.status != GNTST_okay)); rc = arch_gnttab_map_shared(frames, nr_gframes, gnttab_max_grant_frames(), &shared); diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c index 1906125..d123c78 100644 --- a/drivers/xen/xenbus/xenbus_client.c +++ b/drivers/xen/xenbus/xenbus_client.c @@ -455,6 +455,8 @@ int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr) if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1)) BUG(); + if (op.status == GNTST_eagain) + gnttab_check_GNTST_eagain_do_while(GNTTABOP_map_grant_ref, (&op)); if (op.status != GNTST_okay) { free_vm_area(area); xenbus_dev_fatal(dev, op.status, @@ -499,6 +501,8 @@ int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref, if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1)) BUG(); + if (op.status == GNTST_eagain) + gnttab_check_GNTST_eagain_do_while(GNTTABOP_map_grant_ref, (&op)); if (op.status != GNTST_okay) { xenbus_dev_fatal(dev, op.status, "mapping in shared page %d from domain %d", diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h index 11e2dfc..46fac85 100644 --- a/include/xen/grant_table.h +++ b/include/xen/grant_table.h @@ -37,6 +37,8 @@ #ifndef __ASM_GNTTAB_H__ #define __ASM_GNTTAB_H__ +#include <linux/delay.h> + #include <asm/page.h> #include <xen/interface/xen.h> @@ -160,4 +162,41 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops, struct page **pages, unsigned int count); +#define gnttab_check_GNTST_eagain_while(__HCop, __HCarg_p) \ +do { \ + u8 __hc_delay = 1; \ + int __ret; \ + while (unlikely((__HCarg_p)->status == GNTST_eagain && __hc_delay)) { \ + msleep(__hc_delay++); \ + __ret = HYPERVISOR_grant_table_op(__HCop, (__HCarg_p), 1); \ + BUG_ON(__ret); \ + } \ + if (__hc_delay == 0) { \ + printk(KERN_ERR "%s: gnt busy\n", __func__,); \ + (__HCarg_p)->status = GNTST_bad_page; \ + } \ + if ((__HCarg_p)->status != GNTST_okay) \ + printk(KERN_ERR "%s: gnt status %x\n", \ + __func__, (__HCarg_p)->status); \ +} while(0) + +#define gnttab_check_GNTST_eagain_do_while(__HCop, __HCarg_p) \ +do { \ + u8 __hc_delay = 1; \ + int __ret; \ + do { \ + __ret = HYPERVISOR_grant_table_op(__HCop, (__HCarg_p), 1); \ + BUG_ON(__ret); \ + if ((__HCarg_p)->status == GNTST_eagain) \ + msleep(__hc_delay++); \ + } while ((__HCarg_p)->status == GNTST_eagain && __hc_delay); \ + if (__hc_delay == 0) { \ + printk(KERN_ERR "%s: gnt busy\n", __func__); \ + (__HCarg_p)->status = GNTST_bad_page; \ + } \ + if ((__HCarg_p)->status != GNTST_okay) \ + printk(KERN_ERR "%s: gnt status %x\n", \ + __func__, (__HCarg_p)->status); \ +} while(0) + #endif /* __ASM_GNTTAB_H__ */ diff --git a/include/xen/interface/grant_table.h b/include/xen/interface/grant_table.h index 39e5717..ba04080 100644 --- a/include/xen/interface/grant_table.h +++ b/include/xen/interface/grant_table.h @@ -363,6 +363,8 @@ DEFINE_GUEST_HANDLE_STRUCT(gnttab_query_size); #define GNTST_permission_denied (-8) /* Not enough privilege for operation. */ #define GNTST_bad_page (-9) /* Specified page was invalid for op. */ #define GNTST_bad_copy_arg (-10) /* copy arguments cross page boundary */ +#define GNTST_address_too_big (-11) /* transfer page address too large. */ +#define GNTST_eagain (-12) /* Could not map at the moment. Retry. */ #define GNTTABOP_error_msgs { \ "okay", \ @@ -375,7 +377,9 @@ DEFINE_GUEST_HANDLE_STRUCT(gnttab_query_size); "no spare translation slot in the I/O MMU", \ "permission denied", \ "bad page", \ - "copy arguments cross page boundary" \ + "copy arguments cross page boundary", \ + "page address size too large", \ + "could not map at the moment, retry" \ } #endif /* __XEN_PUBLIC_GRANT_TABLE_H__ */ -- 1.6.2.5
Adin Scannell
2011-Dec-17 03:22 UTC
[PATCH 3/3] Port of mmap_batch_v2 to support paging in Xen
This wasn''t ported from any patch, but was rewritten based on the XCP 2.6.32 tree. The code structure is significantly different and this patch mirrors the existing Linux code. The primary reason for need the V2 interface is to support foreign mappings (i.e. qemu) of paged-out pages. The libxc code will already retry mappings when an ENOENT is returned. The V2 interface provides a richer error value, so the user-space code is capable of handling these errors specifically. Signed-off-by: Adin Scannell <adin@scannell.ca> Index: linux/drivers/xen/xenfs/privcmd.c ==================================================================--- drivers/xen/xenfs/privcmd.c | 90 ++++++++++++++++++++++++++++++++++++++++++- include/xen/privcmd.h | 10 +++++ 2 files changed, 99 insertions(+), 1 deletions(-) diff --git a/drivers/xen/xenfs/privcmd.c b/drivers/xen/xenfs/privcmd.c index dbd3b16..21cbb5a 100644 --- a/drivers/xen/xenfs/privcmd.c +++ b/drivers/xen/xenfs/privcmd.c @@ -70,7 +70,7 @@ static void free_page_list(struct list_head *pages) */ static int gather_array(struct list_head *pagelist, unsigned nelem, size_t size, - void __user *data) + const void __user *data) { unsigned pageidx; void *pagedata; @@ -245,6 +245,15 @@ struct mmap_batch_state { xen_pfn_t __user *user; }; +struct mmap_batch_v2_state { + domid_t domain; + unsigned long va; + struct vm_area_struct *vma; + int paged_out; + + int __user *err; +}; + static int mmap_batch_fn(void *data, void *state) { xen_pfn_t *mfnp = data; @@ -260,6 +269,20 @@ static int mmap_batch_fn(void *data, void *state) return 0; } +static int mmap_batch_v2_fn(void *data, void *state) +{ + xen_pfn_t *mfnp = data; + struct mmap_batch_v2_state *st = state; + + int rc = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, *mfnp, 1, + st->vma->vm_page_prot, st->domain); + if ( rc == -ENOENT ) + st->paged_out++; + st->va += PAGE_SIZE; + + return put_user(rc, st->err++); +} + static int mmap_return_errors(void *data, void *state) { xen_pfn_t *mfnp = data; @@ -332,6 +355,67 @@ out: return ret; } +static long privcmd_ioctl_mmap_batch_v2(void __user *udata) +{ + int ret; + struct privcmd_mmapbatch_v2 m; + struct mm_struct *mm = current->mm; + struct vm_area_struct *vma = NULL; + unsigned long nr_pages; + LIST_HEAD(pagelist); + struct mmap_batch_v2_state state; + + if (!xen_initial_domain()) + return -EPERM; + + if (copy_from_user(&m, udata, sizeof(m))) + return -EFAULT; + + nr_pages = m.num; + if ((m.num <= 0) || (nr_pages > (ULONG_MAX >> PAGE_SHIFT))) + return -EINVAL; + + ret = gather_array(&pagelist, m.num, sizeof(xen_pfn_t), + m.arr); + + if (ret || list_empty(&pagelist)) + goto out; + + down_write(&mm->mmap_sem); + + vma = find_vma(mm, m.addr); + ret = -EINVAL; + /* We allow multiple shots here, because this interface + * is used by libxc and mappings for specific pages will + * be retried when pages are paged-out (ENOENT). */ + if (!vma || + vma->vm_ops != &privcmd_vm_ops || + (m.addr < vma->vm_start) || + ((m.addr + (nr_pages << PAGE_SHIFT)) > vma->vm_end)) { + up_write(&mm->mmap_sem); + goto out; + } + + state.domain = m.dom; + state.vma = vma; + state.va = m.addr; + state.err = m.err; + state.paged_out = 0; + + up_write(&mm->mmap_sem); + + ret = traverse_pages(m.num, sizeof(xen_pfn_t), + &pagelist, mmap_batch_v2_fn, &state); + +out: + free_page_list(&pagelist); + + if ( (ret == 0) && (state.paged_out > 0) ) + return -ENOENT; + else + return ret; +} + static long privcmd_ioctl(struct file *file, unsigned int cmd, unsigned long data) { @@ -351,6 +435,10 @@ static long privcmd_ioctl(struct file *file, ret = privcmd_ioctl_mmap_batch(udata); break; + case IOCTL_PRIVCMD_MMAPBATCH_V2: + ret = privcmd_ioctl_mmap_batch_v2(udata); + break; + default: ret = -EINVAL; break; diff --git a/include/xen/privcmd.h b/include/xen/privcmd.h index 17857fb..39b92b1 100644 --- a/include/xen/privcmd.h +++ b/include/xen/privcmd.h @@ -62,6 +62,14 @@ struct privcmd_mmapbatch { xen_pfn_t __user *arr; /* array of mfns - top nibble set on err */ }; +struct privcmd_mmapbatch_v2 { + int num; /* number of pages to populate */ + domid_t dom; /* target domain */ + __u64 addr; /* virtual address */ + const xen_pfn_t __user *arr; /* array of mfns */ + int __user *err; /* array of error codes */ +}; + /* * @cmd: IOCTL_PRIVCMD_HYPERCALL * @arg: &privcmd_hypercall_t @@ -73,5 +81,7 @@ struct privcmd_mmapbatch { _IOC(_IOC_NONE, ''P'', 2, sizeof(struct privcmd_mmap)) #define IOCTL_PRIVCMD_MMAPBATCH \ _IOC(_IOC_NONE, ''P'', 3, sizeof(struct privcmd_mmapbatch)) +#define IOCTL_PRIVCMD_MMAPBATCH_V2 \ + _IOC(_IOC_NONE, ''P'', 4, sizeof(struct privcmd_mmapbatch_v2)) #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */ -- 1.6.2.5
Adin Scannell
2011-Dec-17 03:49 UTC
Re: [PATCH] Add necessary bits to pvops Linux for mapping paged-out pages
> (My apologies if anyone/everyone receives this set of patches more than once, > I''ve had some trouble with both my connection dying and guilt freaking out > while I''m sending, leaving things in a bit of an unknown state.)In my frustration, things got manual. This e-mail should be titled [PATCH 0 of 3].
Konrad Rzeszutek Wilk
2011-Dec-17 14:16 UTC
Re: [PATCH] Add necessary bits to pvops Linux for mapping paged-out pages
On Fri, Dec 16, 2011 at 10:22:18PM -0500, Adin Scannell wrote:> I''ve ported a couple of patches to the Linux pvops kernel that are necessary > for correctly running domains with paging. In a nutshell: in the case of a > foreign attempt to map a paged-out page, the correct error code will now be > propogated up to libxc, which will already handle it correctly. > > This required an implementation of mmap_batch_v2. > > I''ve tested it using a highly-paged domain and everything looks okay (qemu will > receive the appropriate error via the mmap_batch_v2 and retry).Please re-run these patches against scripts/checkpath.pl Also do re-base them on my #linux-next branch (git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git #linux-next). I am a bit worried how they are going to work with Daniel''s HVM patches which I was planning to put in my branch shorlty: http://old-list-archives.xen.org/archives/html/xen-devel/2011-10/msg01487.html> > (My apologies if anyone/everyone receives this set of patches more than once, > I''ve had some trouble with both my connection dying and guilt freaking out > while I''m sending, leaving things in a bit of an unknown state.)Next time please also send them to konrad.wilk@oracle.com. Thanks!>
Konrad Rzeszutek Wilk
2011-Dec-17 14:16 UTC
Re: [PATCH] Add necessary bits to pvops Linux for mapping paged-out pages
On Fri, Dec 16, 2011 at 10:22:18PM -0500, Adin Scannell wrote:> I''ve ported a couple of patches to the Linux pvops kernel that are necessary > for correctly running domains with paging. In a nutshell: in the case of a > foreign attempt to map a paged-out page, the correct error code will now be > propogated up to libxc, which will already handle it correctly. > > This required an implementation of mmap_batch_v2. > > I''ve tested it using a highly-paged domain and everything looks okay (qemu will > receive the appropriate error via the mmap_batch_v2 and retry). > > (My apologies if anyone/everyone receives this set of patches more than once, > I''ve had some trouble with both my connection dying and guilt freaking out > while I''m sending, leaving things in a bit of an unknown state.)Ah, and also please CC LKML mailing list. Thanks.>
Konrad Rzeszutek Wilk
2011-Dec-17 14:30 UTC
Re: [PATCH 2/3] Handle GNTST_eagain in kernel drivers
On Fri, Dec 16, 2011 at 10:22:20PM -0500, Adin Scannell wrote:> Handle GNTST_eagain status from GNTTABOP_map_grant_ref and > GNTTABOP_copy operations properly to allow usage of xenpaging without > causing crashes or data corruption. > > Replace all relevant HYPERVISOR_grant_table_op() calls with a retry > loop. This loop is implemented as a macro to allow different > GNTTABOP_* args. It will sleep up to 33 seconds and wait for the > page to be paged in again. > > All ->status checks were updated to use the GNTST_* namespace. All > return values are converted from GNTST_* namespace to 0/-EINVAL, since > all callers did not use the actual return value. > > Signed-off-by: Olaf Hering <olaf@aepfle.de> > Acked-by: Patrick Colp <pjcolp@cs.ubc.ca> > > This is a port to the mainline Linux tree. This required dropping many backend > drivers which have not yet made it in. Additionally, several of the referenced > functions have moved and/or been refactored. I attempted to minimize changes > while keeping the same semantics. > > Signed-off-by: Adin Scannell <adin@scannell.ca> > > Index: linux/drivers/block/xen-blkback/blkback.c > ==================================================================> --- > drivers/block/xen-blkback/blkback.c | 4 ++- > drivers/xen/grant-table.c | 7 ++++- > drivers/xen/xenbus/xenbus_client.c | 4 +++ > include/xen/grant_table.h | 39 +++++++++++++++++++++++++++++++++++ > include/xen/interface/grant_table.h | 6 ++++-> 5 files changed, 56 insertions(+), 4 deletions(-) > > diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c > index 15ec4db..d3fb290 100644 > --- a/drivers/block/xen-blkback/blkback.c > +++ b/drivers/block/xen-blkback/blkback.c > @@ -390,7 +390,9 @@ static int xen_blkbk_map(struct blkif_request *req, > * the page from the other domain. > */ > for (i = 0; i < nseg; i++) { > - if (unlikely(map[i].status != 0)) { > + if (unlikely(map[i].status == GNTST_eagain)) > + gnttab_check_GNTST_eagain_do_while(GNTTABOP_map_grant_ref, &map[i]); > + if (unlikely(map[i].status != GNTST_okay)) { > pr_debug(DRV_PFX "invalid buffer -- could not remap it\n"); > map[i].handle = BLKBACK_INVALID_HANDLE; > ret |= 1;> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c > index bf1c094..48826c3 100644 > --- a/drivers/xen/grant-table.c > +++ b/drivers/xen/grant-table.c > @@ -464,9 +464,12 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, > > for (i = 0; i < count; i++) { > /* Do not add to override if the map failed. */ > - if (map_ops[i].status) > + if (map_ops[i].status != GNTST_okay && map_ops[i].status != GNTST_eagain) > continue; > > + if (map_ops[i].status == GNTST_eagain) > + return -EAGAIN; > + > if (map_ops[i].flags & GNTMAP_contains_pte) { > pte = (pte_t *) (mfn_to_virt(PFN_DOWN(map_ops[i].host_addr)) + > (map_ops[i].host_addr & ~PAGE_MASK)); > @@ -565,7 +568,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx) > return -ENOSYS; > } > > - BUG_ON(rc || setup.status); > + BUG_ON(rc || (setup.status != GNTST_okay)); > > rc = arch_gnttab_map_shared(frames, nr_gframes, gnttab_max_grant_frames(), > &shared); > diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c > index 1906125..d123c78 100644 > --- a/drivers/xen/xenbus/xenbus_client.c > +++ b/drivers/xen/xenbus/xenbus_client.c > @@ -455,6 +455,8 @@ int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr) > if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1)) > BUG(); > > + if (op.status == GNTST_eagain) > + gnttab_check_GNTST_eagain_do_while(GNTTABOP_map_grant_ref, (&op)); > if (op.status != GNTST_okay) { > free_vm_area(area); > xenbus_dev_fatal(dev, op.status, > @@ -499,6 +501,8 @@ int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref, > if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1)) > BUG(); > > + if (op.status == GNTST_eagain) > + gnttab_check_GNTST_eagain_do_while(GNTTABOP_map_grant_ref, (&op)); > if (op.status != GNTST_okay) { > xenbus_dev_fatal(dev, op.status, > "mapping in shared page %d from domain %d", > diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h > index 11e2dfc..46fac85 100644 > --- a/include/xen/grant_table.h > +++ b/include/xen/grant_table.h > @@ -37,6 +37,8 @@ > #ifndef __ASM_GNTTAB_H__ > #define __ASM_GNTTAB_H__ > > +#include <linux/delay.h> > + > #include <asm/page.h> > > #include <xen/interface/xen.h> > @@ -160,4 +162,41 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, > int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops, > struct page **pages, unsigned int count); > > +#define gnttab_check_GNTST_eagain_while(__HCop, __HCarg_p) \So why does this have to be a macro? What is the advantage of that versus making this a function?> +do { \ > + u8 __hc_delay = 1; \ > + int __ret; \ > + while (unlikely((__HCarg_p)->status == GNTST_eagain && __hc_delay)) { \ > + msleep(__hc_delay++); \Ugh. Not sure what happend to this, but there are tons of ''\'' at the end. So why msleep? Why not go for a proper yield? Call the safe_halt() function?> + __ret = HYPERVISOR_grant_table_op(__HCop, (__HCarg_p), 1); \ > + BUG_ON(__ret); \ > + } \ > + if (__hc_delay == 0) { \So this would happen if we rolled over __hc_delay, so did this more than 255 times? Presumarily this can happen if the swapper in dom0 crashes.. I would recommend you use ''WARN'' here and include tons of details. This is a pretty serious issues, is it not?> + printk(KERN_ERR "%s: gnt busy\n", __func__,); \ > + (__HCarg_p)->status = GNTST_bad_page; \ > + } \ > + if ((__HCarg_p)->status != GNTST_okay) \ > + printk(KERN_ERR "%s: gnt status %x\n", \ > + __func__, (__HCarg_p)->status); \Use GNTTABOP_error_msgs. Also should we continue? What is the impact if we do continue? The times this is hit is if the status is not GNTS_okay and if it is not GNTS_eagain - so what are the situations in which this happens and what can the domain do to recover? Should there be some helpfull message instead of just "gnt status: X" ?> +} while(0) > + > +#define gnttab_check_GNTST_eagain_do_while(__HCop, __HCarg_p) \ > +do { \ > + u8 __hc_delay = 1; \ > + int __ret; \ > + do { \ > + __ret = HYPERVISOR_grant_table_op(__HCop, (__HCarg_p), 1); \ > + BUG_ON(__ret); \ > + if ((__HCarg_p)->status == GNTST_eagain) \ > + msleep(__hc_delay++); \ > + } while ((__HCarg_p)->status == GNTST_eagain && __hc_delay); \ > + if (__hc_delay == 0) { \ > + printk(KERN_ERR "%s: gnt busy\n", __func__); \ > + (__HCarg_p)->status = GNTST_bad_page; \ > + } \ > + if ((__HCarg_p)->status != GNTST_okay) \ > + printk(KERN_ERR "%s: gnt status %x\n", \ > + __func__, (__HCarg_p)->status); \ > +} while(0) > + > #endif /* __ASM_GNTTAB_H__ */ > diff --git a/include/xen/interface/grant_table.h b/include/xen/interface/grant_table.h > index 39e5717..ba04080 100644 > --- a/include/xen/interface/grant_table.h > +++ b/include/xen/interface/grant_table.h > @@ -363,6 +363,8 @@ DEFINE_GUEST_HANDLE_STRUCT(gnttab_query_size); > #define GNTST_permission_denied (-8) /* Not enough privilege for operation. */ > #define GNTST_bad_page (-9) /* Specified page was invalid for op. */ > #define GNTST_bad_copy_arg (-10) /* copy arguments cross page boundary */ > +#define GNTST_address_too_big (-11) /* transfer page address too large. */ > +#define GNTST_eagain (-12) /* Could not map at the moment. Retry. */ > > #define GNTTABOP_error_msgs { \ > "okay", \ > @@ -375,7 +377,9 @@ DEFINE_GUEST_HANDLE_STRUCT(gnttab_query_size); > "no spare translation slot in the I/O MMU", \ > "permission denied", \ > "bad page", \ > - "copy arguments cross page boundary" \ > + "copy arguments cross page boundary", \ > + "page address size too large", \ > + "could not map at the moment, retry" \ > } > > #endif /* __XEN_PUBLIC_GRANT_TABLE_H__ */ > -- > 1.6.2.5 >
Konrad Rzeszutek Wilk
2011-Dec-17 14:40 UTC
Re: [PATCH 3/3] Port of mmap_batch_v2 to support paging in Xen
On Fri, Dec 16, 2011 at 10:22:21PM -0500, Adin Scannell wrote:> This wasn''t ported from any patch, but was rewritten based on the XCP 2.6.32 > tree. The code structure is significantly different and this patch mirrors the > existing Linux code. > > The primary reason for need the V2 interface is to support foreign mappings > (i.e. qemu) of paged-out pages. The libxc code will already retry mappings > when an ENOENT is returned. The V2 interface provides a richer error value, > so the user-space code is capable of handling these errors specifically.Can you give more details on how to use paged-out pages. Perhaps a pointer to the xen''s docs?> > Signed-off-by: Adin Scannell <adin@scannell.ca> > > Index: linux/drivers/xen/xenfs/privcmd.c > ==================================================================> --- > drivers/xen/xenfs/privcmd.c | 90 ++++++++++++++++++++++++++++++++++++++++++-So that file just moved to drivers/xen/privcmd.c> include/xen/privcmd.h | 10 +++++ > 2 files changed, 99 insertions(+), 1 deletions(-) > > diff --git a/drivers/xen/xenfs/privcmd.c b/drivers/xen/xenfs/privcmd.c > index dbd3b16..21cbb5a 100644 > --- a/drivers/xen/xenfs/privcmd.c > +++ b/drivers/xen/xenfs/privcmd.c > @@ -70,7 +70,7 @@ static void free_page_list(struct list_head *pages) > */ > static int gather_array(struct list_head *pagelist, > unsigned nelem, size_t size, > - void __user *data) > + const void __user *data) > { > unsigned pageidx; > void *pagedata; > @@ -245,6 +245,15 @@ struct mmap_batch_state { > xen_pfn_t __user *user; > }; > > +struct mmap_batch_v2_state { > + domid_t domain; > + unsigned long va; > + struct vm_area_struct *vma; > + int paged_out;Should this be unsigned int?> + > + int __user *err; > +}; > + > static int mmap_batch_fn(void *data, void *state) > { > xen_pfn_t *mfnp = data; > @@ -260,6 +269,20 @@ static int mmap_batch_fn(void *data, void *state) > return 0; > } > > +static int mmap_batch_v2_fn(void *data, void *state) > +{ > + xen_pfn_t *mfnp = data; > + struct mmap_batch_v2_state *st = state; > + > + int rc = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, *mfnp, 1, > + st->vma->vm_page_prot, st->domain);You don''t want to check that st is not NULL?> + if ( rc == -ENOENT )This is the wrong style. Please fix.> + st->paged_out++;Is it possible that this ends overflowing and hitting 0?> + st->va += PAGE_SIZE; > + > + return put_user(rc, st->err++); > +} > + > static int mmap_return_errors(void *data, void *state) > { > xen_pfn_t *mfnp = data; > @@ -332,6 +355,67 @@ out: > return ret; > } > > +static long privcmd_ioctl_mmap_batch_v2(void __user *udata) > +{ > + int ret; > + struct privcmd_mmapbatch_v2 m; > + struct mm_struct *mm = current->mm; > + struct vm_area_struct *vma = NULL; > + unsigned long nr_pages; > + LIST_HEAD(pagelist); > + struct mmap_batch_v2_state state; > + > + if (!xen_initial_domain()) > + return -EPERM; > + > + if (copy_from_user(&m, udata, sizeof(m))) > + return -EFAULT; > + > + nr_pages = m.num; > + if ((m.num <= 0) || (nr_pages > (ULONG_MAX >> PAGE_SHIFT)))Just make it nr_pages instead of m.num.> + return -EINVAL; > + > + ret = gather_array(&pagelist, m.num, sizeof(xen_pfn_t),nr_pages.> + m.arr); > + > + if (ret || list_empty(&pagelist)) > + goto out; > + > + down_write(&mm->mmap_sem); > + > + vma = find_vma(mm, m.addr); > + ret = -EINVAL; > + /* We allow multiple shots here, because this interface > + * is used by libxc and mappings for specific pages will > + * be retried when pages are paged-out (ENOENT). */ > + if (!vma || > + vma->vm_ops != &privcmd_vm_ops || > + (m.addr < vma->vm_start) || > + ((m.addr + (nr_pages << PAGE_SHIFT)) > vma->vm_end)) { > + up_write(&mm->mmap_sem); > + goto out; > + } > + > + state.domain = m.dom;Should you check the m.dom for incorrect ones? Like -1? or DOMID_IO?> + state.vma = vma; > + state.va = m.addr; > + state.err = m.err; > + state.paged_out = 0; > + > + up_write(&mm->mmap_sem); > + > + ret = traverse_pages(m.num, sizeof(xen_pfn_t), > + &pagelist, mmap_batch_v2_fn, &state); > + > +out: > + free_page_list(&pagelist); > + > + if ( (ret == 0) && (state.paged_out > 0) ) > + return -ENOENT; > + else > + return ret; > +} > + > static long privcmd_ioctl(struct file *file, > unsigned int cmd, unsigned long data) > { > @@ -351,6 +435,10 @@ static long privcmd_ioctl(struct file *file, > ret = privcmd_ioctl_mmap_batch(udata); > break; > > + case IOCTL_PRIVCMD_MMAPBATCH_V2: > + ret = privcmd_ioctl_mmap_batch_v2(udata); > + break; > + > default: > ret = -EINVAL; > break; > diff --git a/include/xen/privcmd.h b/include/xen/privcmd.h > index 17857fb..39b92b1 100644 > --- a/include/xen/privcmd.h > +++ b/include/xen/privcmd.h > @@ -62,6 +62,14 @@ struct privcmd_mmapbatch { > xen_pfn_t __user *arr; /* array of mfns - top nibble set on err */ > }; > > +struct privcmd_mmapbatch_v2 { > + int num; /* number of pages to populate */ > + domid_t dom; /* target domain */ > + __u64 addr; /* virtual address */ > + const xen_pfn_t __user *arr; /* array of mfns */ > + int __user *err; /* array of error codes */ > +}; > + > /* > * @cmd: IOCTL_PRIVCMD_HYPERCALL > * @arg: &privcmd_hypercall_t > @@ -73,5 +81,7 @@ struct privcmd_mmapbatch { > _IOC(_IOC_NONE, ''P'', 2, sizeof(struct privcmd_mmap)) > #define IOCTL_PRIVCMD_MMAPBATCH \ > _IOC(_IOC_NONE, ''P'', 3, sizeof(struct privcmd_mmapbatch)) > +#define IOCTL_PRIVCMD_MMAPBATCH_V2 \ > + _IOC(_IOC_NONE, ''P'', 4, sizeof(struct privcmd_mmapbatch_v2)) > > #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */ > -- > 1.6.2.5 >
Adin Scannell
2011-Dec-17 16:51 UTC
Re: [PATCH 3/3] Port of mmap_batch_v2 to support paging in Xen
Hi Konrad, Thanks for the quick turnaround. I''ll incorporate your feedback and resend. Some NOTES are inline below. On Sat, Dec 17, 2011 at 9:40 AM, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:> On Fri, Dec 16, 2011 at 10:22:21PM -0500, Adin Scannell wrote: >> This wasn''t ported from any patch, but was rewritten based on the XCP 2.6.32 >> tree. The code structure is significantly different and this patch mirrors the >> existing Linux code. >> >> The primary reason for need the V2 interface is to support foreign mappings >> (i.e. qemu) of paged-out pages. The libxc code will already retry mappings >> when an ENOENT is returned. The V2 interface provides a richer error value, >> so the user-space code is capable of handling these errors specifically. > > Can you give more details on how to use paged-out pages. Perhaps a > pointer to the xen''s docs?Hrm, in usual fashion the docs are a bit lacking. Simply: the kernel has to do nothing to support paging (other than the backend drivers which need to handle the grant EAGAIN case -- other patch). The only reason the V2 interface is needed here is to pass back full error codes from the mmu_update()''s. Xen and libxc have a mutual understanding that when you receive an ENOENT error code, you back off and try again.>> >> Signed-off-by: Adin Scannell <adin@scannell.ca> >> >> Index: linux/drivers/xen/xenfs/privcmd.c >> ==================================================================>> --- >> drivers/xen/xenfs/privcmd.c | 90 ++++++++++++++++++++++++++++++++++++++++++- > > So that file just moved to drivers/xen/privcmd.cOf course it has :)>> include/xen/privcmd.h | 10 +++++ >> 2 files changed, 99 insertions(+), 1 deletions(-) >> >> diff --git a/drivers/xen/xenfs/privcmd.c b/drivers/xen/xenfs/privcmd.c >> index dbd3b16..21cbb5a 100644 >> --- a/drivers/xen/xenfs/privcmd.c >> +++ b/drivers/xen/xenfs/privcmd.c >> @@ -70,7 +70,7 @@ static void free_page_list(struct list_head *pages) >> */ >> static int gather_array(struct list_head *pagelist, >> unsigned nelem, size_t size, >> - void __user *data) >> + const void __user *data) >> { >> unsigned pageidx; >> void *pagedata; >> @@ -245,6 +245,15 @@ struct mmap_batch_state { >> xen_pfn_t __user *user; >> }; >> >> +struct mmap_batch_v2_state { >> + domid_t domain; >> + unsigned long va; >> + struct vm_area_struct *vma; >> + int paged_out; > > Should this be unsigned int?Noted below.>> + >> + int __user *err; >> +}; >> + >> static int mmap_batch_fn(void *data, void *state) >> { >> xen_pfn_t *mfnp = data; >> @@ -260,6 +269,20 @@ static int mmap_batch_fn(void *data, void *state) >> return 0; >> } >> >> +static int mmap_batch_v2_fn(void *data, void *state) >> +{ >> + xen_pfn_t *mfnp = data; >> + struct mmap_batch_v2_state *st = state; >> + >> + int rc = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, *mfnp, 1, >> + st->vma->vm_page_prot, st->domain); > > You don''t want to check that st is not NULL?This could be an assertion as it''s only used in the ioctl_mmap_batch_v2 function where it''s guaranteed to pass in a non-null state. I''ll add an additional patch to the queue that adds these assertions to both mmap_batch_fn and mmap_batch_fn_v2.>> + st->paged_out++; > Is it possible that this ends overflowing and hitting 0?I don''t think this is an issue practically, as the only way to trigger this would be to be on a 64bit machine and map an ungodly number of pages with a single mmap_batch (MAX_INT). For correctness though, I can update this variable and the err variable in mmap_batch_state which suffers from the same problem -- turn them into unsigned long. This will be included in the additional patch.>> + m.arr); >> + >> + if (ret || list_empty(&pagelist)) >> + goto out; >> + >> + down_write(&mm->mmap_sem); >> + >> + vma = find_vma(mm, m.addr); >> + ret = -EINVAL; >> + /* We allow multiple shots here, because this interface >> + * is used by libxc and mappings for specific pages will >> + * be retried when pages are paged-out (ENOENT). */ >> + if (!vma || >> + vma->vm_ops != &privcmd_vm_ops || >> + (m.addr < vma->vm_start) || >> + ((m.addr + (nr_pages << PAGE_SHIFT)) > vma->vm_end)) { >> + up_write(&mm->mmap_sem); >> + goto out; >> + } >> + >> + state.domain = m.dom; > > Should you check the m.dom for incorrect ones? Like -1? or DOMID_IO?I followed the example for mmap_batch, which just tries the call and returns whatever error Xen gives. I think that''s the right approach for these. Cheers! -Adin
On Sat, Dec 17, 2011 at 9:30 AM, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:>> +#define gnttab_check_GNTST_eagain_while(__HCop, __HCarg_p) > > So why does this have to be a macro? What is the advantage of that > versus making this a function?I just wanted to minimize changes in the patch from the known XCP version. I''m personally not a fan of big macros like this.> So why msleep? Why not go for a proper yield? Call the safe_halt() > function?Yes, this is something that should be revisited. Since it looks like Daniel''s HVM patches are going to conflict with this anyways, I''ll revisit this patch independently from the other two and see what I can do about making it nicer and addressing the other bits of feedback you''ve given.> Use GNTTABOP_error_msgs. Also should we continue? What is the > impact if we do continue? The times this is hit is if the status > is not GNTS_okay and if it is not GNTS_eagain - so what are the > situations in which this happens and what can the domain do > to recover? Should there be some helpfull message instead of > just "gnt status: X" ?In all the cases this macro is called, the caller still needs to check op.status and handle any errors appropriately. The point of the macro is to reasonably get you from eagain => !eagain, whatever the result may be. If I turn this into a function, those semantics will still apply.
Konrad Rzeszutek Wilk
2011-Dec-17 21:29 UTC
Re: [PATCH 3/3] Port of mmap_batch_v2 to support paging in Xen
On Sat, Dec 17, 2011 at 11:51:11AM -0500, Adin Scannell wrote:> Hi Konrad, > > Thanks for the quick turnaround. I''ll incorporate your feedback and > resend. Some NOTES are inline below. > > On Sat, Dec 17, 2011 at 9:40 AM, Konrad Rzeszutek Wilk > <konrad@darnok.org> wrote: > > On Fri, Dec 16, 2011 at 10:22:21PM -0500, Adin Scannell wrote: > >> This wasn''t ported from any patch, but was rewritten based on the XCP 2.6.32 > >> tree. The code structure is significantly different and this patch mirrors the > >> existing Linux code. > >> > >> The primary reason for need the V2 interface is to support foreign mappings > >> (i.e. qemu) of paged-out pages. The libxc code will already retry mappings > >> when an ENOENT is returned. The V2 interface provides a richer error value, > >> so the user-space code is capable of handling these errors specifically. > > > > Can you give more details on how to use paged-out pages. Perhaps a > > pointer to the xen''s docs? > > Hrm, in usual fashion the docs are a bit lacking. > > Simply: the kernel has to do nothing to support paging (other than the > backend drivers which need to handle the grant EAGAIN case -- other > patch). The only reason the V2 interface is needed here is to passHm I did not see the netback one? Should that also incorporate this?> back full error codes from the mmu_update()''s. Xen and libxc have a > mutual understanding that when you receive an ENOENT error code, you > back off and try again.What you described is pretty good. Please do include it in the git description. Thx> > >> > >> Signed-off-by: Adin Scannell <adin@scannell.ca> > >> > >> Index: linux/drivers/xen/xenfs/privcmd.c > >> ==================================================================> >> --- > >> drivers/xen/xenfs/privcmd.c | 90 ++++++++++++++++++++++++++++++++++++++++++- > > > > So that file just moved to drivers/xen/privcmd.c > > Of course it has :)Well, we can''t have it easy can we! :-)> > >> include/xen/privcmd.h | 10 +++++ > >> 2 files changed, 99 insertions(+), 1 deletions(-) > >> > >> diff --git a/drivers/xen/xenfs/privcmd.c b/drivers/xen/xenfs/privcmd.c > >> index dbd3b16..21cbb5a 100644 > >> --- a/drivers/xen/xenfs/privcmd.c > >> +++ b/drivers/xen/xenfs/privcmd.c > >> @@ -70,7 +70,7 @@ static void free_page_list(struct list_head *pages) > >> */ > >> static int gather_array(struct list_head *pagelist, > >> unsigned nelem, size_t size, > >> - void __user *data) > >> + const void __user *data) > >> { > >> unsigned pageidx; > >> void *pagedata; > >> @@ -245,6 +245,15 @@ struct mmap_batch_state { > >> xen_pfn_t __user *user; > >> }; > >> > >> +struct mmap_batch_v2_state { > >> + domid_t domain; > >> + unsigned long va; > >> + struct vm_area_struct *vma; > >> + int paged_out; > > > > Should this be unsigned int? > > Noted below. > > >> + > >> + int __user *err; > >> +}; > >> + > >> static int mmap_batch_fn(void *data, void *state) > >> { > >> xen_pfn_t *mfnp = data; > >> @@ -260,6 +269,20 @@ static int mmap_batch_fn(void *data, void *state) > >> return 0; > >> } > >> > >> +static int mmap_batch_v2_fn(void *data, void *state) > >> +{ > >> + xen_pfn_t *mfnp = data; > >> + struct mmap_batch_v2_state *st = state; > >> + > >> + int rc = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, *mfnp, 1, > >> + st->vma->vm_page_prot, st->domain); > > > > You don''t want to check that st is not NULL? > > This could be an assertion as it''s only used in the > ioctl_mmap_batch_v2 function where it''s guaranteed to pass in a > non-null state. > > I''ll add an additional patch to the queue that adds these assertions > to both mmap_batch_fn and mmap_batch_fn_v2. > > >> + st->paged_out++; > > Is it possible that this ends overflowing and hitting 0? > > I don''t think this is an issue practically, as the only way to trigger > this would be to be on a 64bit machine and map an ungodly number of > pages with a single mmap_batch (MAX_INT). For correctness though, I > can update this variable and the err variable in mmap_batch_state > which suffers from the same problem -- turn them into unsigned long. > This will be included in the additional patch. > >> + m.arr); > >> + > >> + if (ret || list_empty(&pagelist)) > >> + goto out; > >> + > >> + down_write(&mm->mmap_sem); > >> + > >> + vma = find_vma(mm, m.addr); > >> + ret = -EINVAL; > >> + /* We allow multiple shots here, because this interface > >> + * is used by libxc and mappings for specific pages will > >> + * be retried when pages are paged-out (ENOENT). */ > >> + if (!vma || > >> + vma->vm_ops != &privcmd_vm_ops || > >> + (m.addr < vma->vm_start) || > >> + ((m.addr + (nr_pages << PAGE_SHIFT)) > vma->vm_end)) { > >> + up_write(&mm->mmap_sem); > >> + goto out; > >> + } > >> + > >> + state.domain = m.dom; > > > > Should you check the m.dom for incorrect ones? Like -1? or DOMID_IO? > > I followed the example for mmap_batch, which just tries the call and > returns whatever error Xen gives. I think that''s the right approach > for these.OK. I think a another patch to actually check for that in every other ioclt in this code might be worth it.> > Cheers! > -Adin
Konrad Rzeszutek Wilk
2011-Dec-17 21:31 UTC
Re: [PATCH 2/3] Handle GNTST_eagain in kernel drivers
On Sat, Dec 17, 2011 at 11:53:40AM -0500, Adin Scannell wrote:> On Sat, Dec 17, 2011 at 9:30 AM, Konrad Rzeszutek Wilk > <konrad@darnok.org> wrote: > >> +#define gnttab_check_GNTST_eagain_while(__HCop, __HCarg_p) > > > So why does this have to be a macro? What is the advantage of that > > versus making this a function? > > I just wanted to minimize changes in the patch from the known XCP > version. I''m personally not a fan of big macros like this. > > > So why msleep? Why not go for a proper yield? Call the safe_halt() > > function? > > Yes, this is something that should be revisited. > > Since it looks like Daniel''s HVM patches are going to conflict with > this anyways, I''ll revisit this patch independently from the other two > and see what I can do about making it nicer and addressing the other > bits of feedback you''ve given.OK. Thanks.> > > Use GNTTABOP_error_msgs. Also should we continue? What is the > > impact if we do continue? The times this is hit is if the status > > is not GNTS_okay and if it is not GNTS_eagain - so what are the > > situations in which this happens and what can the domain do > > to recover? Should there be some helpfull message instead of > > just "gnt status: X" ? > > In all the cases this macro is called, the caller still needs to check > op.status and handle any errors appropriately. The point of the macro > is to reasonably get you from eagain => !eagain, whatever the result > may be. If I turn this into a function, those semantics will still > apply.OK, it sounds as the ''printk'' is not really neccessary as it will be the responsibility of the caller to figure out what to do (which might be very well doing a printk, or maybe not?)
On Sat, Dec 17, Konrad Rzeszutek Wilk wrote:> > +#define gnttab_check_GNTST_eagain_while(__HCop, __HCarg_p) \ > > So why does this have to be a macro? What is the advantage of that > versus making this a function?I dont remember why I turned this into a macro instead of a function.> > +do { \ > > + u8 __hc_delay = 1; \ > > + int __ret; \ > > + while (unlikely((__HCarg_p)->status == GNTST_eagain && __hc_delay)) { \ > > + msleep(__hc_delay++); \ > > Ugh. Not sure what happend to this, but there are tons of ''\'' at the > end.A multiline macro needs backslashes at the end.> So why msleep? Why not go for a proper yield? Call the safe_halt() > function?It needs some interuptible sleep, whatever is best in this context.> > + __ret = HYPERVISOR_grant_table_op(__HCop, (__HCarg_p), 1); \ > > + BUG_ON(__ret); \ > > + } \ > > + if (__hc_delay == 0) { \ > > So this would happen if we rolled over __hc_delay, so did this more than > 255 times? Presumarily this can happen if the swapper in dom0 crashes..Or if something in the paging paths goes wrong.> I would recommend you use ''WARN'' here and include tons of details. > This is a pretty serious issues, is it not?Either the host is really busy and cant page-in quick enough even after so-many seconds. Or something in the pager/hypervisor does not work right. In either case, a backtrace wont help much as it does only cause noise. The printk below prints the function name (I think thats the reason why it is a macro) to give some hint.> > + printk(KERN_ERR "%s: gnt busy\n", __func__,); \ > > + (__HCarg_p)->status = GNTST_bad_page; \ > > + } \ > > + if ((__HCarg_p)->status != GNTST_okay) \ > > + printk(KERN_ERR "%s: gnt status %x\n", \ > > + __func__, (__HCarg_p)->status); \ > > Use GNTTABOP_error_msgs. Also should we continue? What is the > impact if we do continue? The times this is hit is if the status > is not GNTS_okay and if it is not GNTS_eagain - so what are the > situations in which this happens and what can the domain do > to recover? Should there be some helpfull message instead of > just "gnt status: X" ?The caller has to deal with the various !GNTST_okay states anyway, this patch wont change that fact. Olaf
Olaf Hering
2012-Jan-02 16:06 UTC
Re: [PATCH] Add necessary bits to pvops Linux for mapping paged-out pages
On Fri, Dec 16, Adin Scannell wrote:> I''ve ported a couple of patches to the Linux pvops kernel that are necessary > for correctly running domains with paging. In a nutshell: in the case of a > foreign attempt to map a paged-out page, the correct error code will now be > propogated up to libxc, which will already handle it correctly.Adin, thanks for doing this work. Olaf
Konrad Rzeszutek Wilk
2012-Jan-03 18:19 UTC
Re: [PATCH 2/3] Handle GNTST_eagain in kernel drivers
On Mon, Jan 02, 2012 at 05:06:12PM +0100, Olaf Hering wrote:> On Sat, Dec 17, Konrad Rzeszutek Wilk wrote: > > > > +#define gnttab_check_GNTST_eagain_while(__HCop, __HCarg_p) \ > > > > So why does this have to be a macro? What is the advantage of that > > versus making this a function? > > I dont remember why I turned this into a macro instead of a function. > > > > +do { \ > > > + u8 __hc_delay = 1; \ > > > + int __ret; \ > > > + while (unlikely((__HCarg_p)->status == GNTST_eagain && __hc_delay)) { \ > > > + msleep(__hc_delay++); \ > > > > Ugh. Not sure what happend to this, but there are tons of ''\'' at the > > end. > > A multiline macro needs backslashes at the end.Yes. I should have been more specific. The ''\'' are out of aligment.> > > So why msleep? Why not go for a proper yield? Call the safe_halt() > > function? > > It needs some interuptible sleep, whatever is best in this context. > > > > + __ret = HYPERVISOR_grant_table_op(__HCop, (__HCarg_p), 1); \ > > > + BUG_ON(__ret); \ > > > + } \ > > > + if (__hc_delay == 0) { \ > > > > So this would happen if we rolled over __hc_delay, so did this more than > > 255 times? Presumarily this can happen if the swapper in dom0 crashes.. > > Or if something in the paging paths goes wrong. > > > I would recommend you use ''WARN'' here and include tons of details. > > This is a pretty serious issues, is it not? > > Either the host is really busy and cant page-in quick enough even after > so-many seconds. Or something in the pager/hypervisor does not work > right. In either case, a backtrace wont help much as it does only cause > noise. The printk below prints the function name (I think thats the > reason why it is a macro) to give some hint.OK, we can do this differently. Make a function that does the majority of this, and one of the arguments is a ''const char *name'' and use a macro that does: #define gnttab_check_GNTST_eagain_while(__HCop, __HCarg_p) real_function(__func__, __Hcop, __HCarg_p,...) or such. If this problem does occur (the swapper died in dom0) should the printk at least use printk_ratelimited so that we don''t cause too much noise?> > > > + printk(KERN_ERR "%s: gnt busy\n", __func__,); \ > > > + (__HCarg_p)->status = GNTST_bad_page; \ > > > + } \ > > > + if ((__HCarg_p)->status != GNTST_okay) \ > > > + printk(KERN_ERR "%s: gnt status %x\n", \ > > > + __func__, (__HCarg_p)->status); \ > > > > Use GNTTABOP_error_msgs. Also should we continue? What is the > > impact if we do continue? The times this is hit is if the status > > is not GNTS_okay and if it is not GNTS_eagain - so what are the > > situations in which this happens and what can the domain do > > to recover? Should there be some helpfull message instead of > > just "gnt status: X" ? > > The caller has to deal with the various !GNTST_okay states anyway, this > patch wont change that fact.Ok, so then we don''t really need the printk right? As the caller would presumarily do the right thing and also print the error?> > Olaf > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
On Tue, Jan 03, Konrad Rzeszutek Wilk wrote:> If this problem does occur (the swapper died in dom0) should the > printk at least use printk_ratelimited so that we don''t cause too much > noise?I remember there was no flood because the guest was stuck anyway. But see below.> > The caller has to deal with the various !GNTST_okay states anyway, this > > patch wont change that fact. > > Ok, so then we don''t really need the printk right? As the caller > would presumarily do the right thing and also print the error?I think its more a debug thing, so that I knew something bad happend. And at that time it was just helpful to get me some understanding of the code flow. Since now that part of the paging code is reasonable debugged, the printk is not really needed anymore. Instead the code who uses these new functionality should have proper error handling and print reasonable diagnostic messages. Olaf
Konrad Rzeszutek Wilk
2012-Jan-03 18:48 UTC
Re: [PATCH 2/3] Handle GNTST_eagain in kernel drivers
On Tue, Jan 03, 2012 at 07:40:45PM +0100, Olaf Hering wrote:> On Tue, Jan 03, Konrad Rzeszutek Wilk wrote: > > > If this problem does occur (the swapper died in dom0) should the > > printk at least use printk_ratelimited so that we don''t cause too much > > noise? > > I remember there was no flood because the guest was stuck anyway. But > see below. > > > > The caller has to deal with the various !GNTST_okay states anyway, this > > > patch wont change that fact. > > > > Ok, so then we don''t really need the printk right? As the caller > > would presumarily do the right thing and also print the error? > > I think its more a debug thing, so that I knew something bad happend. > And at that time it was just helpful to get me some understanding of the > code flow. Since now that part of the paging code is reasonable > debugged, the printk is not really needed anymore.OK.> Instead the code who uses these new functionality should have proper > error handling and print reasonable diagnostic messages.Excellent. So lets remove that printk(KERN_ERR). Thanks!> > Olaf > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel