Hi Konrad, David and everybody else, I find a privcmd bug that no more than 160 pages can be mmap by IOCTL_PRIVCMD_MMAPBATCH and IOCTL_PRIVCMD_MMAPBATCH_V2. I have started a thread (see below) at the xen-user list, and then discussed with Mr. Ian Campbell, who finally recommended talking to the xen-devel list and CC'ing you. I appreciate your comments. Thanks, Guanglin ---------- Forwarded message ---------- From: Ian Campbell <Ian.Campbell@xxxxxxxxx> Date: 2013/7/30 Subject: Re: [Xen-users] BUG : privcmd mmap 160+ pages To: Guanglin Xu <mzguanglin@xxxxxxxxx> 抄送: "xen-users@lists.xen.org" <xen-users@lists.xen.org> On Tue, 2013-07-30 at 07:55 -0400, Guanglin Xu wrote:> 2013/7/30 Ian Campbell <Ian.Campbell@xxxxxxxxx>: > > On Mon, 2013-07-29 at 16:16 -0400, Guanglin Xu wrote: > >> Hi all, > >> > >> I just find a xc_map_foreign_range() problem in Xen. > >> > >> xc_map_foreign_range(), an API of libxc backed by privcmd - a xen > >> kernel module, can be used to mmap guest VM memory into dom0. However, > >> if we mmap more than 160 pages, we'll fail. > >> > >> Inside xc_map_foreign_range(), it uses ioctrl to communicate with > >> privcmd. There are 2 ioctl commands, the one is > >> IOCTL_PRIVCMD_MMAPBATCH (legacy), another one is > >> IOCTL_PRIVCMD_MMAPBATCH_V2 (newer). Both of them constantly return 0 > >> (success), but the mmapings are fail after mmaping 160 pages. > >> > >> Firstly, when my Linux kernel version was 3.5, IOCTL_PRIVCMD_MMAPBATCH > >> was the only one ioctl command. rc = ioctl(fd, > >> IOCTL_PRIVCMD_MMAPBATCH, &ioctlx). After mapping 160 pages, the > >> subsequent invoking of ioctl would > >> set ioctlx.arr[] items to 140737344202616, overwriting the original > >> pfn number. > >> > >> And then, I updated my Linux kernel to 3.8 so as to test > >> IOCTL_PRIVCMD_MMAPBATCH_V2. rc = ioctl(fd, > >> IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx). This time, the post-160 invoking > >> of ioctl would set ioctls.err[] items to EINVAL. > >> > >> Although I have inserted printk() in privcmd.c to track its execution > >> flow, the result showed a look-like complete path, which is quite > >> weird. I have no idea what happened in privcmd. > >> > >> Can anyone figure out this problem? > > > > I wouldn't be all that surprised if there was a hardcoded batch size > > limit somewhere in either libxc or the privcmd driver. > > Hi Ian, > > Thank you very much for your reply. > > I can confirm that it's the problem of privcmd, because I have debug > libxc and narrowed the problem to ioctl(), where privcmd would set > ioctlx.arr[] ( IOCTL_PRIVCMD_MMAPBATCH) or ioctlx.err[] ( > IOCTL_PRIVCMD_MMAPBATCH_V2) to indicate the limitation. > > > > > If you map you memory in batch of e.g. 128 pages does it all work OK? > > Yes. For example, [0-127] succeeds. However, the subsequent [128-255] > would fail because the size of the whole region has exceeded 160 > pages.That's quite interesting.> > > > If you want to get to the bottom of the 160 page limit you'll probably > > have to trace through the code looking for hardcoded sizes or limits on > > sizes (e.g. of arrays) or perhaps integer indexes etc which are too > > small and are overflowing. > > > > 160 * the size of the various structs involved doesn't look to be all > > that interesting (i.e. just over a page size boundary or something), but > > that's the sort of direction I would start by looking in. > > > > If you can't spot it by eye then you'll likely have to instrument the > > code paths with prints to try and track the progress of the initially > > supplied buffer through to the hypercall etc. > > Yes. I have been debuging libxc and instrumenting privcmd, but I > couldn't find the "hardcode" or other limitation codes. > > Codes in libxc is easizer to trace, but in privcmd it is hard for me > who is in lack of kernel dev experience. I couldn't even find where > privcmd copy_to_user() or put_user() the ioctlx.err[] or ioctlx.arr[] > items while the execution path of privcmd_ioctl() seems quite complete > by use of printk(). > > Do you have idea how privcmd can set ioctlx.err[] in another way?I'm not familiar with this code. It sounds like this is most probably a kernel bug, I'd recommend taking it to the xen-devel list, perhaps CC Konrad Wilk and David Vrabel. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
> Hi Konrad, David and everybody else, > > I find a privcmd bug that no more than 160 pages can be mmap by > IOCTL_PRIVCMD_MMAPBATCH and IOCTL_PRIVCMD_MMAPBATCH_V2. > > I have started a thread (see below) at the xen-user list, and then > discussed with Mr. Ian Campbell, who finally recommended talking to > the xen-devel list and CC''ing you. > > I appreciate your comments.You know, this might have been me. Or in the process of being fixed by a patch I just posted. As you may know by now, xc_map_foreign* first sets up a region of virtual memory with an mmap call, and then issues the mapping ioctl to populate the region. Let''s say for the sake of argument that the mapping operation is broken up into pieces. Current checks in the implementation of PRIVCMD_MMAPBATCH* prevent you from populating the memory region piece-meal. So: can you try the patch I pasted below? If that doesn''t fix it, then you''ll need to give us the output of those printks. Thanks Andres [PATCH] Xen: Fix retry calls into PRIVCMD_MMAPBATCH*. When a foreign mapper attempts to map guest frames that are paged out, the mapper receives an ENOENT response and will have to try again while a helper process pages the target frame back in. Gating checks on PRIVCMD_MMAPBATCH* ioctl args were preventing retries of mapping calls. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> --- drivers/xen/privcmd.c | 32 +++++++++++++++++++++++++++----- 1 files changed, 27 insertions(+), 5 deletions(-) diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index f8e5dd7..44a26c6 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -43,9 +43,12 @@ MODULE_LICENSE("GPL"); #define PRIV_VMA_LOCKED ((void *)1) -#ifndef HAVE_ARCH_PRIVCMD_MMAP static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma); -#endif + +static int privcmd_enforce_singleshot_mapping_granular( + struct vm_area_struct *vma, + unsigned long addr, + unsigned long nr_pages); static long privcmd_ioctl_hypercall(void __user *udata) { @@ -422,9 +425,9 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version) vma = find_vma(mm, m.addr); if (!vma || vma->vm_ops != &privcmd_vm_ops || - (m.addr != vma->vm_start) || - ((m.addr + (nr_pages << PAGE_SHIFT)) != vma->vm_end) || - !privcmd_enforce_singleshot_mapping(vma)) { + (m.addr < vma->vm_start) || + ((m.addr + (nr_pages << PAGE_SHIFT)) > vma->vm_end) || + !privcmd_enforce_singleshot_mapping_granular(vma, m.addr, nr_pages)) { up_write(&mm->mmap_sem); ret = -EINVAL; goto out; @@ -540,11 +543,30 @@ static int privcmd_mmap(struct file *file, struct vm_area_struct *vma) return 0; } +/* For MMAP */ static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma) { return !cmpxchg(&vma->vm_private_data, NULL, PRIV_VMA_LOCKED); } +/* For MMAPBATCH*. This allows asserting the singleshot mapping + * on a per pfn/pte basis. Mapping calls that fail with ENOENT + * can be then retried until success. */ +static int enforce_singleshot_mapping_fn(pte_t *pte, struct page *pmd_page, + unsigned long addr, void *data) +{ + return pte_none(*pte) ? 0 : -EBUSY; +} + +static int privcmd_enforce_singleshot_mapping_granular( + struct vm_area_struct *vma, + unsigned long addr, + unsigned long nr_pages) +{ + return apply_to_page_range(vma->vm_mm, addr, nr_pages << PAGE_SHIFT, + enforce_singleshot_mapping_fn, NULL) == 0; +} + const struct file_operations xen_privcmd_fops = { .owner = THIS_MODULE, .unlocked_ioctl = privcmd_ioctl, -- 1.7.1> > Thanks, > > Guanglin > > > ---------- Forwarded message ---------- > From: Ian Campbell <Ian.Campbell@xxxxxxxxx> > Date: 2013/7/30 > Subject: Re: [Xen-users] BUG : privcmd mmap 160+ pages > To: Guanglin Xu <mzguanglin@xxxxxxxxx> > ??? "xen-users@lists.xen.org" <xen-users@lists.xen.org> > > > On Tue, 2013-07-30 at 07:55 -0400, Guanglin Xu wrote: >> 2013/7/30 Ian Campbell <Ian.Campbell@xxxxxxxxx>: >>> On Mon, 2013-07-29 at 16:16 -0400, Guanglin Xu wrote: >>>> Hi all, >>>> >>>> I just find a xc_map_foreign_range() problem in Xen. >>>> >>>> xc_map_foreign_range(), an API of libxc backed by privcmd - a xen >>>> kernel module, can be used to mmap guest VM memory into dom0. However, >>>> if we mmap more than 160 pages, we''ll fail. >>>> >>>> Inside xc_map_foreign_range(), it uses ioctrl to communicate with >>>> privcmd. There are 2 ioctl commands, the one is >>>> IOCTL_PRIVCMD_MMAPBATCH (legacy), another one is >>>> IOCTL_PRIVCMD_MMAPBATCH_V2 (newer). Both of them constantly return 0 >>>> (success), but the mmapings are fail after mmaping 160 pages. >>>> >>>> Firstly, when my Linux kernel version was 3.5, IOCTL_PRIVCMD_MMAPBATCH >>>> was the only one ioctl command. rc = ioctl(fd, >>>> IOCTL_PRIVCMD_MMAPBATCH, &ioctlx). After mapping 160 pages, the >>>> subsequent invoking of ioctl would >>>> set ioctlx.arr[] items to 140737344202616, overwriting the original >>>> pfn number. >>>> >>>> And then, I updated my Linux kernel to 3.8 so as to test >>>> IOCTL_PRIVCMD_MMAPBATCH_V2. rc = ioctl(fd, >>>> IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx). This time, the post-160 invoking >>>> of ioctl would set ioctls.err[] items to EINVAL. >>>> >>>> Although I have inserted printk() in privcmd.c to track its execution >>>> flow, the result showed a look-like complete path, which is quite >>>> weird. I have no idea what happened in privcmd. >>>> >>>> Can anyone figure out this problem? >>> >>> I wouldn''t be all that surprised if there was a hardcoded batch size >>> limit somewhere in either libxc or the privcmd driver. >> >> Hi Ian, >> >> Thank you very much for your reply. >> >> I can confirm that it''s the problem of privcmd, because I have debug >> libxc and narrowed the problem to ioctl(), where privcmd would set >> ioctlx.arr[] ( IOCTL_PRIVCMD_MMAPBATCH) or ioctlx.err[] ( >> IOCTL_PRIVCMD_MMAPBATCH_V2) to indicate the limitation. >> >>> >>> If you map you memory in batch of e.g. 128 pages does it all work OK? >> >> Yes. For example, [0-127] succeeds. However, the subsequent [128-255] >> would fail because the size of the whole region has exceeded 160 >> pages. > > That''s quite interesting. > >>> >>> If you want to get to the bottom of the 160 page limit you''ll probably >>> have to trace through the code looking for hardcoded sizes or limits on >>> sizes (e.g. of arrays) or perhaps integer indexes etc which are too >>> small and are overflowing. >>> >>> 160 * the size of the various structs involved doesn''t look to be all >>> that interesting (i.e. just over a page size boundary or something), but >>> that''s the sort of direction I would start by looking in. >>> >>> If you can''t spot it by eye then you''ll likely have to instrument the >>> code paths with prints to try and track the progress of the initially >>> supplied buffer through to the hypercall etc. >> >> Yes. I have been debuging libxc and instrumenting privcmd, but I >> couldn''t find the "hardcode" or other limitation codes. >> >> Codes in libxc is easizer to trace, but in privcmd it is hard for me >> who is in lack of kernel dev experience. I couldn''t even find where >> privcmd copy_to_user() or put_user() the ioctlx.err[] or ioctlx.arr[] >> items while the execution path of privcmd_ioctl() seems quite complete >> by use of printk(). >> >> Do you have idea how privcmd can set ioctlx.err[] in another way? > > > I''m not familiar with this code. It sounds like this is most probably a > kernel bug, I''d recommend taking it to the xen-devel list, perhaps CC > Konrad Wilk and David Vrabel. > > Ian._______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Hi Andres, Thanks for your reply. However, I encounter some problems in patching your diff as well as compiling the resulted privcmd.c (after manually patching). Which kernel version are you using? Can you provide complete privcmd.c file? Thanks, Guanglin 2013/7/31 Andres Lagar-Cavilla <andreslc@gridcentric.ca>:> Hi Konrad, David and everybody else, > > I find a privcmd bug that no more than 160 pages can be mmap by > IOCTL_PRIVCMD_MMAPBATCH and IOCTL_PRIVCMD_MMAPBATCH_V2. > > I have started a thread (see below) at the xen-user list, and then > discussed with Mr. Ian Campbell, who finally recommended talking to > the xen-devel list and CC''ing you. > > I appreciate your comments. > > > You know, this might have been me. Or in the process of being fixed by a > patch I just posted. > > As you may know by now, xc_map_foreign* first sets up a region of virtual > memory with an mmap call, and then issues the mapping ioctl to populate the > region. > > Let''s say for the sake of argument that the mapping operation is broken up > into pieces. Current checks in the implementation of PRIVCMD_MMAPBATCH* > prevent you from populating the memory region piece-meal. > > So: can you try the patch I pasted below? > > If that doesn''t fix it, then you''ll need to give us the output of those > printks. > > Thanks > Andres > > [PATCH] Xen: Fix retry calls into PRIVCMD_MMAPBATCH*. > > When a foreign mapper attempts to map guest frames that are paged out, > the mapper receives an ENOENT response and will have to try again > while a helper process pages the target frame back in. > > Gating checks on PRIVCMD_MMAPBATCH* ioctl args were preventing retries > of mapping calls. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > --- > drivers/xen/privcmd.c | 32 +++++++++++++++++++++++++++----- > 1 files changed, 27 insertions(+), 5 deletions(-) > > diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c > index f8e5dd7..44a26c6 100644 > --- a/drivers/xen/privcmd.c > +++ b/drivers/xen/privcmd.c > @@ -43,9 +43,12 @@ MODULE_LICENSE("GPL"); > > #define PRIV_VMA_LOCKED ((void *)1) > > -#ifndef HAVE_ARCH_PRIVCMD_MMAP > static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma); > -#endif > + > +static int privcmd_enforce_singleshot_mapping_granular( > + struct vm_area_struct *vma, > + unsigned long addr, > + unsigned long nr_pages); > > static long privcmd_ioctl_hypercall(void __user *udata) > { > @@ -422,9 +425,9 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, > int version) > vma = find_vma(mm, m.addr); > if (!vma || > vma->vm_ops != &privcmd_vm_ops || > - (m.addr != vma->vm_start) || > - ((m.addr + (nr_pages << PAGE_SHIFT)) != vma->vm_end) || > - !privcmd_enforce_singleshot_mapping(vma)) { > + (m.addr < vma->vm_start) || > + ((m.addr + (nr_pages << PAGE_SHIFT)) > vma->vm_end) || > + !privcmd_enforce_singleshot_mapping_granular(vma, m.addr, nr_pages)) { > up_write(&mm->mmap_sem); > ret = -EINVAL; > goto out; > @@ -540,11 +543,30 @@ static int privcmd_mmap(struct file *file, struct > vm_area_struct *vma) > return 0; > } > > +/* For MMAP */ > static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma) > { > return !cmpxchg(&vma->vm_private_data, NULL, PRIV_VMA_LOCKED); > } > > +/* For MMAPBATCH*. This allows asserting the singleshot mapping > + * on a per pfn/pte basis. Mapping calls that fail with ENOENT > + * can be then retried until success. */ > +static int enforce_singleshot_mapping_fn(pte_t *pte, struct page *pmd_page, > + unsigned long addr, void *data) > +{ > + return pte_none(*pte) ? 0 : -EBUSY; > +} > + > +static int privcmd_enforce_singleshot_mapping_granular( > + struct vm_area_struct *vma, > + unsigned long addr, > + unsigned long nr_pages) > +{ > + return apply_to_page_range(vma->vm_mm, addr, nr_pages << PAGE_SHIFT, > + enforce_singleshot_mapping_fn, NULL) == 0; > +} > + > const struct file_operations xen_privcmd_fops = { > .owner = THIS_MODULE, > .unlocked_ioctl = privcmd_ioctl, > -- > 1.7.1 > > > Thanks, > > Guanglin > > > ---------- Forwarded message ---------- > From: Ian Campbell <Ian.Campbell@xxxxxxxxx> > Date: 2013/7/30 > Subject: Re: [Xen-users] BUG : privcmd mmap 160+ pages > To: Guanglin Xu <mzguanglin@xxxxxxxxx> > ??? "xen-users@lists.xen.org" <xen-users@lists.xen.org> > > > On Tue, 2013-07-30 at 07:55 -0400, Guanglin Xu wrote: > > 2013/7/30 Ian Campbell <Ian.Campbell@xxxxxxxxx>: > > On Mon, 2013-07-29 at 16:16 -0400, Guanglin Xu wrote: > > Hi all, > > I just find a xc_map_foreign_range() problem in Xen. > > xc_map_foreign_range(), an API of libxc backed by privcmd - a xen > kernel module, can be used to mmap guest VM memory into dom0. However, > if we mmap more than 160 pages, we''ll fail. > > Inside xc_map_foreign_range(), it uses ioctrl to communicate with > privcmd. There are 2 ioctl commands, the one is > IOCTL_PRIVCMD_MMAPBATCH (legacy), another one is > IOCTL_PRIVCMD_MMAPBATCH_V2 (newer). Both of them constantly return 0 > (success), but the mmapings are fail after mmaping 160 pages. > > Firstly, when my Linux kernel version was 3.5, IOCTL_PRIVCMD_MMAPBATCH > was the only one ioctl command. rc = ioctl(fd, > IOCTL_PRIVCMD_MMAPBATCH, &ioctlx). After mapping 160 pages, the > subsequent invoking of ioctl would > set ioctlx.arr[] items to 140737344202616, overwriting the original > pfn number. > > And then, I updated my Linux kernel to 3.8 so as to test > IOCTL_PRIVCMD_MMAPBATCH_V2. rc = ioctl(fd, > IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx). This time, the post-160 invoking > of ioctl would set ioctls.err[] items to EINVAL. > > Although I have inserted printk() in privcmd.c to track its execution > flow, the result showed a look-like complete path, which is quite > weird. I have no idea what happened in privcmd. > > Can anyone figure out this problem? > > > I wouldn''t be all that surprised if there was a hardcoded batch size > limit somewhere in either libxc or the privcmd driver. > > > Hi Ian, > > Thank you very much for your reply. > > I can confirm that it''s the problem of privcmd, because I have debug > libxc and narrowed the problem to ioctl(), where privcmd would set > ioctlx.arr[] ( IOCTL_PRIVCMD_MMAPBATCH) or ioctlx.err[] ( > IOCTL_PRIVCMD_MMAPBATCH_V2) to indicate the limitation. > > > If you map you memory in batch of e.g. 128 pages does it all work OK? > > > Yes. For example, [0-127] succeeds. However, the subsequent [128-255] > would fail because the size of the whole region has exceeded 160 > pages. > > > That''s quite interesting. > > > If you want to get to the bottom of the 160 page limit you''ll probably > have to trace through the code looking for hardcoded sizes or limits on > sizes (e.g. of arrays) or perhaps integer indexes etc which are too > small and are overflowing. > > 160 * the size of the various structs involved doesn''t look to be all > that interesting (i.e. just over a page size boundary or something), but > that''s the sort of direction I would start by looking in. > > If you can''t spot it by eye then you''ll likely have to instrument the > code paths with prints to try and track the progress of the initially > supplied buffer through to the hypercall etc. > > > Yes. I have been debuging libxc and instrumenting privcmd, but I > couldn''t find the "hardcode" or other limitation codes. > > Codes in libxc is easizer to trace, but in privcmd it is hard for me > who is in lack of kernel dev experience. I couldn''t even find where > privcmd copy_to_user() or put_user() the ioctlx.err[] or ioctlx.arr[] > items while the execution path of privcmd_ioctl() seems quite complete > by use of printk(). > > Do you have idea how privcmd can set ioctlx.err[] in another way? > > > > I''m not familiar with this code. It sounds like this is most probably a > kernel bug, I''d recommend taking it to the xen-devel list, perhaps CC > Konrad Wilk and David Vrabel. > > Ian. > >
On Jul 31, 2013, at 8:39 PM, Guanglin Xu <mzguanglin@gmail.com> wrote:> Hi Andres, > > Thanks for your reply. However, I encounter some problems in patching > your diff as well as compiling the resulted privcmd.c (after manually > patching). > > Which kernel version are you using? Can you provide complete privcmd.c file?I posted this against linux-next. Please be less vague when posting. For example, how did the final diff look like and what is your compile error. We can probably fix the compile easily. Posting the privcmd.c from linux-next would not really help. Thanks, Andres> > Thanks, > Guanglin > > 2013/7/31 Andres Lagar-Cavilla <andreslc@gridcentric.ca>: >> Hi Konrad, David and everybody else, >> >> I find a privcmd bug that no more than 160 pages can be mmap by >> IOCTL_PRIVCMD_MMAPBATCH and IOCTL_PRIVCMD_MMAPBATCH_V2. >> >> I have started a thread (see below) at the xen-user list, and then >> discussed with Mr. Ian Campbell, who finally recommended talking to >> the xen-devel list and CC''ing you. >> >> I appreciate your comments. >> >> >> You know, this might have been me. Or in the process of being fixed by a >> patch I just posted. >> >> As you may know by now, xc_map_foreign* first sets up a region of virtual >> memory with an mmap call, and then issues the mapping ioctl to populate the >> region. >> >> Let''s say for the sake of argument that the mapping operation is broken up >> into pieces. Current checks in the implementation of PRIVCMD_MMAPBATCH* >> prevent you from populating the memory region piece-meal. >> >> So: can you try the patch I pasted below? >> >> If that doesn''t fix it, then you''ll need to give us the output of those >> printks. >> >> Thanks >> Andres >> >> [PATCH] Xen: Fix retry calls into PRIVCMD_MMAPBATCH*. >> >> When a foreign mapper attempts to map guest frames that are paged out, >> the mapper receives an ENOENT response and will have to try again >> while a helper process pages the target frame back in. >> >> Gating checks on PRIVCMD_MMAPBATCH* ioctl args were preventing retries >> of mapping calls. >> >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >> --- >> drivers/xen/privcmd.c | 32 +++++++++++++++++++++++++++----- >> 1 files changed, 27 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c >> index f8e5dd7..44a26c6 100644 >> --- a/drivers/xen/privcmd.c >> +++ b/drivers/xen/privcmd.c >> @@ -43,9 +43,12 @@ MODULE_LICENSE("GPL"); >> >> #define PRIV_VMA_LOCKED ((void *)1) >> >> -#ifndef HAVE_ARCH_PRIVCMD_MMAP >> static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma); >> -#endif >> + >> +static int privcmd_enforce_singleshot_mapping_granular( >> + struct vm_area_struct *vma, >> + unsigned long addr, >> + unsigned long nr_pages); >> >> static long privcmd_ioctl_hypercall(void __user *udata) >> { >> @@ -422,9 +425,9 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, >> int version) >> vma = find_vma(mm, m.addr); >> if (!vma || >> vma->vm_ops != &privcmd_vm_ops || >> - (m.addr != vma->vm_start) || >> - ((m.addr + (nr_pages << PAGE_SHIFT)) != vma->vm_end) || >> - !privcmd_enforce_singleshot_mapping(vma)) { >> + (m.addr < vma->vm_start) || >> + ((m.addr + (nr_pages << PAGE_SHIFT)) > vma->vm_end) || >> + !privcmd_enforce_singleshot_mapping_granular(vma, m.addr, nr_pages)) { >> up_write(&mm->mmap_sem); >> ret = -EINVAL; >> goto out; >> @@ -540,11 +543,30 @@ static int privcmd_mmap(struct file *file, struct >> vm_area_struct *vma) >> return 0; >> } >> >> +/* For MMAP */ >> static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma) >> { >> return !cmpxchg(&vma->vm_private_data, NULL, PRIV_VMA_LOCKED); >> } >> >> +/* For MMAPBATCH*. This allows asserting the singleshot mapping >> + * on a per pfn/pte basis. Mapping calls that fail with ENOENT >> + * can be then retried until success. */ >> +static int enforce_singleshot_mapping_fn(pte_t *pte, struct page *pmd_page, >> + unsigned long addr, void *data) >> +{ >> + return pte_none(*pte) ? 0 : -EBUSY; >> +} >> + >> +static int privcmd_enforce_singleshot_mapping_granular( >> + struct vm_area_struct *vma, >> + unsigned long addr, >> + unsigned long nr_pages) >> +{ >> + return apply_to_page_range(vma->vm_mm, addr, nr_pages << PAGE_SHIFT, >> + enforce_singleshot_mapping_fn, NULL) == 0; >> +} >> + >> const struct file_operations xen_privcmd_fops = { >> .owner = THIS_MODULE, >> .unlocked_ioctl = privcmd_ioctl, >> -- >> 1.7.1 >> >> >> Thanks, >> >> Guanglin >> >> >> ---------- Forwarded message ---------- >> From: Ian Campbell <Ian.Campbell@xxxxxxxxx> >> Date: 2013/7/30 >> Subject: Re: [Xen-users] BUG : privcmd mmap 160+ pages >> To: Guanglin Xu <mzguanglin@xxxxxxxxx> >> ??? "xen-users@lists.xen.org" <xen-users@lists.xen.org> >> >> >> On Tue, 2013-07-30 at 07:55 -0400, Guanglin Xu wrote: >> >> 2013/7/30 Ian Campbell <Ian.Campbell@xxxxxxxxx>: >> >> On Mon, 2013-07-29 at 16:16 -0400, Guanglin Xu wrote: >> >> Hi all, >> >> I just find a xc_map_foreign_range() problem in Xen. >> >> xc_map_foreign_range(), an API of libxc backed by privcmd - a xen >> kernel module, can be used to mmap guest VM memory into dom0. However, >> if we mmap more than 160 pages, we''ll fail. >> >> Inside xc_map_foreign_range(), it uses ioctrl to communicate with >> privcmd. There are 2 ioctl commands, the one is >> IOCTL_PRIVCMD_MMAPBATCH (legacy), another one is >> IOCTL_PRIVCMD_MMAPBATCH_V2 (newer). Both of them constantly return 0 >> (success), but the mmapings are fail after mmaping 160 pages. >> >> Firstly, when my Linux kernel version was 3.5, IOCTL_PRIVCMD_MMAPBATCH >> was the only one ioctl command. rc = ioctl(fd, >> IOCTL_PRIVCMD_MMAPBATCH, &ioctlx). After mapping 160 pages, the >> subsequent invoking of ioctl would >> set ioctlx.arr[] items to 140737344202616, overwriting the original >> pfn number. >> >> And then, I updated my Linux kernel to 3.8 so as to test >> IOCTL_PRIVCMD_MMAPBATCH_V2. rc = ioctl(fd, >> IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx). This time, the post-160 invoking >> of ioctl would set ioctls.err[] items to EINVAL. >> >> Although I have inserted printk() in privcmd.c to track its execution >> flow, the result showed a look-like complete path, which is quite >> weird. I have no idea what happened in privcmd. >> >> Can anyone figure out this problem? >> >> >> I wouldn''t be all that surprised if there was a hardcoded batch size >> limit somewhere in either libxc or the privcmd driver. >> >> >> Hi Ian, >> >> Thank you very much for your reply. >> >> I can confirm that it''s the problem of privcmd, because I have debug >> libxc and narrowed the problem to ioctl(), where privcmd would set >> ioctlx.arr[] ( IOCTL_PRIVCMD_MMAPBATCH) or ioctlx.err[] ( >> IOCTL_PRIVCMD_MMAPBATCH_V2) to indicate the limitation. >> >> >> If you map you memory in batch of e.g. 128 pages does it all work OK? >> >> >> Yes. For example, [0-127] succeeds. However, the subsequent [128-255] >> would fail because the size of the whole region has exceeded 160 >> pages. >> >> >> That''s quite interesting. >> >> >> If you want to get to the bottom of the 160 page limit you''ll probably >> have to trace through the code looking for hardcoded sizes or limits on >> sizes (e.g. of arrays) or perhaps integer indexes etc which are too >> small and are overflowing. >> >> 160 * the size of the various structs involved doesn''t look to be all >> that interesting (i.e. just over a page size boundary or something), but >> that''s the sort of direction I would start by looking in. >> >> If you can''t spot it by eye then you''ll likely have to instrument the >> code paths with prints to try and track the progress of the initially >> supplied buffer through to the hypercall etc. >> >> >> Yes. I have been debuging libxc and instrumenting privcmd, but I >> couldn''t find the "hardcode" or other limitation codes. >> >> Codes in libxc is easizer to trace, but in privcmd it is hard for me >> who is in lack of kernel dev experience. I couldn''t even find where >> privcmd copy_to_user() or put_user() the ioctlx.err[] or ioctlx.arr[] >> items while the execution path of privcmd_ioctl() seems quite complete >> by use of printk(). >> >> Do you have idea how privcmd can set ioctlx.err[] in another way? >> >> >> >> I''m not familiar with this code. It sounds like this is most probably a >> kernel bug, I''d recommend taking it to the xen-devel list, perhaps CC >> Konrad Wilk and David Vrabel. >> >> Ian. >> >>
2013/7/31 Andres Lagar-Cavilla <andreslc@gridcentric.ca>:> On Jul 31, 2013, at 8:39 PM, Guanglin Xu <mzguanglin@gmail.com> wrote: > >> Hi Andres, >> >> Thanks for your reply. However, I encounter some problems in patching >> your diff as well as compiling the resulted privcmd.c (after manually >> patching). >> >> Which kernel version are you using? Can you provide complete privcmd.c file? > I posted this against linux-next.Hi Andres, I have successfully patched your codes into linux-next too, but it doesn''t solve the problem of mmap 160+ pages. My configuration: Linux-next: 3.110-rc3; Xen: 4.1 Do you think if I should try Xen newer than 4.1?> > Please be less vague when posting. For example, how did the final diff look like and what is your compile error. We can probably fix the compile easily.Apologize for confusing. It''s because of my lack of experience in building kernel. 1. the diff didn''t work just because some blanks were missing. For example, -#ifndef HAVE_ARCH_PRIVCMD_MMAP <!!a blank should be here!!>static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma); -#endif 2. compiling error is because I didn''t specify kernel headers for KDIR environment variable. Correct: KDIR := /usr/src/linux-headers-3.11.0-rc3-xen-next-20130801 Incorrect: KDIR := /root/linux-next Incorrect: KDIR := /lib/modules/$(shell uname -r)/build Thanks, Guanglin> > Posting the privcmd.c from linux-next would not really help. > > Thanks, > Andres > > >> >> Thanks, >> Guanglin >> >> 2013/7/31 Andres Lagar-Cavilla <andreslc@gridcentric.ca>: >>> Hi Konrad, David and everybody else, >>> >>> I find a privcmd bug that no more than 160 pages can be mmap by >>> IOCTL_PRIVCMD_MMAPBATCH and IOCTL_PRIVCMD_MMAPBATCH_V2. >>> >>> I have started a thread (see below) at the xen-user list, and then >>> discussed with Mr. Ian Campbell, who finally recommended talking to >>> the xen-devel list and CC''ing you. >>> >>> I appreciate your comments. >>> >>> >>> You know, this might have been me. Or in the process of being fixed by a >>> patch I just posted. >>> >>> As you may know by now, xc_map_foreign* first sets up a region of virtual >>> memory with an mmap call, and then issues the mapping ioctl to populate the >>> region. >>> >>> Let''s say for the sake of argument that the mapping operation is broken up >>> into pieces. Current checks in the implementation of PRIVCMD_MMAPBATCH* >>> prevent you from populating the memory region piece-meal. >>> >>> So: can you try the patch I pasted below? >>> >>> If that doesn''t fix it, then you''ll need to give us the output of those >>> printks. >>> >>> Thanks >>> Andres >>> >>> [PATCH] Xen: Fix retry calls into PRIVCMD_MMAPBATCH*. >>> >>> When a foreign mapper attempts to map guest frames that are paged out, >>> the mapper receives an ENOENT response and will have to try again >>> while a helper process pages the target frame back in. >>> >>> Gating checks on PRIVCMD_MMAPBATCH* ioctl args were preventing retries >>> of mapping calls. >>> >>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >>> --- >>> drivers/xen/privcmd.c | 32 +++++++++++++++++++++++++++----- >>> 1 files changed, 27 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c >>> index f8e5dd7..44a26c6 100644 >>> --- a/drivers/xen/privcmd.c >>> +++ b/drivers/xen/privcmd.c >>> @@ -43,9 +43,12 @@ MODULE_LICENSE("GPL"); >>> >>> #define PRIV_VMA_LOCKED ((void *)1) >>> >>> -#ifndef HAVE_ARCH_PRIVCMD_MMAP >>> static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma); >>> -#endif >>> + >>> +static int privcmd_enforce_singleshot_mapping_granular( >>> + struct vm_area_struct *vma, >>> + unsigned long addr, >>> + unsigned long nr_pages); >>> >>> static long privcmd_ioctl_hypercall(void __user *udata) >>> { >>> @@ -422,9 +425,9 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, >>> int version) >>> vma = find_vma(mm, m.addr); >>> if (!vma || >>> vma->vm_ops != &privcmd_vm_ops || >>> - (m.addr != vma->vm_start) || >>> - ((m.addr + (nr_pages << PAGE_SHIFT)) != vma->vm_end) || >>> - !privcmd_enforce_singleshot_mapping(vma)) { >>> + (m.addr < vma->vm_start) || >>> + ((m.addr + (nr_pages << PAGE_SHIFT)) > vma->vm_end) || >>> + !privcmd_enforce_singleshot_mapping_granular(vma, m.addr, nr_pages)) { >>> up_write(&mm->mmap_sem); >>> ret = -EINVAL; >>> goto out; >>> @@ -540,11 +543,30 @@ static int privcmd_mmap(struct file *file, struct >>> vm_area_struct *vma) >>> return 0; >>> } >>> >>> +/* For MMAP */ >>> static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma) >>> { >>> return !cmpxchg(&vma->vm_private_data, NULL, PRIV_VMA_LOCKED); >>> } >>> >>> +/* For MMAPBATCH*. This allows asserting the singleshot mapping >>> + * on a per pfn/pte basis. Mapping calls that fail with ENOENT >>> + * can be then retried until success. */ >>> +static int enforce_singleshot_mapping_fn(pte_t *pte, struct page *pmd_page, >>> + unsigned long addr, void *data) >>> +{ >>> + return pte_none(*pte) ? 0 : -EBUSY; >>> +} >>> + >>> +static int privcmd_enforce_singleshot_mapping_granular( >>> + struct vm_area_struct *vma, >>> + unsigned long addr, >>> + unsigned long nr_pages) >>> +{ >>> + return apply_to_page_range(vma->vm_mm, addr, nr_pages << PAGE_SHIFT, >>> + enforce_singleshot_mapping_fn, NULL) == 0; >>> +} >>> + >>> const struct file_operations xen_privcmd_fops = { >>> .owner = THIS_MODULE, >>> .unlocked_ioctl = privcmd_ioctl, >>> -- >>> 1.7.1 >>> >>> >>> Thanks, >>> >>> Guanglin >>> >>> >>> ---------- Forwarded message ---------- >>> From: Ian Campbell <Ian.Campbell@xxxxxxxxx> >>> Date: 2013/7/30 >>> Subject: Re: [Xen-users] BUG : privcmd mmap 160+ pages >>> To: Guanglin Xu <mzguanglin@xxxxxxxxx> >>> ??? "xen-users@lists.xen.org" <xen-users@lists.xen.org> >>> >>> >>> On Tue, 2013-07-30 at 07:55 -0400, Guanglin Xu wrote: >>> >>> 2013/7/30 Ian Campbell <Ian.Campbell@xxxxxxxxx>: >>> >>> On Mon, 2013-07-29 at 16:16 -0400, Guanglin Xu wrote: >>> >>> Hi all, >>> >>> I just find a xc_map_foreign_range() problem in Xen. >>> >>> xc_map_foreign_range(), an API of libxc backed by privcmd - a xen >>> kernel module, can be used to mmap guest VM memory into dom0. However, >>> if we mmap more than 160 pages, we''ll fail. >>> >>> Inside xc_map_foreign_range(), it uses ioctrl to communicate with >>> privcmd. There are 2 ioctl commands, the one is >>> IOCTL_PRIVCMD_MMAPBATCH (legacy), another one is >>> IOCTL_PRIVCMD_MMAPBATCH_V2 (newer). Both of them constantly return 0 >>> (success), but the mmapings are fail after mmaping 160 pages. >>> >>> Firstly, when my Linux kernel version was 3.5, IOCTL_PRIVCMD_MMAPBATCH >>> was the only one ioctl command. rc = ioctl(fd, >>> IOCTL_PRIVCMD_MMAPBATCH, &ioctlx). After mapping 160 pages, the >>> subsequent invoking of ioctl would >>> set ioctlx.arr[] items to 140737344202616, overwriting the original >>> pfn number. >>> >>> And then, I updated my Linux kernel to 3.8 so as to test >>> IOCTL_PRIVCMD_MMAPBATCH_V2. rc = ioctl(fd, >>> IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx). This time, the post-160 invoking >>> of ioctl would set ioctls.err[] items to EINVAL. >>> >>> Although I have inserted printk() in privcmd.c to track its execution >>> flow, the result showed a look-like complete path, which is quite >>> weird. I have no idea what happened in privcmd. >>> >>> Can anyone figure out this problem? >>> >>> >>> I wouldn''t be all that surprised if there was a hardcoded batch size >>> limit somewhere in either libxc or the privcmd driver. >>> >>> >>> Hi Ian, >>> >>> Thank you very much for your reply. >>> >>> I can confirm that it''s the problem of privcmd, because I have debug >>> libxc and narrowed the problem to ioctl(), where privcmd would set >>> ioctlx.arr[] ( IOCTL_PRIVCMD_MMAPBATCH) or ioctlx.err[] ( >>> IOCTL_PRIVCMD_MMAPBATCH_V2) to indicate the limitation. >>> >>> >>> If you map you memory in batch of e.g. 128 pages does it all work OK? >>> >>> >>> Yes. For example, [0-127] succeeds. However, the subsequent [128-255] >>> would fail because the size of the whole region has exceeded 160 >>> pages. >>> >>> >>> That''s quite interesting. >>> >>> >>> If you want to get to the bottom of the 160 page limit you''ll probably >>> have to trace through the code looking for hardcoded sizes or limits on >>> sizes (e.g. of arrays) or perhaps integer indexes etc which are too >>> small and are overflowing. >>> >>> 160 * the size of the various structs involved doesn''t look to be all >>> that interesting (i.e. just over a page size boundary or something), but >>> that''s the sort of direction I would start by looking in. >>> >>> If you can''t spot it by eye then you''ll likely have to instrument the >>> code paths with prints to try and track the progress of the initially >>> supplied buffer through to the hypercall etc. >>> >>> >>> Yes. I have been debuging libxc and instrumenting privcmd, but I >>> couldn''t find the "hardcode" or other limitation codes. >>> >>> Codes in libxc is easizer to trace, but in privcmd it is hard for me >>> who is in lack of kernel dev experience. I couldn''t even find where >>> privcmd copy_to_user() or put_user() the ioctlx.err[] or ioctlx.arr[] >>> items while the execution path of privcmd_ioctl() seems quite complete >>> by use of printk(). >>> >>> Do you have idea how privcmd can set ioctlx.err[] in another way? >>> >>> >>> >>> I''m not familiar with this code. It sounds like this is most probably a >>> kernel bug, I''d recommend taking it to the xen-devel list, perhaps CC >>> Konrad Wilk and David Vrabel. >>> >>> Ian. >>> >>> >
2013/8/5 Guanglin Xu <mzguanglin@gmail.com>:> 2013/7/31 Andres Lagar-Cavilla <andreslc@gridcentric.ca>: >> On Jul 31, 2013, at 8:39 PM, Guanglin Xu <mzguanglin@gmail.com> wrote: >> >>> Hi Andres, >>> >>> Thanks for your reply. However, I encounter some problems in patching >>> your diff as well as compiling the resulted privcmd.c (after manually >>> patching). >>> >>> Which kernel version are you using? Can you provide complete privcmd.c file? >> I posted this against linux-next. > > Hi Andres, > > I have successfully patched your codes into linux-next too, but it > doesn''t solve the problem of mmap 160+ pages. > > My configuration: Linux-next: 3.110-rc3; Xen: 4.1 > > Do you think if I should try Xen newer than 4.1? > >> >> Please be less vague when posting. For example, how did the final diff look like and what is your compile error. We can probably fix the compile easily. > > Apologize for confusing. It''s because of my lack of experience in > building kernel. > > 1. the diff didn''t work just because some blanks were missing. For example, > > -#ifndef HAVE_ARCH_PRIVCMD_MMAP > <!!a blank should be here!!>static int > privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma); > -#endif > > 2. compiling error is because I didn''t specify kernel headers for KDIR > environment variable. > > Correct: KDIR := /usr/src/linux-headers-3.11.0-rc3-xen-next-20130801 > Incorrect: KDIR := /root/linux-next > Incorrect: KDIR := /lib/modules/$(shell uname -r)/build > > Thanks, > > Guanglin > >> >> Posting the privcmd.c from linux-next would not really help. >> >> Thanks, >> Andres >> >> >>> >>> Thanks, >>> Guanglin >>> >>> 2013/7/31 Andres Lagar-Cavilla <andreslc@gridcentric.ca>: >>>> Hi Konrad, David and everybody else, >>>> >>>> I find a privcmd bug that no more than 160 pages can be mmap by >>>> IOCTL_PRIVCMD_MMAPBATCH and IOCTL_PRIVCMD_MMAPBATCH_V2. >>>> >>>> I have started a thread (see below) at the xen-user list, and then >>>> discussed with Mr. Ian Campbell, who finally recommended talking to >>>> the xen-devel list and CC''ing you. >>>> >>>> I appreciate your comments. >>>> >>>> >>>> You know, this might have been me. Or in the process of being fixed by a >>>> patch I just posted. >>>> >>>> As you may know by now, xc_map_foreign* first sets up a region of virtual >>>> memory with an mmap call, and then issues the mapping ioctl to populate the >>>> region. >>>> >>>> Let''s say for the sake of argument that the mapping operation is broken up >>>> into pieces. Current checks in the implementation of PRIVCMD_MMAPBATCH* >>>> prevent you from populating the memory region piece-meal. >>>> >>>> So: can you try the patch I pasted below? >>>> >>>> If that doesn''t fix it, then you''ll need to give us the output of those >>>> printks. >>>> >>>> Thanks >>>> Andres >>>> >>>> [PATCH] Xen: Fix retry calls into PRIVCMD_MMAPBATCH*. >>>> >>>> When a foreign mapper attempts to map guest frames that are paged out, >>>> the mapper receives an ENOENT response and will have to try again >>>> while a helper process pages the target frame back in. >>>> >>>> Gating checks on PRIVCMD_MMAPBATCH* ioctl args were preventing retries >>>> of mapping calls. >>>> >>>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >>>> --- >>>> drivers/xen/privcmd.c | 32 +++++++++++++++++++++++++++----- >>>> 1 files changed, 27 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c >>>> index f8e5dd7..44a26c6 100644 >>>> --- a/drivers/xen/privcmd.c >>>> +++ b/drivers/xen/privcmd.c >>>> @@ -43,9 +43,12 @@ MODULE_LICENSE("GPL"); >>>> >>>> #define PRIV_VMA_LOCKED ((void *)1) >>>> >>>> -#ifndef HAVE_ARCH_PRIVCMD_MMAP >>>> static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma); >>>> -#endif >>>> + >>>> +static int privcmd_enforce_singleshot_mapping_granular( >>>> + struct vm_area_struct *vma, >>>> + unsigned long addr, >>>> + unsigned long nr_pages); >>>> >>>> static long privcmd_ioctl_hypercall(void __user *udata) >>>> { >>>> @@ -422,9 +425,9 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, >>>> int version) >>>> vma = find_vma(mm, m.addr); >>>> if (!vma || >>>> vma->vm_ops != &privcmd_vm_ops || >>>> - (m.addr != vma->vm_start) || >>>> - ((m.addr + (nr_pages << PAGE_SHIFT)) != vma->vm_end) || >>>> - !privcmd_enforce_singleshot_mapping(vma)) { >>>> + (m.addr < vma->vm_start) || >>>> + ((m.addr + (nr_pages << PAGE_SHIFT)) > vma->vm_end) || >>>> + !privcmd_enforce_singleshot_mapping_granular(vma, m.addr, nr_pages)) { >>>> up_write(&mm->mmap_sem); >>>> ret = -EINVAL; >>>> goto out; >>>> @@ -540,11 +543,30 @@ static int privcmd_mmap(struct file *file, struct >>>> vm_area_struct *vma) >>>> return 0; >>>> } >>>> >>>> +/* For MMAP */ >>>> static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma) >>>> { >>>> return !cmpxchg(&vma->vm_private_data, NULL, PRIV_VMA_LOCKED); >>>> } >>>> >>>> +/* For MMAPBATCH*. This allows asserting the singleshot mapping >>>> + * on a per pfn/pte basis. Mapping calls that fail with ENOENT >>>> + * can be then retried until success. */ >>>> +static int enforce_singleshot_mapping_fn(pte_t *pte, struct page *pmd_page, >>>> + unsigned long addr, void *data) >>>> +{ >>>> + return pte_none(*pte) ? 0 : -EBUSY; >>>> +} >>>> + >>>> +static int privcmd_enforce_singleshot_mapping_granular( >>>> + struct vm_area_struct *vma, >>>> + unsigned long addr, >>>> + unsigned long nr_pages) >>>> +{ >>>> + return apply_to_page_range(vma->vm_mm, addr, nr_pages << PAGE_SHIFT, >>>> + enforce_singleshot_mapping_fn, NULL) == 0; >>>> +} >>>> + >>>> const struct file_operations xen_privcmd_fops = { >>>> .owner = THIS_MODULE, >>>> .unlocked_ioctl = privcmd_ioctl, >>>> -- >>>> 1.7.1 >>>> >>>> >>>> Thanks, >>>> >>>> Guanglin >>>> >>>> >>>> ---------- Forwarded message ---------- >>>> From: Ian Campbell <Ian.Campbell@xxxxxxxxx> >>>> Date: 2013/7/30 >>>> Subject: Re: [Xen-users] BUG : privcmd mmap 160+ pages >>>> To: Guanglin Xu <mzguanglin@xxxxxxxxx> >>>> ??? "xen-users@lists.xen.org" <xen-users@lists.xen.org> >>>> >>>> >>>> On Tue, 2013-07-30 at 07:55 -0400, Guanglin Xu wrote: >>>> >>>> 2013/7/30 Ian Campbell <Ian.Campbell@xxxxxxxxx>: >>>> >>>> On Mon, 2013-07-29 at 16:16 -0400, Guanglin Xu wrote: >>>> >>>> Hi all, >>>> >>>> I just find a xc_map_foreign_range() problem in Xen. >>>> >>>> xc_map_foreign_range(), an API of libxc backed by privcmd - a xen >>>> kernel module, can be used to mmap guest VM memory into dom0. However, >>>> if we mmap more than 160 pages, we''ll fail. >>>> >>>> Inside xc_map_foreign_range(), it uses ioctrl to communicate with >>>> privcmd. There are 2 ioctl commands, the one is >>>> IOCTL_PRIVCMD_MMAPBATCH (legacy), another one is >>>> IOCTL_PRIVCMD_MMAPBATCH_V2 (newer). Both of them constantly return 0 >>>> (success), but the mmapings are fail after mmaping 160 pages. >>>> >>>> Firstly, when my Linux kernel version was 3.5, IOCTL_PRIVCMD_MMAPBATCH >>>> was the only one ioctl command. rc = ioctl(fd, >>>> IOCTL_PRIVCMD_MMAPBATCH, &ioctlx). After mapping 160 pages, the >>>> subsequent invoking of ioctl would >>>> set ioctlx.arr[] items to 140737344202616, overwriting the original >>>> pfn number. >>>> >>>> And then, I updated my Linux kernel to 3.8 so as to test >>>> IOCTL_PRIVCMD_MMAPBATCH_V2. rc = ioctl(fd, >>>> IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx). This time, the post-160 invoking >>>> of ioctl would set ioctls.err[] items to EINVAL. >>>> >>>> Although I have inserted printk() in privcmd.c to track its execution >>>> flow, the result showed a look-like complete path, which is quite >>>> weird. I have no idea what happened in privcmd. >>>> >>>> Can anyone figure out this problem? >>>> >>>> >>>> I wouldn''t be all that surprised if there was a hardcoded batch size >>>> limit somewhere in either libxc or the privcmd driver. >>>> >>>> >>>> Hi Ian, >>>> >>>> Thank you very much for your reply. >>>> >>>> I can confirm that it''s the problem of privcmd, because I have debug >>>> libxc and narrowed the problem to ioctl(), where privcmd would set >>>> ioctlx.arr[] ( IOCTL_PRIVCMD_MMAPBATCH) or ioctlx.err[] ( >>>> IOCTL_PRIVCMD_MMAPBATCH_V2) to indicate the limitation. >>>> >>>> >>>> If you map you memory in batch of e.g. 128 pages does it all work OK? >>>> >>>> >>>> Yes. For example, [0-127] succeeds. However, the subsequent [128-255] >>>> would fail because the size of the whole region has exceeded 160 >>>> pages. >>>> >>>> >>>> That''s quite interesting. >>>> >>>> >>>> If you want to get to the bottom of the 160 page limit you''ll probably >>>> have to trace through the code looking for hardcoded sizes or limits on >>>> sizes (e.g. of arrays) or perhaps integer indexes etc which are too >>>> small and are overflowing. >>>> >>>> 160 * the size of the various structs involved doesn''t look to be all >>>> that interesting (i.e. just over a page size boundary or something), but >>>> that''s the sort of direction I would start by looking in. >>>> >>>> If you can''t spot it by eye then you''ll likely have to instrument the >>>> code paths with prints to try and track the progress of the initially >>>> supplied buffer through to the hypercall etc. >>>> >>>> >>>> Yes. I have been debuging libxc and instrumenting privcmd, but I >>>> couldn''t find the "hardcode" or other limitation codes. >>>> >>>> Codes in libxc is easizer to trace, but in privcmd it is hard for me >>>> who is in lack of kernel dev experience. I couldn''t even find where >>>> privcmd copy_to_user() or put_user() the ioctlx.err[] or ioctlx.arr[] >>>> items while the execution path of privcmd_ioctl() seems quite complete >>>> by use of printk(). >>>> >>>> Do you have idea how privcmd can set ioctlx.err[] in another way? >>>> >>>> >>>> >>>> I''m not familiar with this code. It sounds like this is most probably a >>>> kernel bug, I''d recommend taking it to the xen-devel list, perhaps CC >>>> Konrad Wilk and David Vrabel. >>>> >>>> Ian. >>>> >>>> >>Dear Andres and all, After a few days exploration, I realize that the problem I found is actually about "memory hole". I must clarify that privcmd can definitely map more than 160 pages, but it just can''t map some memory holes. For example, in a 1 GB guest memory settings where 262144 pages (4096 bytes per page) should be available, privcmd denies to map pages 160 ~ 191 and 260096 ~ 262143. According to www.intel.la/content/dam/www/public/us/en/documents/datasheets/4th-gen-core-family-desktop-vol-2-datasheet.pdf , pages 160 ~ 191 are legacy video buffer; and in my settings, pages 260096 ~ 262143 are 8MB preserved memory for VGA. However, this result is not satisfying at all. I would be happy if you can response to the two further topics: 1. privcmd adds dummy paddings to the above memory holes rather than denying mapping. http://lists.xen.org/archives/html/xen-devel/2013-08/msg01052.html 2.fix Xen that reports wrong memory size of domU. http://lists.xen.org/archives/html/xen-users/2013-08/msg00098.html Thanks, Guanglin