Hi, All, This is the test report of xen-unstable tree on some Intel platforms. We found 1 new bug about live migration. Version Info: ================================================================xen changeset: 26323:64b36dde26bc Dom0: Linux 3.7.1 ================================================================ New issue(1) =============1. sometimes live migration failed and reported call trace in dom0 http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1841 Fixed issue(0) ============= Old issues(11) =============1. [ACPI] Dom0 can''t resume from S3 sleep http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1707 2. [XL]"xl vcpu-set" causes dom0 crash or panic http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1730 -- Dom0 will not panic with latest linux.git tree 3.8-rc.. -- We''ll have more detailed update on that bugzilla. 3. [VT-D] VNC console broken after detach NIC from guest http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1736 4. Sometimes Xen panic on ia32pae Sandybridge when restore guest http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1747 5. ''xl vcpu-set'' can''t decrease the vCPU number of a HVM guest http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1822 6. Dom0 cannot be shutdown before PCI device detachment from guest http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1826 7. xl pci-list shows one PCI device (PF or VF) could be assigned to two different guests http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1834 8. [upstream qemu] Guest free memory with upstream qemu is 14MB lower than that with qemu-xen-unstable.git http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1836 9. [upstream qemu]''maxvcpus=NUM'' item is not supported in upstream QEMU http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1837 10. [upstream qemu] Guest console hangs after save/restore or live-migration when setting ''hpet=0'' in guest config file http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1838 11. [upstream qemu] ''xen_platform_pci=0'' setting cannot make the guest use emulated PCI devices by default http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1839 Best Regards, Yongjie (Jay)
>>> On 10.01.13 at 08:51, "Ren, Yongjie" <yongjie.ren@intel.com> wrote: > New issue(1) > =============> 1. sometimes live migration failed and reported call trace in dom0 > http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1841For the failed allocation, the only obvious candidate appears to be err_array = kcalloc(m.num, sizeof(int), GFP_KERNEL); which quite obviously can be of (almost) arbitrary size because nr_pages = m.num; if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT))) return -EINVAL; really only checks for completely insane values. This got introduced by Andres'' "xen/privcmd: add PRIVCMD_MMAPBATCH_V2 ioctl" and is becoming worse with Mukesh''s recent "xen: privcmd: support autotranslated physmap guests", which added another similar (twice as large) allocation in alloc_empty_pages(). I''d like to note that the forward ported kernels don''t appear to have a similar issue, as they never allocates more than a page at a time. Was that code consulted at all when that addition was done? Jan
On Jan 10, 2013, at 3:57 AM, "Jan Beulich" <JBeulich@suse.com> wrote:>>>> On 10.01.13 at 08:51, "Ren, Yongjie" <yongjie.ren@intel.com> wrote: >> New issue(1) >> =============>> 1. sometimes live migration failed and reported call trace in dom0 >> http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1841 > > For the failed allocation, the only obvious candidate appears to be > > err_array = kcalloc(m.num, sizeof(int), GFP_KERNEL); > > which quite obviously can be of (almost) arbitrary size because > > nr_pages = m.num; > if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT))) > return -EINVAL; > > really only checks for completely insane values. > > This got introduced by Andres'' "xen/privcmd: add PRIVCMD_MMAPBATCH_V2 > ioctl" and is becoming worse with Mukesh''s recent "xen: privcmd: > support autotranslated physmap guests", which added another > similar (twice as large) allocation in alloc_empty_pages().Perhaps the err_array in this case, since alloc_empty_pages only happens for auto translated dom0s. Not familiar wether libxl changes (or is even capable of changing) parameters of the migration code. But right now in libxc, mapping is done in MAX_BATCH_SIZE batches, which are of size 1024. So we are talking about 1024 ints, which is *one* page. So is really the kernel incapable of allocating one measly page? This leads me to think that it might be gather_array, but that one would allocate a grand total of two pages. In any case, both functions allocate arbitrary number of pages, and that is the fundamental problem. What is the approach in the forward ported kernel wrt to gather_array? The cleanest alternative I can think of is to refactor the the body of mmap_batch to allocate one page for each array, and iteratively call traverse_pages recycling the local arrays and increasing the pointers in the source user space arrays. Having said that, that would allocate two pages (always), and the code right now allocates max three (for libxc driven migrations). So maybe the problem is elsewhere…. Thanks, Andres> > I''d like to note that the forward ported kernels don''t appear to > have a similar issue, as they never allocates more than a page at > a time. Was that code consulted at all when that addition was > done? > > Jan >
On 10/01/13 17:10, Andres Lagar-Cavilla wrote:> On Jan 10, 2013, at 3:57 AM, "Jan Beulich" <JBeulich@suse.com> wrote: > >>>>> On 10.01.13 at 08:51, "Ren, Yongjie" <yongjie.ren@intel.com> wrote: >>> New issue(1) >>> =============>>> 1. sometimes live migration failed and reported call trace in dom0 >>> http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1841 >> For the failed allocation, the only obvious candidate appears to be >> >> err_array = kcalloc(m.num, sizeof(int), GFP_KERNEL); >> >> which quite obviously can be of (almost) arbitrary size because >> >> nr_pages = m.num; >> if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT))) >> return -EINVAL; >> >> really only checks for completely insane values. >> >> This got introduced by Andres'' "xen/privcmd: add PRIVCMD_MMAPBATCH_V2 >> ioctl" and is becoming worse with Mukesh''s recent "xen: privcmd: >> support autotranslated physmap guests", which added another >> similar (twice as large) allocation in alloc_empty_pages(). > Perhaps the err_array in this case, since alloc_empty_pages only happens for auto translated dom0s. > > Not familiar wether libxl changes (or is even capable of changing) parameters of the migration code. But right now in libxc, mapping is done in MAX_BATCH_SIZE batches, which are of size 1024. So we are talking about 1024 ints, which is *one* page. > > So is really the kernel incapable of allocating one measly page? > > This leads me to think that it might be gather_array, but that one would allocate a grand total of two pages. > > In any case, both functions allocate arbitrary number of pages, and that is the fundamental problem. > > What is the approach in the forward ported kernel wrt to gather_array? > > The cleanest alternative I can think of is to refactor the the body of mmap_batch to allocate one page for each array, and iteratively call traverse_pages recycling the local arrays and increasing the pointers in the source user space arrays. > > Having said that, that would allocate two pages (always), and the code right now allocates max three (for libxc driven migrations). So maybe the problem is elsewhere…. > > Thanks, > AndresWhilst this may not add much to the discussion, where I have been working on the improved privcmd.c, I have been using 3.7.0rc5 and 3.8.0. Both of these seem to work fine for migration using the libxc interface (since I''ve been using the Xenserver build, the migration is not done through libxl). I have not had a single failure to allocate pages in the migration, - I have a script that loops around migrating the guest to the same host as quickly as it can and I have used guests up to 64GB (and left that running overnight - it takes about 3 minutes, so a night gives several hundred iterations. So I''m wondering what is different between my setup and this one... -- Mats> >> I''d like to note that the forward ported kernels don''t appear to >> have a similar issue, as they never allocates more than a page at >> a time. Was that code consulted at all when that addition was >> done? >> >> Jan >> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > >
On 10/01/13 08:57, Jan Beulich wrote:>>>> On 10.01.13 at 08:51, "Ren, Yongjie" <yongjie.ren@intel.com> wrote: >> New issue(1) >> =============>> 1. sometimes live migration failed and reported call trace in dom0 >> http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1841 > > For the failed allocation, the only obvious candidate appears to be > > err_array = kcalloc(m.num, sizeof(int), GFP_KERNEL); > > which quite obviously can be of (almost) arbitrary size because > > nr_pages = m.num; > if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT))) > return -EINVAL; > > really only checks for completely insane values. > > This got introduced by Andres'' "xen/privcmd: add PRIVCMD_MMAPBATCH_V2 > ioctl" and is becoming worse with Mukesh''s recent "xen: privcmd: > support autotranslated physmap guests", which added another > similar (twice as large) allocation in alloc_empty_pages(). > > I''d like to note that the forward ported kernels don''t appear to > have a similar issue, as they never allocates more than a page at > a time. Was that code consulted at all when that addition was > done?I did highlight this at the time[1]. See [2] for how I avoided this allocation. David [1] http://lists.xen.org/archives/html/xen-devel/2012-08/msg02208.html [2] http://lists.xen.org/archives/html/xen-devel/2012-08/msg02092.html
>>> On 10.01.13 at 18:10, Andres Lagar-Cavilla <andreslc@gridcentric.ca> wrote: > On Jan 10, 2013, at 3:57 AM, "Jan Beulich" <JBeulich@suse.com> wrote: > >>>>> On 10.01.13 at 08:51, "Ren, Yongjie" <yongjie.ren@intel.com> wrote: >>> New issue(1) >>> =============>>> 1. sometimes live migration failed and reported call trace in dom0 >>> http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1841 >> >> For the failed allocation, the only obvious candidate appears to be >> >> err_array = kcalloc(m.num, sizeof(int), GFP_KERNEL); >> >> which quite obviously can be of (almost) arbitrary size because >> >> nr_pages = m.num; >> if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT))) >> return -EINVAL; >> >> really only checks for completely insane values. >> >> This got introduced by Andres'' "xen/privcmd: add PRIVCMD_MMAPBATCH_V2 >> ioctl" and is becoming worse with Mukesh''s recent "xen: privcmd: >> support autotranslated physmap guests", which added another >> similar (twice as large) allocation in alloc_empty_pages(). > > Perhaps the err_array in this case, since alloc_empty_pages only happens for > auto translated dom0s. > > Not familiar wether libxl changes (or is even capable of changing) > parameters of the migration code. But right now in libxc, mapping is done in > MAX_BATCH_SIZE batches, which are of size 1024. So we are talking about 1024 > ints, which is *one* page. > > So is really the kernel incapable of allocating one measly page? > > This leads me to think that it might be gather_array, but that one would > allocate a grand total of two pages.The log indicated a failed order-4 allocation though. So maybe not that allocation after all (be the basically unbounded size here is a problem in any case).> In any case, both functions allocate arbitrary number of pages, and that is > the fundamental problem. > > What is the approach in the forward ported kernel wrt to gather_array?There''s no gather_array there. It simply sets up things one page at a time. (And you can always go and check linux-2.6.18-xen.hg). Jan
On Jan 10, 2013, at 2:23 PM, David Vrabel <david.vrabel@citrix.com> wrote:> On 10/01/13 08:57, Jan Beulich wrote: >>>>> On 10.01.13 at 08:51, "Ren, Yongjie" <yongjie.ren@intel.com> wrote: >>> New issue(1) >>> =============>>> 1. sometimes live migration failed and reported call trace in dom0 >>> http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1841 >> >> For the failed allocation, the only obvious candidate appears to be >> >> err_array = kcalloc(m.num, sizeof(int), GFP_KERNEL); >> >> which quite obviously can be of (almost) arbitrary size because >> >> nr_pages = m.num; >> if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT))) >> return -EINVAL; >> >> really only checks for completely insane values. >> >> This got introduced by Andres'' "xen/privcmd: add PRIVCMD_MMAPBATCH_V2 >> ioctl" and is becoming worse with Mukesh''s recent "xen: privcmd: >> support autotranslated physmap guests", which added another >> similar (twice as large) allocation in alloc_empty_pages(). >> >> I''d like to note that the forward ported kernels don''t appear to >> have a similar issue, as they never allocates more than a page at >> a time. Was that code consulted at all when that addition was >> done? > > I did highlight this at the time[1]. > > See [2] for how I avoided this allocation.Well you should pushed more forcefully back then! I remember convincing you otherwise ;) In any case see follow-up to Jan. Andres> > David > > [1] http://lists.xen.org/archives/html/xen-devel/2012-08/msg02208.html > [2] http://lists.xen.org/archives/html/xen-devel/2012-08/msg02092.html
On Jan 11, 2013, at 3:34 AM, "Jan Beulich" <JBeulich@suse.com> wrote:>>>> On 10.01.13 at 18:10, Andres Lagar-Cavilla <andreslc@gridcentric.ca> wrote: >> On Jan 10, 2013, at 3:57 AM, "Jan Beulich" <JBeulich@suse.com> wrote: >> >>>>>> On 10.01.13 at 08:51, "Ren, Yongjie" <yongjie.ren@intel.com> wrote: >>>> New issue(1) >>>> =============>>>> 1. sometimes live migration failed and reported call trace in dom0 >>>> http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1841 >>> >>> For the failed allocation, the only obvious candidate appears to be >>> >>> err_array = kcalloc(m.num, sizeof(int), GFP_KERNEL); >>> >>> which quite obviously can be of (almost) arbitrary size because >>> >>> nr_pages = m.num; >>> if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT))) >>> return -EINVAL; >>> >>> really only checks for completely insane values. >>> >>> This got introduced by Andres'' "xen/privcmd: add PRIVCMD_MMAPBATCH_V2 >>> ioctl" and is becoming worse with Mukesh''s recent "xen: privcmd: >>> support autotranslated physmap guests", which added another >>> similar (twice as large) allocation in alloc_empty_pages(). >> >> Perhaps the err_array in this case, since alloc_empty_pages only happens for >> auto translated dom0s. >> >> Not familiar wether libxl changes (or is even capable of changing) >> parameters of the migration code. But right now in libxc, mapping is done in >> MAX_BATCH_SIZE batches, which are of size 1024. So we are talking about 1024 >> ints, which is *one* page. >> >> So is really the kernel incapable of allocating one measly page? >> >> This leads me to think that it might be gather_array, but that one would >> allocate a grand total of two pages. > > The log indicated a failed order-4 allocation though. So maybe not > that allocation after all (be the basically unbounded size here is a > problem in any case).Unless somehow the batch size got changed to 16384, this most definitely is not err_array. Would be good to ascertain that. I do see how gather array avoids allocations larger than O(0). Let me look into cooking a similar solution for err_array. Andres> >> In any case, both functions allocate arbitrary number of pages, and that is >> the fundamental problem. >> >> What is the approach in the forward ported kernel wrt to gather_array? > > There''s no gather_array there. It simply sets up things one page > at a time. (And you can always go and check linux-2.6.18-xen.hg). > > Jan >
On Jan 11, 2013, at 3:34 AM, Jan Beulich <jbeulich@suse.com> wrote:>>>> On 10.01.13 at 18:10, Andres Lagar-Cavilla <andreslc@gridcentric.ca> wrote: >> On Jan 10, 2013, at 3:57 AM, "Jan Beulich" <JBeulich@suse.com> wrote: >> >>>>>> On 10.01.13 at 08:51, "Ren, Yongjie" <yongjie.ren@intel.com> wrote: >>>> New issue(1) >>>> =============>>>> 1. sometimes live migration failed and reported call trace in dom0 >>>> http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1841 >>> >>> For the failed allocation, the only obvious candidate appears to be >>> >>> err_array = kcalloc(m.num, sizeof(int), GFP_KERNEL); >>> >>> which quite obviously can be of (almost) arbitrary size because >>> >>> nr_pages = m.num; >>> if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT))) >>> return -EINVAL; >>> >>> really only checks for completely insane values. >>> >>> This got introduced by Andres'' "xen/privcmd: add PRIVCMD_MMAPBATCH_V2 >>> ioctl" and is becoming worse with Mukesh''s recent "xen: privcmd: >>> support autotranslated physmap guests", which added another >>> similar (twice as large) allocation in alloc_empty_pages(). >> >> Perhaps the err_array in this case, since alloc_empty_pages only happens for >> auto translated dom0s. >> >> Not familiar wether libxl changes (or is even capable of changing) >> parameters of the migration code. But right now in libxc, mapping is done in >> MAX_BATCH_SIZE batches, which are of size 1024. So we are talking about 1024 >> ints, which is *one* page. >> >> So is really the kernel incapable of allocating one measly page? >> >> This leads me to think that it might be gather_array, but that one would >> allocate a grand total of two pages. > > The log indicated a failed order-4 allocation though. So maybe not > that allocation after all (be the basically unbounded size here is a > problem in any case).In my mind this casts serious doubt on whether err_array is the culprit of this test result. I would look into other variables. Having said that, a kcalloc of a potentially multi-page-sized array is a problem. Below you''ll find pasted an RFC patch to fix this. I''ve expanded the cc line to add Mats Peterson, who is also looking into some improvements to privcmd (and IanC for general feedback). The RFC patch cuts down code overall and cleans up logic too. I did change the behavior wrt classic implementations when it comes to handling errors & EFAULT. Instead of doing all the mapping work and then copying back to user, I copy back each individual mapping error as soon as it arises. And short-circuit and quit the whole operation as soon as the first EFAULT arises. Short-circuiting on the first EFAULT is the right thing to do IMHO. There is no point in continue working if we can''t tell the caller about the result. Further, libxc will just munmap the vma and undo all work upon receiving EFAULT. So any further work is wasted work. Please advise, thanks Andres diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index 3421f0d..9433396 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -254,18 +254,16 @@ struct mmap_batch_state { unsigned long va; struct vm_area_struct *vma; int index; - /* A tristate: - * 0 for no errors - * 1 if at least one error has happened (and no - * -ENOENT errors have happened) - * -ENOENT if at least 1 -ENOENT has happened. - */ - int global_error; + /* Whether at least one enoent has happened. */ + int enoent; /* An array for individual errors */ int *err; - /* User-space mfn array to store errors in the second pass for V1. */ + int version; + /* User-space mfn array to store errors for V1. */ xen_pfn_t __user *user_mfn; + /* User space int array to store errors for V2. */ + int __user *user_err; }; /* auto translated dom0 note: if domU being created is PV, then mfn is @@ -287,40 +285,35 @@ static int mmap_batch_fn(void *data, void *state) st->vma->vm_page_prot, st->domain, &cur_page); - /* Store error code for second pass. */ - *(st->err++) = ret; - - /* And see if it affects the global_error. */ - if (ret < 0) { - if (ret == -ENOENT) - st->global_error = -ENOENT; - else { - /* Record that at least one error has happened. */ - if (st->global_error == 0) - st->global_error = 1; + /* See if error affects the global enoent flag. */ + if (ret == -ENOENT) { + st->enoent = 1; + } + + /* And, if error, store in user space (version dependent). */ + if (ret) { + int efault = 0; + if (st->version == 1) { + xen_pfn_t mfn_err = *mfnp; + mfn_err |= (ret == -ENOENT) ? + PRIVCMD_MMAPBATCH_PAGED_ERROR : + PRIVCMD_MMAPBATCH_MFN_ERROR; + efault = __put_user(mfn_err, st->user_mfn++); + } else { /* st->version == 2 */ + efault = __put_user(ret, st->user_err++); } + /* If copy to user failed, short circuit now. */ + if (efault) + return efault; + } else { + st->user_mfn++; + st->user_err++; } - st->va += PAGE_SIZE; + st->va += PAGE_SIZE; return 0; } -static int mmap_return_errors_v1(void *data, void *state) -{ - xen_pfn_t *mfnp = data; - struct mmap_batch_state *st = state; - int err = *(st->err++); - - /* - * V1 encodes the error codes in the 32bit top nibble of the - * mfn (with its known limitations vis-a-vis 64 bit callers). - */ - *mfnp |= (err == -ENOENT) ? - PRIVCMD_MMAPBATCH_PAGED_ERROR : - PRIVCMD_MMAPBATCH_MFN_ERROR; - return __put_user(*mfnp, st->user_mfn++); -} - /* Allocate pfns that are then mapped with gmfns from foreign domid. Update * the vma with the page info to use later. * Returns: 0 if success, otherwise -errno @@ -357,7 +350,6 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version) struct vm_area_struct *vma; unsigned long nr_pages; LIST_HEAD(pagelist); - int *err_array = NULL; struct mmap_batch_state state; if (!xen_initial_domain()) @@ -396,10 +388,12 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version) goto out; } - err_array = kcalloc(m.num, sizeof(int), GFP_KERNEL); - if (err_array == NULL) { - ret = -ENOMEM; - goto out; + /* Zero the V2 array of errors, so we only write return codes on error. */ + if (version == 2) { + if (clear_user(m.err, sizeof(int) * m.num)) { + ret = -EFAULT; + goto out; + } } down_write(&mm->mmap_sem); @@ -426,38 +420,22 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version) state.vma = vma; state.va = m.addr; state.index = 0; - state.global_error = 0; - state.err = err_array; + state.enoent = 0; + state.user_mfn = (xen_pfn_t *)m.arr; + state.user_err = m.err; + state.version = version; - /* mmap_batch_fn guarantees ret == 0 */ - BUG_ON(traverse_pages(m.num, sizeof(xen_pfn_t), - &pagelist, mmap_batch_fn, &state)); + ret = traverse_pages(m.num, sizeof(xen_pfn_t), + &pagelist, mmap_batch_fn, &state); up_write(&mm->mmap_sem); - if (version == 1) { - if (state.global_error) { - /* Write back errors in second pass. */ - state.user_mfn = (xen_pfn_t *)m.arr; - state.err = err_array; - ret = traverse_pages(m.num, sizeof(xen_pfn_t), - &pagelist, mmap_return_errors_v1, &state); - } else - ret = 0; - - } else if (version == 2) { - ret = __copy_to_user(m.err, err_array, m.num * sizeof(int)); - if (ret) - ret = -EFAULT; - } - /* If we have not had any EFAULT-like global errors then set the global * error to -ENOENT if necessary. */ - if ((ret == 0) && (state.global_error == -ENOENT)) + if ((ret == 0) && (state.enoent)) ret = -ENOENT; out: - kfree(err_array); free_page_list(&pagelist); return ret;> >> In any case, both functions allocate arbitrary number of pages, and that is >> the fundamental problem. >> >> What is the approach in the forward ported kernel wrt to gather_array? > > There''s no gather_array there. It simply sets up things one page > at a time. (And you can always go and check linux-2.6.18-xen.hg). > > Jan >
On Jan 13, 2013, at 11:29 PM, Andres Lagar-Cavilla <andreslc@gridcentric.ca> wrote:> On Jan 11, 2013, at 3:34 AM, Jan Beulich <jbeulich@suse.com> wrote: > >>>>> On 10.01.13 at 18:10, Andres Lagar-Cavilla <andreslc@gridcentric.ca> wrote: >>> On Jan 10, 2013, at 3:57 AM, "Jan Beulich" <JBeulich@suse.com> wrote: >>> >>>>>>> On 10.01.13 at 08:51, "Ren, Yongjie" <yongjie.ren@intel.com> wrote: >>>>> New issue(1) >>>>> =============>>>>> 1. sometimes live migration failed and reported call trace in dom0 >>>>> http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1841 >>>> >>>> For the failed allocation, the only obvious candidate appears to be >>>> >>>> err_array = kcalloc(m.num, sizeof(int), GFP_KERNEL); >>>> >>>> which quite obviously can be of (almost) arbitrary size because >>>> >>>> nr_pages = m.num; >>>> if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT))) >>>> return -EINVAL; >>>> >>>> really only checks for completely insane values. >>>> >>>> This got introduced by Andres'' "xen/privcmd: add PRIVCMD_MMAPBATCH_V2 >>>> ioctl" and is becoming worse with Mukesh''s recent "xen: privcmd: >>>> support autotranslated physmap guests", which added another >>>> similar (twice as large) allocation in alloc_empty_pages(). >>> >>> Perhaps the err_array in this case, since alloc_empty_pages only happens for >>> auto translated dom0s. >>> >>> Not familiar wether libxl changes (or is even capable of changing) >>> parameters of the migration code. But right now in libxc, mapping is done in >>> MAX_BATCH_SIZE batches, which are of size 1024. So we are talking about 1024 >>> ints, which is *one* page. >>> >>> So is really the kernel incapable of allocating one measly page? >>> >>> This leads me to think that it might be gather_array, but that one would >>> allocate a grand total of two pages. >> >> The log indicated a failed order-4 allocation though. So maybe not >> that allocation after all (be the basically unbounded size here is a >> problem in any case). > > In my mind this casts serious doubt on whether err_array is the culprit of this test result. I would look into other variables. > > Having said that, a kcalloc of a potentially multi-page-sized array is a problem. > > Below you''ll find pasted an RFC patch to fix this. I''ve expanded the cc line to add Mats Peterson, who is also looking into some improvements to privcmd (and IanC for general feedback). > > The RFC patch cuts down code overall and cleans up logic too. I did change the behavior wrt classic implementations when it comes to handling errors & EFAULT. Instead of doing all the mapping work and then copying back to user, I copy back each individual mapping error as soon as it arises. And short-circuit and quit the whole operation as soon as the first EFAULT arises. > > Short-circuiting on the first EFAULT is the right thing to do IMHO. There is no point in continue working if we can''t tell the caller about the result. Further, libxc will just munmap the vma and undo all work upon receiving EFAULT. So any further work is wasted work.Copying back one error at a time is what the current code does for the ioctl V1 in any case. I''ve noticed a glaring logic error in the V1 case. If there is any mapping error, a second pass to copy back errors will be done. That second pass always sets the top nibble for each mfn to some error, whether one has happened for that mapping or not. Uggh. Patch I just posted fixes this. Some version of a fix has to go in soon. Andres> > Please advise, thanks > Andres > > diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c > index 3421f0d..9433396 100644 > --- a/drivers/xen/privcmd.c > +++ b/drivers/xen/privcmd.c > @@ -254,18 +254,16 @@ struct mmap_batch_state { > unsigned long va; > struct vm_area_struct *vma; > int index; > - /* A tristate: > - * 0 for no errors > - * 1 if at least one error has happened (and no > - * -ENOENT errors have happened) > - * -ENOENT if at least 1 -ENOENT has happened. > - */ > - int global_error; > + /* Whether at least one enoent has happened. */ > + int enoent; > /* An array for individual errors */ > int *err; > > - /* User-space mfn array to store errors in the second pass for V1. */ > + int version; > + /* User-space mfn array to store errors for V1. */ > xen_pfn_t __user *user_mfn; > + /* User space int array to store errors for V2. */ > + int __user *user_err; > }; > > /* auto translated dom0 note: if domU being created is PV, then mfn is > @@ -287,40 +285,35 @@ static int mmap_batch_fn(void *data, void *state) > st->vma->vm_page_prot, st->domain, > &cur_page); > > - /* Store error code for second pass. */ > - *(st->err++) = ret; > - > - /* And see if it affects the global_error. */ > - if (ret < 0) { > - if (ret == -ENOENT) > - st->global_error = -ENOENT; > - else { > - /* Record that at least one error has happened. */ > - if (st->global_error == 0) > - st->global_error = 1; > + /* See if error affects the global enoent flag. */ > + if (ret == -ENOENT) { > + st->enoent = 1; > + } > + > + /* And, if error, store in user space (version dependent). */ > + if (ret) { > + int efault = 0; > + if (st->version == 1) { > + xen_pfn_t mfn_err = *mfnp; > + mfn_err |= (ret == -ENOENT) ? > + PRIVCMD_MMAPBATCH_PAGED_ERROR : > + PRIVCMD_MMAPBATCH_MFN_ERROR; > + efault = __put_user(mfn_err, st->user_mfn++); > + } else { /* st->version == 2 */ > + efault = __put_user(ret, st->user_err++); > } > + /* If copy to user failed, short circuit now. */ > + if (efault) > + return efault; > + } else { > + st->user_mfn++; > + st->user_err++; > } > - st->va += PAGE_SIZE; > > + st->va += PAGE_SIZE; > return 0; > } > > -static int mmap_return_errors_v1(void *data, void *state) > -{ > - xen_pfn_t *mfnp = data; > - struct mmap_batch_state *st = state; > - int err = *(st->err++); > - > - /* > - * V1 encodes the error codes in the 32bit top nibble of the > - * mfn (with its known limitations vis-a-vis 64 bit callers). > - */ > - *mfnp |= (err == -ENOENT) ? > - PRIVCMD_MMAPBATCH_PAGED_ERROR : > - PRIVCMD_MMAPBATCH_MFN_ERROR; > - return __put_user(*mfnp, st->user_mfn++); > -} > - > /* Allocate pfns that are then mapped with gmfns from foreign domid. Update > * the vma with the page info to use later. > * Returns: 0 if success, otherwise -errno > @@ -357,7 +350,6 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version) > struct vm_area_struct *vma; > unsigned long nr_pages; > LIST_HEAD(pagelist); > - int *err_array = NULL; > struct mmap_batch_state state; > > if (!xen_initial_domain()) > @@ -396,10 +388,12 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version) > goto out; > } > > - err_array = kcalloc(m.num, sizeof(int), GFP_KERNEL); > - if (err_array == NULL) { > - ret = -ENOMEM; > - goto out; > + /* Zero the V2 array of errors, so we only write return codes on error. */ > + if (version == 2) { > + if (clear_user(m.err, sizeof(int) * m.num)) { > + ret = -EFAULT; > + goto out; > + } > } > > down_write(&mm->mmap_sem); > @@ -426,38 +420,22 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version) > state.vma = vma; > state.va = m.addr; > state.index = 0; > - state.global_error = 0; > - state.err = err_array; > + state.enoent = 0; > + state.user_mfn = (xen_pfn_t *)m.arr; > + state.user_err = m.err; > + state.version = version; > > - /* mmap_batch_fn guarantees ret == 0 */ > - BUG_ON(traverse_pages(m.num, sizeof(xen_pfn_t), > - &pagelist, mmap_batch_fn, &state)); > + ret = traverse_pages(m.num, sizeof(xen_pfn_t), > + &pagelist, mmap_batch_fn, &state); > > up_write(&mm->mmap_sem); > > - if (version == 1) { > - if (state.global_error) { > - /* Write back errors in second pass. */ > - state.user_mfn = (xen_pfn_t *)m.arr; > - state.err = err_array; > - ret = traverse_pages(m.num, sizeof(xen_pfn_t), > - &pagelist, mmap_return_errors_v1, &state); > - } else > - ret = 0; > - > - } else if (version == 2) { > - ret = __copy_to_user(m.err, err_array, m.num * sizeof(int)); > - if (ret) > - ret = -EFAULT; > - } > - > /* If we have not had any EFAULT-like global errors then set the global > * error to -ENOENT if necessary. */ > - if ((ret == 0) && (state.global_error == -ENOENT)) > + if ((ret == 0) && (state.enoent)) > ret = -ENOENT; > > out: > - kfree(err_array); > free_page_list(&pagelist); > > return ret; > > >> >>> In any case, both functions allocate arbitrary number of pages, and that is >>> the fundamental problem. >>> >>> What is the approach in the forward ported kernel wrt to gather_array? >> >> There''s no gather_array there. It simply sets up things one page >> at a time. (And you can always go and check linux-2.6.18-xen.hg). >> >> Jan >> >
On 14/01/13 04:29, Andres Lagar-Cavilla wrote:> > Below you''ll find pasted an RFC patch to fix this. I''ve expanded the > cc line to add Mats Peterson, who is also looking into some improvements > to privcmd (and IanC for general feedback). > > The RFC patch cuts down code overall and cleans up logic too. I did > change the behavior wrt classic implementations when it comes to > handling errors & EFAULT. Instead of doing all the mapping work and then > copying back to user, I copy back each individual mapping error as soon > as it arises. And short-circuit and quit the whole operation as soon as > the first EFAULT arises.Which is broken. Please just look at my v3 patch and implement that method.> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c > index 3421f0d..9433396 100644 > --- a/drivers/xen/privcmd.c > +++ b/drivers/xen/privcmd.c[...]> @@ -287,40 +285,35 @@ static int mmap_batch_fn(void *data, void *state)[...]> + efault = __put_user(mfn_err, st->user_mfn++); > + } else { /* st->version == 2 */ > + efault = __put_user(ret, st->user_err++);You can''t use __put_user() or any other function accessing user memory while holding mmap_sem or you will occasionally deadlock in the page fault handler (depending on whether the user page is currently present or not). David
On Jan 14, 2013, at 8:59 AM, David Vrabel <david.vrabel@citrix.com> wrote:> On 14/01/13 04:29, Andres Lagar-Cavilla wrote: >> >> Below you''ll find pasted an RFC patch to fix this. I''ve expanded the >> cc line to add Mats Peterson, who is also looking into some improvements >> to privcmd (and IanC for general feedback). >> >> The RFC patch cuts down code overall and cleans up logic too. I did >> change the behavior wrt classic implementations when it comes to >> handling errors & EFAULT. Instead of doing all the mapping work and then >> copying back to user, I copy back each individual mapping error as soon >> as it arises. And short-circuit and quit the whole operation as soon as >> the first EFAULT arises. > > Which is broken.Certainly due to copy_on_write within mmap semaphore. Unfortunately I didn''t have time last night to post the fix, pardon for the noise.> Please just look at my v3 patch and implement that method.The one nit I have about that is that it does an unnecessary get_user of the mfn on the second pass for V1. HOw about this? Andres diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index 3421f0d..fc4952d 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -261,11 +261,12 @@ struct mmap_batch_state { * -ENOENT if at least 1 -ENOENT has happened. */ int global_error; - /* An array for individual errors */ - int *err; + int version; /* User-space mfn array to store errors in the second pass for V1. */ xen_pfn_t __user *user_mfn; + /* User-space int array to store errors in the second pass for V2. */ + int __user *user_err; }; /* auto translated dom0 note: if domU being created is PV, then mfn is @@ -288,7 +289,19 @@ static int mmap_batch_fn(void *data, void *state) &cur_page); /* Store error code for second pass. */ - *(st->err++) = ret; + if (st->version == 1) { + if (ret < 0) { + /* + * V1 encodes the error codes in the 32bit top nibble of the + * mfn (with its known limitations vis-a-vis 64 bit callers). + */ + *mfnp |= (ret == -ENOENT) ? + PRIVCMD_MMAPBATCH_PAGED_ERROR : + PRIVCMD_MMAPBATCH_MFN_ERROR; + } + } else { /* st->version == 2 */ + *((int *) mfnp) = ret; + } /* And see if it affects the global_error. */ if (ret < 0) { @@ -305,20 +318,25 @@ static int mmap_batch_fn(void *data, void *state) return 0; } -static int mmap_return_errors_v1(void *data, void *state) +static int mmap_return_errors(void *data, void *state) { - xen_pfn_t *mfnp = data; struct mmap_batch_state *st = state; - int err = *(st->err++); - /* - * V1 encodes the error codes in the 32bit top nibble of the - * mfn (with its known limitations vis-a-vis 64 bit callers). - */ - *mfnp |= (err == -ENOENT) ? - PRIVCMD_MMAPBATCH_PAGED_ERROR : - PRIVCMD_MMAPBATCH_MFN_ERROR; - return __put_user(*mfnp, st->user_mfn++); + if (st->version == 1) { + xen_pfn_t mfnp = *((xen_pfn_t *) data); + if (mfnp & PRIVCMD_MMAPBATCH_MFN_ERROR) + return __put_user(mfnp, st->user_mfn++); + else + st->user_mfn++; + } else { /* st->version == 2 */ + int err = *((int *) data); + if (err) + return __put_user(err, st->user_err++); + else + st->user_err++; + } + + return 0; } /* Allocate pfns that are then mapped with gmfns from foreign domid. Update @@ -357,7 +375,6 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version) struct vm_area_struct *vma; unsigned long nr_pages; LIST_HEAD(pagelist); - int *err_array = NULL; struct mmap_batch_state state; if (!xen_initial_domain()) @@ -396,10 +413,12 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version) goto out; } - err_array = kcalloc(m.num, sizeof(int), GFP_KERNEL); - if (err_array == NULL) { - ret = -ENOMEM; - goto out; + if (version == 2) { + /* Zero error array now to only copy back actual errors. */ + if (clear_user(m.err, sizeof(int) * m.num)) { + ret = -EFAULT; + goto out; + } } down_write(&mm->mmap_sem); @@ -427,7 +446,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version) state.va = m.addr; state.index = 0; state.global_error = 0; - state.err = err_array; + state.version = version; /* mmap_batch_fn guarantees ret == 0 */ BUG_ON(traverse_pages(m.num, sizeof(xen_pfn_t), @@ -435,21 +454,14 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version) up_write(&mm->mmap_sem); - if (version == 1) { - if (state.global_error) { - /* Write back errors in second pass. */ - state.user_mfn = (xen_pfn_t *)m.arr; - state.err = err_array; - ret = traverse_pages(m.num, sizeof(xen_pfn_t), - &pagelist, mmap_return_errors_v1, &state); - } else - ret = 0; - - } else if (version == 2) { - ret = __copy_to_user(m.err, err_array, m.num * sizeof(int)); - if (ret) - ret = -EFAULT; - } + if (state.global_error) { + /* Write back errors in second pass. */ + state.user_mfn = (xen_pfn_t *)m.arr; + state.user_err = m.err; + ret = traverse_pages(m.num, sizeof(xen_pfn_t), + &pagelist, mmap_return_errors, &state); + } else + ret = 0; /* If we have not had any EFAULT-like global errors then set the global * error to -ENOENT if necessary. */ @@ -457,7 +469,6 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version) ret = -ENOENT; out: - kfree(err_array); free_page_list(&pagelist); return ret;> >> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c >> index 3421f0d..9433396 100644 >> --- a/drivers/xen/privcmd.c >> +++ b/drivers/xen/privcmd.c > [...] >> @@ -287,40 +285,35 @@ static int mmap_batch_fn(void *data, void *state) > [...] >> + efault = __put_user(mfn_err, st->user_mfn++); >> + } else { /* st->version == 2 */ >> + efault = __put_user(ret, st->user_err++); > > You can''t use __put_user() or any other function accessing user memory > while holding mmap_sem or you will occasionally deadlock in the page > fault handler (depending on whether the user page is currently present > or not). > > David
On 14/01/13 15:06, Andres Lagar-Cavilla wrote:> On Jan 14, 2013, at 8:59 AM, David Vrabel <david.vrabel@citrix.com> wrote: > >> On 14/01/13 04:29, Andres Lagar-Cavilla wrote: >>> >>> Below you''ll find pasted an RFC patch to fix this. I''ve expanded the >>> cc line to add Mats Peterson, who is also looking into some improvements >>> to privcmd (and IanC for general feedback). >>> >>> The RFC patch cuts down code overall and cleans up logic too. I did >>> change the behavior wrt classic implementations when it comes to >>> handling errors & EFAULT. Instead of doing all the mapping work and then >>> copying back to user, I copy back each individual mapping error as soon >>> as it arises. And short-circuit and quit the whole operation as soon as >>> the first EFAULT arises. >> >> Which is broken. > Certainly due to copy_on_write within mmap semaphore. Unfortunately I didn''t have time last night to post the fix, pardon for the noise. >> Please just look at my v3 patch and implement that method.... but be aware that I messed up mmap_return_errors() for V1 and set all MFNs as having errors. Oops.> The one nit I have about that is that it does an unnecessary get_user of the mfn on the second pass for V1. HOw about this?__get_user() and __put_user() are actually cheap (provided they don''t fault). This looks ok except for one thing.> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c > index 3421f0d..fc4952d 100644 > --- a/drivers/xen/privcmd.c > +++ b/drivers/xen/privcmd.c[...]> @@ -288,7 +289,19 @@ static int mmap_batch_fn(void *data, void *state) > &cur_page); > > /* Store error code for second pass. */ > - *(st->err++) = ret; > + if (st->version == 1) { > + if (ret < 0) { > + /* > + * V1 encodes the error codes in the 32bit top nibble of the > + * mfn (with its known limitations vis-a-vis 64 bit callers). > + */ > + *mfnp |= (ret == -ENOENT) ? > + PRIVCMD_MMAPBATCH_PAGED_ERROR : > + PRIVCMD_MMAPBATCH_MFN_ERROR;You also need to clear the top nibble on success (ret >= 0) so large PFNs with the top nibble already set don''t give false positives of errors. David
>>> On 14.01.13 at 17:03, David Vrabel <david.vrabel@citrix.com> wrote: > On 14/01/13 15:06, Andres Lagar-Cavilla wrote: >> @@ -288,7 +289,19 @@ static int mmap_batch_fn(void *data, void *state) >> &cur_page); >> >> /* Store error code for second pass. */ >> - *(st->err++) = ret; >> + if (st->version == 1) { >> + if (ret < 0) { >> + /* >> + * V1 encodes the error codes in the 32bit top nibble of the >> + * mfn (with its known limitations vis-a-vis 64 bit callers). >> + */ >> + *mfnp |= (ret == -ENOENT) ? >> + PRIVCMD_MMAPBATCH_PAGED_ERROR : >> + PRIVCMD_MMAPBATCH_MFN_ERROR; > > You also need to clear the top nibble on success (ret >= 0) so large > PFNs with the top nibble already set don''t give false positives of errors.Not really - that''s what v2 was added for (the caller, unless keeping a second array with the original MFNs, wouldn''t be able to match things up in that case). Jan
On 14/01/13 16:14, Jan Beulich wrote:>>>> On 14.01.13 at 17:03, David Vrabel <david.vrabel@citrix.com> wrote: >> On 14/01/13 15:06, Andres Lagar-Cavilla wrote: >>> @@ -288,7 +289,19 @@ static int mmap_batch_fn(void *data, void *state) >>> &cur_page); >>> >>> /* Store error code for second pass. */ >>> - *(st->err++) = ret; >>> + if (st->version == 1) { >>> + if (ret < 0) { >>> + /* >>> + * V1 encodes the error codes in the 32bit top nibble of the >>> + * mfn (with its known limitations vis-a-vis 64 bit callers). >>> + */ >>> + *mfnp |= (ret == -ENOENT) ? >>> + PRIVCMD_MMAPBATCH_PAGED_ERROR : >>> + PRIVCMD_MMAPBATCH_MFN_ERROR; >> >> You also need to clear the top nibble on success (ret >= 0) so large >> PFNs with the top nibble already set don''t give false positives of errors. > > Not really - that''s what v2 was added for (the caller, unless > keeping a second array with the original MFNs, wouldn''t be able > to match things up in that case).Ok, I can agree with that. David
On Jan 14, 2013, at 11:03 AM, David Vrabel <david.vrabel@citrix.com> wrote:> On 14/01/13 15:06, Andres Lagar-Cavilla wrote: >> On Jan 14, 2013, at 8:59 AM, David Vrabel <david.vrabel@citrix.com> wrote: >> >>> On 14/01/13 04:29, Andres Lagar-Cavilla wrote: >>>> >>>> Below you''ll find pasted an RFC patch to fix this. I''ve expanded the >>>> cc line to add Mats Peterson, who is also looking into some improvements >>>> to privcmd (and IanC for general feedback). >>>> >>>> The RFC patch cuts down code overall and cleans up logic too. I did >>>> change the behavior wrt classic implementations when it comes to >>>> handling errors & EFAULT. Instead of doing all the mapping work and then >>>> copying back to user, I copy back each individual mapping error as soon >>>> as it arises. And short-circuit and quit the whole operation as soon as >>>> the first EFAULT arises. >>> >>> Which is broken. >> Certainly due to copy_on_write within mmap semaphore. Unfortunately I didn''t have time last night to post the fix, pardon for the noise. >>> Please just look at my v3 patch and implement that method. > > ... but be aware that I messed up mmap_return_errors() for V1 and set > all MFNs as having errors. Oops. > >> The one nit I have about that is that it does an unnecessary get_user of the mfn on the second pass for V1. HOw about this? > > __get_user() and __put_user() are actually cheap (provided they don''t > fault). > > This looks ok except for one thing. > >> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c >> index 3421f0d..fc4952d 100644 >> --- a/drivers/xen/privcmd.c >> +++ b/drivers/xen/privcmd.c > [...] >> @@ -288,7 +289,19 @@ static int mmap_batch_fn(void *data, void *state) >> &cur_page); >> >> /* Store error code for second pass. */ >> - *(st->err++) = ret; >> + if (st->version == 1) { >> + if (ret < 0) { >> + /* >> + * V1 encodes the error codes in the 32bit top nibble of the >> + * mfn (with its known limitations vis-a-vis 64 bit callers). >> + */ >> + *mfnp |= (ret == -ENOENT) ? >> + PRIVCMD_MMAPBATCH_PAGED_ERROR : >> + PRIVCMD_MMAPBATCH_MFN_ERROR; > > You also need to clear the top nibble on success (ret >= 0) so large > PFNs with the top nibble already set don''t give false positives of errors.But classic kernels don''t do this either, afaict (e.g. http://xenbits.xen.org/hg/linux-2.6.18-xen.hg/file/c340a22a3a63/drivers/xen/privcmd/privcmd.c#l282 or XenServer''s 6.1 kernel). The key reason for V2 is to get rid of all these limitations and stop trying to fix them in V1. I''m open to whichever fix, though. It''d be just one line for the else case. I''d just like some feedback before "officially" submitting a patch. Thanks! Andres> > David >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Jan 14, 2013, at 11:15 AM, David Vrabel <david.vrabel@citrix.com> wrote:> On 14/01/13 16:14, Jan Beulich wrote: >>>>> On 14.01.13 at 17:03, David Vrabel <david.vrabel@citrix.com> wrote: >>> On 14/01/13 15:06, Andres Lagar-Cavilla wrote: >>>> @@ -288,7 +289,19 @@ static int mmap_batch_fn(void *data, void *state) >>>> &cur_page); >>>> >>>> /* Store error code for second pass. */ >>>> - *(st->err++) = ret; >>>> + if (st->version == 1) { >>>> + if (ret < 0) { >>>> + /* >>>> + * V1 encodes the error codes in the 32bit top nibble of the >>>> + * mfn (with its known limitations vis-a-vis 64 bit callers). >>>> + */ >>>> + *mfnp |= (ret == -ENOENT) ? >>>> + PRIVCMD_MMAPBATCH_PAGED_ERROR : >>>> + PRIVCMD_MMAPBATCH_MFN_ERROR; >>> >>> You also need to clear the top nibble on success (ret >= 0) so large >>> PFNs with the top nibble already set don''t give false positives of errors. >> >> Not really - that''s what v2 was added for (the caller, unless >> keeping a second array with the original MFNs, wouldn''t be able >> to match things up in that case). > > Ok, I can agree with that.Ok, cool, thanks David. Jan, Konrad, is the last patch sent this (EST) morning decent enough? Andres> > David
On Mon, Jan 14, 2013 at 11:19:18AM -0500, Andres Lagar-Cavilla wrote:> On Jan 14, 2013, at 11:15 AM, David Vrabel <david.vrabel@citrix.com> wrote: > > > On 14/01/13 16:14, Jan Beulich wrote: > >>>>> On 14.01.13 at 17:03, David Vrabel <david.vrabel@citrix.com> wrote: > >>> On 14/01/13 15:06, Andres Lagar-Cavilla wrote: > >>>> @@ -288,7 +289,19 @@ static int mmap_batch_fn(void *data, void *state) > >>>> &cur_page); > >>>> > >>>> /* Store error code for second pass. */ > >>>> - *(st->err++) = ret; > >>>> + if (st->version == 1) { > >>>> + if (ret < 0) { > >>>> + /* > >>>> + * V1 encodes the error codes in the 32bit top nibble of the > >>>> + * mfn (with its known limitations vis-a-vis 64 bit callers). > >>>> + */ > >>>> + *mfnp |= (ret == -ENOENT) ? > >>>> + PRIVCMD_MMAPBATCH_PAGED_ERROR : > >>>> + PRIVCMD_MMAPBATCH_MFN_ERROR; > >>> > >>> You also need to clear the top nibble on success (ret >= 0) so large > >>> PFNs with the top nibble already set don''t give false positives of errors. > >> > >> Not really - that''s what v2 was added for (the caller, unless > >> keeping a second array with the original MFNs, wouldn''t be able > >> to match things up in that case). > > > > Ok, I can agree with that. > > Ok, cool, thanks David. Jan, Konrad, is the last patch sent this (EST) morning decent enough?Hm, I am not seeing it in my mailbox. Was I on the ''To'' or ''CC'' list?