Hello, While attempting to teach a hypercall-aware valgrind about enough hypercalls to allow it to introspect HVM domain migration I came across some systemic problems with certain hypercalls, particularly with migrate. Here is the example of XENMEM_maximum_ram_page, but it is not alone as far as this goes. In Xen, it is defined as /* * Returns the maximum machine frame number of mapped RAM in this system. * This command always succeeds (it never returns an error code). * arg =NULL. */ In memory.c, there is a possible unsigned->signed conversion error from max_pages to rc. In compat/memory.c, there is a long->int truncation error for compat hypercalls, although newer versions of Xen cap this at INT_{MIN,MAX} In the privcmd driver passes the hypercall rc through as the return from the ioctl handler, containing a possible long->int truncation error. From libxc, the do_memory_op() is expected -errno style error handling, but does not enforce it. There is however a possible int->long extension issue with xc_maximum_ram_page(). The value from this is then stuffed into unsigned long minfo->max_mfn and immediately used in try to map the M2P table. From the work with XSA-55, we have already identified that the error handling and propagation in libxc leaves a lot to be desired. However, the hypervisor side of things is just as problematic. What policy do we have about deprecating hypercall interfaces and introducing newer ones? At a minimum, all hypercalls should be using -errno style errors, with a possibility of returning 0 to LONG_MAX as well. I realise that simply changing the hypercalls in place is not possible. Would it be acceptable to have a step change across a Xen version (say early in 4.4) where consumers of the public interface would have to make use of -DXEN_LEGACY_UNSAFE_HYPERCALLS (or equivalent) in an attempt to move them forward with the API ? ~Andrew
>>> On 19.06.13 at 17:43, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > In memory.c, there is a possible unsigned->signed conversion error from > max_pages to rc.That''s of no concern as long as the maximum possible value can''t result in the value being negative. Plus it''s problematic only when the hypervisor is 32-bit (as otherwise it''s a conversion from "unsigned int" to "signed long". And for the list of items to be complete - there''s a similar conversion for d->tot_pages.> In compat/memory.c, there is a long->int truncation error for compat > hypercalls, although newer versions of Xen cap this at INT_{MIN,MAX}That was added precisely to avoid uncontrolled truncation.> In the privcmd driver passes the hypercall rc through as the return from > the ioctl handler, containing a possible long->int truncation error.That''s an outright bug, introduced by improper code transformations when porting the XenoLinux code to the upstream kernel, or - if the porting was done long enough ago - lack of noticing linux-2.6.18-xen.hg c/s 984.> From the work with XSA-55, we have already identified that the error > handling and propagation in libxc leaves a lot to be desired. However, > the hypervisor side of things is just as problematic.Given the above I''m not clear what problematic point you see.> What policy do we have about deprecating hypercall interfaces and > introducing newer ones? At a minimum, all hypercalls should be using > -errno style errors, with a possibility of returning 0 to LONG_MAX as well. > > I realise that simply changing the hypercalls in place is not possible. > Would it be acceptable to have a step change across a Xen version (say > early in 4.4) where consumers of the public interface would have to make > use of -DXEN_LEGACY_UNSAFE_HYPERCALLS (or equivalent) in an attempt to > move them forward with the API ?That''s what we have __XEN_INTERFACE_VERSION__ for - just guard stuff you don''t want up-to-date consumers to use anymore with a respective #if __XEN_INTERFACE_VERSION__ < 0x040400. Of course pv-ops is lacking any such version handling so far, apparently with the original hope of only using up-to-date bits. Jan
On 20/06/13 10:01, Jan Beulich wrote:>>>> On 19.06.13 at 17:43, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> In memory.c, there is a possible unsigned->signed conversion error from >> max_pages to rc. > That''s of no concern as long as the maximum possible value can''t > result in the value being negative. Plus it''s problematic only when > the hypervisor is 32-bit (as otherwise it''s a conversion from > "unsigned int" to "signed long". > > And for the list of items to be complete - there''s a similar conversion > for d->tot_pages.In this case, 64bit domain on 64bit Xen is fine. This hypercall is ok as it really shouldn''t be returning more than ((~0ULL)>>PAGE_SHIFT) I guess the question boils down this: Is it ok to retroactively apply -error semantics to hypercalls which were previously defined to never return an error? Already for the compat layer a wrong value is being returned. All we would be doing is changing from INT_MAX to -ERANGE which is differently wrong but more consistent. ~Andrew
>>> On 25.06.13 at 15:10, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 20/06/13 10:01, Jan Beulich wrote: >>>>> On 19.06.13 at 17:43, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> In memory.c, there is a possible unsigned->signed conversion error from >>> max_pages to rc. >> That''s of no concern as long as the maximum possible value can''t >> result in the value being negative. Plus it''s problematic only when >> the hypervisor is 32-bit (as otherwise it''s a conversion from >> "unsigned int" to "signed long". >> >> And for the list of items to be complete - there''s a similar conversion >> for d->tot_pages. > > In this case, 64bit domain on 64bit Xen is fine. This hypercall is ok > as it really shouldn''t be returning more than ((~0ULL)>>PAGE_SHIFT) > > I guess the question boils down this: > > Is it ok to retroactively apply -error semantics to hypercalls which > were previously defined to never return an error? Already for the > compat layer a wrong value is being returned. All we would be doing is > changing from INT_MAX to -ERANGE which is differently wrong but more > consistent.I think it is okay if the change is, like here, from a de facto random value (due to having got truncated) to a predictable error indicator. The capping to INT_MAX was trying to do almost the same (with the goal of not converting a success return to an error one). Jan