Hello,
While attempting to teach a hypercall-aware valgrind about enough
hypercalls to allow it to introspect HVM domain migration I came across
some systemic problems with certain hypercalls, particularly with migrate.
Here is the example of XENMEM_maximum_ram_page, but it is not alone as
far as this goes.
In Xen, it is defined as
/*
* Returns the maximum machine frame number of mapped RAM in this
system.
* This command always succeeds (it never returns an error
code).
* arg =NULL.
*/
In memory.c, there is a possible unsigned->signed conversion error from
max_pages to rc.
In compat/memory.c, there is a long->int truncation error for compat
hypercalls, although newer versions of Xen cap this at INT_{MIN,MAX}
In the privcmd driver passes the hypercall rc through as the return from
the ioctl handler, containing a possible long->int truncation error.
From libxc, the do_memory_op() is expected -errno style error handling,
but does not enforce it. There is however a possible int->long
extension issue with xc_maximum_ram_page().
The value from this is then stuffed into unsigned long minfo->max_mfn
and immediately used in try to map the M2P table.
From the work with XSA-55, we have already identified that the error
handling and propagation in libxc leaves a lot to be desired. However,
the hypervisor side of things is just as problematic.
What policy do we have about deprecating hypercall interfaces and
introducing newer ones? At a minimum, all hypercalls should be using
-errno style errors, with a possibility of returning 0 to LONG_MAX as well.
I realise that simply changing the hypercalls in place is not possible.
Would it be acceptable to have a step change across a Xen version (say
early in 4.4) where consumers of the public interface would have to make
use of -DXEN_LEGACY_UNSAFE_HYPERCALLS (or equivalent) in an attempt to
move them forward with the API ?
~Andrew
>>> On 19.06.13 at 17:43, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > In memory.c, there is a possible unsigned->signed conversion error from > max_pages to rc.That''s of no concern as long as the maximum possible value can''t result in the value being negative. Plus it''s problematic only when the hypervisor is 32-bit (as otherwise it''s a conversion from "unsigned int" to "signed long". And for the list of items to be complete - there''s a similar conversion for d->tot_pages.> In compat/memory.c, there is a long->int truncation error for compat > hypercalls, although newer versions of Xen cap this at INT_{MIN,MAX}That was added precisely to avoid uncontrolled truncation.> In the privcmd driver passes the hypercall rc through as the return from > the ioctl handler, containing a possible long->int truncation error.That''s an outright bug, introduced by improper code transformations when porting the XenoLinux code to the upstream kernel, or - if the porting was done long enough ago - lack of noticing linux-2.6.18-xen.hg c/s 984.> From the work with XSA-55, we have already identified that the error > handling and propagation in libxc leaves a lot to be desired. However, > the hypervisor side of things is just as problematic.Given the above I''m not clear what problematic point you see.> What policy do we have about deprecating hypercall interfaces and > introducing newer ones? At a minimum, all hypercalls should be using > -errno style errors, with a possibility of returning 0 to LONG_MAX as well. > > I realise that simply changing the hypercalls in place is not possible. > Would it be acceptable to have a step change across a Xen version (say > early in 4.4) where consumers of the public interface would have to make > use of -DXEN_LEGACY_UNSAFE_HYPERCALLS (or equivalent) in an attempt to > move them forward with the API ?That''s what we have __XEN_INTERFACE_VERSION__ for - just guard stuff you don''t want up-to-date consumers to use anymore with a respective #if __XEN_INTERFACE_VERSION__ < 0x040400. Of course pv-ops is lacking any such version handling so far, apparently with the original hope of only using up-to-date bits. Jan
On 20/06/13 10:01, Jan Beulich wrote:>>>> On 19.06.13 at 17:43, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> In memory.c, there is a possible unsigned->signed conversion error from >> max_pages to rc. > That''s of no concern as long as the maximum possible value can''t > result in the value being negative. Plus it''s problematic only when > the hypervisor is 32-bit (as otherwise it''s a conversion from > "unsigned int" to "signed long". > > And for the list of items to be complete - there''s a similar conversion > for d->tot_pages.In this case, 64bit domain on 64bit Xen is fine. This hypercall is ok as it really shouldn''t be returning more than ((~0ULL)>>PAGE_SHIFT) I guess the question boils down this: Is it ok to retroactively apply -error semantics to hypercalls which were previously defined to never return an error? Already for the compat layer a wrong value is being returned. All we would be doing is changing from INT_MAX to -ERANGE which is differently wrong but more consistent. ~Andrew
>>> On 25.06.13 at 15:10, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 20/06/13 10:01, Jan Beulich wrote: >>>>> On 19.06.13 at 17:43, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> In memory.c, there is a possible unsigned->signed conversion error from >>> max_pages to rc. >> That''s of no concern as long as the maximum possible value can''t >> result in the value being negative. Plus it''s problematic only when >> the hypervisor is 32-bit (as otherwise it''s a conversion from >> "unsigned int" to "signed long". >> >> And for the list of items to be complete - there''s a similar conversion >> for d->tot_pages. > > In this case, 64bit domain on 64bit Xen is fine. This hypercall is ok > as it really shouldn''t be returning more than ((~0ULL)>>PAGE_SHIFT) > > I guess the question boils down this: > > Is it ok to retroactively apply -error semantics to hypercalls which > were previously defined to never return an error? Already for the > compat layer a wrong value is being returned. All we would be doing is > changing from INT_MAX to -ERANGE which is differently wrong but more > consistent.I think it is okay if the change is, like here, from a de facto random value (due to having got truncated) to a predictable error indicator. The capping to INT_MAX was trying to do almost the same (with the goal of not converting a success return to an error one). Jan