Eric Blake
2021-Nov-18 18:12 UTC
[Libguestfs] [PATCH nbdkit 2/2] common/include/checked-overflow.h: Provide fallback
On Thu, Nov 18, 2021 at 06:24:09PM +0100, Laszlo Ersek wrote:> > Your analysis proceeds with all of "a", "b", and "max" of the same > > unsigned types. I understand why you skipped signed types (the > > analysis is harder); but it it worth supporting cases where "a" and > > "b" are possibly different widths? The gnulib file intprops.h does > > this, but of course, due to the incompatible licensing, me repeating > > how it does it here is not in our best interest (however, I will point > > out that it results in a MUCH larger preprocessor expansion than your > > simpler solution that enforces same size inputs). > > I imagine we could take each input (i.e., "a", "b") as uintmax_t, > perform the addition / multiplication in uintmax_t (with the safeguards > I proposed earlier adapted to UINTMAX_MAX), and then explicitly compare > the result against the maximum of the desired result type, such as > UINT64_MAX, SIZE_MAX, and so on. > > I didn't think of this because I'm acutely aware of types, promotions > etc when I write C code, so "these two types must be the same" is not > something I'd be bothered by.It's also a tradeoff on how generic we happen to be, vs. whether all our callers actually do use the same size. If the code is easier to maintain by insisting on callers matching certain patterns, that's okay as long as we can easily detect callers that fail those patterns. But if the cost of detecting out-of-spec callers is worse than just biting the bullet and writing the code to support mixed types, then being fully generic wins out.> > > > > >> (9) And so the C-language functions are just: > >> > >> bool > >> add_size_t_overflow (size_t a, size_t b, size_t *result) > >> { > >> if (a <= SIZE_MAX - b) { > >> *result = a + b; > >> return false; > >> } > >> return true; > > > > The compiler builtins ALSO assign the low-order bits of the > > infinite-precision operation into *result on overflow, behaving more > > like: > > > > *result = a + b; > > return ! (a <= SIZE_MAX - b); > > Ah, interesting. I didn't notice that. I figured "output is > indeterminate on error" is generally acceptable. Is there a particular > use for the low-order bits, on overflow?I suspect there may be some backwards compatibility at play, and most hardware is already able to give us that for free. Even though you're right that most sane users of these macros won't use the result on overflow, I'm more worried about the subtle bugs that would occur on less-tested platforms if we don't actually provide a deterministic value (particularly if most of our testing occurs on platforms where the compiler builtins do have deterministic behavior). ...> Right, I didn't consider types with lesser integer conversion rank than > that of "unsigned int" -- I don't think we'd use "short" for memory > allocation, sizes, indexing, file offsets etc.While it sounds impractical, I do have to wonder if anyone has ever tried to detect running out of TCP ports (a "short") by using overflow checking.> > I wouldn't want to be more gnulib than gnulib :) (Not that I looked at > it -- I intentionally didn't.)And that's a good thing - since I'm tainted by knowing what gnulib has done, I'm trying to make sure I keep it possible for you to come up with a clean-room implementation if we want it, even though I shouldn't write it myself. ...> >> Comments? > > > > As long as we have good unit tests for the particular use cases we > > care about detecting, I can live with either a generic macro or with > > type-specific functional interfaces. I'm a bit worried that using > > functions may fail to catch overflow on the implicit type conversions > > of arguments passed to the function; maybe it's possible to do a > > hybrid, by writing a function that checks for type equality but then > > calls a function for the actual overflow check, so that we are ensured > > that callers use the correct types, but have the ease of reading the > > overflow check in a function that doesn't require even more macro > > expansions to see what it is doing. > > > > Let me try this (later, only a sketch now): > > - use macros for checking that the input and output types are (ii) > unsigned (i) integer types, ignore their (iii) particular ranges > > - convert the inputs to uintmax_t and perform the operation in uintmax_t > unconditionally, only *remembering* -- using the same in-advance checks > -- if UINTMAX_MAX is overflowed,Perhaps you can do this with a function signature like: bool check_op_overflow (uint64_t a, uint64_t b, uint64_t max, uint64_t *r)> > - deduce the actual result limit from ((typeof (*r))-1), > > - assign the low-order bits in any case -- if either the uintmax_t > operation overflows or the actual result limit is exceeded, it all > happens in unsigned, so it's well-defined (modular reduction). Return > the overflow status correctly.and then utilize it something like: #define CHECK_OP_OVERFLOW(a, b, r) \ ({ \ bool overflow; uint64_t tmp; \ CHECK_UNSIGNED_INT (a); CHECK_UNSIGNED_INT (b); \ overflow = check_op_overflow (a, b, (typeof (*(r)))-1, &tmp); \ *(r) = tmp; \ overflow; \ }) and maybe even have CHECK_UNSIGNED_INT() permit signed values >= 0, rather than insisting that the underlying type be unsigned. At any rate, the approach sounds good to me. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Laszlo Ersek
2021-Nov-19 14:35 UTC
[Libguestfs] [PATCH nbdkit 2/2] common/include/checked-overflow.h: Provide fallback
On 11/18/21 19:12, Eric Blake wrote:> Perhaps you can do this with a function signature like: > > bool check_op_overflow (uint64_t a, uint64_t b, uint64_t max, uint64_t *r) > >> >> - deduce the actual result limit from ((typeof (*r))-1), >> >> - assign the low-order bits in any case -- if either the uintmax_t >> operation overflows or the actual result limit is exceeded, it all >> happens in unsigned, so it's well-defined (modular reduction). Return >> the overflow status correctly. > > and then utilize it something like: > > #define CHECK_OP_OVERFLOW(a, b, r) \ > ({ \ > bool overflow; uint64_t tmp; \ > CHECK_UNSIGNED_INT (a); CHECK_UNSIGNED_INT (b); \ > overflow = check_op_overflow (a, b, (typeof (*(r)))-1, &tmp); \ > *(r) = tmp; \ > overflow; \ > }) > > and maybe even have CHECK_UNSIGNED_INT() permit signed values >= 0, > rather than insisting that the underlying type be unsigned. > > At any rate, the approach sounds good to me.How about this: -------- #include <stdbool.h> #include <stdint.h> /* Assert, at compile time, that the expression "x" has some unsigned integer * type. * * The expression "x" is not evaluated, unless it has variably modified type. */ #define STATIC_ASSERT_UNSIGNED_INT(x) \ do { \ typedef char x_has_uint_type[(typeof (x))-1 > 0 ? 1 : -1]; \ } while (0) /* Assign the sum (a + b) to (*r), using uintmax_t modular arithmetic. * * Return true iff the addition overflows or the result exceeds (max). */ static inline bool check_add_overflow (uintmax_t a, uintmax_t b, uintmax_t max, uintmax_t *r) { *r = a + b; return a > UINTMAX_MAX - b || *r > max; } /* Assign the product (a * b) to (*r), using uintmax_t modular arithmetic. * * Return true iff the multiplication overflows or the result exceeds (max). */ static inline bool check_mul_overflow (uintmax_t a, uintmax_t b, uintmax_t max, uintmax_t *r) { *r = a * b; return b > 0 && (a > UINTMAX_MAX / b || *r > max); } /* Add (a) and (b), both of (possibly different) unsigned integer types, and * store the sum in (*r), which must also have some unsigned integer type. * * Each macro argument is evaluated exactly once, as long as it does not have * variably modified type. * * The macro evaluates to (false) if (*r) can represent the mathematical sum. * Otherwise, the macro evaluates to (true), and the low order bits of the * mathematical sum are stored to (*r). */ #define ADD_OVERFLOW(a, b, r) \ ({ \ bool overflow; \ uintmax_t tmp; \ \ STATIC_ASSERT_UNSIGNED_INT (a); \ STATIC_ASSERT_UNSIGNED_INT (b); \ STATIC_ASSERT_UNSIGNED_INT (*(r)); \ overflow = check_add_overflow ((a), (b), (typeof (*(r)))-1, &tmp); \ *(r) = tmp; \ overflow; \ }) /* Multiply (a) and (b), both of (possibly different) unsigned integer types, * and store the product in (*r), which must also have some unsigned integer * type. * * Each macro argument is evaluated exactly once, as long as it does not have * variably modified type. * * The macro evaluates to (false) if (*r) can represent the mathematical * product. Otherwise, the macro evaluates to (true), and the low order bits of * the mathematical product are stored to (*r). */ #define MUL_OVERFLOW(a, b, r) \ ({ \ bool overflow; \ uintmax_t tmp; \ \ STATIC_ASSERT_UNSIGNED_INT (a); \ STATIC_ASSERT_UNSIGNED_INT (b); \ STATIC_ASSERT_UNSIGNED_INT (*(r)); \ overflow = check_mul_overflow ((a), (b), (typeof (*(r)))-1, &tmp); \ *(r) = tmp; \ overflow; \ }) -------- Questions: (1) What is the usual style in nbdkit to highlight parameters of functions and function-like macros in documentation (comments)? Here I used parentheses. Normally I use quotes "". (2) In the actual functions, we return "true" for "overflow". My original conditions expressed "no overflow". In this source code, I *open-coded* the negations of my original conditions, using De Morgan's laws. That's because I dislike expressions with an outermost "!" operator -- whenever I face such an expression, I always work the outermost negation *into* the formula, as deeply as I can. However, I realize the resultant conditions may not be easy to read this way. Here's an alternative for the body of check_add_overflow(), based on the original formulation: register bool in_range; *r = a + b; in_range = a <= UINTMAX_MAX - b && *r <= max; return !in_range; (3) In the overflow conditions, I used parentheses as sparingly as possible; only when needed. I could also use as many as possible. - Compare, for the addition: return ((a > UINTMAX_MAX - b) || (*r > max)); or in_range = ((a <= UINTMAX_MAX - b) && (*r <= max)); return !in_range; - Compare, for the multiplication: return ((b > 0) && ((a > UINTMAX_MAX / b) || (*r > max))); or in_range = ((b == 0) || ((a <= UINTMAX_MAX / b) && (*r <= max))); return !in_range; What's the preferred form? Thanks, Laszlo