Laszlo Ersek
2021-Nov-18 17:24 UTC
[Libguestfs] [PATCH nbdkit 2/2] common/include/checked-overflow.h: Provide fallback
On 11/18/21 16:27, Eric Blake wrote:> On Thu, Nov 18, 2021 at 01:39:39PM +0100, Laszlo Ersek wrote: >> (... long email) > > And quite insightful. I'm trimming to portions where I'm replying... > >> >> - signed integers are a huge complication, while at the same time, their >> use (or usefulness) is extremely limited for dealing with memory >> allocations, sizes, indexing, file offsets, and so on. > > Actually, the emacs project has been moving towards explicitly using > signed types for memory sizes. Why? Because compilers have modes > where they can diagnose signed overflows, but the C standard requires > unsigned types to perform module 2^N arithmetic without warning. So > using a signed type (gnulib calls it idx_t) that is otherwise the same > width as size_t makes it possible for the compiler to help diagnose > non-portable code. > > But yse, writing code that avoids overflow using signed types is a > much trickier proposition. > >> >> I wouldn't try to do this in a type-transparent manner. I'd first revert >> commit b31859402d14 ("common/include/checked-overflow.h: Simplify", >> 2021-11-11), and then add separate functions for uint64_t and size_t. > > Writing functions would allow a and b to undergo integer promotions, > but the return type pointer forces an explicit choice of the result > type. Unless we tell the compiler to warn on narrowing of integers to > a function call, there is a risk that on 32-bit machines, someone > passing a 64-bit off_t value to a 32-bit size_t overflow checker will > undergo inadvertent overflow prior to the function getting a chance to > perform its checks. > > One of the benefits of the gcc/clang builtin is that they are > type-agnostic, even if I want to perform off_t*int and store the > result in size_t, the builtin will correctly tell me if the > infinite-precision result would overflow the destination, without me > having to figure out whether promotion or implicit narrowing affected > things. Using a function loses some of that benefit when compared to > a fully-generic macro. However, writing a fully-generic macro that > correctly handles mismatched operand types is indeed harder. > >> >> If we don't want to be 100% standard C, just make it work for gcc-4.2, >> then a type-generic(-ish) solution could exist using "typeof" and >> (perhaps) statement expressions. (Both are GCC extensions that exist in >> 4.2: >> - <https://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcc/Statement-Exprs.html> >> - <https://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcc/Typeof.html>.) > > Indeed, that's also an intermediate option - we require gcc (even on > RHEL 7) for our automatic cleanup usage, so relying on the features > present in the oldest supported gcc is acceptable, rather than having > to strive for strict C compliance. > >> >> To go into some details: >> >> >> (1) The idea is that we want the following C expression to evaluate to >> true, without OP overflowing first: >> >> a OP b <= max >> >> where "a", "b", "max" are all of the same unsigned integer type, and OP >> is either "+" or "*". > > Your analysis proceeds with all of "a", "b", and "max" of the same > unsigned types. I understand why you skipped signed types (the > analysis is harder); but it it worth supporting cases where "a" and > "b" are possibly different widths? The gnulib file intprops.h does > this, but of course, due to the incompatible licensing, me repeating > how it does it here is not in our best interest (however, I will point > out that it results in a MUCH larger preprocessor expansion than your > simpler solution that enforces same size inputs).I imagine we could take each input (i.e., "a", "b") as uintmax_t, perform the addition / multiplication in uintmax_t (with the safeguards I proposed earlier adapted to UINTMAX_MAX), and then explicitly compare the result against the maximum of the desired result type, such as UINT64_MAX, SIZE_MAX, and so on. I didn't think of this because I'm acutely aware of types, promotions etc when I write C code, so "these two types must be the same" is not something I'd be bothered by.> >> (9) And so the C-language functions are just: >> >> bool >> add_size_t_overflow (size_t a, size_t b, size_t *result) >> { >> if (a <= SIZE_MAX - b) { >> *result = a + b; >> return false; >> } >> return true; > > The compiler builtins ALSO assign the low-order bits of the > infinite-precision operation into *result on overflow, behaving more > like: > > *result = a + b; > return ! (a <= SIZE_MAX - b);Ah, interesting. I didn't notice that. I figured "output is indeterminate on error" is generally acceptable. Is there a particular use for the low-order bits, on overflow?> > for unsigned types (where C has guaranteed modulo semantics), and a > much more complicated solution with signed types (basically, perform > the operation in the unsigned counterparts, and then cast those > low-order bits back to signed at the end). > >> } >> >> bool >> mul_size_t_overflow (size_t a, size_t b, size_t *result) >> { >> if (b == 0 || a <= SIZE_MAX / b) { >> *result = a * b; >> return false; >> } >> return true; >> } >> >> >> (10) The (unsigned) type-generic macros could look something like this. >> First, we'd need to check if the types of "a", "b" and "result" were (i) >> integers, (ii) unsigned, and (iii) had identical range. > > Check (i) is a nice safety valve. Check (ii) is essential to avoid > the nastiness that would be required for correct results with signed > types, but with a larger macro, we could indeed support it (gnulib > demonstrates that). But I'm inclined to insist on (ii) until we > encounter a case where we really care about signed overflow; insisting > that we use unsigned sizes (contrary to emacs' choice of using a > signed idx_t type for compiler assistance) is easier to maintain, even > if it reduces the places where the compiler can help us. Check (iii) > is the one I question as being strictly necessary; if we can correctly > predict the size of the result after arithmetic promotion, we can > determine the maximum value that we must fit within even if the inputs > underwent a size change.Yes, this might work with the above (vague) uintmax_t-based approach; we could use ((typeof (*r))-1) as the desired actual limit.> > ... >> I don't consider bit-fields at all because they are pure evil anyway. :) > > Too true! > > ... >> (11) Then we could do this, for example: >> >> #include <stdbool.h> >> #include <string.h> >> >> /* This macro does not evaluate any one of its arguments, unless the argument >> * has variably modified type. >> */ >> #define HAVE_SAME_UINT_TYPES(a, b) \ >> ((typeof (a))1 / 2 == 0 && \ >> (typeof (b))1 / 2 == 0 && \ >> (typeof (a))-1 > 0 && \ >> (typeof (b))-1 > 0 && \ >> (typeof (a))-1 == (typeof (b))-1 && \ >> sizeof a == sizeof b) > > Yes, this is relying on gcc's typeof extension, but we said above that > should be okay. > >> >> /* This macro evaluates each of "a", "b" and "r" exactly once, assuming neither >> * has variably modified type. >> */ >> #define ADD_UINT_OVERFLOW(a, b, r) \ >> ({ \ >> typedef char static_assert_1[-1 + 2 * HAVE_SAME_UINT_TYPES(a, b)]; \ >> typedef char static_assert_2[-1 + 2 * HAVE_SAME_UINT_TYPES(a, *r)]; \ > > We could perhaps relax this to: > typedef char static_assert[-1 + 2 * HAVE_SAME_UINT_TYPES((a)+(b), *r)]; \ > > if we merely care about the promoted result type having the same type > as where we are storing the result, and if we also assert that > sizeof(*r)>=sizeof(int) (for result types smaller than int, the logic > is a lot messier, because of arithmetic promotion).Right, I didn't consider types with lesser integer conversion rank than that of "unsigned int" -- I don't think we'd use "short" for memory allocation, sizes, indexing, file offsets etc. I wouldn't want to be more gnulib than gnulib :) (Not that I looked at it -- I intentionally didn't.)> >> typeof (a) a2 = a, b2 = b, max = -1, result; \ >> void *r2 = r; \ >> bool overflow = true; \ >> \ >> if (a2 <= max - b2) { \ >> result = a2 + b2; \ >> memcpy(r2, &result, sizeof result); \ >> overflow = false; \ >> } \ >> overflow; \ >> }) >> >> >> Note that using this would require a bit more pedantry than usual at the >> *call sites*; for example, adding constants "1" and "2u" would not work. > > Per the above, constant "1" would never work (it is a signed > constant). That's why figuring out a macro that works with arbitrary > type inputs, so long as their integer promotion doesn't overflow > (which is what the compiler builtins do) is nicer, but indeed harder. > Using a function instead of a macro gets you the implicit promotion > (you can pass 1 to a function taking size_t, whereas the macro > computing typeof(1) will see a signed type). > >> >> TBH, I think the macros I created above are hideous. I like call sites >> to be explicit about types, and so I'd prefer the much simpler; >> type-specific add_size_t_overflow() and mul_size_t_overflow() functions. > > Your macros were not too hideous, but only because they made limiting > assumptions like forcing unsigned types and not caring about > arithmetic promotion. The gnulib macros are more generic, but even > more hideous. > >> >> (Beyond reverting b31859402d14, I'd even replace the type-specific >> ADD_UINT64_T_OVERFLOW, MUL_UINT64_T_OVERFLOW, ADD_SIZE_T_OVERFLOW, >> MUL_SIZE_T_OVERFLOW function-like macros with actual functions. Just let >> the compiler inline whatever it likes.) >> >> Comments? > > As long as we have good unit tests for the particular use cases we > care about detecting, I can live with either a generic macro or with > type-specific functional interfaces. I'm a bit worried that using > functions may fail to catch overflow on the implicit type conversions > of arguments passed to the function; maybe it's possible to do a > hybrid, by writing a function that checks for type equality but then > calls a function for the actual overflow check, so that we are ensured > that callers use the correct types, but have the ease of reading the > overflow check in a function that doesn't require even more macro > expansions to see what it is doing. >Let me try this (later, only a sketch now): - use macros for checking that the input and output types are (ii) unsigned (i) integer types, ignore their (iii) particular ranges - convert the inputs to uintmax_t and perform the operation in uintmax_t unconditionally, only *remembering* -- using the same in-advance checks -- if UINTMAX_MAX is overflowed, - deduce the actual result limit from ((typeof (*r))-1), - assign the low-order bits in any case -- if either the uintmax_t operation overflows or the actual result limit is exceeded, it all happens in unsigned, so it's well-defined (modular reduction). Return the overflow status correctly. Thanks, Laszlo
Eric Blake
2021-Nov-18 18:12 UTC
[Libguestfs] [PATCH nbdkit 2/2] common/include/checked-overflow.h: Provide fallback
On Thu, Nov 18, 2021 at 06:24:09PM +0100, Laszlo Ersek wrote:> > Your analysis proceeds with all of "a", "b", and "max" of the same > > unsigned types. I understand why you skipped signed types (the > > analysis is harder); but it it worth supporting cases where "a" and > > "b" are possibly different widths? The gnulib file intprops.h does > > this, but of course, due to the incompatible licensing, me repeating > > how it does it here is not in our best interest (however, I will point > > out that it results in a MUCH larger preprocessor expansion than your > > simpler solution that enforces same size inputs). > > I imagine we could take each input (i.e., "a", "b") as uintmax_t, > perform the addition / multiplication in uintmax_t (with the safeguards > I proposed earlier adapted to UINTMAX_MAX), and then explicitly compare > the result against the maximum of the desired result type, such as > UINT64_MAX, SIZE_MAX, and so on. > > I didn't think of this because I'm acutely aware of types, promotions > etc when I write C code, so "these two types must be the same" is not > something I'd be bothered by.It's also a tradeoff on how generic we happen to be, vs. whether all our callers actually do use the same size. If the code is easier to maintain by insisting on callers matching certain patterns, that's okay as long as we can easily detect callers that fail those patterns. But if the cost of detecting out-of-spec callers is worse than just biting the bullet and writing the code to support mixed types, then being fully generic wins out.> > > > > >> (9) And so the C-language functions are just: > >> > >> bool > >> add_size_t_overflow (size_t a, size_t b, size_t *result) > >> { > >> if (a <= SIZE_MAX - b) { > >> *result = a + b; > >> return false; > >> } > >> return true; > > > > The compiler builtins ALSO assign the low-order bits of the > > infinite-precision operation into *result on overflow, behaving more > > like: > > > > *result = a + b; > > return ! (a <= SIZE_MAX - b); > > Ah, interesting. I didn't notice that. I figured "output is > indeterminate on error" is generally acceptable. Is there a particular > use for the low-order bits, on overflow?I suspect there may be some backwards compatibility at play, and most hardware is already able to give us that for free. Even though you're right that most sane users of these macros won't use the result on overflow, I'm more worried about the subtle bugs that would occur on less-tested platforms if we don't actually provide a deterministic value (particularly if most of our testing occurs on platforms where the compiler builtins do have deterministic behavior). ...> Right, I didn't consider types with lesser integer conversion rank than > that of "unsigned int" -- I don't think we'd use "short" for memory > allocation, sizes, indexing, file offsets etc.While it sounds impractical, I do have to wonder if anyone has ever tried to detect running out of TCP ports (a "short") by using overflow checking.> > I wouldn't want to be more gnulib than gnulib :) (Not that I looked at > it -- I intentionally didn't.)And that's a good thing - since I'm tainted by knowing what gnulib has done, I'm trying to make sure I keep it possible for you to come up with a clean-room implementation if we want it, even though I shouldn't write it myself. ...> >> Comments? > > > > As long as we have good unit tests for the particular use cases we > > care about detecting, I can live with either a generic macro or with > > type-specific functional interfaces. I'm a bit worried that using > > functions may fail to catch overflow on the implicit type conversions > > of arguments passed to the function; maybe it's possible to do a > > hybrid, by writing a function that checks for type equality but then > > calls a function for the actual overflow check, so that we are ensured > > that callers use the correct types, but have the ease of reading the > > overflow check in a function that doesn't require even more macro > > expansions to see what it is doing. > > > > Let me try this (later, only a sketch now): > > - use macros for checking that the input and output types are (ii) > unsigned (i) integer types, ignore their (iii) particular ranges > > - convert the inputs to uintmax_t and perform the operation in uintmax_t > unconditionally, only *remembering* -- using the same in-advance checks > -- if UINTMAX_MAX is overflowed,Perhaps you can do this with a function signature like: bool check_op_overflow (uint64_t a, uint64_t b, uint64_t max, uint64_t *r)> > - deduce the actual result limit from ((typeof (*r))-1), > > - assign the low-order bits in any case -- if either the uintmax_t > operation overflows or the actual result limit is exceeded, it all > happens in unsigned, so it's well-defined (modular reduction). Return > the overflow status correctly.and then utilize it something like: #define CHECK_OP_OVERFLOW(a, b, r) \ ({ \ bool overflow; uint64_t tmp; \ CHECK_UNSIGNED_INT (a); CHECK_UNSIGNED_INT (b); \ overflow = check_op_overflow (a, b, (typeof (*(r)))-1, &tmp); \ *(r) = tmp; \ overflow; \ }) and maybe even have CHECK_UNSIGNED_INT() permit signed values >= 0, rather than insisting that the underlying type be unsigned. At any rate, the approach sounds good to me. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org