thr3ads.net - Libguestfs - [Libguestfs] [PATCH nbdkit 2/2] common/include/checked-overflow.h: Provide fallback [Nov 2021]

If this information is useful, please help other people find it:
Share via:

Laszlo Ersek

2021-Nov-18 12:39 UTC

[Libguestfs] [PATCH nbdkit 2/2] common/include/checked-overflow.h: Provide fallback

(... long email)

On 11/17/21 19:07, Richard W.M. Jones wrote:> On Mon, Nov 15, 2021 at 02:11:01PM +0100, Laszlo Ersek wrote:
>> (2) Should nbdkit continue building on RHEL7? In the OCaml upgrade
>> thread
>>
<https://listman.redhat.com/archives/libguestfs/2021-November/msg00095.html>
>> I thought we practically abandoned RHEL7 for the v2v projects.
>
> If I sound schizophrenic on RHEL 7 that'll be because I'm in two
minds
> still about supporting it.
>
> In this particular case I was trying to test the OCaml changes on the
> oldest version of OCaml I have easy access to and found that there
> were various problems compiling on RHEL 7.  I fixed the other
> problems, but this fix was a bit more controversial.
>
>> (3) Red Hat Developer Toolset 10 (and 11 Beta) are available for
>> RHEL7:
>>
>> https://developers.redhat.com/products/developertoolset/overview
>>
>>
https://access.redhat.com/documentation/en-us/red_hat_developer_toolset/10/html/10.1_release_notes/system_requirements
>>
>>
https://access.redhat.com/documentation/en-us/red_hat_developer_toolset/11-beta/html/11.0_release_notes/system_requirements
>>
>> providing gcc 10 and 11 respectively.
>>
>> We could say that building nbdkit on RHEL7 requires devtoolset.
>
> This doesn't sound very convenient, but there's a stronger reason
to
> keep stuff working with GCC 3/4 and that's because OpenBSD sticks with
> GCC 4.2 as it was the last GPLv2+ version.  I think the reason OpenBSD
> does this is wrong, but that's their choice.  (Of course they have
> Clang).
>
>> (1) I think that APIs that lie are wrong. I could contribute standard
>> C implementations if we really think nbdkit should continue building
>> on RHEL7. The difference with the built-ins should only be in
>> performance, not in functionality.
>
> I agree.  If you want to have a go at standard C versions that would
> be great.  Even unsigned addition looks very hard.
>
> I found this lengthy paper on the subject:
> https://www.cs.utah.edu/~regehr/papers/overflow12.pdf
I've not checked the paper, but I can say two things at once:

- "~regehr" means it's going to be great :)

- signed integers are a huge complication, while at the same time, their
  use (or usefulness) is extremely limited for dealing with memory
  allocations, sizes, indexing, file offsets, and so on.

I wouldn't try to do this in a type-transparent manner. I'd first revert
commit b31859402d14 ("common/include/checked-overflow.h: Simplify",
2021-11-11), and then add separate functions for uint64_t and size_t.

If we don't want to be 100% standard C, just make it work for gcc-4.2,
then a type-generic(-ish) solution could exist using "typeof" and
(perhaps) statement expressions. (Both are GCC extensions that exist in
4.2:
- <https://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcc/Statement-Exprs.html>
- <https://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcc/Typeof.html>.)

To go into some details:

(1) The idea is that we want the following C expression to evaluate to
true, without OP overflowing first:

  a OP b <= max

where "a", "b", "max" are all of the same unsigned
integer type, and OP
is either "+" or "*".

(2) In case OP is "+":

  a + b <= max

we can always subtract "b" from both sides:

  a <= max - b

This condition is checkable in C as-is, because "max - b" cannot
underflow. If the condition evaluates to true, then "a + b" won't
overflow.

(3) In case OP is "*":

  a * b <= max

we need to see if "b" is zero, at first. If "b" is zero, the
multiplication is safe. If "b" is nonzero, then we can divide both
sides
by it:

  a <= max / b

This condition is checkable in C as-is, because "max / b" cannot wrap,
or involve a division by zero.

However, unlike in the addition/subtraction case (2), the *equivalence*
of this "rearrangement as a division" with the original

  a * b <= max

is not immediately clear, given that the "/" operator in C (between
integer operands) is not an exact division, but division with a
remainder. As the standard says, "the result ... is the algebraic
quotient with any fractional part discarded".

The usual way to prove "equivalence" between two logical formulae is
to
prove "implication" in both directions. That is, assuming
"b" is
nonzero, let's prove both of the following implications between
C-language expressions:

  (a <= max / b)  IMPLIES  (a * b <= max)   [1]

  (a * b <= max)  IMPLIES  (a <= max / b)   [2]

(5) At first, let's express what the

  max / b == q                              [3]

expression in C means, expressed *mathematically*. That way, we can
investigate [1] and [2] in pure math.

The C expression [3] means, mathematically speaking:

  max = q * b + r                           [4]

where "q" is a nonnegative integer, and "r" is a nonnegative
integer
strictly smaller than "b". And, this decomposition is unique, given
any
pair of (max, b). (And we posited earlier that "b" is nonzero, of
course.)

(6) Now, going back to [1] and [2]:

- wherever we have a (max / b) C-language expression, substitute "q"
(from [3];

- wherever we have a standalone "max" expression, substitute "q *
b + r"
(from [4]).

We get the following *mathematical* (not C!) expressions:

  (a <= q)              IMPLIES  (a * b <= q * b + r)   [5]

  (a * b <= q * b + r)  IMPLIES  (a <= q)               [6]

(7) Implication [5] is trivially true, as we multiply both sides of the
premise inequality (a <= q) with the positive integer "b" at first:

  a * b <= q * b

and then add the nonnegative integer "r" to the right hand side only:

   a * b <= q * b + r

(8) Implication [6] is slightly more tricky. First, let's divide both
sides of the premise inequality

  (a * b <= q * b + r)

with the positive integer "b":

  a * b    q * b + r
  ----- <= ---------
    b          b

That simplifies to:

  a <= q + (r/b)                                        [7]

Here's where we consider the restriction on "r" from [4]:
"r" is a
nonnegative integer *strictly smaller* than "b". That gives us, for
the
(r/b) addend in separation:

  0 <= (r / b) < 1

Which means that we can tack another (strict) inequality to the right
side of [7]:

  a <= q + (r/b) < q + 1                                [8]
                 ^^^^^^^

(Because, adding (r/b) to "q" produces a smaller number than adding 1
to
"q" does.)

Considering the two far sides of [8] together, we get

  a < q + 1

Then, as both "a" and "q" are integers, this is equivalent
to

  a <= q

which is what we wanted to reach in [6].

(9) And so the C-language functions are just:

bool
add_size_t_overflow (size_t a, size_t b, size_t *result)
{
  if (a <= SIZE_MAX - b) {
    *result = a + b;
    return false;
  }
  return true;
}

bool
mul_size_t_overflow (size_t a, size_t b, size_t *result)
{
  if (b == 0 || a <= SIZE_MAX / b) {
    *result = a * b;
    return false;
  }
  return true;
}

(10) The (unsigned) type-generic macros could look something like this.
First, we'd need to check if the types of "a", "b" and
"result" were (i)
integers, (ii) unsigned, and (iii) had identical range. If not, we
should get a compilation failure (static assert, of sorts). Let's assume
at least that none of our integer types have padding bits, and that
signed integer types use two's complement. (Although the signed
representation may be irrelevant in the discussion below.)

I don't consider bit-fields at all because they are pure evil anyway. :)

(i) The "integer" check for any arithmetic type can be done by
dividing
((type)1) with 2. Every floating point type can represent 1/2 exactly
(so the result compares uneqal to zero), and for every integer type, the
result is zero.

(ii) The "unsigned" check can be done by seeing if ((type)-1) is
positive. This works even for _Bool, plus -1 is safely expressible by
any signed integer type (IOW even if the check fails, it is safe to
evaluate).

(iii) "identical range" can be expressed by ((type1)-1 == (type2)-1),
with the additional -- perhaps redundant -- restriction (sizeof a =sizeof b).
This relies on our assumption about integer representation
(no padding bits).

The static assert can be done by intentionally violating a C constraint
(which in turn requires a diagnostic to be emitted), dependent on an
integer constant expression that combines (i), (ii) and (iii). A common
trick is to create an array typedef, where the size of the array is -1
(if the condition we want to enforce fails) vs. +1 (otherwise).

(11) Then we could do this, for example:

#include <stdbool.h>
#include <string.h>

/* This macro does not evaluate any one of its arguments, unless the argument
 * has variably modified type.
 */
#define HAVE_SAME_UINT_TYPES(a, b)     \
  ((typeof (a))1 / 2 == 0 &&           \
   (typeof (b))1 / 2 == 0 &&           \
   (typeof (a))-1 > 0 &&               \
   (typeof (b))-1 > 0 &&               \
   (typeof (a))-1 == (typeof (b))-1 && \
   sizeof a == sizeof b)

/* This macro evaluates each of "a", "b" and "r"
exactly once, assuming neither
 * has variably modified type.
 */
#define ADD_UINT_OVERFLOW(a, b, r)                                      \
  ({                                                                    \
    typedef char static_assert_1[-1 + 2 * HAVE_SAME_UINT_TYPES(a, b)];  \
    typedef char static_assert_2[-1 + 2 * HAVE_SAME_UINT_TYPES(a, *r)]; \
    typeof (a) a2 = a, b2 = b, max = -1, result;                        \
    void *r2 = r;                                                       \
    bool overflow = true;                                               \
                                                                        \
    if (a2 <= max - b2) {                                               \
      result = a2 + b2;                                                 \
      memcpy(r2, &result, sizeof result);                               \
      overflow = false;                                                 \
    }                                                                   \
    overflow;                                                           \
  })

Note that using this would require a bit more pedantry than usual at the
*call sites*; for example, adding constants "1" and "2u"
would not work.

TBH, I think the macros I created above are hideous. I like call sites
to be explicit about types, and so I'd prefer the much simpler;
type-specific add_size_t_overflow() and mul_size_t_overflow() functions.

(Beyond reverting b31859402d14, I'd even replace the type-specific
ADD_UINT64_T_OVERFLOW, MUL_UINT64_T_OVERFLOW, ADD_SIZE_T_OVERFLOW,
MUL_SIZE_T_OVERFLOW function-like macros with actual functions. Just let
the compiler inline whatever it likes.)

Comments?

Thanks
Laszlo

Laszlo Ersek

2021-Nov-18 13:06 UTC

head link

[Libguestfs] [PATCH nbdkit 2/2] common/include/checked-overflow.h: Provide fallback

some additional comments:

On 11/18/21 13:39, Laszlo Ersek wrote:
> The usual way to prove "equivalence" between two logical formulae
is to
> prove "implication" in both directions. That is, assuming
"b" is
> nonzero, let's prove both of the following implications between
> C-language expressions:
> 
>   (a <= max / b)  IMPLIES  (a * b <= max)   [1]
> 
>   (a * b <= max)  IMPLIES  (a <= max / b)   [2]
- We can call [1] "correctness": wherever our reformulated condition
works, we're sure to avoid overflow. I.e., our reworked formula isn't
"too lax" (= a security issue).

- We can call [2] "completeness": wherever our original condition can
be
satisfied, our reformulated condition will evaluate to "true" as well.
In other words, our reworked formula isn't "too strict" (= a
needless
limit).

- In both [1] and [2], it is understood (as I posited initially) that
the multiplication operation "a * b" does not overflow. We posit that
for the sake of argument; after the (proven) rearrangements, we'll only
use multiplication if it indeed does not overflow. So perhaps it would
be more precise to call [1] and [2] mathematical expressions already
(not C expressions), just redefine "/" to be the C-style division (but
let * have infinite precision).
> /* This macro does not evaluate any one of its arguments, unless the
argument
>  * has variably modified type.
>  */
> #define HAVE_SAME_UINT_TYPES(a, b)     \
>   ((typeof (a))1 / 2 == 0 &&           \
>    (typeof (b))1 / 2 == 0 &&           \
>    (typeof (a))-1 > 0 &&               \
>    (typeof (b))-1 > 0 &&               \
>    (typeof (a))-1 == (typeof (b))-1 && \
>    sizeof a == sizeof b)
Sorry, forgot the parens around (a) and (b) in the sizeof operands --
normally they are not needed, but here "a" and "b" are macro
arguments,
so the parents are needed.

Thanks,
Laszlo

Eric Blake

2021-Nov-18 15:27 UTC

head link

[Libguestfs] [PATCH nbdkit 2/2] common/include/checked-overflow.h: Provide fallback

On Thu, Nov 18, 2021 at 01:39:39PM +0100, Laszlo Ersek
wrote:> (... long email)
And quite insightful. I'm trimming to portions where I'm replying...
> 
> - signed integers are a huge complication, while at the same time, their
>   use (or usefulness) is extremely limited for dealing with memory
>   allocations, sizes, indexing, file offsets, and so on.
Actually, the emacs project has been moving towards explicitly using
signed types for memory sizes.  Why?  Because compilers have modes
where they can diagnose signed overflows, but the C standard requires
unsigned types to perform module 2^N arithmetic without warning.  So
using a signed type (gnulib calls it idx_t) that is otherwise the same
width as size_t makes it possible for the compiler to help diagnose
non-portable code.

But yse, writing code that avoids overflow using signed types is a
much trickier proposition.
> 
> I wouldn't try to do this in a type-transparent manner. I'd first
revert
> commit b31859402d14 ("common/include/checked-overflow.h:
Simplify",
> 2021-11-11), and then add separate functions for uint64_t and size_t.
Writing functions would allow a and b to undergo integer promotions,
but the return type pointer forces an explicit choice of the result
type.  Unless we tell the compiler to warn on narrowing of integers to
a function call, there is a risk that on 32-bit machines, someone
passing a 64-bit off_t value to a 32-bit size_t overflow checker will
undergo inadvertent overflow prior to the function getting a chance to
perform its checks.

One of the benefits of the gcc/clang builtin is that they are
type-agnostic, even if I want to perform off_t*int and store the
result in size_t, the builtin will correctly tell me if the
infinite-precision result would overflow the destination, without me
having to figure out whether promotion or implicit narrowing affected
things.  Using a function loses some of that benefit when compared to
a fully-generic macro.  However, writing a fully-generic macro that
correctly handles mismatched operand types is indeed harder.
> 
> If we don't want to be 100% standard C, just make it work for gcc-4.2,
> then a type-generic(-ish) solution could exist using "typeof" and
> (perhaps) statement expressions. (Both are GCC extensions that exist in
> 4.2:
> - <https://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcc/Statement-Exprs.html>
> - <https://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcc/Typeof.html>.)
Indeed, that's also an intermediate option - we require gcc (even on
RHEL 7) for our automatic cleanup usage, so relying on the features
present in the oldest supported gcc is acceptable, rather than having
to strive for strict C compliance.
> 
> To go into some details:
> 
> 
> (1) The idea is that we want the following C expression to evaluate to
> true, without OP overflowing first:
> 
>   a OP b <= max
> 
> where "a", "b", "max" are all of the same
unsigned integer type, and OP
> is either "+" or "*".
Your analysis proceeds with all of "a", "b", and
"max" of the same
unsigned types.  I understand why you skipped signed types (the
analysis is harder); but it it worth supporting cases where "a" and
"b" are possibly different widths?  The gnulib file intprops.h does
this, but of course, due to the incompatible licensing, me repeating
how it does it here is not in our best interest (however, I will point
out that it results in a MUCH larger preprocessor expansion than your
simpler solution that enforces same size inputs).
> (9) And so the C-language functions are just:
> 
> bool
> add_size_t_overflow (size_t a, size_t b, size_t *result)
> {
>   if (a <= SIZE_MAX - b) {
>     *result = a + b;
>     return false;
>   }
>   return true;
The compiler builtins ALSO assign the low-order bits of the
infinite-precision operation into *result on overflow, behaving more
like:

  *result = a + b;
  return ! (a <= SIZE_MAX - b);

for unsigned types (where C has guaranteed modulo semantics), and a
much more complicated solution with signed types (basically, perform
the operation in the unsigned counterparts, and then cast those
low-order bits back to signed at the end).
> }
> 
> bool
> mul_size_t_overflow (size_t a, size_t b, size_t *result)
> {
>   if (b == 0 || a <= SIZE_MAX / b) {
>     *result = a * b;
>     return false;
>   }
>   return true;
> }
> 
> 
> (10) The (unsigned) type-generic macros could look something like this.
> First, we'd need to check if the types of "a", "b"
and "result" were (i)
> integers, (ii) unsigned, and (iii) had identical range.
Check (i) is a nice safety valve.  Check (ii) is essential to avoid
the nastiness that would be required for correct results with signed
types, but with a larger macro, we could indeed support it (gnulib
demonstrates that).  But I'm inclined to insist on (ii) until we
encounter a case where we really care about signed overflow; insisting
that we use unsigned sizes (contrary to emacs' choice of using a
signed idx_t type for compiler assistance) is easier to maintain, even
if it reduces the places where the compiler can help us.  Check (iii)
is the one I question as being strictly necessary; if we can correctly
predict the size of the result after arithmetic promotion, we can
determine the maximum value that we must fit within even if the inputs
underwent a size change.

...> I don't consider bit-fields at all because they are pure evil anyway.
:)
Too true!

...> (11) Then we could do this, for example:
> 
> #include <stdbool.h>
> #include <string.h>
> 
> /* This macro does not evaluate any one of its arguments, unless the
argument
>  * has variably modified type.
>  */
> #define HAVE_SAME_UINT_TYPES(a, b)     \
>   ((typeof (a))1 / 2 == 0 &&           \
>    (typeof (b))1 / 2 == 0 &&           \
>    (typeof (a))-1 > 0 &&               \
>    (typeof (b))-1 > 0 &&               \
>    (typeof (a))-1 == (typeof (b))-1 && \
>    sizeof a == sizeof b)
Yes, this is relying on gcc's typeof extension, but we said above that
should be okay.
> 
> /* This macro evaluates each of "a", "b" and
"r" exactly once, assuming neither
>  * has variably modified type.
>  */
> #define ADD_UINT_OVERFLOW(a, b, r)                                      \
>   ({                                                                    \
>     typedef char static_assert_1[-1 + 2 * HAVE_SAME_UINT_TYPES(a, b)];  \
>     typedef char static_assert_2[-1 + 2 * HAVE_SAME_UINT_TYPES(a, *r)]; \
We could perhaps relax this to:
  typedef char static_assert[-1 + 2 * HAVE_SAME_UINT_TYPES((a)+(b), *r)]; \

if we merely care about the promoted result type having the same type
as where we are storing the result, and if we also assert that
sizeof(*r)>=sizeof(int) (for result types smaller than int, the logic
is a lot messier, because of arithmetic promotion).
>     typeof (a) a2 = a, b2 = b, max = -1, result;                        \
>     void *r2 = r;                                                       \
>     bool overflow = true;                                               \
>                                                                         \
>     if (a2 <= max - b2) {                                              
\
>       result = a2 + b2;                                                 \
>       memcpy(r2, &result, sizeof result);                              
\
>       overflow = false;                                                 \
>     }                                                                   \
>     overflow;                                                           \
>   })
> 
> 
> Note that using this would require a bit more pedantry than usual at the
> *call sites*; for example, adding constants "1" and
"2u" would not work.
Per the above, constant "1" would never work (it is a signed
constant).  That's why figuring out a macro that works with arbitrary
type inputs, so long as their integer promotion doesn't overflow
(which is what the compiler builtins do) is nicer, but indeed harder.
Using a function instead of a macro gets you the implicit promotion
(you can pass 1 to a function taking size_t, whereas the macro
computing typeof(1) will see a signed type).
> 
> TBH, I think the macros I created above are hideous. I like call sites
> to be explicit about types, and so I'd prefer the much simpler;
> type-specific add_size_t_overflow() and mul_size_t_overflow() functions.
Your macros were not too hideous, but only because they made limiting
assumptions like forcing unsigned types and not caring about
arithmetic promotion.  The gnulib macros are more generic, but even
more hideous.
> 
> (Beyond reverting b31859402d14, I'd even replace the type-specific
> ADD_UINT64_T_OVERFLOW, MUL_UINT64_T_OVERFLOW, ADD_SIZE_T_OVERFLOW,
> MUL_SIZE_T_OVERFLOW function-like macros with actual functions. Just let
> the compiler inline whatever it likes.)
> 
> Comments?
As long as we have good unit tests for the particular use cases we
care about detecting, I can live with either a generic macro or with
type-specific functional interfaces.  I'm a bit worried that using
functions may fail to catch overflow on the implicit type conversions
of arguments passed to the function; maybe it's possible to do a
hybrid, by writing a function that checks for type equality but then
calls a function for the actual overflow check, so that we are ensured
that callers use the correct types, but have the ease of reading the
overflow check in a function that doesn't require even more macro
expansions to see what it is doing.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Libguestfs - Nov 2021 - [PATCH nbdkit 2/2] common/include/checked-overflow.h: Provide fallback

[Libguestfs] [PATCH nbdkit 2/2] common/include/checked-overflow.h: Provide fallback

[Libguestfs] [PATCH nbdkit 2/2] common/include/checked-overflow.h: Provide fallback

[Libguestfs] [PATCH nbdkit 2/2] common/include/checked-overflow.h: Provide fallback