Juneyoung Lee via llvm-dev
2020-Oct-08 23:13 UTC
[llvm-dev] Undef and Poison round table follow-up & a plan
> It is important to note that this applies to trap representations and notto unspecified values. A structure or union never has a trap representation. Yes, nondeterministic bits would work for padding of struct/union, as described in (3) The third case is the value of struct/union padding. For the members of struct/union, it is allowed to have trap representation, so poison can be used. Juneyoung On Fri, Oct 9, 2020 at 5:37 AM Hubert Tong <hubert.reinterpretcast at gmail.com> wrote:> On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hello all, >> >> Thank everyone who participated in the (impromptu) round table discussion >> on Tuesday. >> For those who are interested, I share the summary of the discussion. >> Also, I share a short-term plan regarding this issue and relevant patches. >> >> >> *Fixing Miscompilations using Freeze* >> ----------------------------------- >> >> To reduce the cost of fixing miscompilations using freeze instruction, we >> need to >> optimize freeze away whenever possible. >> Using the no-undef/poison assumption from the source language (C/C++ in >> this context) can play a significant role. >> To make use the assumptions, here are short-term goals: >> >> *1. Preserve no-undef/poison assumption of function arguments from C/C++ >> when valid.* >> >> There is an ongoing relevant patch (that is written by others): >> https://reviews.llvm.org/D81678 >> >> *2. Preserve no-undef/poison assumption of lvalue reads in C/C++ when >> valid.* >> >> Reading an indeterminate value from an lvalue that does not have char or >> std::byte type is UB [1]. >> Since reading an lvalue is lowered to `load` in IR, we suggest attaching >> a new >> !noundef metadata to such `load`s. >> The IR-side change is here: https://reviews.llvm.org/D89050 >> The clang-side change is going to be made after D81678 is reviewed, >> because it is likely >> that this patch will have a lot of changes in clang tests. >> >> >> *Replacing Undef with Poison* >> --------------------------- >> >> Since undef is known to be the source of many optimizations due to its >> complexity, >> we'd like to suggest gradually moving towards using poison only. >> To make it, (1) `poison` constant should be introduced into LLVM IR >> first, and (2) >> transformations that introduce `undef` should be updated to introduce >> `poison` instead. >> >> For the step (2), we need an experimental result showing that it does not >> cause >> performance degradation. This relies on better support for freeze (the >> no-undef/poison analysis patches). >> >> *1. Introduce a new `poison` constant into IR*: >> https://reviews.llvm.org/D71126 >> >> Note that `poison` constant can be used as a true placeholder value as >> well. >> Undef cannot be used in general because it is less undefined than poison. >> >> *2. Update transformations that introduce `undef` to introduce `poison` >> instead* >> >> (1) There are transformations that introduce `undef` as a placeholder >> (e.g. phi operand >> from an unreachable block). >> For these, `poison` can be used instead. >> >> (2) The value of an uninitialized object (automatic or dynamic). >> They are indeterminate values in C/C++, so okay to use poison instead. >> A tricky case is a bitfield access, and we have two possible solutions: >> >> - i. Introduce a very-packed struct type >> ``` >> <C> >> struct { >> int a:2, b:6; >> } s; >> >> v = s.a; >> >> => >> >> <IR> >> >> s = alloca >> >> tmp = load *{{i2, i6}}** s ; load as a very packed struct type >> v = extractvalue tmp, 0 >> ``` >> * Pros: Can be used to precisely lower C/C++'s struct typed function >> argument into IR >> (currently clang coerces a struct into int if small enough; I'll explain >> about this detail if anyone requests) >> * Cons: Since optimizations aren’t aware of the new type, they should >> be updated >> >> - ii. Use load-freeze >> ``` >> <C> >> struct { >> int a:2, b:6; >> } s; >> >> v = s.a; >> >> => >> >> <IR> >> s = alloca >> >> // Poison bits are frozen and returned >> tmp = *load freeze* i8* s >> v = tmp & 3 >> ``` >> * Pros: The change is simpler >> * Cons: Store forwarding isn’t free; needs insertion of freeze >> (store x, p; v = load freeze p => store x, p; v = freeze x) >> >> >> (3) The third case is the value of struct/union padding. >> Padding is filled with unspecified value in C, so it is too undefined to >> use poison. >> We can fill it with defined bits nondeterministically chosen at >> allocation time (freeze poison). >> >> ``` >> <C> >> struct { >> char a; // 3 bytes padding >> int b; >> } s; >> >> v = s.b; >> >> => >> >> <IR> >> s = alloca {i8, i32} // alloca initializes bytes in a type-dependent >> manner >> // s[0], s[4~7]: poison >> // s[1~3]: let's fill these bytes with nondet. bits >> >> s2 = gep (bitcast s to i8*), 4 >> v = load i32 s2 >> ``` >> >> >> Thanks, >> Juneyoung >> >> >> >> [1] >> C11 6.2.6.1.5: If the stored value of an object has such a representation >> and is read by an lvalue expression that does not have character type, the >> behavior is undefined. >> (Similarly, C17 6.2.6.1.5) >> > It is important to note that this applies to trap representations and not > to unspecified values. A structure or union never has a trap representation. > > >> C++14 8.5.12: If an indeterminate value is produced by an evaluation, the >> behavior is undefined except in the following cases: If an indeterminate >> value of unsigned narrow character type ... >> (Similarly, C++17 11.6.12 , C++11 4.1.1) >> > While loading undef for the unsigned character type case merely produces > undef, for C++, operations such as sign-extend or zero-extend on an undef > i8 is also undefined behaviour. > > >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-- Juneyoung Lee Software Foundation Lab, Seoul National University -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201009/dc379e78/attachment.html>
Hubert Tong via llvm-dev
2020-Oct-09 01:45 UTC
[llvm-dev] Undef and Poison round table follow-up & a plan
On Thu, Oct 8, 2020 at 7:13 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr> wrote:> > It is important to note that this applies to trap representations and > not to unspecified values. A structure or union never has a trap > representation. > Yes, nondeterministic bits would work for padding of struct/union, as > described in (3) The third case is the value of struct/union padding. > For the members of struct/union, it is allowed to have trap > representation, so poison can be used. >At what point are the members considered poison? For copying/passing/returning a struct or union, there is no UB even if some members are uninitialized.> > Juneyoung > > On Fri, Oct 9, 2020 at 5:37 AM Hubert Tong < > hubert.reinterpretcast at gmail.com> wrote: > >> On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Hello all, >>> >>> Thank everyone who participated in the (impromptu) round table >>> discussion on Tuesday. >>> For those who are interested, I share the summary of the discussion. >>> Also, I share a short-term plan regarding this issue and relevant >>> patches. >>> >>> >>> *Fixing Miscompilations using Freeze* >>> ----------------------------------- >>> >>> To reduce the cost of fixing miscompilations using freeze instruction, >>> we need to >>> optimize freeze away whenever possible. >>> Using the no-undef/poison assumption from the source language (C/C++ in >>> this context) can play a significant role. >>> To make use the assumptions, here are short-term goals: >>> >>> *1. Preserve no-undef/poison assumption of function arguments from C/C++ >>> when valid.* >>> >>> There is an ongoing relevant patch (that is written by others): >>> https://reviews.llvm.org/D81678 >>> >>> *2. Preserve no-undef/poison assumption of lvalue reads in C/C++ when >>> valid.* >>> >>> Reading an indeterminate value from an lvalue that does not have char or >>> std::byte type is UB [1]. >>> Since reading an lvalue is lowered to `load` in IR, we suggest attaching >>> a new >>> !noundef metadata to such `load`s. >>> The IR-side change is here: https://reviews.llvm.org/D89050 >>> The clang-side change is going to be made after D81678 is reviewed, >>> because it is likely >>> that this patch will have a lot of changes in clang tests. >>> >>> >>> *Replacing Undef with Poison* >>> --------------------------- >>> >>> Since undef is known to be the source of many optimizations due to its >>> complexity, >>> we'd like to suggest gradually moving towards using poison only. >>> To make it, (1) `poison` constant should be introduced into LLVM IR >>> first, and (2) >>> transformations that introduce `undef` should be updated to introduce >>> `poison` instead. >>> >>> For the step (2), we need an experimental result showing that it does >>> not cause >>> performance degradation. This relies on better support for freeze (the >>> no-undef/poison analysis patches). >>> >>> *1. Introduce a new `poison` constant into IR*: >>> https://reviews.llvm.org/D71126 >>> >>> Note that `poison` constant can be used as a true placeholder value as >>> well. >>> Undef cannot be used in general because it is less undefined than poison. >>> >>> *2. Update transformations that introduce `undef` to introduce `poison` >>> instead* >>> >>> (1) There are transformations that introduce `undef` as a placeholder >>> (e.g. phi operand >>> from an unreachable block). >>> For these, `poison` can be used instead. >>> >>> (2) The value of an uninitialized object (automatic or dynamic). >>> They are indeterminate values in C/C++, so okay to use poison instead. >>> A tricky case is a bitfield access, and we have two possible solutions: >>> >>> - i. Introduce a very-packed struct type >>> ``` >>> <C> >>> struct { >>> int a:2, b:6; >>> } s; >>> >>> v = s.a; >>> >>> => >>> >>> <IR> >>> >>> s = alloca >>> >>> tmp = load *{{i2, i6}}** s ; load as a very packed struct type >>> v = extractvalue tmp, 0 >>> ``` >>> * Pros: Can be used to precisely lower C/C++'s struct typed function >>> argument into IR >>> (currently clang coerces a struct into int if small enough; I'll explain >>> about this detail if anyone requests) >>> * Cons: Since optimizations aren’t aware of the new type, they should >>> be updated >>> >>> - ii. Use load-freeze >>> ``` >>> <C> >>> struct { >>> int a:2, b:6; >>> } s; >>> >>> v = s.a; >>> >>> => >>> >>> <IR> >>> s = alloca >>> >>> // Poison bits are frozen and returned >>> tmp = *load freeze* i8* s >>> v = tmp & 3 >>> ``` >>> * Pros: The change is simpler >>> * Cons: Store forwarding isn’t free; needs insertion of freeze >>> (store x, p; v = load freeze p => store x, p; v = freeze x) >>> >>> >>> (3) The third case is the value of struct/union padding. >>> Padding is filled with unspecified value in C, so it is too undefined to >>> use poison. >>> We can fill it with defined bits nondeterministically chosen at >>> allocation time (freeze poison). >>> >>> ``` >>> <C> >>> struct { >>> char a; // 3 bytes padding >>> int b; >>> } s; >>> >>> v = s.b; >>> >>> => >>> >>> <IR> >>> s = alloca {i8, i32} // alloca initializes bytes in a type-dependent >>> manner >>> // s[0], s[4~7]: poison >>> // s[1~3]: let's fill these bytes with nondet. bits >>> >>> s2 = gep (bitcast s to i8*), 4 >>> v = load i32 s2 >>> ``` >>> >>> >>> Thanks, >>> Juneyoung >>> >>> >>> >>> [1] >>> C11 6.2.6.1.5: If the stored value of an object has such a >>> representation and is read by an lvalue expression that does not have >>> character type, the behavior is undefined. >>> (Similarly, C17 6.2.6.1.5) >>> >> It is important to note that this applies to trap representations and not >> to unspecified values. A structure or union never has a trap representation. >> >> >>> C++14 8.5.12: If an indeterminate value is produced by an evaluation, >>> the behavior is undefined except in the following cases: If an >>> indeterminate value of unsigned narrow character type ... >>> (Similarly, C++17 11.6.12 , C++11 4.1.1) >>> >> While loading undef for the unsigned character type case merely produces >> undef, for C++, operations such as sign-extend or zero-extend on an undef >> i8 is also undefined behaviour. >> >> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> > > -- > > Juneyoung Lee > Software Foundation Lab, Seoul National University >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201008/fc759927/attachment.html>
Juneyoung Lee via llvm-dev
2020-Oct-09 03:08 UTC
[llvm-dev] Undef and Poison round table follow-up & a plan
It is UB when a poison is passed to certain operations that raise UB on
poison, such as division by poison/dereferencing poison pointer/branching
on poison condition/etc.
Otherwise, poison is simply propagated, but it does not raise UB
Copying poison bytes is okay:
// Members are initialized to poison at object creation.
p = alloca {i8, i32} // p[0], p[4~7] are poison
q = alloca {i8, i32} // we want to copy p to q
v = load i8* p[0] // v is poison
store i8 v, i8* q[0] // poison is simply copied; no UB happened
Similarly, passing/returning poison is allowed as well.
Juneyoung
On Fri, Oct 9, 2020 at 10:45 AM Hubert Tong <
hubert.reinterpretcast at gmail.com> wrote:
> On Thu, Oct 8, 2020 at 7:13 PM Juneyoung Lee <juneyoung.lee at
sf.snu.ac.kr>
> wrote:
>
>> > It is important to note that this applies to trap representations
and
>> not to unspecified values. A structure or union never has a trap
>> representation.
>> Yes, nondeterministic bits would work for padding of struct/union, as
>> described in (3) The third case is the value of struct/union padding.
>> For the members of struct/union, it is allowed to have trap
>> representation, so poison can be used.
>>
> At what point are the members considered poison? For
> copying/passing/returning a struct or union, there is no UB even if some
> members are uninitialized.
>
>
>>
>> Juneyoung
>>
>> On Fri, Oct 9, 2020 at 5:37 AM Hubert Tong <
>> hubert.reinterpretcast at gmail.com> wrote:
>>
>>> On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Hello all,
>>>>
>>>> Thank everyone who participated in the (impromptu) round table
>>>> discussion on Tuesday.
>>>> For those who are interested, I share the summary of the
discussion.
>>>> Also, I share a short-term plan regarding this issue and
relevant
>>>> patches.
>>>>
>>>>
>>>> *Fixing Miscompilations using Freeze*
>>>> -----------------------------------
>>>>
>>>> To reduce the cost of fixing miscompilations using freeze
instruction,
>>>> we need to
>>>> optimize freeze away whenever possible.
>>>> Using the no-undef/poison assumption from the source language
(C/C++ in
>>>> this context) can play a significant role.
>>>> To make use the assumptions, here are short-term goals:
>>>>
>>>> *1. Preserve no-undef/poison assumption of function arguments
from
>>>> C/C++ when valid.*
>>>>
>>>> There is an ongoing relevant patch (that is written by others):
>>>> https://reviews.llvm.org/D81678
>>>>
>>>> *2. Preserve no-undef/poison assumption of lvalue reads in
C/C++ when
>>>> valid.*
>>>>
>>>> Reading an indeterminate value from an lvalue that does not
have char or
>>>> std::byte type is UB [1].
>>>> Since reading an lvalue is lowered to `load` in IR, we suggest
>>>> attaching a new
>>>> !noundef metadata to such `load`s.
>>>> The IR-side change is here: https://reviews.llvm.org/D89050
>>>> The clang-side change is going to be made after D81678 is
reviewed,
>>>> because it is likely
>>>> that this patch will have a lot of changes in clang tests.
>>>>
>>>>
>>>> *Replacing Undef with Poison*
>>>> ---------------------------
>>>>
>>>> Since undef is known to be the source of many optimizations due
to its
>>>> complexity,
>>>> we'd like to suggest gradually moving towards using poison
only.
>>>> To make it, (1) `poison` constant should be introduced into
LLVM IR
>>>> first, and (2)
>>>> transformations that introduce `undef` should be updated to
introduce
>>>> `poison` instead.
>>>>
>>>> For the step (2), we need an experimental result showing that
it does
>>>> not cause
>>>> performance degradation. This relies on better support for
freeze (the
>>>> no-undef/poison analysis patches).
>>>>
>>>> *1. Introduce a new `poison` constant into IR*:
>>>> https://reviews.llvm.org/D71126
>>>>
>>>> Note that `poison` constant can be used as a true placeholder
value as
>>>> well.
>>>> Undef cannot be used in general because it is less undefined
than
>>>> poison.
>>>>
>>>> *2. Update transformations that introduce `undef` to introduce
`poison`
>>>> instead*
>>>>
>>>> (1) There are transformations that introduce `undef` as a
placeholder
>>>> (e.g. phi operand
>>>> from an unreachable block).
>>>> For these, `poison` can be used instead.
>>>>
>>>> (2) The value of an uninitialized object (automatic or
dynamic).
>>>> They are indeterminate values in C/C++, so okay to use poison
instead.
>>>> A tricky case is a bitfield access, and we have two possible
solutions:
>>>>
>>>> - i. Introduce a very-packed struct type
>>>> ```
>>>> <C>
>>>> struct {
>>>> int a:2, b:6;
>>>> } s;
>>>>
>>>> v = s.a;
>>>>
>>>> =>
>>>>
>>>> <IR>
>>>>
>>>> s = alloca
>>>>
>>>> tmp = load *{{i2, i6}}** s ; load as a very packed struct type
>>>> v = extractvalue tmp, 0
>>>> ```
>>>> * Pros: Can be used to precisely lower C/C++'s struct
typed function
>>>> argument into IR
>>>> (currently clang coerces a struct into int if small enough;
I'll
>>>> explain about this detail if anyone requests)
>>>> * Cons: Since optimizations aren’t aware of the new type,
they should
>>>> be updated
>>>>
>>>> - ii. Use load-freeze
>>>> ```
>>>> <C>
>>>> struct {
>>>> int a:2, b:6;
>>>> } s;
>>>>
>>>> v = s.a;
>>>>
>>>> =>
>>>>
>>>> <IR>
>>>> s = alloca
>>>>
>>>> // Poison bits are frozen and returned
>>>> tmp = *load freeze* i8* s
>>>> v = tmp & 3
>>>> ```
>>>> * Pros: The change is simpler
>>>> * Cons: Store forwarding isn’t free; needs insertion of
freeze
>>>> (store x, p; v = load freeze p => store x, p; v = freeze
x)
>>>>
>>>>
>>>> (3) The third case is the value of struct/union padding.
>>>> Padding is filled with unspecified value in C, so it is too
undefined
>>>> to use poison.
>>>> We can fill it with defined bits nondeterministically chosen at
>>>> allocation time (freeze poison).
>>>>
>>>> ```
>>>> <C>
>>>> struct {
>>>> char a; // 3 bytes padding
>>>> int b;
>>>> } s;
>>>>
>>>> v = s.b;
>>>>
>>>> =>
>>>>
>>>> <IR>
>>>> s = alloca {i8, i32} // alloca initializes bytes in a
type-dependent
>>>> manner
>>>> // s[0], s[4~7]: poison
>>>> // s[1~3]: let's fill these bytes with nondet. bits
>>>>
>>>> s2 = gep (bitcast s to i8*), 4
>>>> v = load i32 s2
>>>> ```
>>>>
>>>>
>>>> Thanks,
>>>> Juneyoung
>>>>
>>>>
>>>>
>>>> [1]
>>>> C11 6.2.6.1.5: If the stored value of an object has such a
>>>> representation and is read by an lvalue expression that does
not have
>>>> character type, the behavior is undefined.
>>>> (Similarly, C17 6.2.6.1.5)
>>>>
>>> It is important to note that this applies to trap representations
and
>>> not to unspecified values. A structure or union never has a trap
>>> representation.
>>>
>>>
>>>> C++14 8.5.12: If an indeterminate value is produced by an
evaluation,
>>>> the behavior is undefined except in the following cases: If an
>>>> indeterminate value of unsigned narrow character type ...
>>>> (Similarly, C++17 11.6.12 , C++11 4.1.1)
>>>>
>>> While loading undef for the unsigned character type case merely
produces
>>> undef, for C++, operations such as sign-extend or zero-extend on an
undef
>>> i8 is also undefined behaviour.
>>>
>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
>>
>> --
>>
>> Juneyoung Lee
>> Software Foundation Lab, Seoul National University
>>
>
--
Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201009/a7ae424f/attachment-0001.html>