Juneyoung Lee via llvm-dev
2020-Oct-08  16:11 UTC
[llvm-dev] Undef and Poison round table follow-up & a plan
Hello all,
Thank everyone who participated in the (impromptu) round table discussion
on Tuesday.
For those who are interested, I share the summary of the discussion.
Also, I share a short-term plan regarding this issue and relevant patches.
*Fixing Miscompilations using Freeze*
-----------------------------------
To reduce the cost of fixing miscompilations using freeze instruction, we
need to
optimize freeze away whenever possible.
Using the no-undef/poison assumption from the source language (C/C++ in
this context) can play a significant role.
To make use the assumptions, here are short-term goals:
*1. Preserve no-undef/poison assumption of function arguments from C/C++
when valid.*
There is an ongoing relevant patch (that is written by others):
https://reviews.llvm.org/D81678
*2. Preserve no-undef/poison assumption of lvalue reads in C/C++ when
valid.*
Reading an indeterminate value from an lvalue that does not have char or
std::byte type is UB [1].
Since reading an lvalue is lowered to `load` in IR, we suggest attaching a
new
!noundef metadata to such `load`s.
The IR-side change is here: https://reviews.llvm.org/D89050
The clang-side change is going to be made after D81678 is reviewed, because
it is likely
that this patch will have a lot of changes in clang tests.
*Replacing Undef with Poison*
---------------------------
Since undef is known to be the source of many optimizations due to its
complexity,
we'd like to suggest gradually moving towards using poison only.
To make it, (1) `poison` constant should be introduced into LLVM IR first,
and (2)
transformations that introduce `undef` should be updated to introduce
`poison` instead.
For the step (2), we need an experimental result showing that it does not
cause
performance degradation. This relies on better support for freeze (the
no-undef/poison analysis patches).
*1. Introduce a new `poison` constant into IR*:
https://reviews.llvm.org/D71126
Note that `poison` constant can be used as a true placeholder value as well.
Undef cannot be used in general because it is less undefined than poison.
*2. Update transformations that introduce `undef` to introduce `poison`
instead*
(1) There are transformations that introduce `undef` as a placeholder (e.g.
phi operand
from an unreachable block).
For these, `poison` can be used instead.
(2) The value of an uninitialized object (automatic or dynamic).
They are indeterminate values in C/C++, so okay to use poison instead.
A tricky case is a bitfield access, and we have two possible solutions:
- i. Introduce a very-packed struct type
```
<C>
struct {
  int a:2, b:6;
} s;
v = s.a;
=>
<IR>
s = alloca
tmp = load *{{i2, i6}}** s ; load as a very packed struct type
v = extractvalue tmp, 0
```
  * Pros: Can be used to precisely lower C/C++'s struct typed function
argument into IR
(currently clang coerces a struct into int if small enough; I'll explain
about this detail if anyone requests)
  * Cons: Since optimizations aren’t aware of the new type, they should be
updated
- ii. Use load-freeze
```
<C>
struct {
  int a:2, b:6;
} s;
v = s.a;
=>
<IR>
s = alloca
// Poison bits are frozen and returned
tmp = *load freeze* i8* s
v = tmp & 3
```
  * Pros: The change is simpler
  * Cons: Store forwarding isn’t free; needs insertion of freeze
    (store x, p; v = load freeze p => store x, p; v = freeze x)
(3) The third case is the value of struct/union padding.
Padding is filled with unspecified value in C, so it is too undefined to
use poison.
We can fill it with defined bits nondeterministically chosen at allocation
time (freeze poison).
```
<C>
struct {
  char a; // 3 bytes padding
  int b;
} s;
v = s.b;
=>
<IR>
s = alloca {i8, i32} // alloca initializes bytes in a type-dependent manner
// s[0], s[4~7]: poison
// s[1~3]: let's fill these bytes with nondet. bits
s2 = gep (bitcast s to i8*), 4
v = load i32 s2
```
Thanks,
Juneyoung
[1]
C11 6.2.6.1.5: If the stored value of an object has such a representation
and is read by an lvalue expression that does not have character type, the
behavior is undefined.
(Similarly, C17 6.2.6.1.5)
C++14 8.5.12: If an indeterminate value is produced by an evaluation, the
behavior is undefined except in the following cases: If an indeterminate
value of unsigned narrow character type ...
(Similarly, C++17 11.6.12 , C++11 4.1.1)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201009/400ad5c0/attachment.html>
Hubert Tong via llvm-dev
2020-Oct-08  20:37 UTC
[llvm-dev] Undef and Poison round table follow-up & a plan
On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hello all, > > Thank everyone who participated in the (impromptu) round table discussion > on Tuesday. > For those who are interested, I share the summary of the discussion. > Also, I share a short-term plan regarding this issue and relevant patches. > > > *Fixing Miscompilations using Freeze* > ----------------------------------- > > To reduce the cost of fixing miscompilations using freeze instruction, we > need to > optimize freeze away whenever possible. > Using the no-undef/poison assumption from the source language (C/C++ in > this context) can play a significant role. > To make use the assumptions, here are short-term goals: > > *1. Preserve no-undef/poison assumption of function arguments from C/C++ > when valid.* > > There is an ongoing relevant patch (that is written by others): > https://reviews.llvm.org/D81678 > > *2. Preserve no-undef/poison assumption of lvalue reads in C/C++ when > valid.* > > Reading an indeterminate value from an lvalue that does not have char or > std::byte type is UB [1]. > Since reading an lvalue is lowered to `load` in IR, we suggest attaching a > new > !noundef metadata to such `load`s. > The IR-side change is here: https://reviews.llvm.org/D89050 > The clang-side change is going to be made after D81678 is reviewed, > because it is likely > that this patch will have a lot of changes in clang tests. > > > *Replacing Undef with Poison* > --------------------------- > > Since undef is known to be the source of many optimizations due to its > complexity, > we'd like to suggest gradually moving towards using poison only. > To make it, (1) `poison` constant should be introduced into LLVM IR first, > and (2) > transformations that introduce `undef` should be updated to introduce > `poison` instead. > > For the step (2), we need an experimental result showing that it does not > cause > performance degradation. This relies on better support for freeze (the > no-undef/poison analysis patches). > > *1. Introduce a new `poison` constant into IR*: > https://reviews.llvm.org/D71126 > > Note that `poison` constant can be used as a true placeholder value as > well. > Undef cannot be used in general because it is less undefined than poison. > > *2. Update transformations that introduce `undef` to introduce `poison` > instead* > > (1) There are transformations that introduce `undef` as a placeholder > (e.g. phi operand > from an unreachable block). > For these, `poison` can be used instead. > > (2) The value of an uninitialized object (automatic or dynamic). > They are indeterminate values in C/C++, so okay to use poison instead. > A tricky case is a bitfield access, and we have two possible solutions: > > - i. Introduce a very-packed struct type > ``` > <C> > struct { > int a:2, b:6; > } s; > > v = s.a; > > => > > <IR> > > s = alloca > > tmp = load *{{i2, i6}}** s ; load as a very packed struct type > v = extractvalue tmp, 0 > ``` > * Pros: Can be used to precisely lower C/C++'s struct typed function > argument into IR > (currently clang coerces a struct into int if small enough; I'll explain > about this detail if anyone requests) > * Cons: Since optimizations aren’t aware of the new type, they should be > updated > > - ii. Use load-freeze > ``` > <C> > struct { > int a:2, b:6; > } s; > > v = s.a; > > => > > <IR> > s = alloca > > // Poison bits are frozen and returned > tmp = *load freeze* i8* s > v = tmp & 3 > ``` > * Pros: The change is simpler > * Cons: Store forwarding isn’t free; needs insertion of freeze > (store x, p; v = load freeze p => store x, p; v = freeze x) > > > (3) The third case is the value of struct/union padding. > Padding is filled with unspecified value in C, so it is too undefined to > use poison. > We can fill it with defined bits nondeterministically chosen at allocation > time (freeze poison). > > ``` > <C> > struct { > char a; // 3 bytes padding > int b; > } s; > > v = s.b; > > => > > <IR> > s = alloca {i8, i32} // alloca initializes bytes in a type-dependent manner > // s[0], s[4~7]: poison > // s[1~3]: let's fill these bytes with nondet. bits > > s2 = gep (bitcast s to i8*), 4 > v = load i32 s2 > ``` > > > Thanks, > Juneyoung > > > > [1] > C11 6.2.6.1.5: If the stored value of an object has such a representation > and is read by an lvalue expression that does not have character type, the > behavior is undefined. > (Similarly, C17 6.2.6.1.5) >It is important to note that this applies to trap representations and not to unspecified values. A structure or union never has a trap representation.> C++14 8.5.12: If an indeterminate value is produced by an evaluation, the > behavior is undefined except in the following cases: If an indeterminate > value of unsigned narrow character type ... > (Similarly, C++17 11.6.12 , C++11 4.1.1) >While loading undef for the unsigned character type case merely produces undef, for C++, operations such as sign-extend or zero-extend on an undef i8 is also undefined behaviour.> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201008/c992b897/attachment-0001.html>
Juneyoung Lee via llvm-dev
2020-Oct-08  23:13 UTC
[llvm-dev] Undef and Poison round table follow-up & a plan
> It is important to note that this applies to trap representations and notto unspecified values. A structure or union never has a trap representation. Yes, nondeterministic bits would work for padding of struct/union, as described in (3) The third case is the value of struct/union padding. For the members of struct/union, it is allowed to have trap representation, so poison can be used. Juneyoung On Fri, Oct 9, 2020 at 5:37 AM Hubert Tong <hubert.reinterpretcast at gmail.com> wrote:> On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hello all, >> >> Thank everyone who participated in the (impromptu) round table discussion >> on Tuesday. >> For those who are interested, I share the summary of the discussion. >> Also, I share a short-term plan regarding this issue and relevant patches. >> >> >> *Fixing Miscompilations using Freeze* >> ----------------------------------- >> >> To reduce the cost of fixing miscompilations using freeze instruction, we >> need to >> optimize freeze away whenever possible. >> Using the no-undef/poison assumption from the source language (C/C++ in >> this context) can play a significant role. >> To make use the assumptions, here are short-term goals: >> >> *1. Preserve no-undef/poison assumption of function arguments from C/C++ >> when valid.* >> >> There is an ongoing relevant patch (that is written by others): >> https://reviews.llvm.org/D81678 >> >> *2. Preserve no-undef/poison assumption of lvalue reads in C/C++ when >> valid.* >> >> Reading an indeterminate value from an lvalue that does not have char or >> std::byte type is UB [1]. >> Since reading an lvalue is lowered to `load` in IR, we suggest attaching >> a new >> !noundef metadata to such `load`s. >> The IR-side change is here: https://reviews.llvm.org/D89050 >> The clang-side change is going to be made after D81678 is reviewed, >> because it is likely >> that this patch will have a lot of changes in clang tests. >> >> >> *Replacing Undef with Poison* >> --------------------------- >> >> Since undef is known to be the source of many optimizations due to its >> complexity, >> we'd like to suggest gradually moving towards using poison only. >> To make it, (1) `poison` constant should be introduced into LLVM IR >> first, and (2) >> transformations that introduce `undef` should be updated to introduce >> `poison` instead. >> >> For the step (2), we need an experimental result showing that it does not >> cause >> performance degradation. This relies on better support for freeze (the >> no-undef/poison analysis patches). >> >> *1. Introduce a new `poison` constant into IR*: >> https://reviews.llvm.org/D71126 >> >> Note that `poison` constant can be used as a true placeholder value as >> well. >> Undef cannot be used in general because it is less undefined than poison. >> >> *2. Update transformations that introduce `undef` to introduce `poison` >> instead* >> >> (1) There are transformations that introduce `undef` as a placeholder >> (e.g. phi operand >> from an unreachable block). >> For these, `poison` can be used instead. >> >> (2) The value of an uninitialized object (automatic or dynamic). >> They are indeterminate values in C/C++, so okay to use poison instead. >> A tricky case is a bitfield access, and we have two possible solutions: >> >> - i. Introduce a very-packed struct type >> ``` >> <C> >> struct { >> int a:2, b:6; >> } s; >> >> v = s.a; >> >> => >> >> <IR> >> >> s = alloca >> >> tmp = load *{{i2, i6}}** s ; load as a very packed struct type >> v = extractvalue tmp, 0 >> ``` >> * Pros: Can be used to precisely lower C/C++'s struct typed function >> argument into IR >> (currently clang coerces a struct into int if small enough; I'll explain >> about this detail if anyone requests) >> * Cons: Since optimizations aren’t aware of the new type, they should >> be updated >> >> - ii. Use load-freeze >> ``` >> <C> >> struct { >> int a:2, b:6; >> } s; >> >> v = s.a; >> >> => >> >> <IR> >> s = alloca >> >> // Poison bits are frozen and returned >> tmp = *load freeze* i8* s >> v = tmp & 3 >> ``` >> * Pros: The change is simpler >> * Cons: Store forwarding isn’t free; needs insertion of freeze >> (store x, p; v = load freeze p => store x, p; v = freeze x) >> >> >> (3) The third case is the value of struct/union padding. >> Padding is filled with unspecified value in C, so it is too undefined to >> use poison. >> We can fill it with defined bits nondeterministically chosen at >> allocation time (freeze poison). >> >> ``` >> <C> >> struct { >> char a; // 3 bytes padding >> int b; >> } s; >> >> v = s.b; >> >> => >> >> <IR> >> s = alloca {i8, i32} // alloca initializes bytes in a type-dependent >> manner >> // s[0], s[4~7]: poison >> // s[1~3]: let's fill these bytes with nondet. bits >> >> s2 = gep (bitcast s to i8*), 4 >> v = load i32 s2 >> ``` >> >> >> Thanks, >> Juneyoung >> >> >> >> [1] >> C11 6.2.6.1.5: If the stored value of an object has such a representation >> and is read by an lvalue expression that does not have character type, the >> behavior is undefined. >> (Similarly, C17 6.2.6.1.5) >> > It is important to note that this applies to trap representations and not > to unspecified values. A structure or union never has a trap representation. > > >> C++14 8.5.12: If an indeterminate value is produced by an evaluation, the >> behavior is undefined except in the following cases: If an indeterminate >> value of unsigned narrow character type ... >> (Similarly, C++17 11.6.12 , C++11 4.1.1) >> > While loading undef for the unsigned character type case merely produces > undef, for C++, operations such as sign-extend or zero-extend on an undef > i8 is also undefined behaviour. > > >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-- Juneyoung Lee Software Foundation Lab, Seoul National University -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201009/dc379e78/attachment.html>
Seemingly Similar Threads
- Undef and Poison round table follow-up & a plan
- Undef and Poison round table follow-up & a plan
- Undef and Poison round table follow-up & a plan
- Undef and Poison round table follow-up & a plan
- LangRef semantics for shufflevector with undef mask is incorrect