Juneyoung Lee via llvm-dev
2020-Oct-08 23:13 UTC
[llvm-dev] Undef and Poison round table follow-up & a plan
> It is important to note that this applies to trap representations and notto unspecified values. A structure or union never has a trap representation. Yes, nondeterministic bits would work for padding of struct/union, as described in (3) The third case is the value of struct/union padding. For the members of struct/union, it is allowed to have trap representation, so poison can be used. Juneyoung On Fri, Oct 9, 2020 at 5:37 AM Hubert Tong <hubert.reinterpretcast at gmail.com> wrote:> On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hello all, >> >> Thank everyone who participated in the (impromptu) round table discussion >> on Tuesday. >> For those who are interested, I share the summary of the discussion. >> Also, I share a short-term plan regarding this issue and relevant patches. >> >> >> *Fixing Miscompilations using Freeze* >> ----------------------------------- >> >> To reduce the cost of fixing miscompilations using freeze instruction, we >> need to >> optimize freeze away whenever possible. >> Using the no-undef/poison assumption from the source language (C/C++ in >> this context) can play a significant role. >> To make use the assumptions, here are short-term goals: >> >> *1. Preserve no-undef/poison assumption of function arguments from C/C++ >> when valid.* >> >> There is an ongoing relevant patch (that is written by others): >> https://reviews.llvm.org/D81678 >> >> *2. Preserve no-undef/poison assumption of lvalue reads in C/C++ when >> valid.* >> >> Reading an indeterminate value from an lvalue that does not have char or >> std::byte type is UB [1]. >> Since reading an lvalue is lowered to `load` in IR, we suggest attaching >> a new >> !noundef metadata to such `load`s. >> The IR-side change is here: https://reviews.llvm.org/D89050 >> The clang-side change is going to be made after D81678 is reviewed, >> because it is likely >> that this patch will have a lot of changes in clang tests. >> >> >> *Replacing Undef with Poison* >> --------------------------- >> >> Since undef is known to be the source of many optimizations due to its >> complexity, >> we'd like to suggest gradually moving towards using poison only. >> To make it, (1) `poison` constant should be introduced into LLVM IR >> first, and (2) >> transformations that introduce `undef` should be updated to introduce >> `poison` instead. >> >> For the step (2), we need an experimental result showing that it does not >> cause >> performance degradation. This relies on better support for freeze (the >> no-undef/poison analysis patches). >> >> *1. Introduce a new `poison` constant into IR*: >> https://reviews.llvm.org/D71126 >> >> Note that `poison` constant can be used as a true placeholder value as >> well. >> Undef cannot be used in general because it is less undefined than poison. >> >> *2. Update transformations that introduce `undef` to introduce `poison` >> instead* >> >> (1) There are transformations that introduce `undef` as a placeholder >> (e.g. phi operand >> from an unreachable block). >> For these, `poison` can be used instead. >> >> (2) The value of an uninitialized object (automatic or dynamic). >> They are indeterminate values in C/C++, so okay to use poison instead. >> A tricky case is a bitfield access, and we have two possible solutions: >> >> - i. Introduce a very-packed struct type >> ``` >> <C> >> struct { >> int a:2, b:6; >> } s; >> >> v = s.a; >> >> => >> >> <IR> >> >> s = alloca >> >> tmp = load *{{i2, i6}}** s ; load as a very packed struct type >> v = extractvalue tmp, 0 >> ``` >> * Pros: Can be used to precisely lower C/C++'s struct typed function >> argument into IR >> (currently clang coerces a struct into int if small enough; I'll explain >> about this detail if anyone requests) >> * Cons: Since optimizations aren’t aware of the new type, they should >> be updated >> >> - ii. Use load-freeze >> ``` >> <C> >> struct { >> int a:2, b:6; >> } s; >> >> v = s.a; >> >> => >> >> <IR> >> s = alloca >> >> // Poison bits are frozen and returned >> tmp = *load freeze* i8* s >> v = tmp & 3 >> ``` >> * Pros: The change is simpler >> * Cons: Store forwarding isn’t free; needs insertion of freeze >> (store x, p; v = load freeze p => store x, p; v = freeze x) >> >> >> (3) The third case is the value of struct/union padding. >> Padding is filled with unspecified value in C, so it is too undefined to >> use poison. >> We can fill it with defined bits nondeterministically chosen at >> allocation time (freeze poison). >> >> ``` >> <C> >> struct { >> char a; // 3 bytes padding >> int b; >> } s; >> >> v = s.b; >> >> => >> >> <IR> >> s = alloca {i8, i32} // alloca initializes bytes in a type-dependent >> manner >> // s[0], s[4~7]: poison >> // s[1~3]: let's fill these bytes with nondet. bits >> >> s2 = gep (bitcast s to i8*), 4 >> v = load i32 s2 >> ``` >> >> >> Thanks, >> Juneyoung >> >> >> >> [1] >> C11 6.2.6.1.5: If the stored value of an object has such a representation >> and is read by an lvalue expression that does not have character type, the >> behavior is undefined. >> (Similarly, C17 6.2.6.1.5) >> > It is important to note that this applies to trap representations and not > to unspecified values. A structure or union never has a trap representation. > > >> C++14 8.5.12: If an indeterminate value is produced by an evaluation, the >> behavior is undefined except in the following cases: If an indeterminate >> value of unsigned narrow character type ... >> (Similarly, C++17 11.6.12 , C++11 4.1.1) >> > While loading undef for the unsigned character type case merely produces > undef, for C++, operations such as sign-extend or zero-extend on an undef > i8 is also undefined behaviour. > > >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-- Juneyoung Lee Software Foundation Lab, Seoul National University -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201009/dc379e78/attachment.html>
Hubert Tong via llvm-dev
2020-Oct-09 01:45 UTC
[llvm-dev] Undef and Poison round table follow-up & a plan
On Thu, Oct 8, 2020 at 7:13 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr> wrote:> > It is important to note that this applies to trap representations and > not to unspecified values. A structure or union never has a trap > representation. > Yes, nondeterministic bits would work for padding of struct/union, as > described in (3) The third case is the value of struct/union padding. > For the members of struct/union, it is allowed to have trap > representation, so poison can be used. >At what point are the members considered poison? For copying/passing/returning a struct or union, there is no UB even if some members are uninitialized.> > Juneyoung > > On Fri, Oct 9, 2020 at 5:37 AM Hubert Tong < > hubert.reinterpretcast at gmail.com> wrote: > >> On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Hello all, >>> >>> Thank everyone who participated in the (impromptu) round table >>> discussion on Tuesday. >>> For those who are interested, I share the summary of the discussion. >>> Also, I share a short-term plan regarding this issue and relevant >>> patches. >>> >>> >>> *Fixing Miscompilations using Freeze* >>> ----------------------------------- >>> >>> To reduce the cost of fixing miscompilations using freeze instruction, >>> we need to >>> optimize freeze away whenever possible. >>> Using the no-undef/poison assumption from the source language (C/C++ in >>> this context) can play a significant role. >>> To make use the assumptions, here are short-term goals: >>> >>> *1. Preserve no-undef/poison assumption of function arguments from C/C++ >>> when valid.* >>> >>> There is an ongoing relevant patch (that is written by others): >>> https://reviews.llvm.org/D81678 >>> >>> *2. Preserve no-undef/poison assumption of lvalue reads in C/C++ when >>> valid.* >>> >>> Reading an indeterminate value from an lvalue that does not have char or >>> std::byte type is UB [1]. >>> Since reading an lvalue is lowered to `load` in IR, we suggest attaching >>> a new >>> !noundef metadata to such `load`s. >>> The IR-side change is here: https://reviews.llvm.org/D89050 >>> The clang-side change is going to be made after D81678 is reviewed, >>> because it is likely >>> that this patch will have a lot of changes in clang tests. >>> >>> >>> *Replacing Undef with Poison* >>> --------------------------- >>> >>> Since undef is known to be the source of many optimizations due to its >>> complexity, >>> we'd like to suggest gradually moving towards using poison only. >>> To make it, (1) `poison` constant should be introduced into LLVM IR >>> first, and (2) >>> transformations that introduce `undef` should be updated to introduce >>> `poison` instead. >>> >>> For the step (2), we need an experimental result showing that it does >>> not cause >>> performance degradation. This relies on better support for freeze (the >>> no-undef/poison analysis patches). >>> >>> *1. Introduce a new `poison` constant into IR*: >>> https://reviews.llvm.org/D71126 >>> >>> Note that `poison` constant can be used as a true placeholder value as >>> well. >>> Undef cannot be used in general because it is less undefined than poison. >>> >>> *2. Update transformations that introduce `undef` to introduce `poison` >>> instead* >>> >>> (1) There are transformations that introduce `undef` as a placeholder >>> (e.g. phi operand >>> from an unreachable block). >>> For these, `poison` can be used instead. >>> >>> (2) The value of an uninitialized object (automatic or dynamic). >>> They are indeterminate values in C/C++, so okay to use poison instead. >>> A tricky case is a bitfield access, and we have two possible solutions: >>> >>> - i. Introduce a very-packed struct type >>> ``` >>> <C> >>> struct { >>> int a:2, b:6; >>> } s; >>> >>> v = s.a; >>> >>> => >>> >>> <IR> >>> >>> s = alloca >>> >>> tmp = load *{{i2, i6}}** s ; load as a very packed struct type >>> v = extractvalue tmp, 0 >>> ``` >>> * Pros: Can be used to precisely lower C/C++'s struct typed function >>> argument into IR >>> (currently clang coerces a struct into int if small enough; I'll explain >>> about this detail if anyone requests) >>> * Cons: Since optimizations aren’t aware of the new type, they should >>> be updated >>> >>> - ii. Use load-freeze >>> ``` >>> <C> >>> struct { >>> int a:2, b:6; >>> } s; >>> >>> v = s.a; >>> >>> => >>> >>> <IR> >>> s = alloca >>> >>> // Poison bits are frozen and returned >>> tmp = *load freeze* i8* s >>> v = tmp & 3 >>> ``` >>> * Pros: The change is simpler >>> * Cons: Store forwarding isn’t free; needs insertion of freeze >>> (store x, p; v = load freeze p => store x, p; v = freeze x) >>> >>> >>> (3) The third case is the value of struct/union padding. >>> Padding is filled with unspecified value in C, so it is too undefined to >>> use poison. >>> We can fill it with defined bits nondeterministically chosen at >>> allocation time (freeze poison). >>> >>> ``` >>> <C> >>> struct { >>> char a; // 3 bytes padding >>> int b; >>> } s; >>> >>> v = s.b; >>> >>> => >>> >>> <IR> >>> s = alloca {i8, i32} // alloca initializes bytes in a type-dependent >>> manner >>> // s[0], s[4~7]: poison >>> // s[1~3]: let's fill these bytes with nondet. bits >>> >>> s2 = gep (bitcast s to i8*), 4 >>> v = load i32 s2 >>> ``` >>> >>> >>> Thanks, >>> Juneyoung >>> >>> >>> >>> [1] >>> C11 6.2.6.1.5: If the stored value of an object has such a >>> representation and is read by an lvalue expression that does not have >>> character type, the behavior is undefined. >>> (Similarly, C17 6.2.6.1.5) >>> >> It is important to note that this applies to trap representations and not >> to unspecified values. A structure or union never has a trap representation. >> >> >>> C++14 8.5.12: If an indeterminate value is produced by an evaluation, >>> the behavior is undefined except in the following cases: If an >>> indeterminate value of unsigned narrow character type ... >>> (Similarly, C++17 11.6.12 , C++11 4.1.1) >>> >> While loading undef for the unsigned character type case merely produces >> undef, for C++, operations such as sign-extend or zero-extend on an undef >> i8 is also undefined behaviour. >> >> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> > > -- > > Juneyoung Lee > Software Foundation Lab, Seoul National University >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201008/fc759927/attachment.html>
Juneyoung Lee via llvm-dev
2020-Oct-09 03:08 UTC
[llvm-dev] Undef and Poison round table follow-up & a plan
It is UB when a poison is passed to certain operations that raise UB on poison, such as division by poison/dereferencing poison pointer/branching on poison condition/etc. Otherwise, poison is simply propagated, but it does not raise UB Copying poison bytes is okay: // Members are initialized to poison at object creation. p = alloca {i8, i32} // p[0], p[4~7] are poison q = alloca {i8, i32} // we want to copy p to q v = load i8* p[0] // v is poison store i8 v, i8* q[0] // poison is simply copied; no UB happened Similarly, passing/returning poison is allowed as well. Juneyoung On Fri, Oct 9, 2020 at 10:45 AM Hubert Tong < hubert.reinterpretcast at gmail.com> wrote:> On Thu, Oct 8, 2020 at 7:13 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr> > wrote: > >> > It is important to note that this applies to trap representations and >> not to unspecified values. A structure or union never has a trap >> representation. >> Yes, nondeterministic bits would work for padding of struct/union, as >> described in (3) The third case is the value of struct/union padding. >> For the members of struct/union, it is allowed to have trap >> representation, so poison can be used. >> > At what point are the members considered poison? For > copying/passing/returning a struct or union, there is no UB even if some > members are uninitialized. > > >> >> Juneyoung >> >> On Fri, Oct 9, 2020 at 5:37 AM Hubert Tong < >> hubert.reinterpretcast at gmail.com> wrote: >> >>> On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> Hello all, >>>> >>>> Thank everyone who participated in the (impromptu) round table >>>> discussion on Tuesday. >>>> For those who are interested, I share the summary of the discussion. >>>> Also, I share a short-term plan regarding this issue and relevant >>>> patches. >>>> >>>> >>>> *Fixing Miscompilations using Freeze* >>>> ----------------------------------- >>>> >>>> To reduce the cost of fixing miscompilations using freeze instruction, >>>> we need to >>>> optimize freeze away whenever possible. >>>> Using the no-undef/poison assumption from the source language (C/C++ in >>>> this context) can play a significant role. >>>> To make use the assumptions, here are short-term goals: >>>> >>>> *1. Preserve no-undef/poison assumption of function arguments from >>>> C/C++ when valid.* >>>> >>>> There is an ongoing relevant patch (that is written by others): >>>> https://reviews.llvm.org/D81678 >>>> >>>> *2. Preserve no-undef/poison assumption of lvalue reads in C/C++ when >>>> valid.* >>>> >>>> Reading an indeterminate value from an lvalue that does not have char or >>>> std::byte type is UB [1]. >>>> Since reading an lvalue is lowered to `load` in IR, we suggest >>>> attaching a new >>>> !noundef metadata to such `load`s. >>>> The IR-side change is here: https://reviews.llvm.org/D89050 >>>> The clang-side change is going to be made after D81678 is reviewed, >>>> because it is likely >>>> that this patch will have a lot of changes in clang tests. >>>> >>>> >>>> *Replacing Undef with Poison* >>>> --------------------------- >>>> >>>> Since undef is known to be the source of many optimizations due to its >>>> complexity, >>>> we'd like to suggest gradually moving towards using poison only. >>>> To make it, (1) `poison` constant should be introduced into LLVM IR >>>> first, and (2) >>>> transformations that introduce `undef` should be updated to introduce >>>> `poison` instead. >>>> >>>> For the step (2), we need an experimental result showing that it does >>>> not cause >>>> performance degradation. This relies on better support for freeze (the >>>> no-undef/poison analysis patches). >>>> >>>> *1. Introduce a new `poison` constant into IR*: >>>> https://reviews.llvm.org/D71126 >>>> >>>> Note that `poison` constant can be used as a true placeholder value as >>>> well. >>>> Undef cannot be used in general because it is less undefined than >>>> poison. >>>> >>>> *2. Update transformations that introduce `undef` to introduce `poison` >>>> instead* >>>> >>>> (1) There are transformations that introduce `undef` as a placeholder >>>> (e.g. phi operand >>>> from an unreachable block). >>>> For these, `poison` can be used instead. >>>> >>>> (2) The value of an uninitialized object (automatic or dynamic). >>>> They are indeterminate values in C/C++, so okay to use poison instead. >>>> A tricky case is a bitfield access, and we have two possible solutions: >>>> >>>> - i. Introduce a very-packed struct type >>>> ``` >>>> <C> >>>> struct { >>>> int a:2, b:6; >>>> } s; >>>> >>>> v = s.a; >>>> >>>> => >>>> >>>> <IR> >>>> >>>> s = alloca >>>> >>>> tmp = load *{{i2, i6}}** s ; load as a very packed struct type >>>> v = extractvalue tmp, 0 >>>> ``` >>>> * Pros: Can be used to precisely lower C/C++'s struct typed function >>>> argument into IR >>>> (currently clang coerces a struct into int if small enough; I'll >>>> explain about this detail if anyone requests) >>>> * Cons: Since optimizations aren’t aware of the new type, they should >>>> be updated >>>> >>>> - ii. Use load-freeze >>>> ``` >>>> <C> >>>> struct { >>>> int a:2, b:6; >>>> } s; >>>> >>>> v = s.a; >>>> >>>> => >>>> >>>> <IR> >>>> s = alloca >>>> >>>> // Poison bits are frozen and returned >>>> tmp = *load freeze* i8* s >>>> v = tmp & 3 >>>> ``` >>>> * Pros: The change is simpler >>>> * Cons: Store forwarding isn’t free; needs insertion of freeze >>>> (store x, p; v = load freeze p => store x, p; v = freeze x) >>>> >>>> >>>> (3) The third case is the value of struct/union padding. >>>> Padding is filled with unspecified value in C, so it is too undefined >>>> to use poison. >>>> We can fill it with defined bits nondeterministically chosen at >>>> allocation time (freeze poison). >>>> >>>> ``` >>>> <C> >>>> struct { >>>> char a; // 3 bytes padding >>>> int b; >>>> } s; >>>> >>>> v = s.b; >>>> >>>> => >>>> >>>> <IR> >>>> s = alloca {i8, i32} // alloca initializes bytes in a type-dependent >>>> manner >>>> // s[0], s[4~7]: poison >>>> // s[1~3]: let's fill these bytes with nondet. bits >>>> >>>> s2 = gep (bitcast s to i8*), 4 >>>> v = load i32 s2 >>>> ``` >>>> >>>> >>>> Thanks, >>>> Juneyoung >>>> >>>> >>>> >>>> [1] >>>> C11 6.2.6.1.5: If the stored value of an object has such a >>>> representation and is read by an lvalue expression that does not have >>>> character type, the behavior is undefined. >>>> (Similarly, C17 6.2.6.1.5) >>>> >>> It is important to note that this applies to trap representations and >>> not to unspecified values. A structure or union never has a trap >>> representation. >>> >>> >>>> C++14 8.5.12: If an indeterminate value is produced by an evaluation, >>>> the behavior is undefined except in the following cases: If an >>>> indeterminate value of unsigned narrow character type ... >>>> (Similarly, C++17 11.6.12 , C++11 4.1.1) >>>> >>> While loading undef for the unsigned character type case merely produces >>> undef, for C++, operations such as sign-extend or zero-extend on an undef >>> i8 is also undefined behaviour. >>> >>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>> >> >> -- >> >> Juneyoung Lee >> Software Foundation Lab, Seoul National University >> >-- Juneyoung Lee Software Foundation Lab, Seoul National University -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201009/a7ae424f/attachment-0001.html>