Juneyoung Lee via llvm-dev
2020-Oct-10 01:29 UTC
[llvm-dev] Undef and Poison round table follow-up & a plan
> > Okay, it's just not immediately undefined behaviour. The C model has more > issues because of the problem with how "trap representation" is defined > (which precludes trap representations for unsigned char, two's complement > signed char, etc.).This interpretation is further stressed because C only explicitly ascribes> undefined behaviour to trap representations on loads and stores.In this case, freeze(poison) can be used to represent an uninitialized value, because using freeze(poison) never raises UB. However, I couldn't find relevant statements from C standard about these two. Could you elaborate a bit more please? Thanks for raising this. So, it can be that there are:> - values that induce broader undefined behaviour (with certain operations), > - values that may change on their own, and > - values that become defined to an unspecified value (not of the first > category) at some point in time.The unspecified values put into padding in C appear to be in the last> category. Is that how LLVM treats them?With the current version of LLVM, there is slightly a gap, and with our suggestion this becomes correct. https://godbolt.org/z/nxhnTK Reading padding is optimized to undef, but according to the previous link ( http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1787), this isn't valid because unspecified value is a 'stable' value. In our suggestion, well-defined bits are stored into padding at object creation, so it becomes okay. Juneyoung On Fri, Oct 9, 2020 at 11:58 PM Hubert Tong < hubert.reinterpretcast at gmail.com> wrote:> On Thu, Oct 8, 2020 at 11:54 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr> > wrote: > >> // Members are initialized to poison at object creation. >>>> p = alloca {i8, i32} // p[0], p[4~7] are poison >>>> p[0] is an i8, so it shouldn't be poison? >>> >>> >> My interpretation of standard is that reading uninitialized char can also >> yield trap representation. >> If uninitialized, char variable has indeterminate value, and C/C++ does >> not seem to forbid reading trap representation from it. >> > Okay, it's just not immediately undefined behaviour. The C model has more > issues because of the problem with how "trap representation" is defined > (which precludes trap representations for unsigned char, two's complement > signed char, etc.). This interpretation is further stressed because C only > explicitly ascribes undefined behaviour to trap representations on loads > and stores. > > >> C++14 explicitly has an example that shows it is indeterminate value at >> 3.3.2.1 : >> ``` >> The point of declaration for a name is immediately after its complete >> declarator (Clause 8) and before its initializer (if any), except as noted >> below. [Example: >> unsigned char x = 12; >> { unsigned char x = x; } >> Here the second x is initialized with its own (*indeterminate*) value. >> —end example] >> ``` >> >> It seems there was a phrase saying that reading indeterminate value as an >> unsigned char should yield unspecified value in the C++14 draft in the >> past, but it is removed: >> http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1787 >> The removed phrase did not exist in C++11, so I believe it is fine to use >> poison for uninitialized char types. >> > Thanks for raising this. So, it can be that there are: > - values that induce broader undefined behaviour (with certain operations), > - values that may change on their own, and > - values that become defined to an unspecified value (not of the first > category) at some point in time. > > The unspecified values put into padding in C appear to be in the last > category. Is that how LLVM treats them? > > >> >> Juneyoung >> >> On Fri, Oct 9, 2020 at 12:19 PM Hubert Tong < >> hubert.reinterpretcast at gmail.com> wrote: >> >>> On Thu, Oct 8, 2020 at 11:09 PM Juneyoung Lee < >>> juneyoung.lee at sf.snu.ac.kr> wrote: >>> >>>> It is UB when a poison is passed to certain operations that raise UB on >>>> poison, such as division by poison/dereferencing poison pointer/branching >>>> on poison condition/etc. >>>> >>> Got it. Thanks. >>> >>> >>>> Otherwise, poison is simply propagated, but it does not raise UB >>>> Copying poison bytes is okay: >>>> >>>> // Members are initialized to poison at object creation. >>>> p = alloca {i8, i32} // p[0], p[4~7] are poison >>>> >>> p[0] is an i8, so it shouldn't be poison? >>> >>> >>>> q = alloca {i8, i32} // we want to copy p to q >>>> v = load i8* p[0] // v is poison >>>> store i8 v, i8* q[0] // poison is simply copied; no UB happened >>>> >>>> Similarly, passing/returning poison is allowed as well. >>>> >>>> Juneyoung >>>> >>>> On Fri, Oct 9, 2020 at 10:45 AM Hubert Tong < >>>> hubert.reinterpretcast at gmail.com> wrote: >>>> >>>>> On Thu, Oct 8, 2020 at 7:13 PM Juneyoung Lee < >>>>> juneyoung.lee at sf.snu.ac.kr> wrote: >>>>> >>>>>> > It is important to note that this applies to trap representations >>>>>> and not to unspecified values. A structure or union never has a trap >>>>>> representation. >>>>>> Yes, nondeterministic bits would work for padding of struct/union, as >>>>>> described in (3) The third case is the value of struct/union padding. >>>>>> For the members of struct/union, it is allowed to have trap >>>>>> representation, so poison can be used. >>>>>> >>>>> At what point are the members considered poison? For >>>>> copying/passing/returning a struct or union, there is no UB even if some >>>>> members are uninitialized. >>>>> >>>>> >>>>>> >>>>>> Juneyoung >>>>>> >>>>>> On Fri, Oct 9, 2020 at 5:37 AM Hubert Tong < >>>>>> hubert.reinterpretcast at gmail.com> wrote: >>>>>> >>>>>>> On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev < >>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>> >>>>>>>> Hello all, >>>>>>>> >>>>>>>> Thank everyone who participated in the (impromptu) round table >>>>>>>> discussion on Tuesday. >>>>>>>> For those who are interested, I share the summary of the discussion. >>>>>>>> Also, I share a short-term plan regarding this issue and relevant >>>>>>>> patches. >>>>>>>> >>>>>>>> >>>>>>>> *Fixing Miscompilations using Freeze* >>>>>>>> ----------------------------------- >>>>>>>> >>>>>>>> To reduce the cost of fixing miscompilations using freeze >>>>>>>> instruction, we need to >>>>>>>> optimize freeze away whenever possible. >>>>>>>> Using the no-undef/poison assumption from the source language >>>>>>>> (C/C++ in >>>>>>>> this context) can play a significant role. >>>>>>>> To make use the assumptions, here are short-term goals: >>>>>>>> >>>>>>>> *1. Preserve no-undef/poison assumption of function arguments from >>>>>>>> C/C++ when valid.* >>>>>>>> >>>>>>>> There is an ongoing relevant patch (that is written by others): >>>>>>>> https://reviews.llvm.org/D81678 >>>>>>>> >>>>>>>> *2. Preserve no-undef/poison assumption of lvalue reads in C/C++ >>>>>>>> when valid.* >>>>>>>> >>>>>>>> Reading an indeterminate value from an lvalue that does not have >>>>>>>> char or >>>>>>>> std::byte type is UB [1]. >>>>>>>> Since reading an lvalue is lowered to `load` in IR, we suggest >>>>>>>> attaching a new >>>>>>>> !noundef metadata to such `load`s. >>>>>>>> The IR-side change is here: https://reviews.llvm.org/D89050 >>>>>>>> The clang-side change is going to be made after D81678 is reviewed, >>>>>>>> because it is likely >>>>>>>> that this patch will have a lot of changes in clang tests. >>>>>>>> >>>>>>>> >>>>>>>> *Replacing Undef with Poison* >>>>>>>> --------------------------- >>>>>>>> >>>>>>>> Since undef is known to be the source of many optimizations due to >>>>>>>> its complexity, >>>>>>>> we'd like to suggest gradually moving towards using poison only. >>>>>>>> To make it, (1) `poison` constant should be introduced into LLVM IR >>>>>>>> first, and (2) >>>>>>>> transformations that introduce `undef` should be updated to >>>>>>>> introduce `poison` instead. >>>>>>>> >>>>>>>> For the step (2), we need an experimental result showing that it >>>>>>>> does not cause >>>>>>>> performance degradation. This relies on better support for freeze >>>>>>>> (the >>>>>>>> no-undef/poison analysis patches). >>>>>>>> >>>>>>>> *1. Introduce a new `poison` constant into IR*: >>>>>>>> https://reviews.llvm.org/D71126 >>>>>>>> >>>>>>>> Note that `poison` constant can be used as a true placeholder value >>>>>>>> as well. >>>>>>>> Undef cannot be used in general because it is less undefined than >>>>>>>> poison. >>>>>>>> >>>>>>>> *2. Update transformations that introduce `undef` to introduce >>>>>>>> `poison` instead* >>>>>>>> >>>>>>>> (1) There are transformations that introduce `undef` as a >>>>>>>> placeholder (e.g. phi operand >>>>>>>> from an unreachable block). >>>>>>>> For these, `poison` can be used instead. >>>>>>>> >>>>>>>> (2) The value of an uninitialized object (automatic or dynamic). >>>>>>>> They are indeterminate values in C/C++, so okay to use poison >>>>>>>> instead. >>>>>>>> A tricky case is a bitfield access, and we have two >>>>>>>> possible solutions: >>>>>>>> >>>>>>>> - i. Introduce a very-packed struct type >>>>>>>> ``` >>>>>>>> <C> >>>>>>>> struct { >>>>>>>> int a:2, b:6; >>>>>>>> } s; >>>>>>>> >>>>>>>> v = s.a; >>>>>>>> >>>>>>>> => >>>>>>>> >>>>>>>> <IR> >>>>>>>> >>>>>>>> s = alloca >>>>>>>> >>>>>>>> tmp = load *{{i2, i6}}** s ; load as a very packed struct type >>>>>>>> v = extractvalue tmp, 0 >>>>>>>> ``` >>>>>>>> * Pros: Can be used to precisely lower C/C++'s struct typed >>>>>>>> function argument into IR >>>>>>>> (currently clang coerces a struct into int if small enough; I'll >>>>>>>> explain about this detail if anyone requests) >>>>>>>> * Cons: Since optimizations aren’t aware of the new type, they >>>>>>>> should be updated >>>>>>>> >>>>>>>> - ii. Use load-freeze >>>>>>>> ``` >>>>>>>> <C> >>>>>>>> struct { >>>>>>>> int a:2, b:6; >>>>>>>> } s; >>>>>>>> >>>>>>>> v = s.a; >>>>>>>> >>>>>>>> => >>>>>>>> >>>>>>>> <IR> >>>>>>>> s = alloca >>>>>>>> >>>>>>>> // Poison bits are frozen and returned >>>>>>>> tmp = *load freeze* i8* s >>>>>>>> v = tmp & 3 >>>>>>>> ``` >>>>>>>> * Pros: The change is simpler >>>>>>>> * Cons: Store forwarding isn’t free; needs insertion of freeze >>>>>>>> (store x, p; v = load freeze p => store x, p; v = freeze x) >>>>>>>> >>>>>>>> >>>>>>>> (3) The third case is the value of struct/union padding. >>>>>>>> Padding is filled with unspecified value in C, so it is too >>>>>>>> undefined to use poison. >>>>>>>> We can fill it with defined bits nondeterministically chosen at >>>>>>>> allocation time (freeze poison). >>>>>>>> >>>>>>>> ``` >>>>>>>> <C> >>>>>>>> struct { >>>>>>>> char a; // 3 bytes padding >>>>>>>> int b; >>>>>>>> } s; >>>>>>>> >>>>>>>> v = s.b; >>>>>>>> >>>>>>>> => >>>>>>>> >>>>>>>> <IR> >>>>>>>> s = alloca {i8, i32} // alloca initializes bytes in a >>>>>>>> type-dependent manner >>>>>>>> // s[0], s[4~7]: poison >>>>>>>> // s[1~3]: let's fill these bytes with nondet. bits >>>>>>>> >>>>>>>> s2 = gep (bitcast s to i8*), 4 >>>>>>>> v = load i32 s2 >>>>>>>> ``` >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Juneyoung >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> [1] >>>>>>>> C11 6.2.6.1.5: If the stored value of an object has such a >>>>>>>> representation and is read by an lvalue expression that does not have >>>>>>>> character type, the behavior is undefined. >>>>>>>> (Similarly, C17 6.2.6.1.5) >>>>>>>> >>>>>>> It is important to note that this applies to trap representations >>>>>>> and not to unspecified values. A structure or union never has a trap >>>>>>> representation. >>>>>>> >>>>>>> >>>>>>>> C++14 8.5.12: If an indeterminate value is produced by an >>>>>>>> evaluation, the behavior is undefined except in the following cases: If an >>>>>>>> indeterminate value of unsigned narrow character type ... >>>>>>>> (Similarly, C++17 11.6.12 , C++11 4.1.1) >>>>>>>> >>>>>>> While loading undef for the unsigned character type case merely >>>>>>> produces undef, for C++, operations such as sign-extend or zero-extend on >>>>>>> an undef i8 is also undefined behaviour. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> llvm-dev at lists.llvm.org >>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Juneyoung Lee >>>>>> Software Foundation Lab, Seoul National University >>>>>> >>>>> >>>> >>>> -- >>>> >>>> Juneyoung Lee >>>> Software Foundation Lab, Seoul National University >>>> >>> >> >> -- >> >> Juneyoung Lee >> Software Foundation Lab, Seoul National University >> >-- Juneyoung Lee Software Foundation Lab, Seoul National University -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201010/67634be8/attachment.html>
Hubert Tong via llvm-dev
2020-Oct-10 04:08 UTC
[llvm-dev] Undef and Poison round table follow-up & a plan
On Fri, Oct 9, 2020 at 9:29 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr> wrote:> Okay, it's just not immediately undefined behaviour. The C model has more >> issues because of the problem with how "trap representation" is defined >> (which precludes trap representations for unsigned char, two's complement >> signed char, etc.). > > This interpretation is further stressed because C only explicitly ascribes >> undefined behaviour to trap representations on loads and stores. > > > In this case, freeze(poison) can be used to represent an uninitialized > value, because using freeze(poison) never raises UB. > However, I couldn't find relevant statements from C standard about these > two. Could you elaborate a bit more please? >The definition of trap representation in C17 subclause 6.2.6.1 paragraph 5 refers to object representations that do not represent a value of the type. No such object representations can exist for an unsigned char because all of the bits in the object representation of that type are value bits and every combination of value bits represent a value of the type (see C17 subclause 6.2.6.2 paragraph 1). It seems C17 still allows trap representations for two's complement signed char, but that is only if SCHAR_MIN == -SCHAR_MAX.> Thanks for raising this. So, it can be that there are: >> - values that induce broader undefined behaviour (with certain >> operations), >> - values that may change on their own, and >> - values that become defined to an unspecified value (not of the first >> category) at some point in time. > > > > The unspecified values put into padding in C appear to be in the last >> category. Is that how LLVM treats them? > > > With the current version of LLVM, there is slightly a gap, and with our > suggestion this becomes correct. > https://godbolt.org/z/nxhnTK > Reading padding is optimized to undef, but according to the previous link ( > http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1787), this > isn't valid because unspecified value is a 'stable' value. > In our suggestion, well-defined bits are stored into padding at object > creation, so it becomes okay. > > Juneyoung > > On Fri, Oct 9, 2020 at 11:58 PM Hubert Tong < > hubert.reinterpretcast at gmail.com> wrote: > >> On Thu, Oct 8, 2020 at 11:54 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr> >> wrote: >> >>> // Members are initialized to poison at object creation. >>>>> p = alloca {i8, i32} // p[0], p[4~7] are poison >>>>> p[0] is an i8, so it shouldn't be poison? >>>> >>>> >>> My interpretation of standard is that reading uninitialized char can >>> also yield trap representation. >>> If uninitialized, char variable has indeterminate value, and C/C++ does >>> not seem to forbid reading trap representation from it. >>> >> Okay, it's just not immediately undefined behaviour. The C model has more >> issues because of the problem with how "trap representation" is defined >> (which precludes trap representations for unsigned char, two's complement >> signed char, etc.). This interpretation is further stressed because C only >> explicitly ascribes undefined behaviour to trap representations on loads >> and stores. >> >> >>> C++14 explicitly has an example that shows it is indeterminate value at >>> 3.3.2.1 : >>> ``` >>> The point of declaration for a name is immediately after its complete >>> declarator (Clause 8) and before its initializer (if any), except as noted >>> below. [Example: >>> unsigned char x = 12; >>> { unsigned char x = x; } >>> Here the second x is initialized with its own (*indeterminate*) value. >>> —end example] >>> ``` >>> >>> It seems there was a phrase saying that reading indeterminate value as >>> an unsigned char should yield unspecified value in the C++14 draft in the >>> past, but it is removed: >>> http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1787 >>> The removed phrase did not exist in C++11, so I believe it is fine to >>> use poison for uninitialized char types. >>> >> Thanks for raising this. So, it can be that there are: >> - values that induce broader undefined behaviour (with certain >> operations), >> - values that may change on their own, and >> - values that become defined to an unspecified value (not of the first >> category) at some point in time. >> >> The unspecified values put into padding in C appear to be in the last >> category. Is that how LLVM treats them? >> >> >>> >>> Juneyoung >>> >>> On Fri, Oct 9, 2020 at 12:19 PM Hubert Tong < >>> hubert.reinterpretcast at gmail.com> wrote: >>> >>>> On Thu, Oct 8, 2020 at 11:09 PM Juneyoung Lee < >>>> juneyoung.lee at sf.snu.ac.kr> wrote: >>>> >>>>> It is UB when a poison is passed to certain operations that raise UB >>>>> on poison, such as division by poison/dereferencing poison >>>>> pointer/branching on poison condition/etc. >>>>> >>>> Got it. Thanks. >>>> >>>> >>>>> Otherwise, poison is simply propagated, but it does not raise UB >>>>> Copying poison bytes is okay: >>>>> >>>>> // Members are initialized to poison at object creation. >>>>> p = alloca {i8, i32} // p[0], p[4~7] are poison >>>>> >>>> p[0] is an i8, so it shouldn't be poison? >>>> >>>> >>>>> q = alloca {i8, i32} // we want to copy p to q >>>>> v = load i8* p[0] // v is poison >>>>> store i8 v, i8* q[0] // poison is simply copied; no UB happened >>>>> >>>>> Similarly, passing/returning poison is allowed as well. >>>>> >>>>> Juneyoung >>>>> >>>>> On Fri, Oct 9, 2020 at 10:45 AM Hubert Tong < >>>>> hubert.reinterpretcast at gmail.com> wrote: >>>>> >>>>>> On Thu, Oct 8, 2020 at 7:13 PM Juneyoung Lee < >>>>>> juneyoung.lee at sf.snu.ac.kr> wrote: >>>>>> >>>>>>> > It is important to note that this applies to trap representations >>>>>>> and not to unspecified values. A structure or union never has a trap >>>>>>> representation. >>>>>>> Yes, nondeterministic bits would work for padding of struct/union, >>>>>>> as described in (3) The third case is the value of struct/union padding. >>>>>>> For the members of struct/union, it is allowed to have trap >>>>>>> representation, so poison can be used. >>>>>>> >>>>>> At what point are the members considered poison? For >>>>>> copying/passing/returning a struct or union, there is no UB even if some >>>>>> members are uninitialized. >>>>>> >>>>>> >>>>>>> >>>>>>> Juneyoung >>>>>>> >>>>>>> On Fri, Oct 9, 2020 at 5:37 AM Hubert Tong < >>>>>>> hubert.reinterpretcast at gmail.com> wrote: >>>>>>> >>>>>>>> On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev < >>>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>>> >>>>>>>>> Hello all, >>>>>>>>> >>>>>>>>> Thank everyone who participated in the (impromptu) round table >>>>>>>>> discussion on Tuesday. >>>>>>>>> For those who are interested, I share the summary of the >>>>>>>>> discussion. >>>>>>>>> Also, I share a short-term plan regarding this issue and relevant >>>>>>>>> patches. >>>>>>>>> >>>>>>>>> >>>>>>>>> *Fixing Miscompilations using Freeze* >>>>>>>>> ----------------------------------- >>>>>>>>> >>>>>>>>> To reduce the cost of fixing miscompilations using freeze >>>>>>>>> instruction, we need to >>>>>>>>> optimize freeze away whenever possible. >>>>>>>>> Using the no-undef/poison assumption from the source language >>>>>>>>> (C/C++ in >>>>>>>>> this context) can play a significant role. >>>>>>>>> To make use the assumptions, here are short-term goals: >>>>>>>>> >>>>>>>>> *1. Preserve no-undef/poison assumption of function arguments from >>>>>>>>> C/C++ when valid.* >>>>>>>>> >>>>>>>>> There is an ongoing relevant patch (that is written by others): >>>>>>>>> https://reviews.llvm.org/D81678 >>>>>>>>> >>>>>>>>> *2. Preserve no-undef/poison assumption of lvalue reads in C/C++ >>>>>>>>> when valid.* >>>>>>>>> >>>>>>>>> Reading an indeterminate value from an lvalue that does not have >>>>>>>>> char or >>>>>>>>> std::byte type is UB [1]. >>>>>>>>> Since reading an lvalue is lowered to `load` in IR, we suggest >>>>>>>>> attaching a new >>>>>>>>> !noundef metadata to such `load`s. >>>>>>>>> The IR-side change is here: https://reviews.llvm.org/D89050 >>>>>>>>> The clang-side change is going to be made after D81678 is >>>>>>>>> reviewed, because it is likely >>>>>>>>> that this patch will have a lot of changes in clang tests. >>>>>>>>> >>>>>>>>> >>>>>>>>> *Replacing Undef with Poison* >>>>>>>>> --------------------------- >>>>>>>>> >>>>>>>>> Since undef is known to be the source of many optimizations due to >>>>>>>>> its complexity, >>>>>>>>> we'd like to suggest gradually moving towards using poison only. >>>>>>>>> To make it, (1) `poison` constant should be introduced into LLVM >>>>>>>>> IR first, and (2) >>>>>>>>> transformations that introduce `undef` should be updated to >>>>>>>>> introduce `poison` instead. >>>>>>>>> >>>>>>>>> For the step (2), we need an experimental result showing that it >>>>>>>>> does not cause >>>>>>>>> performance degradation. This relies on better support for freeze >>>>>>>>> (the >>>>>>>>> no-undef/poison analysis patches). >>>>>>>>> >>>>>>>>> *1. Introduce a new `poison` constant into IR*: >>>>>>>>> https://reviews.llvm.org/D71126 >>>>>>>>> >>>>>>>>> Note that `poison` constant can be used as a true placeholder >>>>>>>>> value as well. >>>>>>>>> Undef cannot be used in general because it is less undefined than >>>>>>>>> poison. >>>>>>>>> >>>>>>>>> *2. Update transformations that introduce `undef` to introduce >>>>>>>>> `poison` instead* >>>>>>>>> >>>>>>>>> (1) There are transformations that introduce `undef` as a >>>>>>>>> placeholder (e.g. phi operand >>>>>>>>> from an unreachable block). >>>>>>>>> For these, `poison` can be used instead. >>>>>>>>> >>>>>>>>> (2) The value of an uninitialized object (automatic or dynamic). >>>>>>>>> They are indeterminate values in C/C++, so okay to use poison >>>>>>>>> instead. >>>>>>>>> A tricky case is a bitfield access, and we have two >>>>>>>>> possible solutions: >>>>>>>>> >>>>>>>>> - i. Introduce a very-packed struct type >>>>>>>>> ``` >>>>>>>>> <C> >>>>>>>>> struct { >>>>>>>>> int a:2, b:6; >>>>>>>>> } s; >>>>>>>>> >>>>>>>>> v = s.a; >>>>>>>>> >>>>>>>>> => >>>>>>>>> >>>>>>>>> <IR> >>>>>>>>> >>>>>>>>> s = alloca >>>>>>>>> >>>>>>>>> tmp = load *{{i2, i6}}** s ; load as a very packed struct type >>>>>>>>> v = extractvalue tmp, 0 >>>>>>>>> ``` >>>>>>>>> * Pros: Can be used to precisely lower C/C++'s struct typed >>>>>>>>> function argument into IR >>>>>>>>> (currently clang coerces a struct into int if small enough; I'll >>>>>>>>> explain about this detail if anyone requests) >>>>>>>>> * Cons: Since optimizations aren’t aware of the new type, they >>>>>>>>> should be updated >>>>>>>>> >>>>>>>>> - ii. Use load-freeze >>>>>>>>> ``` >>>>>>>>> <C> >>>>>>>>> struct { >>>>>>>>> int a:2, b:6; >>>>>>>>> } s; >>>>>>>>> >>>>>>>>> v = s.a; >>>>>>>>> >>>>>>>>> => >>>>>>>>> >>>>>>>>> <IR> >>>>>>>>> s = alloca >>>>>>>>> >>>>>>>>> // Poison bits are frozen and returned >>>>>>>>> tmp = *load freeze* i8* s >>>>>>>>> v = tmp & 3 >>>>>>>>> ``` >>>>>>>>> * Pros: The change is simpler >>>>>>>>> * Cons: Store forwarding isn’t free; needs insertion of freeze >>>>>>>>> (store x, p; v = load freeze p => store x, p; v = freeze x) >>>>>>>>> >>>>>>>>> >>>>>>>>> (3) The third case is the value of struct/union padding. >>>>>>>>> Padding is filled with unspecified value in C, so it is too >>>>>>>>> undefined to use poison. >>>>>>>>> We can fill it with defined bits nondeterministically chosen at >>>>>>>>> allocation time (freeze poison). >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> <C> >>>>>>>>> struct { >>>>>>>>> char a; // 3 bytes padding >>>>>>>>> int b; >>>>>>>>> } s; >>>>>>>>> >>>>>>>>> v = s.b; >>>>>>>>> >>>>>>>>> => >>>>>>>>> >>>>>>>>> <IR> >>>>>>>>> s = alloca {i8, i32} // alloca initializes bytes in a >>>>>>>>> type-dependent manner >>>>>>>>> // s[0], s[4~7]: poison >>>>>>>>> // s[1~3]: let's fill these bytes with nondet. bits >>>>>>>>> >>>>>>>>> s2 = gep (bitcast s to i8*), 4 >>>>>>>>> v = load i32 s2 >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Juneyoung >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> C11 6.2.6.1.5: If the stored value of an object has such a >>>>>>>>> representation and is read by an lvalue expression that does not have >>>>>>>>> character type, the behavior is undefined. >>>>>>>>> (Similarly, C17 6.2.6.1.5) >>>>>>>>> >>>>>>>> It is important to note that this applies to trap representations >>>>>>>> and not to unspecified values. A structure or union never has a trap >>>>>>>> representation. >>>>>>>> >>>>>>>> >>>>>>>>> C++14 8.5.12: If an indeterminate value is produced by an >>>>>>>>> evaluation, the behavior is undefined except in the following cases: If an >>>>>>>>> indeterminate value of unsigned narrow character type ... >>>>>>>>> (Similarly, C++17 11.6.12 , C++11 4.1.1) >>>>>>>>> >>>>>>>> While loading undef for the unsigned character type case merely >>>>>>>> produces undef, for C++, operations such as sign-extend or zero-extend on >>>>>>>> an undef i8 is also undefined behaviour. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> LLVM Developers mailing list >>>>>>>>> llvm-dev at lists.llvm.org >>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Juneyoung Lee >>>>>>> Software Foundation Lab, Seoul National University >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> >>>>> Juneyoung Lee >>>>> Software Foundation Lab, Seoul National University >>>>> >>>> >>> >>> -- >>> >>> Juneyoung Lee >>> Software Foundation Lab, Seoul National University >>> >> > > -- > > Juneyoung Lee > Software Foundation Lab, Seoul National University >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201010/ee33dc5c/attachment-0001.html>
Juneyoung Lee via llvm-dev
2020-Oct-12 04:48 UTC
[llvm-dev] Undef and Poison round table follow-up & a plan
> > The definition of trap representation in C17 subclause 6.2.6.1 paragraph 5 > refers to object representations that do not represent a value of the type. > No such object representations can exist for an unsigned char because all > of the bits in the object representation of that type are value bits and > every combination of value bits represent a value of the type (see C17 > subclause 6.2.6.2 paragraph 1). It seems C17 still allows trap > representations for two's complement signed char, but that is only if > SCHAR_MIN == -SCHAR_MAX.I see, thank you for the info. Juneyoung On Sat, Oct 10, 2020 at 1:08 PM Hubert Tong < hubert.reinterpretcast at gmail.com> wrote:> On Fri, Oct 9, 2020 at 9:29 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr> > wrote: > >> Okay, it's just not immediately undefined behaviour. The C model has more >>> issues because of the problem with how "trap representation" is defined >>> (which precludes trap representations for unsigned char, two's complement >>> signed char, etc.). >> >> This interpretation is further stressed because C only explicitly >>> ascribes undefined behaviour to trap representations on loads and stores. >> >> >> In this case, freeze(poison) can be used to represent an uninitialized >> value, because using freeze(poison) never raises UB. >> However, I couldn't find relevant statements from C standard about these >> two. Could you elaborate a bit more please? >> > The definition of trap representation in C17 subclause 6.2.6.1 paragraph 5 > refers to object representations that do not represent a value of the type. > No such object representations can exist for an unsigned char because all > of the bits in the object representation of that type are value bits and > every combination of value bits represent a value of the type (see C17 > subclause 6.2.6.2 paragraph 1). It seems C17 still allows trap > representations for two's complement signed char, but that is only if > SCHAR_MIN == -SCHAR_MAX. > > >> Thanks for raising this. So, it can be that there are: >>> - values that induce broader undefined behaviour (with certain >>> operations), >>> - values that may change on their own, and >>> - values that become defined to an unspecified value (not of the first >>> category) at some point in time. >> >> >> >> The unspecified values put into padding in C appear to be in the last >>> category. Is that how LLVM treats them? >> >> >> With the current version of LLVM, there is slightly a gap, and with our >> suggestion this becomes correct. >> https://godbolt.org/z/nxhnTK >> Reading padding is optimized to undef, but according to the previous link >> (http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1787), >> this isn't valid because unspecified value is a 'stable' value. >> In our suggestion, well-defined bits are stored into padding at object >> creation, so it becomes okay. >> >> Juneyoung >> >> On Fri, Oct 9, 2020 at 11:58 PM Hubert Tong < >> hubert.reinterpretcast at gmail.com> wrote: >> >>> On Thu, Oct 8, 2020 at 11:54 PM Juneyoung Lee < >>> juneyoung.lee at sf.snu.ac.kr> wrote: >>> >>>> // Members are initialized to poison at object creation. >>>>>> p = alloca {i8, i32} // p[0], p[4~7] are poison >>>>>> p[0] is an i8, so it shouldn't be poison? >>>>> >>>>> >>>> My interpretation of standard is that reading uninitialized char can >>>> also yield trap representation. >>>> If uninitialized, char variable has indeterminate value, and C/C++ does >>>> not seem to forbid reading trap representation from it. >>>> >>> Okay, it's just not immediately undefined behaviour. The C model has >>> more issues because of the problem with how "trap representation" is >>> defined (which precludes trap representations for unsigned char, two's >>> complement signed char, etc.). This interpretation is further stressed >>> because C only explicitly ascribes undefined behaviour to trap >>> representations on loads and stores. >>> >>> >>>> C++14 explicitly has an example that shows it is indeterminate value at >>>> 3.3.2.1 : >>>> ``` >>>> The point of declaration for a name is immediately after its complete >>>> declarator (Clause 8) and before its initializer (if any), except as noted >>>> below. [Example: >>>> unsigned char x = 12; >>>> { unsigned char x = x; } >>>> Here the second x is initialized with its own (*indeterminate*) value. >>>> —end example] >>>> ``` >>>> >>>> It seems there was a phrase saying that reading indeterminate value as >>>> an unsigned char should yield unspecified value in the C++14 draft in the >>>> past, but it is removed: >>>> http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1787 >>>> The removed phrase did not exist in C++11, so I believe it is fine to >>>> use poison for uninitialized char types. >>>> >>> Thanks for raising this. So, it can be that there are: >>> - values that induce broader undefined behaviour (with certain >>> operations), >>> - values that may change on their own, and >>> - values that become defined to an unspecified value (not of the first >>> category) at some point in time. >>> >>> The unspecified values put into padding in C appear to be in the last >>> category. Is that how LLVM treats them? >>> >>> >>>> >>>> Juneyoung >>>> >>>> On Fri, Oct 9, 2020 at 12:19 PM Hubert Tong < >>>> hubert.reinterpretcast at gmail.com> wrote: >>>> >>>>> On Thu, Oct 8, 2020 at 11:09 PM Juneyoung Lee < >>>>> juneyoung.lee at sf.snu.ac.kr> wrote: >>>>> >>>>>> It is UB when a poison is passed to certain operations that raise UB >>>>>> on poison, such as division by poison/dereferencing poison >>>>>> pointer/branching on poison condition/etc. >>>>>> >>>>> Got it. Thanks. >>>>> >>>>> >>>>>> Otherwise, poison is simply propagated, but it does not raise UB >>>>>> Copying poison bytes is okay: >>>>>> >>>>>> // Members are initialized to poison at object creation. >>>>>> p = alloca {i8, i32} // p[0], p[4~7] are poison >>>>>> >>>>> p[0] is an i8, so it shouldn't be poison? >>>>> >>>>> >>>>>> q = alloca {i8, i32} // we want to copy p to q >>>>>> v = load i8* p[0] // v is poison >>>>>> store i8 v, i8* q[0] // poison is simply copied; no UB happened >>>>>> >>>>>> Similarly, passing/returning poison is allowed as well. >>>>>> >>>>>> Juneyoung >>>>>> >>>>>> On Fri, Oct 9, 2020 at 10:45 AM Hubert Tong < >>>>>> hubert.reinterpretcast at gmail.com> wrote: >>>>>> >>>>>>> On Thu, Oct 8, 2020 at 7:13 PM Juneyoung Lee < >>>>>>> juneyoung.lee at sf.snu.ac.kr> wrote: >>>>>>> >>>>>>>> > It is important to note that this applies to trap representations >>>>>>>> and not to unspecified values. A structure or union never has a trap >>>>>>>> representation. >>>>>>>> Yes, nondeterministic bits would work for padding of struct/union, >>>>>>>> as described in (3) The third case is the value of struct/union padding. >>>>>>>> For the members of struct/union, it is allowed to have trap >>>>>>>> representation, so poison can be used. >>>>>>>> >>>>>>> At what point are the members considered poison? For >>>>>>> copying/passing/returning a struct or union, there is no UB even if some >>>>>>> members are uninitialized. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Juneyoung >>>>>>>> >>>>>>>> On Fri, Oct 9, 2020 at 5:37 AM Hubert Tong < >>>>>>>> hubert.reinterpretcast at gmail.com> wrote: >>>>>>>> >>>>>>>>> On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev < >>>>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>>>> >>>>>>>>>> Hello all, >>>>>>>>>> >>>>>>>>>> Thank everyone who participated in the (impromptu) round table >>>>>>>>>> discussion on Tuesday. >>>>>>>>>> For those who are interested, I share the summary of the >>>>>>>>>> discussion. >>>>>>>>>> Also, I share a short-term plan regarding this issue and relevant >>>>>>>>>> patches. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *Fixing Miscompilations using Freeze* >>>>>>>>>> ----------------------------------- >>>>>>>>>> >>>>>>>>>> To reduce the cost of fixing miscompilations using freeze >>>>>>>>>> instruction, we need to >>>>>>>>>> optimize freeze away whenever possible. >>>>>>>>>> Using the no-undef/poison assumption from the source language >>>>>>>>>> (C/C++ in >>>>>>>>>> this context) can play a significant role. >>>>>>>>>> To make use the assumptions, here are short-term goals: >>>>>>>>>> >>>>>>>>>> *1. Preserve no-undef/poison assumption of function arguments >>>>>>>>>> from C/C++ when valid.* >>>>>>>>>> >>>>>>>>>> There is an ongoing relevant patch (that is written by others): >>>>>>>>>> https://reviews.llvm.org/D81678 >>>>>>>>>> >>>>>>>>>> *2. Preserve no-undef/poison assumption of lvalue reads in C/C++ >>>>>>>>>> when valid.* >>>>>>>>>> >>>>>>>>>> Reading an indeterminate value from an lvalue that does not have >>>>>>>>>> char or >>>>>>>>>> std::byte type is UB [1]. >>>>>>>>>> Since reading an lvalue is lowered to `load` in IR, we suggest >>>>>>>>>> attaching a new >>>>>>>>>> !noundef metadata to such `load`s. >>>>>>>>>> The IR-side change is here: https://reviews.llvm.org/D89050 >>>>>>>>>> The clang-side change is going to be made after D81678 is >>>>>>>>>> reviewed, because it is likely >>>>>>>>>> that this patch will have a lot of changes in clang tests. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *Replacing Undef with Poison* >>>>>>>>>> --------------------------- >>>>>>>>>> >>>>>>>>>> Since undef is known to be the source of many optimizations due >>>>>>>>>> to its complexity, >>>>>>>>>> we'd like to suggest gradually moving towards using poison only. >>>>>>>>>> To make it, (1) `poison` constant should be introduced into LLVM >>>>>>>>>> IR first, and (2) >>>>>>>>>> transformations that introduce `undef` should be updated to >>>>>>>>>> introduce `poison` instead. >>>>>>>>>> >>>>>>>>>> For the step (2), we need an experimental result showing that it >>>>>>>>>> does not cause >>>>>>>>>> performance degradation. This relies on better support for freeze >>>>>>>>>> (the >>>>>>>>>> no-undef/poison analysis patches). >>>>>>>>>> >>>>>>>>>> *1. Introduce a new `poison` constant into IR*: >>>>>>>>>> https://reviews.llvm.org/D71126 >>>>>>>>>> >>>>>>>>>> Note that `poison` constant can be used as a true placeholder >>>>>>>>>> value as well. >>>>>>>>>> Undef cannot be used in general because it is less undefined than >>>>>>>>>> poison. >>>>>>>>>> >>>>>>>>>> *2. Update transformations that introduce `undef` to introduce >>>>>>>>>> `poison` instead* >>>>>>>>>> >>>>>>>>>> (1) There are transformations that introduce `undef` as a >>>>>>>>>> placeholder (e.g. phi operand >>>>>>>>>> from an unreachable block). >>>>>>>>>> For these, `poison` can be used instead. >>>>>>>>>> >>>>>>>>>> (2) The value of an uninitialized object (automatic or dynamic). >>>>>>>>>> They are indeterminate values in C/C++, so okay to use poison >>>>>>>>>> instead. >>>>>>>>>> A tricky case is a bitfield access, and we have two >>>>>>>>>> possible solutions: >>>>>>>>>> >>>>>>>>>> - i. Introduce a very-packed struct type >>>>>>>>>> ``` >>>>>>>>>> <C> >>>>>>>>>> struct { >>>>>>>>>> int a:2, b:6; >>>>>>>>>> } s; >>>>>>>>>> >>>>>>>>>> v = s.a; >>>>>>>>>> >>>>>>>>>> => >>>>>>>>>> >>>>>>>>>> <IR> >>>>>>>>>> >>>>>>>>>> s = alloca >>>>>>>>>> >>>>>>>>>> tmp = load *{{i2, i6}}** s ; load as a very packed struct type >>>>>>>>>> v = extractvalue tmp, 0 >>>>>>>>>> ``` >>>>>>>>>> * Pros: Can be used to precisely lower C/C++'s struct typed >>>>>>>>>> function argument into IR >>>>>>>>>> (currently clang coerces a struct into int if small enough; I'll >>>>>>>>>> explain about this detail if anyone requests) >>>>>>>>>> * Cons: Since optimizations aren’t aware of the new type, they >>>>>>>>>> should be updated >>>>>>>>>> >>>>>>>>>> - ii. Use load-freeze >>>>>>>>>> ``` >>>>>>>>>> <C> >>>>>>>>>> struct { >>>>>>>>>> int a:2, b:6; >>>>>>>>>> } s; >>>>>>>>>> >>>>>>>>>> v = s.a; >>>>>>>>>> >>>>>>>>>> => >>>>>>>>>> >>>>>>>>>> <IR> >>>>>>>>>> s = alloca >>>>>>>>>> >>>>>>>>>> // Poison bits are frozen and returned >>>>>>>>>> tmp = *load freeze* i8* s >>>>>>>>>> v = tmp & 3 >>>>>>>>>> ``` >>>>>>>>>> * Pros: The change is simpler >>>>>>>>>> * Cons: Store forwarding isn’t free; needs insertion of freeze >>>>>>>>>> (store x, p; v = load freeze p => store x, p; v = freeze x) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> (3) The third case is the value of struct/union padding. >>>>>>>>>> Padding is filled with unspecified value in C, so it is too >>>>>>>>>> undefined to use poison. >>>>>>>>>> We can fill it with defined bits nondeterministically chosen at >>>>>>>>>> allocation time (freeze poison). >>>>>>>>>> >>>>>>>>>> ``` >>>>>>>>>> <C> >>>>>>>>>> struct { >>>>>>>>>> char a; // 3 bytes padding >>>>>>>>>> int b; >>>>>>>>>> } s; >>>>>>>>>> >>>>>>>>>> v = s.b; >>>>>>>>>> >>>>>>>>>> => >>>>>>>>>> >>>>>>>>>> <IR> >>>>>>>>>> s = alloca {i8, i32} // alloca initializes bytes in a >>>>>>>>>> type-dependent manner >>>>>>>>>> // s[0], s[4~7]: poison >>>>>>>>>> // s[1~3]: let's fill these bytes with nondet. bits >>>>>>>>>> >>>>>>>>>> s2 = gep (bitcast s to i8*), 4 >>>>>>>>>> v = load i32 s2 >>>>>>>>>> ``` >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Juneyoung >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> C11 6.2.6.1.5: If the stored value of an object has such a >>>>>>>>>> representation and is read by an lvalue expression that does not have >>>>>>>>>> character type, the behavior is undefined. >>>>>>>>>> (Similarly, C17 6.2.6.1.5) >>>>>>>>>> >>>>>>>>> It is important to note that this applies to trap representations >>>>>>>>> and not to unspecified values. A structure or union never has a trap >>>>>>>>> representation. >>>>>>>>> >>>>>>>>> >>>>>>>>>> C++14 8.5.12: If an indeterminate value is produced by an >>>>>>>>>> evaluation, the behavior is undefined except in the following cases: If an >>>>>>>>>> indeterminate value of unsigned narrow character type ... >>>>>>>>>> (Similarly, C++17 11.6.12 , C++11 4.1.1) >>>>>>>>>> >>>>>>>>> While loading undef for the unsigned character type case merely >>>>>>>>> produces undef, for C++, operations such as sign-extend or zero-extend on >>>>>>>>> an undef i8 is also undefined behaviour. >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> LLVM Developers mailing list >>>>>>>>>> llvm-dev at lists.llvm.org >>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Juneyoung Lee >>>>>>>> Software Foundation Lab, Seoul National University >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Juneyoung Lee >>>>>> Software Foundation Lab, Seoul National University >>>>>> >>>>> >>>> >>>> -- >>>> >>>> Juneyoung Lee >>>> Software Foundation Lab, Seoul National University >>>> >>> >> >> -- >> >> Juneyoung Lee >> Software Foundation Lab, Seoul National University >> >-- Juneyoung Lee Software Foundation Lab, Seoul National University -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201012/f0a1a247/attachment.html>