Tom Stellard via llvm-dev
2016-Oct-03 20:51 UTC
[llvm-dev] Is this undefined behavior optimization legal?
Hi, I've found a test case where SelectionDAG is doing an undefined behavior optimization, and I need help determining whether or not this is legal. Here is the example IR: define void @test(<4 x i8> addrspace(1)* %out, float %a) { %uint8 = fptoui float %a to i8 %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>, i8 %uint8, i32 0 store <4 x i8> %vec, <4 x i8> addrspace(1)* %out ret void } Since %vec is a 32-bit vector, a common way to implement this function on a target with 32-bit registers would be to zero initialize a 32-bit register to hold the initial vector and then 'mask' and 'or' the inserted value with the initial vector. In AMDGPU assembly it would look something like: v_mov_b32 v0, 0 v_cvt_u32_f32_e32 v1, s0 v_and_b32 v1, v1, 0x000000ff v_or_b32 v0, v0, v1 The optimization the SelectionDAG does for us in this function, though, ends up removing the mask operation. Which gives us: v_mov_b32 v0, 0 v_cvt_u32_f32_e32 v1, s0 v_or_b32 v0, v0, v1 The reason the SelectionDAG is doing this is because it knows that the result of %uint8 = fptoui float %a to i8 is undefined when the result uses more than 8-bits. So, it assumes that the result will only set the low 8-bits, because anything else would be undefined behavior and the program would be broken. This assumption is what causes it to remove the 'and' operation. So effectively, what has happened here, is that by inserting the result of an operation with undefined behavior into one lane of a vector, we have overwritten all the other lanes of the vector. Is this optimization legal? To me it seems wrong that undefined behavior in one lane of a vector could affect another lane. However, given that LLVM IR is SSA and we are technically creating a new vector and not modifying the old one, then maybe it's OK. I'm just not sure. Appreciate any insight people may have. Thanks, Tom
Hal Finkel via llvm-dev
2016-Oct-03 20:58 UTC
[llvm-dev] Is this undefined behavior optimization legal?
----- Original Message -----> From: "Tom Stellard via llvm-dev" <llvm-dev at lists.llvm.org> > To: llvm-dev at lists.llvm.org > Sent: Monday, October 3, 2016 3:51:40 PM > Subject: [llvm-dev] Is this undefined behavior optimization legal? > > Hi, > > I've found a test case where SelectionDAG is doing an undefined > behavior > optimization, and I need help determining whether or not this is > legal. > > Here is the example IR: > > define void @test(<4 x i8> addrspace(1)* %out, float %a) { > %uint8 = fptoui float %a to i8 > %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>, i8 %uint8, > i32 0 > store <4 x i8> %vec, <4 x i8> addrspace(1)* %out > ret void > } > > Since %vec is a 32-bit vector, a common way to implement this > function on a target > with 32-bit registers would be to zero initialize a 32-bit register > to hold > the initial vector and then 'mask' and 'or' the inserted value with > the > initial vector. In AMDGPU assembly it would look something like: > > v_mov_b32 v0, 0 > v_cvt_u32_f32_e32 v1, s0 > v_and_b32 v1, v1, 0x000000ff > v_or_b32 v0, v0, v1 > > The optimization the SelectionDAG does for us in this function, > though, ends > up removing the mask operation. Which gives us: > > v_mov_b32 v0, 0 > v_cvt_u32_f32_e32 v1, s0 > v_or_b32 v0, v0, v1 > > The reason the SelectionDAG is doing this is because it knows that > the result > of %uint8 = fptoui float %a to i8 is undefined when the result uses > more than > 8-bits. So, it assumes that the result will only set the low 8-bits, > because > anything else would be undefined behavior and the program would be > broken. > This assumption is what causes it to remove the 'and' operation. > > So effectively, what has happened here, is that by inserting the > result of > an operation with undefined behavior into one lane of a vector, we > have > overwritten all the other lanes of the vector. > > Is this optimization legal? To me it seems wrong that undefined > behavior > in one lane of a vector could affect another lane. However, given > that LLVM IR > is SSA and we are technically creating a new vector and not modifying > the old > one, then maybe it's OK. I'm just not sure. > > Appreciate any insight people may have.So, to be clear, for values of %a that are not undefined behavior (i.e. that really do produce an integer than can be represented in the i8), the code does indeed store <4 x i8> <i8 %uint8, i8 0, i8 0, i8 0> into *%out? If so, this seems legal to me. -Hal> > Thanks, > Tom > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Tom Stellard via llvm-dev
2016-Oct-03 21:13 UTC
[llvm-dev] Is this undefined behavior optimization legal?
On Mon, Oct 03, 2016 at 03:58:01PM -0500, Hal Finkel wrote:> ----- Original Message ----- > > From: "Tom Stellard via llvm-dev" <llvm-dev at lists.llvm.org> > > To: llvm-dev at lists.llvm.org > > Sent: Monday, October 3, 2016 3:51:40 PM > > Subject: [llvm-dev] Is this undefined behavior optimization legal? > > > > Hi, > > > > I've found a test case where SelectionDAG is doing an undefined > > behavior > > optimization, and I need help determining whether or not this is > > legal. > > > > Here is the example IR: > > > > define void @test(<4 x i8> addrspace(1)* %out, float %a) { > > %uint8 = fptoui float %a to i8 > > %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>, i8 %uint8, > > i32 0 > > store <4 x i8> %vec, <4 x i8> addrspace(1)* %out > > ret void > > } > > > > Since %vec is a 32-bit vector, a common way to implement this > > function on a target > > with 32-bit registers would be to zero initialize a 32-bit register > > to hold > > the initial vector and then 'mask' and 'or' the inserted value with > > the > > initial vector. In AMDGPU assembly it would look something like: > > > > v_mov_b32 v0, 0 > > v_cvt_u32_f32_e32 v1, s0 > > v_and_b32 v1, v1, 0x000000ff > > v_or_b32 v0, v0, v1 > > > > The optimization the SelectionDAG does for us in this function, > > though, ends > > up removing the mask operation. Which gives us: > > > > v_mov_b32 v0, 0 > > v_cvt_u32_f32_e32 v1, s0 > > v_or_b32 v0, v0, v1 > > > > The reason the SelectionDAG is doing this is because it knows that > > the result > > of %uint8 = fptoui float %a to i8 is undefined when the result uses > > more than > > 8-bits. So, it assumes that the result will only set the low 8-bits, > > because > > anything else would be undefined behavior and the program would be > > broken. > > This assumption is what causes it to remove the 'and' operation. > > > > So effectively, what has happened here, is that by inserting the > > result of > > an operation with undefined behavior into one lane of a vector, we > > have > > overwritten all the other lanes of the vector. > > > > Is this optimization legal? To me it seems wrong that undefined > > behavior > > in one lane of a vector could affect another lane. However, given > > that LLVM IR > > is SSA and we are technically creating a new vector and not modifying > > the old > > one, then maybe it's OK. I'm just not sure. > > > > Appreciate any insight people may have. > > So, to be clear, for values of %a that are not undefined behavior (i.e. that really do produce an integer than can be represented in the i8), the code does indeed store <4 x i8> <i8 %uint8, i8 0, i8 0, i8 0> into *%out? If so, this seems legal to me. >That is correct. When there is no undefined behavior then the high 24-bits (representing lanes 1, 2, 3) of the stored value are always 0. -Tom> -Hal > > > > > Thanks, > > Tom > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory
Mehdi Amini via llvm-dev
2016-Oct-03 21:27 UTC
[llvm-dev] Is this undefined behavior optimization legal?
> On Oct 3, 2016, at 1:51 PM, Tom Stellard via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi, > > I've found a test case where SelectionDAG is doing an undefined behavior > optimization, and I need help determining whether or not this is legal. > > Here is the example IR: > > define void @test(<4 x i8> addrspace(1)* %out, float %a) { > %uint8 = fptoui float %a to i8 > %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>, i8 %uint8, i32 0 > store <4 x i8> %vec, <4 x i8> addrspace(1)* %out > ret void > } > > Since %vec is a 32-bit vector, a common way to implement this function on a target > with 32-bit registers would be to zero initialize a 32-bit register to hold > the initial vector and then 'mask' and 'or' the inserted value with the > initial vector. In AMDGPU assembly it would look something like: > > v_mov_b32 v0, 0 > v_cvt_u32_f32_e32 v1, s0 > v_and_b32 v1, v1, 0x000000ff > v_or_b32 v0, v0, v1 > > The optimization the SelectionDAG does for us in this function, though, ends > up removing the mask operation. Which gives us: > > v_mov_b32 v0, 0 > v_cvt_u32_f32_e32 v1, s0 > v_or_b32 v0, v0, v1 > > The reason the SelectionDAG is doing this is because it knows that the result > of %uint8 = fptoui float %a to i8 is undefined when the result uses more than > 8-bits. So, it assumes that the result will only set the low 8-bits, because > anything else would be undefined behavior and the program would be broken. > This assumption is what causes it to remove the 'and' operation. > > So effectively, what has happened here, is that by inserting the result of > an operation with undefined behavior into one lane of a vector, we have > overwritten all the other lanes of the vector. > > Is this optimization legal? To me it seems wrong that undefined behavior > in one lane of a vector could affect another lane.Isn’t undefined behavior in a program that all the program is undefined? I’m not sure why you think that there should be a limit to what the optimizer can do specifically on the vector lane while we don’t put any limit usually. There might be a question about your fptoui conversion here though: is it guarantee to write zero to the upper bits of the 32bits register? In the IR it produces an i8 value, and insert it in a vector. It isn’t clear to me which combine / transformation knows that the fptoui will zero the upper part of the register. — Mehdi> However, given that LLVM IR > is SSA and we are technically creating a new vector and not modifying the old > one, then maybe it's OK. I'm just not sure. > > Appreciate any insight people may have. > > Thanks, > Tom > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Kevin Choi via llvm-dev
2016-Oct-03 22:04 UTC
[llvm-dev] Is this undefined behavior optimization legal?
> This assumption is what causes it to remove the 'and' operation.CMIIW, this assumption appears to be flawed. Initialization values are escaping side-effects and removing them is making a correct program incorrect. -Kevin On Mon, Oct 3, 2016 at 2:27 PM, Mehdi Amini via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > > On Oct 3, 2016, at 1:51 PM, Tom Stellard via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > Hi, > > > > I've found a test case where SelectionDAG is doing an undefined behavior > > optimization, and I need help determining whether or not this is legal. > > > > Here is the example IR: > > > > define void @test(<4 x i8> addrspace(1)* %out, float %a) { > > %uint8 = fptoui float %a to i8 > > %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>, i8 %uint8, i32 0 > > store <4 x i8> %vec, <4 x i8> addrspace(1)* %out > > ret void > > } > > > > Since %vec is a 32-bit vector, a common way to implement this function > on a target > > with 32-bit registers would be to zero initialize a 32-bit register to > hold > > the initial vector and then 'mask' and 'or' the inserted value with the > > initial vector. In AMDGPU assembly it would look something like: > > > > v_mov_b32 v0, 0 > > v_cvt_u32_f32_e32 v1, s0 > > v_and_b32 v1, v1, 0x000000ff > > v_or_b32 v0, v0, v1 > > > > The optimization the SelectionDAG does for us in this function, though, > ends > > up removing the mask operation. Which gives us: > > > > v_mov_b32 v0, 0 > > v_cvt_u32_f32_e32 v1, s0 > > v_or_b32 v0, v0, v1 > > > > The reason the SelectionDAG is doing this is because it knows that the > result > > of %uint8 = fptoui float %a to i8 is undefined when the result uses more > than > > 8-bits. So, it assumes that the result will only set the low 8-bits, > because > > anything else would be undefined behavior and the program would be > broken. > > This assumption is what causes it to remove the 'and' operation. > > > > So effectively, what has happened here, is that by inserting the result > of > > an operation with undefined behavior into one lane of a vector, we have > > overwritten all the other lanes of the vector. > > > > Is this optimization legal? To me it seems wrong that undefined behavior > > in one lane of a vector could affect another lane. > > Isn’t undefined behavior in a program that all the program is undefined? > I’m not sure why you think that there should be a limit to what the > optimizer can do specifically on the vector lane while we don’t put any > limit usually. > > There might be a question about your fptoui conversion here though: is it > guarantee to write zero to the upper bits of the 32bits register? > In the IR it produces an i8 value, and insert it in a vector. It isn’t > clear to me which combine / transformation knows that the fptoui will zero > the upper part of the register. > > — > Mehdi > > > > > However, given that LLVM IR > > is SSA and we are technically creating a new vector and not modifying > the old > > one, then maybe it's OK. I'm just not sure. > > > > Appreciate any insight people may have. > > > > Thanks, > > Tom > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161003/6b26f0f4/attachment.html>
Friedman, Eli via llvm-dev
2016-Oct-03 23:10 UTC
[llvm-dev] Is this undefined behavior optimization legal?
On 10/3/2016 1:51 PM, Tom Stellard via llvm-dev wrote:> Hi, > > I've found a test case where SelectionDAG is doing an undefined behavior > optimization, and I need help determining whether or not this is legal. > > Here is the example IR: > > define void @test(<4 x i8> addrspace(1)* %out, float %a) { > %uint8 = fptoui float %a to i8 > %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>, i8 %uint8, i32 0 > store <4 x i8> %vec, <4 x i8> addrspace(1)* %out > ret void > } > > Since %vec is a 32-bit vector, a common way to implement this function on a target > with 32-bit registers would be to zero initialize a 32-bit register to hold > the initial vector and then 'mask' and 'or' the inserted value with the > initial vector. In AMDGPU assembly it would look something like: > > v_mov_b32 v0, 0 > v_cvt_u32_f32_e32 v1, s0 > v_and_b32 v1, v1, 0x000000ff > v_or_b32 v0, v0, v1 > > The optimization the SelectionDAG does for us in this function, though, ends > up removing the mask operation. Which gives us: > > v_mov_b32 v0, 0 > v_cvt_u32_f32_e32 v1, s0 > v_or_b32 v0, v0, v1 > > The reason the SelectionDAG is doing this is because it knows that the result > of %uint8 = fptoui float %a to i8 is undefined when the result uses more than > 8-bits. So, it assumes that the result will only set the low 8-bits, because > anything else would be undefined behavior and the program would be broken. > This assumption is what causes it to remove the 'and' operation. > > So effectively, what has happened here, is that by inserting the result of > an operation with undefined behavior into one lane of a vector, we have > overwritten all the other lanes of the vector. > > Is this optimization legal? To me it seems wrong that undefined behavior > in one lane of a vector could affect another lane. However, given that LLVM IR > is SSA and we are technically creating a new vector and not modifying the old > one, then maybe it's OK. I'm just not sure. > > Appreciate any insight people may have.The way insertelement is defined, inserting an element never affects the other elements of the vector ("Its element values are those of|val|...") So the question is whether you're triggering undefined behavior in some other way. Looking at LangRef for fptoui, it says "If the value cannot fit in|ty2|, the results are undefined", i.e. the value is equivalent to the constant "undef". Therefore, you should end up storing "<4 x i8> <undef, 0, 0, 0>", not "<4 x i8> undef". Note that there's a tradeoff here: saying that fptoui for out-of-range values doesn't have undefined behavior allows us to simplify control flow and hoist operations more aggressively. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161003/ba4df182/attachment.html>
Maybe Matching Threads
- Implementing cross-thread reduction in the AMDGPU backend
- Implementing cross-thread reduction in the AMDGPU backend
- Implementing cross-thread reduction in the AMDGPU backend
- Implementing cross-thread reduction in the AMDGPU backend
- Implementing cross-thread reduction in the AMDGPU backend