thr3ads.net - llvm dev - [llvm-dev] Is this undefined behavior optimization legal? [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Tom Stellard via llvm-dev

2016-Oct-03 20:51 UTC

[llvm-dev] Is this undefined behavior optimization legal?

Hi,

I've found a test case where SelectionDAG is doing an undefined behavior
optimization, and I need help determining whether or not this is legal.

Here is the example IR: 

define void @test(<4 x i8> addrspace(1)* %out, float %a) {
  %uint8 = fptoui float %a to i8
  %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>, i8 %uint8,
i32 0
  store <4 x i8> %vec, <4 x i8> addrspace(1)* %out
  ret void
}

Since %vec is a 32-bit vector, a common way to implement this function on a
target
with 32-bit registers would be to zero initialize a 32-bit register to hold
the initial vector and then 'mask' and 'or' the inserted value
with the
initial vector.  In AMDGPU assembly it would look something like:

v_mov_b32 v0, 0
v_cvt_u32_f32_e32 v1, s0
v_and_b32 v1, v1, 0x000000ff
v_or_b32 v0, v0, v1

The optimization the SelectionDAG does for us in this function, though, ends
up removing the mask operation.  Which gives us:

v_mov_b32 v0, 0
v_cvt_u32_f32_e32 v1, s0
v_or_b32 v0, v0, v1

The reason the SelectionDAG is doing this is because it knows that the result
of %uint8 = fptoui float %a to i8 is undefined when the result uses more than
8-bits.  So, it assumes that the result will only set the low 8-bits, because
anything else would be undefined behavior and the program would be broken.
This assumption is what causes it to remove the 'and' operation.

So effectively, what has happened here, is that by inserting the result of
an operation with undefined behavior into one lane of a vector, we have
overwritten all the other lanes of the vector.

Is this optimization legal?  To me it seems wrong that undefined behavior
in one lane of a vector could affect another lane.  However, given that LLVM IR
is SSA and we are technically creating a new vector and not modifying the old
one, then maybe it's OK.  I'm just not sure.

Appreciate any insight people may have.

Thanks,
Tom

Hal Finkel via llvm-dev

2016-Oct-03 20:58 UTC

head link

[llvm-dev] Is this undefined behavior optimization legal?

----- Original Message -----> From: "Tom Stellard via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: llvm-dev at lists.llvm.org
> Sent: Monday, October 3, 2016 3:51:40 PM
> Subject: [llvm-dev] Is this undefined behavior optimization legal?
> 
> Hi,
> 
> I've found a test case where SelectionDAG is doing an undefined
> behavior
> optimization, and I need help determining whether or not this is
> legal.
> 
> Here is the example IR:
> 
> define void @test(<4 x i8> addrspace(1)* %out, float %a) {
>   %uint8 = fptoui float %a to i8
>   %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>, i8
%uint8,
>   i32 0
>   store <4 x i8> %vec, <4 x i8> addrspace(1)* %out
>   ret void
> }
> 
> Since %vec is a 32-bit vector, a common way to implement this
> function on a target
> with 32-bit registers would be to zero initialize a 32-bit register
> to hold
> the initial vector and then 'mask' and 'or' the inserted
value with
> the
> initial vector.  In AMDGPU assembly it would look something like:
> 
> v_mov_b32 v0, 0
> v_cvt_u32_f32_e32 v1, s0
> v_and_b32 v1, v1, 0x000000ff
> v_or_b32 v0, v0, v1
> 
> The optimization the SelectionDAG does for us in this function,
> though, ends
> up removing the mask operation.  Which gives us:
> 
> v_mov_b32 v0, 0
> v_cvt_u32_f32_e32 v1, s0
> v_or_b32 v0, v0, v1
> 
> The reason the SelectionDAG is doing this is because it knows that
> the result
> of %uint8 = fptoui float %a to i8 is undefined when the result uses
> more than
> 8-bits.  So, it assumes that the result will only set the low 8-bits,
> because
> anything else would be undefined behavior and the program would be
> broken.
> This assumption is what causes it to remove the 'and' operation.
> 
> So effectively, what has happened here, is that by inserting the
> result of
> an operation with undefined behavior into one lane of a vector, we
> have
> overwritten all the other lanes of the vector.
> 
> Is this optimization legal?  To me it seems wrong that undefined
> behavior
> in one lane of a vector could affect another lane.  However, given
> that LLVM IR
> is SSA and we are technically creating a new vector and not modifying
> the old
> one, then maybe it's OK.  I'm just not sure.
> 
> Appreciate any insight people may have.
So, to be clear, for values of %a that are not undefined behavior (i.e. that
really do produce an integer than can be represented in the i8), the code does
indeed store <4 x i8> <i8 %uint8, i8 0, i8 0, i8 0> into *%out? If
so, this seems legal to me.

 -Hal
> 
> Thanks,
> Tom
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Tom Stellard via llvm-dev

2016-Oct-03 21:13 UTC

head link

[llvm-dev] Is this undefined behavior optimization legal?

On Mon, Oct 03, 2016 at 03:58:01PM -0500, Hal Finkel
wrote:> ----- Original Message -----
> > From: "Tom Stellard via llvm-dev" <llvm-dev at
lists.llvm.org>
> > To: llvm-dev at lists.llvm.org
> > Sent: Monday, October 3, 2016 3:51:40 PM
> > Subject: [llvm-dev] Is this undefined behavior optimization legal?
> > 
> > Hi,
> > 
> > I've found a test case where SelectionDAG is doing an undefined
> > behavior
> > optimization, and I need help determining whether or not this is
> > legal.
> > 
> > Here is the example IR:
> > 
> > define void @test(<4 x i8> addrspace(1)* %out, float %a) {
> >   %uint8 = fptoui float %a to i8
> >   %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>,
i8 %uint8,
> >   i32 0
> >   store <4 x i8> %vec, <4 x i8> addrspace(1)* %out
> >   ret void
> > }
> > 
> > Since %vec is a 32-bit vector, a common way to implement this
> > function on a target
> > with 32-bit registers would be to zero initialize a 32-bit register
> > to hold
> > the initial vector and then 'mask' and 'or' the
inserted value with
> > the
> > initial vector.  In AMDGPU assembly it would look something like:
> > 
> > v_mov_b32 v0, 0
> > v_cvt_u32_f32_e32 v1, s0
> > v_and_b32 v1, v1, 0x000000ff
> > v_or_b32 v0, v0, v1
> > 
> > The optimization the SelectionDAG does for us in this function,
> > though, ends
> > up removing the mask operation.  Which gives us:
> > 
> > v_mov_b32 v0, 0
> > v_cvt_u32_f32_e32 v1, s0
> > v_or_b32 v0, v0, v1
> > 
> > The reason the SelectionDAG is doing this is because it knows that
> > the result
> > of %uint8 = fptoui float %a to i8 is undefined when the result uses
> > more than
> > 8-bits.  So, it assumes that the result will only set the low 8-bits,
> > because
> > anything else would be undefined behavior and the program would be
> > broken.
> > This assumption is what causes it to remove the 'and'
operation.
> > 
> > So effectively, what has happened here, is that by inserting the
> > result of
> > an operation with undefined behavior into one lane of a vector, we
> > have
> > overwritten all the other lanes of the vector.
> > 
> > Is this optimization legal?  To me it seems wrong that undefined
> > behavior
> > in one lane of a vector could affect another lane.  However, given
> > that LLVM IR
> > is SSA and we are technically creating a new vector and not modifying
> > the old
> > one, then maybe it's OK.  I'm just not sure.
> > 
> > Appreciate any insight people may have.
> 
> So, to be clear, for values of %a that are not undefined behavior (i.e.
that really do produce an integer than can be represented in the i8), the code
does indeed store <4 x i8> <i8 %uint8, i8 0, i8 0, i8 0> into *%out?
If so, this seems legal to me.
> 
That is correct. When there is no undefined behavior then the high 24-bits
(representing lanes 1, 2, 3) of the stored value are always 0.

-Tom
>  -Hal
> 
> > 
> > Thanks,
> > Tom
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > 
> 
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory

Mehdi Amini via llvm-dev

2016-Oct-03 21:27 UTC

head link

[llvm-dev] Is this undefined behavior optimization legal?

> On Oct 3, 2016, at 1:51 PM, Tom Stellard via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
> 
> I've found a test case where SelectionDAG is doing an undefined
behavior
> optimization, and I need help determining whether or not this is legal.
> 
> Here is the example IR: 
> 
> define void @test(<4 x i8> addrspace(1)* %out, float %a) {
>  %uint8 = fptoui float %a to i8
>  %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>, i8
%uint8, i32 0
>  store <4 x i8> %vec, <4 x i8> addrspace(1)* %out
>  ret void
> }
> 
> Since %vec is a 32-bit vector, a common way to implement this function on a
target
> with 32-bit registers would be to zero initialize a 32-bit register to hold
> the initial vector and then 'mask' and 'or' the inserted
value with the
> initial vector.  In AMDGPU assembly it would look something like:
> 
> v_mov_b32 v0, 0
> v_cvt_u32_f32_e32 v1, s0
> v_and_b32 v1, v1, 0x000000ff
> v_or_b32 v0, v0, v1
> 
> The optimization the SelectionDAG does for us in this function, though,
ends
> up removing the mask operation.  Which gives us:
> 
> v_mov_b32 v0, 0
> v_cvt_u32_f32_e32 v1, s0
> v_or_b32 v0, v0, v1
> 
> The reason the SelectionDAG is doing this is because it knows that the
result
> of %uint8 = fptoui float %a to i8 is undefined when the result uses more
than
> 8-bits.  So, it assumes that the result will only set the low 8-bits,
because
> anything else would be undefined behavior and the program would be broken.
> This assumption is what causes it to remove the 'and' operation.
> 
> So effectively, what has happened here, is that by inserting the result of
> an operation with undefined behavior into one lane of a vector, we have
> overwritten all the other lanes of the vector.
> 
> Is this optimization legal?  To me it seems wrong that undefined behavior
> in one lane of a vector could affect another lane.  
Isn’t undefined behavior in a program that all the program is undefined?
I’m not sure why you think that there should be a limit to what the optimizer
can do specifically on the vector lane while we don’t put any limit usually.

There might be a question about your fptoui conversion here though: is it
guarantee to write zero to the upper bits of the 32bits register?
In the IR it produces an i8 value, and insert it in a vector. It isn’t clear to
me which combine / transformation knows that the fptoui will zero the upper part
of the register.

— 
Mehdi


> However, given that LLVM IR
> is SSA and we are technically creating a new vector and not modifying the
old
> one, then maybe it's OK.  I'm just not sure.
> 
> Appreciate any insight people may have.
> 
> Thanks,
> Tom
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Kevin Choi via llvm-dev

2016-Oct-03 22:04 UTC

head link

[llvm-dev] Is this undefined behavior optimization legal?

> This assumption is what causes it to remove the 'and' operation.
CMIIW, this assumption appears to be flawed. Initialization values are
escaping side-effects and removing them is making a correct program
incorrect.

-Kevin

On Mon, Oct 3, 2016 at 2:27 PM, Mehdi Amini via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> > On Oct 3, 2016, at 1:51 PM, Tom Stellard via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > Hi,
> >
> > I've found a test case where SelectionDAG is doing an undefined
behavior
> > optimization, and I need help determining whether or not this is
legal.
> >
> > Here is the example IR:
> >
> > define void @test(<4 x i8> addrspace(1)* %out, float %a) {
> >  %uint8 = fptoui float %a to i8
> >  %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>,
i8 %uint8, i32 0
> >  store <4 x i8> %vec, <4 x i8> addrspace(1)* %out
> >  ret void
> > }
> >
> > Since %vec is a 32-bit vector, a common way to implement this function
> on a target
> > with 32-bit registers would be to zero initialize a 32-bit register to
> hold
> > the initial vector and then 'mask' and 'or' the
inserted value with the
> > initial vector.  In AMDGPU assembly it would look something like:
> >
> > v_mov_b32 v0, 0
> > v_cvt_u32_f32_e32 v1, s0
> > v_and_b32 v1, v1, 0x000000ff
> > v_or_b32 v0, v0, v1
> >
> > The optimization the SelectionDAG does for us in this function,
though,
> ends
> > up removing the mask operation.  Which gives us:
> >
> > v_mov_b32 v0, 0
> > v_cvt_u32_f32_e32 v1, s0
> > v_or_b32 v0, v0, v1
> >
> > The reason the SelectionDAG is doing this is because it knows that the
> result
> > of %uint8 = fptoui float %a to i8 is undefined when the result uses
more
> than
> > 8-bits.  So, it assumes that the result will only set the low 8-bits,
> because
> > anything else would be undefined behavior and the program would be
> broken.
> > This assumption is what causes it to remove the 'and'
operation.
> >
> > So effectively, what has happened here, is that by inserting the
result
> of
> > an operation with undefined behavior into one lane of a vector, we
have
> > overwritten all the other lanes of the vector.
> >
> > Is this optimization legal?  To me it seems wrong that undefined
behavior
> > in one lane of a vector could affect another lane.
>
> Isn’t undefined behavior in a program that all the program is undefined?
> I’m not sure why you think that there should be a limit to what the
> optimizer can do specifically on the vector lane while we don’t put any
> limit usually.
>
> There might be a question about your fptoui conversion here though: is it
> guarantee to write zero to the upper bits of the 32bits register?
> In the IR it produces an i8 value, and insert it in a vector. It isn’t
> clear to me which combine / transformation knows that the fptoui will zero
> the upper part of the register.
>
> —
> Mehdi
>
>
>
> > However, given that LLVM IR
> > is SSA and we are technically creating a new vector and not modifying
> the old
> > one, then maybe it's OK.  I'm just not sure.
> >
> > Appreciate any insight people may have.
> >
> > Thanks,
> > Tom
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161003/6b26f0f4/attachment.html>

Friedman, Eli via llvm-dev

2016-Oct-03 23:10 UTC

head link

[llvm-dev] Is this undefined behavior optimization legal?

On 10/3/2016 1:51 PM, Tom Stellard via llvm-dev wrote:> Hi,
>
> I've found a test case where SelectionDAG is doing an undefined
behavior
> optimization, and I need help determining whether or not this is legal.
>
> Here is the example IR:
>
> define void @test(<4 x i8> addrspace(1)* %out, float %a) {
>    %uint8 = fptoui float %a to i8
>    %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>, i8
%uint8, i32 0
>    store <4 x i8> %vec, <4 x i8> addrspace(1)* %out
>    ret void
> }
>
> Since %vec is a 32-bit vector, a common way to implement this function on a
target
> with 32-bit registers would be to zero initialize a 32-bit register to hold
> the initial vector and then 'mask' and 'or' the inserted
value with the
> initial vector.  In AMDGPU assembly it would look something like:
>
> v_mov_b32 v0, 0
> v_cvt_u32_f32_e32 v1, s0
> v_and_b32 v1, v1, 0x000000ff
> v_or_b32 v0, v0, v1
>
> The optimization the SelectionDAG does for us in this function, though,
ends
> up removing the mask operation.  Which gives us:
>
> v_mov_b32 v0, 0
> v_cvt_u32_f32_e32 v1, s0
> v_or_b32 v0, v0, v1
>
> The reason the SelectionDAG is doing this is because it knows that the
result
> of %uint8 = fptoui float %a to i8 is undefined when the result uses more
than
> 8-bits.  So, it assumes that the result will only set the low 8-bits,
because
> anything else would be undefined behavior and the program would be broken.
> This assumption is what causes it to remove the 'and' operation.
>
> So effectively, what has happened here, is that by inserting the result of
> an operation with undefined behavior into one lane of a vector, we have
> overwritten all the other lanes of the vector.
>
> Is this optimization legal?  To me it seems wrong that undefined behavior
> in one lane of a vector could affect another lane.  However, given that
LLVM IR
> is SSA and we are technically creating a new vector and not modifying the
old
> one, then maybe it's OK.  I'm just not sure.
>
> Appreciate any insight people may have.
The way insertelement is defined, inserting an element never affects the 
other elements of the vector ("Its element values are those 
of|val|...")  So the question is whether you're triggering undefined 
behavior in some other way. Looking at LangRef for fptoui, it says "If 
the value cannot fit in|ty2|, the results are undefined", i.e. the value 
is equivalent to the constant "undef".  Therefore, you should end up 
storing "<4 x i8> <undef, 0, 0, 0>", not "<4 x
i8> undef".

Note that there's a tradeoff here: saying that fptoui for out-of-range 
values doesn't have undefined behavior allows us to simplify control 
flow and hoist operations more aggressively.

-Eli

-- 

Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161003/ba4df182/attachment.html>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Oct 2016 - Is this undefined behavior optimization legal?

[llvm-dev] Is this undefined behavior optimization legal?

[llvm-dev] Is this undefined behavior optimization legal?

[llvm-dev] Is this undefined behavior optimization legal?

[llvm-dev] Is this undefined behavior optimization legal?

[llvm-dev] Is this undefined behavior optimization legal?

[llvm-dev] Is this undefined behavior optimization legal?

Possibly Parallel Threads