thr3ads.net - llvm dev - [llvm-dev] Branch is not optimized because of right shift [Apr 2020]

If this information is useful, please help other people find it:
Share via:

Stefanos Baziotis via llvm-dev

2020-Apr-05 21:00 UTC

[llvm-dev] Branch is not optimized because of right shift

Hi,
> I think the IR in both of your examples makes things harder for thecompiler than expected from the original C source.

Note that both versions are from clang with -O2. The first is with version
9.0 and the second is with the trunk.
> but in the branch only %0 is used. Sinking the lshr too early made theanalysis harder.

Yes, exactly! That's what I figured too.
> The version in https://godbolt.org/z/_ipKhb  is probably the easiest foranalysis (basically the original C source code built with `clang -O0 -S
-emit-llvm`, followed by running `opt -mem2reg`). There’s a patch under
review that adds support for conditional range propagation (
https://reviews.llvm.org/D76611) and with the patch it can be simplified to
the code below by running `opt  -ipsccp -simplifycfg
-instcombine`>
> define i32 @test(i32 %0) {
>   %.off = add i32 %0, -16
>   %2 = icmp ult i32 %.off, 12
>   %spec.select = select i1 %2, i32 66, i32 45
>   ret i32 %spec.select
> }
>
>
> The reason it does not yet work in the default pipeline is that thebranch condition will be combined to use an AND before the conditional
propagation and D76611 does not support that yet. But once it lands that
should be an easy extension.>
> The problems with your two versions could be addressed as well of coursefor that special case relatively easily I think, but the challenge is to
fix it in a general and efficient (compile-time wise) way. I hope the
conditional propagation should cover such cases soon though.

Ok, I checked the patch, best of luck! I see how it does the
transformation, it's interesting. I think that
https://reviews.llvm.org/rGe9963034314edf49a12ea5e29f694d8f9f52734a could
maybe help in such things.
Any idea about how the compiler could remove the lshr and use a add -16?

Best,
Stefanos Baziotis


Στις Κυρ, 5 Απρ 2020 στις 10:06 μ.μ., ο/η Florian Hahn <
florian_hahn at apple.com> έγραψε:
> Hi,
>
> > On Apr 5, 2020, at 14:08, Stefanos Baziotis via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> > To be more specific for everyone:
> > - First of all, I hope it's clear that in the original (C) code,
the
> region - 0x8 > 1000 branch should
> > be eliminated. That is because it is inside a block that has ensured
> that 8 < region < 12. But for some reason,
> > it is not eliminated. I'm trying to find that reason. :)
> > - I understand that in the -O2 .ll output, we take %0 and do it a
right
> shift. That means that we don't have
> > any range info about %0 or %0 >>= 1, so we can't eliminate
the branch.
> What I don't understand
> > is why we chose to reuse %0 and do it a right shift and instead we
> didn't have a phi there.
> > - Finally, I saw that putting nuw in the addition with -8 eliminates
the
> branch. This makes sense since,
> > we know probably know from the code above that %0 >>= 1 is less
than 12.
> Although, I don't understand
> > why the right shift was translated to an add with -16.
> >
> > I hope this made some sense. You may ignore the last 2 and focus on
the
> first, i.e. what optimization
> > should have been done but it's not and what we can do about it.
> >
> > A clearer version of the .ll: https://godbolt.org/z/2t4RU5
>
> I think the IR in both of your examples makes things harder for the
> compiler than expected from the original C source.
>
> With the IR in your original example (https://godbolt.org/z/BL-4jL), I
> think the problem is that the branch condition is '%0 - 16 < 12’,
which
> allows us to limit the range of `%0 - 16` easily, but in the branch only %0
> is used. Sinking the lshr too early made the analysis harder.
>
> In the second example (https://godbolt.org/z/2t4RU5), things are hard to
> simplify because we need to use the information from the condition of one
> select to simplify the earlier select.
>
> The version in https://godbolt.org/z/_ipKhb  is probably the easiest for
> analysis (basically the original C source code built with `clang -O0 -S
> -emit-llvm`, followed by running `opt -mem2reg`). There’s a patch under
> review that adds support for conditional range propagation (
> https://reviews.llvm.org/D76611) and with the patch it can be simplified
> to the code below by running `opt  -ipsccp -simplifycfg -instcombine`
>
> define i32 @test(i32 %0) {
>   %.off = add i32 %0, -16
>   %2 = icmp ult i32 %.off, 12
>   %spec.select = select i1 %2, i32 66, i32 45
>   ret i32 %spec.select
> }
>
>
> The reason it does not yet work in the default pipeline is that the branch
> condition will be combined to use an AND before the conditional propagation
> and D76611 does not support that yet. But once it lands that should be an
> easy extension.
>
> The problems with your two versions could be addressed as well of course
> for that special case relatively easily I think, but the challenge is to
> fix it in a general and efficient (compile-time wise) way. I hope the
> conditional propagation should cover such cases soon though.
>
> Cheers,
> Florian-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200406/a7bf6bf7/attachment.html>

Stefanos Baziotis via llvm-dev

2020-Apr-05 21:20 UTC

head link

[llvm-dev] Branch is not optimized because of right shift

> Any idea about how the compiler could remove the lshr and use a add -16?Actually, I just figured that doing this test is like solving this:

8 <= x/2 <= 13
16 <= x <= 26
0 <= x - 16 <= 10 => 0 <= x < 11
The left part is know since it's unsigned
The right part could be done x <= 11 => x < 12 because it's
actually an
integer division.
Wow... I would be really happy to know what pass does that.

Best,
- Stefanos

Στις Δευ, 6 Απρ 2020 στις 12:00 π.μ., ο/η Stefanos Baziotis <
stefanos.baziotis at gmail.com> έγραψε:
> Hi,
>
> > I think the IR in both of your examples makes things harder for the
> compiler than expected from the original C source.
>
> Note that both versions are from clang with -O2. The first is with version
> 9.0 and the second is with the trunk.
>
> > but in the branch only %0 is used. Sinking the lshr too early made the
> analysis harder.
>
> Yes, exactly! That's what I figured too.
>
> > The version in https://godbolt.org/z/_ipKhb  is probably the easiest
> for analysis (basically the original C source code built with `clang -O0 -S
> -emit-llvm`, followed by running `opt -mem2reg`). There’s a patch under
> review that adds support for conditional range propagation (
> https://reviews.llvm.org/D76611) and with the patch it can be simplified
> to the code below by running `opt  -ipsccp -simplifycfg -instcombine`
> >
> > define i32 @test(i32 %0) {
> >   %.off = add i32 %0, -16
> >   %2 = icmp ult i32 %.off, 12
> >   %spec.select = select i1 %2, i32 66, i32 45
> >   ret i32 %spec.select
> > }
> >
> >
> > The reason it does not yet work in the default pipeline is that the
> branch condition will be combined to use an AND before the conditional
> propagation and D76611 does not support that yet. But once it lands that
> should be an easy extension.
> >
> > The problems with your two versions could be addressed as well of
course
> for that special case relatively easily I think, but the challenge is to
> fix it in a general and efficient (compile-time wise) way. I hope the
> conditional propagation should cover such cases soon though.
>
> Ok, I checked the patch, best of luck! I see how it does the
> transformation, it's interesting. I think that
> https://reviews.llvm.org/rGe9963034314edf49a12ea5e29f694d8f9f52734a could
> maybe help in such things.
> Any idea about how the compiler could remove the lshr and use a add -16?
>
> Best,
> Stefanos Baziotis
>
>
> Στις Κυρ, 5 Απρ 2020 στις 10:06 μ.μ., ο/η Florian Hahn <
> florian_hahn at apple.com> έγραψε:
>
>> Hi,
>>
>> > On Apr 5, 2020, at 14:08, Stefanos Baziotis via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> > To be more specific for everyone:
>> > - First of all, I hope it's clear that in the original (C)
code, the
>> region - 0x8 > 1000 branch should
>> > be eliminated. That is because it is inside a block that has
ensured
>> that 8 < region < 12. But for some reason,
>> > it is not eliminated. I'm trying to find that reason. :)
>> > - I understand that in the -O2 .ll output, we take %0 and do it a
right
>> shift. That means that we don't have
>> > any range info about %0 or %0 >>= 1, so we can't
eliminate the branch.
>> What I don't understand
>> > is why we chose to reuse %0 and do it a right shift and instead we
>> didn't have a phi there.
>> > - Finally, I saw that putting nuw in the addition with -8
eliminates
>> the branch. This makes sense since,
>> > we know probably know from the code above that %0 >>= 1 is
less than
>> 12. Although, I don't understand
>> > why the right shift was translated to an add with -16.
>> >
>> > I hope this made some sense. You may ignore the last 2 and focus
on the
>> first, i.e. what optimization
>> > should have been done but it's not and what we can do about
it.
>> >
>> > A clearer version of the .ll: https://godbolt.org/z/2t4RU5
>>
>> I think the IR in both of your examples makes things harder for the
>> compiler than expected from the original C source.
>>
>> With the IR in your original example (https://godbolt.org/z/BL-4jL), I
>> think the problem is that the branch condition is '%0 - 16 <
12’, which
>> allows us to limit the range of `%0 - 16` easily, but in the branch
only %0
>> is used. Sinking the lshr too early made the analysis harder.
>>
>> In the second example (https://godbolt.org/z/2t4RU5), things are hard
to
>> simplify because we need to use the information from the condition of
one
>> select to simplify the earlier select.
>>
>> The version in https://godbolt.org/z/_ipKhb  is probably the easiest
for
>> analysis (basically the original C source code built with `clang -O0 -S
>> -emit-llvm`, followed by running `opt -mem2reg`). There’s a patch under
>> review that adds support for conditional range propagation (
>> https://reviews.llvm.org/D76611) and with the patch it can be
simplified
>> to the code below by running `opt  -ipsccp -simplifycfg -instcombine`
>>
>> define i32 @test(i32 %0) {
>>   %.off = add i32 %0, -16
>>   %2 = icmp ult i32 %.off, 12
>>   %spec.select = select i1 %2, i32 66, i32 45
>>   ret i32 %spec.select
>> }
>>
>>
>> The reason it does not yet work in the default pipeline is that the
>> branch condition will be combined to use an AND before the conditional
>> propagation and D76611 does not support that yet. But once it lands
that
>> should be an easy extension.
>>
>> The problems with your two versions could be addressed as well of
course
>> for that special case relatively easily I think, but the challenge is
to
>> fix it in a general and efficient (compile-time wise) way. I hope the
>> conditional propagation should cover such cases soon though.
>>
>> Cheers,
>> Florian
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200406/25b8ad99/attachment.html>

Stefanos Baziotis via llvm-dev

2020-Apr-05 21:21 UTC

head link

[llvm-dev] Branch is not optimized because of right shift

Typo: The last 3 x's are x-16 of course.

Στις Δευ, 6 Απρ 2020 στις 12:20 π.μ., ο/η Stefanos Baziotis <
stefanos.baziotis at gmail.com> έγραψε:
> > Any idea about how the compiler could remove the lshr and use a add
-16?
> Actually, I just figured that doing this test is like solving this:
>
> 8 <= x/2 <= 13
> 16 <= x <= 26
> 0 <= x - 16 <= 10 => 0 <= x < 11
> The left part is know since it's unsigned
> The right part could be done x <= 11 => x < 12 because it's
actually an
> integer division.
> Wow... I would be really happy to know what pass does that.
>
> Best,
> - Stefanos
>
> Στις Δευ, 6 Απρ 2020 στις 12:00 π.μ., ο/η Stefanos Baziotis <
> stefanos.baziotis at gmail.com> έγραψε:
>
>> Hi,
>>
>> > I think the IR in both of your examples makes things harder for
the
>> compiler than expected from the original C source.
>>
>> Note that both versions are from clang with -O2. The first is with
>> version 9.0 and the second is with the trunk.
>>
>> > but in the branch only %0 is used. Sinking the lshr too early made
the
>> analysis harder.
>>
>> Yes, exactly! That's what I figured too.
>>
>> > The version in https://godbolt.org/z/_ipKhb  is probably the
easiest
>> for analysis (basically the original C source code built with `clang
-O0 -S
>> -emit-llvm`, followed by running `opt -mem2reg`). There’s a patch under
>> review that adds support for conditional range propagation (
>> https://reviews.llvm.org/D76611) and with the patch it can be
simplified
>> to the code below by running `opt  -ipsccp -simplifycfg -instcombine`
>> >
>> > define i32 @test(i32 %0) {
>> >   %.off = add i32 %0, -16
>> >   %2 = icmp ult i32 %.off, 12
>> >   %spec.select = select i1 %2, i32 66, i32 45
>> >   ret i32 %spec.select
>> > }
>> >
>> >
>> > The reason it does not yet work in the default pipeline is that
the
>> branch condition will be combined to use an AND before the conditional
>> propagation and D76611 does not support that yet. But once it lands
that
>> should be an easy extension.
>> >
>> > The problems with your two versions could be addressed as well of
>> course for that special case relatively easily I think, but the
challenge
>> is to fix it in a general and efficient (compile-time wise) way. I hope
the
>> conditional propagation should cover such cases soon though.
>>
>> Ok, I checked the patch, best of luck! I see how it does the
>> transformation, it's interesting. I think that
>> https://reviews.llvm.org/rGe9963034314edf49a12ea5e29f694d8f9f52734a
could
>> maybe help in such things.
>> Any idea about how the compiler could remove the lshr and use a add
-16?
>>
>> Best,
>> Stefanos Baziotis
>>
>>
>> Στις Κυρ, 5 Απρ 2020 στις 10:06 μ.μ., ο/η Florian Hahn <
>> florian_hahn at apple.com> έγραψε:
>>
>>> Hi,
>>>
>>> > On Apr 5, 2020, at 14:08, Stefanos Baziotis via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>> > To be more specific for everyone:
>>> > - First of all, I hope it's clear that in the original (C)
code, the
>>> region - 0x8 > 1000 branch should
>>> > be eliminated. That is because it is inside a block that has
ensured
>>> that 8 < region < 12. But for some reason,
>>> > it is not eliminated. I'm trying to find that reason. :)
>>> > - I understand that in the -O2 .ll output, we take %0 and do
it a
>>> right shift. That means that we don't have
>>> > any range info about %0 or %0 >>= 1, so we can't
eliminate the branch.
>>> What I don't understand
>>> > is why we chose to reuse %0 and do it a right shift and
instead we
>>> didn't have a phi there.
>>> > - Finally, I saw that putting nuw in the addition with -8
eliminates
>>> the branch. This makes sense since,
>>> > we know probably know from the code above that %0 >>= 1
is less than
>>> 12. Although, I don't understand
>>> > why the right shift was translated to an add with -16.
>>> >
>>> > I hope this made some sense. You may ignore the last 2 and
focus on
>>> the first, i.e. what optimization
>>> > should have been done but it's not and what we can do
about it.
>>> >
>>> > A clearer version of the .ll: https://godbolt.org/z/2t4RU5
>>>
>>> I think the IR in both of your examples makes things harder for the
>>> compiler than expected from the original C source.
>>>
>>> With the IR in your original example
(https://godbolt.org/z/BL-4jL), I
>>> think the problem is that the branch condition is '%0 - 16 <
12’, which
>>> allows us to limit the range of `%0 - 16` easily, but in the branch
only %0
>>> is used. Sinking the lshr too early made the analysis harder.
>>>
>>> In the second example (https://godbolt.org/z/2t4RU5), things are
hard
>>> to simplify because we need to use the information from the
condition of
>>> one select to simplify the earlier select.
>>>
>>> The version in https://godbolt.org/z/_ipKhb  is probably the
easiest
>>> for analysis (basically the original C source code built with
`clang -O0 -S
>>> -emit-llvm`, followed by running `opt -mem2reg`). There’s a patch
under
>>> review that adds support for conditional range propagation (
>>> https://reviews.llvm.org/D76611) and with the patch it can be
>>> simplified to the code below by running `opt  -ipsccp -simplifycfg
>>> -instcombine`
>>>
>>> define i32 @test(i32 %0) {
>>>   %.off = add i32 %0, -16
>>>   %2 = icmp ult i32 %.off, 12
>>>   %spec.select = select i1 %2, i32 66, i32 45
>>>   ret i32 %spec.select
>>> }
>>>
>>>
>>> The reason it does not yet work in the default pipeline is that the
>>> branch condition will be combined to use an AND before the
conditional
>>> propagation and D76611 does not support that yet. But once it lands
that
>>> should be an easy extension.
>>>
>>> The problems with your two versions could be addressed as well of
course
>>> for that special case relatively easily I think, but the challenge
is to
>>> fix it in a general and efficient (compile-time wise) way. I hope
the
>>> conditional propagation should cover such cases soon though.
>>>
>>> Cheers,
>>> Florian
>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200406/564113c1/attachment.html>

Florian Hahn via llvm-dev

2020-Apr-05 21:38 UTC

head link

[llvm-dev] Branch is not optimized because of right shift

> On Apr 5, 2020, at 22:20, Stefanos Baziotis <stefanos.baziotis at
gmail.com> wrote:
> 
> > Any idea about how the compiler could remove the lshr and use a add
-16?
> Actually, I just figured that doing this test is like solving this:
> 
> 8 <= x/2 <= 13
> 16 <= x <= 26
> 0 <= x - 16 <= 10 => 0 <= x < 11
> The left part is know since it's unsigned
> The right part could be done x <= 11 => x < 12 because it's
actually an integer division.
> Wow... I would be really happy to know what pass does that.
I’d guess a combination of instcombine rules together with some other
transforms. You could probably use `-print-after-all` (`clang -mllvm
-print-after-all` if you are using clang) to track down the relevant
passes/steps.

llvm dev - Apr 2020 - Branch is not optimized because of right shift

[llvm-dev] Branch is not optimized because of right shift

[llvm-dev] Branch is not optimized because of right shift

[llvm-dev] Branch is not optimized because of right shift

[llvm-dev] Branch is not optimized because of right shift