thr3ads.net - llvm dev - [llvm-dev] rotl: undocumented LLVM instruction? [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Ryan Taylor via llvm-dev

2016-Nov-03 21:20 UTC

[llvm-dev] rotl: undocumented LLVM instruction?

Setting the ISD::ROTL to Expand doesn't work? (via SetOperation)

You could also do a Custom hook if that's what you're looking for.

On Thu, Nov 3, 2016 at 5:12 PM, Phil Tomson <phil.a.tomson at gmail.com>
wrote:
> ... or perhaps to rephrase:
>
> In 3.9 it seems to be doing a smaller combine much sooner, whereas in 3.6
> it deferred that till later in the instruction selection pattern matching -
> the latter was giving us better results because it seems to match a larger
> pattern than the former did in the earlier stage.
>
> Phil
>
> On Thu, Nov 3, 2016 at 2:07 PM, Phil Tomson <phil.a.tomson at
gmail.com>
> wrote:
>
>> Is there any way to get it to delay this optimization where it goes
from
>> this:
>>
>> Initial selection DAG: BB#0 'bclr64:entry'
>> SelectionDAG has 14 nodes:
>>   t0: ch = EntryToken
>>       t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>             t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>>           t6: i64 = sub t4, Constant:i64<1>
>>         t7: i64 = shl Constant:i64<1>, t6
>>       t9: i64 = xor t7, Constant:i64<-1>
>>     t10: i64 = and t2, t9
>>   t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>   t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>
>>
>>
>> Combining: t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>
>> Combining: t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>
>> Combining: t11: i64 = Register %R1
>>
>> Combining: t10: i64 = and t2, t9
>>
>> Combining: t9: i64 = xor t7, Constant:i64<-1>
>>  ... into: t15: i64 = rotl Constant:i64<-2>, t6
>>
>> ...to this:
>>
>> Optimized lowered selection DAG: BB#0 'bclr64:entry'
>> SelectionDAG has 13 nodes:
>>   t0: ch = EntryToken
>>       t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>           t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>>         t17: i64 = add t4, Constant:i64<-1>
>>       t15: i64 = rotl Constant:i64<-2>, t17
>>     t10: i64 = and t2, t15
>>   t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>   t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>
>>
>> That combining of the xor & and there ends up giving us suboptimal
>> results as compared with 3.6.
>>
>> For example, in 3.6 the generated code is simply:
>>
>> bclr64:                                 # @bclr64
>> # BB#0:                                 # %entry
>>     addI    r1, r1, -1, 64
>>     bclr        r1, r0, r1, 64
>>     jabs        r511
>>
>> Whereas with 3.9 the generated code is:
>>
>> bclr64:                                 # @bclr64
>> # BB#0:                                 # %entry
>>     addI    r1, r1, -1, 64
>>     movimm        r2, -2, 64
>>     rol        r1, r2, r1, 64
>>     bitop1        r1, r0, r1, AND, 64
>>     jabs        r511
>>
>>
>> ... it seems to be negatively impacting some of our larger benchmarks
as
>> well that used to contains several bclr (bit clear) commands but now
>> contain much less.
>>
>> Phil
>>
>>
>>
>>
>> On Wed, Nov 2, 2016 at 4:10 PM, Ryan Taylor <ryta1203 at
gmail.com> wrote:
>>
>>> I believe some of the ISDs were introduced to allow for DAG
>>> optimizations under the assumption that some of the major
architectures
>>> directly support these types of instructions.
>>>
>>> -Ryan
>>>
>>> On Wed, Nov 2, 2016 at 6:24 PM, Phil Tomson via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> We've recently moved our project from LLVM 3.6 to LLVM 3.9.
I noticed
>>>> one of our code generation tests is breaking in 3.9.
>>>>
>>>> The test is:
>>>>
>>>>  ; RUN: llc < %s -march=xstg | FileCheck %s
>>>>
>>>> define i64 @bclr64(i64 %a, i64 %b) nounwind readnone {
>>>> entry:
>>>> ; CHECK: bclr     r1, r0, r1, 64
>>>>   %sub = sub i64 %b, 1
>>>>   %shl = shl i64 1, %sub
>>>>   %xor = xor i64 %shl, -1
>>>>   %and = and i64 %a, %xor
>>>>   ret i64 %and
>>>> }
>>>>
>>>> I ran llc with -debug to get a better idea of what's going
on and found:
>>>>
>>>> Initial selection DAG: BB#0 'bclr64:entry'
>>>> SelectionDAG has 14 nodes:
>>>>   t0: ch = EntryToken
>>>>       t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>>>             t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>>>>           t6: i64 = sub t4, Constant:i64<1>
>>>>         t7: i64 = shl Constant:i64<1>, t6
>>>>       t9: i64 = xor t7, Constant:i64<-1>
>>>>     t10: i64 = and t2, t9
>>>>   t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>>   t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>>
>>>>
>>>>
>>>> Combining: t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>>
>>>> Combining: t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>>
>>>> Combining: t11: i64 = Register %R1
>>>>
>>>> Combining: t10: i64 = and t2, t9
>>>>
>>>> Combining: t9: i64 = xor t7, Constant:i64<-1>
>>>>  ... into: t15: i64 = rotl Constant:i64<-2>, t6
>>>>
>>>> Combining: t10: i64 = and t2, t15
>>>>
>>>> Combining: t15: i64 = rotl Constant:i64<-2>, t6
>>>>
>>>> Combining: t14: i64 = Constant<-2>
>>>>
>>>> Combining: t6: i64 = sub t4, Constant:i64<1>
>>>>  ... into: t17: i64 = add t4, Constant:i64<-1>
>>>>
>>>> Combining: t15: i64 = rotl Constant:i64<-2>, t17
>>>>
>>>>
>>>>
>>>> These rotl instructions weren't showing up when I ran llc
3.6 and
>>>> that's completely changing the generated code at the end
which means the
>>>> test fails (and it's less optimal than it was in 3.6).
>>>>
>>>> I've been looking in the LLVM language docs (3.9 version)
and I don't
>>>> see any documentation on 'rotl'. What does it do? Why
isn't it in the docs?
>>>>
>>>> Phil
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161103/1345ddbe/attachment.html>

Phil Tomson via llvm-dev

2016-Nov-03 21:24 UTC

head link

[llvm-dev] rotl: undocumented LLVM instruction?

I could try setting ISD::ROTL to Expand... however, we do have a rol op and
we'd like the ISD::ROTL to map to it.  If I set it to Expand it's not
going
to do that, right?

I think in this case we're just getting the ISD::ROTL a bit too soon in the
process and that's causing us to miss other optimization opportunities
later on.

Phil

On Thu, Nov 3, 2016 at 2:20 PM, Ryan Taylor <ryta1203 at gmail.com> wrote:
> Setting the ISD::ROTL to Expand doesn't work? (via SetOperation)
>
> You could also do a Custom hook if that's what you're looking for.
>
> On Thu, Nov 3, 2016 at 5:12 PM, Phil Tomson <phil.a.tomson at
gmail.com>
> wrote:
>
>> ... or perhaps to rephrase:
>>
>> In 3.9 it seems to be doing a smaller combine much sooner, whereas in
3.6
>> it deferred that till later in the instruction selection pattern
matching -
>> the latter was giving us better results because it seems to match a
larger
>> pattern than the former did in the earlier stage.
>>
>> Phil
>>
>> On Thu, Nov 3, 2016 at 2:07 PM, Phil Tomson <phil.a.tomson at
gmail.com>
>> wrote:
>>
>>> Is there any way to get it to delay this optimization where it goes
from
>>> this:
>>>
>>> Initial selection DAG: BB#0 'bclr64:entry'
>>> SelectionDAG has 14 nodes:
>>>   t0: ch = EntryToken
>>>       t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>>             t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>>>           t6: i64 = sub t4, Constant:i64<1>
>>>         t7: i64 = shl Constant:i64<1>, t6
>>>       t9: i64 = xor t7, Constant:i64<-1>
>>>     t10: i64 = and t2, t9
>>>   t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>   t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>
>>>
>>>
>>> Combining: t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>
>>> Combining: t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>
>>> Combining: t11: i64 = Register %R1
>>>
>>> Combining: t10: i64 = and t2, t9
>>>
>>> Combining: t9: i64 = xor t7, Constant:i64<-1>
>>>  ... into: t15: i64 = rotl Constant:i64<-2>, t6
>>>
>>> ...to this:
>>>
>>> Optimized lowered selection DAG: BB#0 'bclr64:entry'
>>> SelectionDAG has 13 nodes:
>>>   t0: ch = EntryToken
>>>       t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>>           t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>>>         t17: i64 = add t4, Constant:i64<-1>
>>>       t15: i64 = rotl Constant:i64<-2>, t17
>>>     t10: i64 = and t2, t15
>>>   t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>   t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>
>>>
>>> That combining of the xor & and there ends up giving us
suboptimal
>>> results as compared with 3.6.
>>>
>>> For example, in 3.6 the generated code is simply:
>>>
>>> bclr64:                                 # @bclr64
>>> # BB#0:                                 # %entry
>>>     addI    r1, r1, -1, 64
>>>     bclr        r1, r0, r1, 64
>>>     jabs        r511
>>>
>>> Whereas with 3.9 the generated code is:
>>>
>>> bclr64:                                 # @bclr64
>>> # BB#0:                                 # %entry
>>>     addI    r1, r1, -1, 64
>>>     movimm        r2, -2, 64
>>>     rol        r1, r2, r1, 64
>>>     bitop1        r1, r0, r1, AND, 64
>>>     jabs        r511
>>>
>>>
>>> ... it seems to be negatively impacting some of our larger
benchmarks as
>>> well that used to contains several bclr (bit clear) commands but
now
>>> contain much less.
>>>
>>> Phil
>>>
>>>
>>>
>>>
>>> On Wed, Nov 2, 2016 at 4:10 PM, Ryan Taylor <ryta1203 at
gmail.com> wrote:
>>>
>>>> I believe some of the ISDs were introduced to allow for DAG
>>>> optimizations under the assumption that some of the major
architectures
>>>> directly support these types of instructions.
>>>>
>>>> -Ryan
>>>>
>>>> On Wed, Nov 2, 2016 at 6:24 PM, Phil Tomson via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> We've recently moved our project from LLVM 3.6 to LLVM
3.9.  I
>>>>> noticed  one of our code generation tests is breaking in
3.9.
>>>>>
>>>>> The test is:
>>>>>
>>>>>  ; RUN: llc < %s -march=xstg | FileCheck %s
>>>>>
>>>>> define i64 @bclr64(i64 %a, i64 %b) nounwind readnone {
>>>>> entry:
>>>>> ; CHECK: bclr     r1, r0, r1, 64
>>>>>   %sub = sub i64 %b, 1
>>>>>   %shl = shl i64 1, %sub
>>>>>   %xor = xor i64 %shl, -1
>>>>>   %and = and i64 %a, %xor
>>>>>   ret i64 %and
>>>>> }
>>>>>
>>>>> I ran llc with -debug to get a better idea of what's
going on and
>>>>> found:
>>>>>
>>>>> Initial selection DAG: BB#0 'bclr64:entry'
>>>>> SelectionDAG has 14 nodes:
>>>>>   t0: ch = EntryToken
>>>>>       t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>>>>             t4: i64,ch = CopyFromReg t0, Register:i64
%vreg1
>>>>>           t6: i64 = sub t4, Constant:i64<1>
>>>>>         t7: i64 = shl Constant:i64<1>, t6
>>>>>       t9: i64 = xor t7, Constant:i64<-1>
>>>>>     t10: i64 = and t2, t9
>>>>>   t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>>>   t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>>>
>>>>>
>>>>>
>>>>> Combining: t13: ch = XSTGISD::Ret t12, Register:i64 %R1,
t12:1
>>>>>
>>>>> Combining: t12: ch,glue = CopyToReg t0, Register:i64 %R1,
t10
>>>>>
>>>>> Combining: t11: i64 = Register %R1
>>>>>
>>>>> Combining: t10: i64 = and t2, t9
>>>>>
>>>>> Combining: t9: i64 = xor t7, Constant:i64<-1>
>>>>>  ... into: t15: i64 = rotl Constant:i64<-2>, t6
>>>>>
>>>>> Combining: t10: i64 = and t2, t15
>>>>>
>>>>> Combining: t15: i64 = rotl Constant:i64<-2>, t6
>>>>>
>>>>> Combining: t14: i64 = Constant<-2>
>>>>>
>>>>> Combining: t6: i64 = sub t4, Constant:i64<1>
>>>>>  ... into: t17: i64 = add t4, Constant:i64<-1>
>>>>>
>>>>> Combining: t15: i64 = rotl Constant:i64<-2>, t17
>>>>>
>>>>>
>>>>>
>>>>> These rotl instructions weren't showing up when I ran
llc 3.6 and
>>>>> that's completely changing the generated code at the
end which means the
>>>>> test fails (and it's less optimal than it was in 3.6).
>>>>>
>>>>> I've been looking in the LLVM language docs (3.9
version) and I don't
>>>>> see any documentation on 'rotl'. What does it do?
Why isn't it in the docs?
>>>>>
>>>>> Phil
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161103/e1d9f9b4/attachment-0001.html>

Ryan Taylor via llvm-dev

2016-Nov-03 21:27 UTC

head link

[llvm-dev] rotl: undocumented LLVM instruction?

Change the DAGCombine.

On Nov 3, 2016 17:24, "Phil Tomson" <phil.a.tomson at gmail.com>
wrote:
> I could try setting ISD::ROTL to Expand... however, we do have a rol op
> and we'd like the ISD::ROTL to map to it.  If I set it to Expand
it's not
> going to do that, right?
>
> I think in this case we're just getting the ISD::ROTL a bit too soon in
> the process and that's causing us to miss other optimization
opportunities
> later on.
>
> Phil
>
> On Thu, Nov 3, 2016 at 2:20 PM, Ryan Taylor <ryta1203 at gmail.com>
wrote:
>
>> Setting the ISD::ROTL to Expand doesn't work? (via SetOperation)
>>
>> You could also do a Custom hook if that's what you're looking
for.
>>
>> On Thu, Nov 3, 2016 at 5:12 PM, Phil Tomson <phil.a.tomson at
gmail.com>
>> wrote:
>>
>>> ... or perhaps to rephrase:
>>>
>>> In 3.9 it seems to be doing a smaller combine much sooner, whereas
in
>>> 3.6 it deferred that till later in the instruction selection
pattern
>>> matching - the latter was giving us better results because it seems
to
>>> match a larger pattern than the former did in the earlier stage.
>>>
>>> Phil
>>>
>>> On Thu, Nov 3, 2016 at 2:07 PM, Phil Tomson <phil.a.tomson at
gmail.com>
>>> wrote:
>>>
>>>> Is there any way to get it to delay this optimization where it
goes
>>>> from this:
>>>>
>>>> Initial selection DAG: BB#0 'bclr64:entry'
>>>> SelectionDAG has 14 nodes:
>>>>   t0: ch = EntryToken
>>>>       t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>>>             t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>>>>           t6: i64 = sub t4, Constant:i64<1>
>>>>         t7: i64 = shl Constant:i64<1>, t6
>>>>       t9: i64 = xor t7, Constant:i64<-1>
>>>>     t10: i64 = and t2, t9
>>>>   t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>>   t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>>
>>>>
>>>>
>>>> Combining: t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>>
>>>> Combining: t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>>
>>>> Combining: t11: i64 = Register %R1
>>>>
>>>> Combining: t10: i64 = and t2, t9
>>>>
>>>> Combining: t9: i64 = xor t7, Constant:i64<-1>
>>>>  ... into: t15: i64 = rotl Constant:i64<-2>, t6
>>>>
>>>> ...to this:
>>>>
>>>> Optimized lowered selection DAG: BB#0 'bclr64:entry'
>>>> SelectionDAG has 13 nodes:
>>>>   t0: ch = EntryToken
>>>>       t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>>>           t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>>>>         t17: i64 = add t4, Constant:i64<-1>
>>>>       t15: i64 = rotl Constant:i64<-2>, t17
>>>>     t10: i64 = and t2, t15
>>>>   t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>>   t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>>
>>>>
>>>> That combining of the xor & and there ends up giving us
suboptimal
>>>> results as compared with 3.6.
>>>>
>>>> For example, in 3.6 the generated code is simply:
>>>>
>>>> bclr64:                                 # @bclr64
>>>> # BB#0:                                 # %entry
>>>>     addI    r1, r1, -1, 64
>>>>     bclr        r1, r0, r1, 64
>>>>     jabs        r511
>>>>
>>>> Whereas with 3.9 the generated code is:
>>>>
>>>> bclr64:                                 # @bclr64
>>>> # BB#0:                                 # %entry
>>>>     addI    r1, r1, -1, 64
>>>>     movimm        r2, -2, 64
>>>>     rol        r1, r2, r1, 64
>>>>     bitop1        r1, r0, r1, AND, 64
>>>>     jabs        r511
>>>>
>>>>
>>>> ... it seems to be negatively impacting some of our larger
benchmarks
>>>> as well that used to contains several bclr (bit clear) commands
but now
>>>> contain much less.
>>>>
>>>> Phil
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Nov 2, 2016 at 4:10 PM, Ryan Taylor <ryta1203 at
gmail.com> wrote:
>>>>
>>>>> I believe some of the ISDs were introduced to allow for DAG
>>>>> optimizations under the assumption that some of the major
architectures
>>>>> directly support these types of instructions.
>>>>>
>>>>> -Ryan
>>>>>
>>>>> On Wed, Nov 2, 2016 at 6:24 PM, Phil Tomson via llvm-dev
<
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> We've recently moved our project from LLVM 3.6 to
LLVM 3.9.  I
>>>>>> noticed  one of our code generation tests is breaking
in 3.9.
>>>>>>
>>>>>> The test is:
>>>>>>
>>>>>>  ; RUN: llc < %s -march=xstg | FileCheck %s
>>>>>>
>>>>>> define i64 @bclr64(i64 %a, i64 %b) nounwind readnone {
>>>>>> entry:
>>>>>> ; CHECK: bclr     r1, r0, r1, 64
>>>>>>   %sub = sub i64 %b, 1
>>>>>>   %shl = shl i64 1, %sub
>>>>>>   %xor = xor i64 %shl, -1
>>>>>>   %and = and i64 %a, %xor
>>>>>>   ret i64 %and
>>>>>> }
>>>>>>
>>>>>> I ran llc with -debug to get a better idea of
what's going on and
>>>>>> found:
>>>>>>
>>>>>> Initial selection DAG: BB#0 'bclr64:entry'
>>>>>> SelectionDAG has 14 nodes:
>>>>>>   t0: ch = EntryToken
>>>>>>       t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>>>>>             t4: i64,ch = CopyFromReg t0, Register:i64
%vreg1
>>>>>>           t6: i64 = sub t4, Constant:i64<1>
>>>>>>         t7: i64 = shl Constant:i64<1>, t6
>>>>>>       t9: i64 = xor t7, Constant:i64<-1>
>>>>>>     t10: i64 = and t2, t9
>>>>>>   t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>>>>   t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>>>>
>>>>>>
>>>>>>
>>>>>> Combining: t13: ch = XSTGISD::Ret t12, Register:i64
%R1, t12:1
>>>>>>
>>>>>> Combining: t12: ch,glue = CopyToReg t0, Register:i64
%R1, t10
>>>>>>
>>>>>> Combining: t11: i64 = Register %R1
>>>>>>
>>>>>> Combining: t10: i64 = and t2, t9
>>>>>>
>>>>>> Combining: t9: i64 = xor t7, Constant:i64<-1>
>>>>>>  ... into: t15: i64 = rotl Constant:i64<-2>, t6
>>>>>>
>>>>>> Combining: t10: i64 = and t2, t15
>>>>>>
>>>>>> Combining: t15: i64 = rotl Constant:i64<-2>, t6
>>>>>>
>>>>>> Combining: t14: i64 = Constant<-2>
>>>>>>
>>>>>> Combining: t6: i64 = sub t4, Constant:i64<1>
>>>>>>  ... into: t17: i64 = add t4, Constant:i64<-1>
>>>>>>
>>>>>> Combining: t15: i64 = rotl Constant:i64<-2>, t17
>>>>>>
>>>>>>
>>>>>>
>>>>>> These rotl instructions weren't showing up when I
ran llc 3.6 and
>>>>>> that's completely changing the generated code at
the end which means the
>>>>>> test fails (and it's less optimal than it was in
3.6).
>>>>>>
>>>>>> I've been looking in the LLVM language docs (3.9
version) and I don't
>>>>>> see any documentation on 'rotl'. What does it
do? Why isn't it in the docs?
>>>>>>
>>>>>> Phil
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161103/d0e8abd2/attachment.html>

Krzysztof Parzyszek via llvm-dev

2016-Nov-03 23:41 UTC

head link

[llvm-dev] rotl: undocumented LLVM instruction?

One option may be to prevent the formation of ROTL, if possible, and 
then generating rol by hand.
Marking it as "expand" would likely stop the DAG combiner from
creating
it. Then you could "preprocess" the selection DAG before the
instruction
selection and do the pattern matching yourself.

-Krzysztof


On 11/3/2016 4:24 PM, Phil Tomson via llvm-dev wrote:> I could try setting ISD::ROTL to Expand... however, we do have a rol op
> and we'd like the ISD::ROTL to map to it.  If I set it to Expand
it's
> not going to do that, right?
>
> I think in this case we're just getting the ISD::ROTL a bit too soon in
> the process and that's causing us to miss other optimization
> opportunities later on.
>
> Phil
>
> On Thu, Nov 3, 2016 at 2:20 PM, Ryan Taylor <ryta1203 at gmail.com
> <mailto:ryta1203 at gmail.com>> wrote:
>
>     Setting the ISD::ROTL to Expand doesn't work? (via SetOperation)
>
>     You could also do a Custom hook if that's what you're looking
for.
>
>     On Thu, Nov 3, 2016 at 5:12 PM, Phil Tomson <phil.a.tomson at
gmail.com
>     <mailto:phil.a.tomson at gmail.com>> wrote:
>
>         ... or perhaps to rephrase:
>
>         In 3.9 it seems to be doing a smaller combine much sooner,
>         whereas in 3.6 it deferred that till later in the instruction
>         selection pattern matching - the latter was giving us better
>         results because it seems to match a larger pattern than the
>         former did in the earlier stage.
>
>         Phil
>
>         On Thu, Nov 3, 2016 at 2:07 PM, Phil Tomson
>         <phil.a.tomson at gmail.com <mailto:phil.a.tomson at
gmail.com>> wrote:
>
>             Is there any way to get it to delay this optimization where
>             it goes from this:
>
>             Initial selection DAG: BB#0 'bclr64:entry'
>             SelectionDAG has 14 nodes:
>               t0: ch = EntryToken
>                   t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>                         t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>                       t6: i64 = sub t4, Constant:i64<1>
>                     t7: i64 = shl Constant:i64<1>, t6
>                   t9: i64 = xor t7, Constant:i64<-1>
>                 t10: i64 = and t2, t9
>               t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>               t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>
>
>
>             Combining: t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>
>             Combining: t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>
>             Combining: t11: i64 = Register %R1
>
>             Combining: t10: i64 = and t2, t9
>
>             Combining: t9: i64 = xor t7, Constant:i64<-1>
>              ... into: t15: i64 = rotl Constant:i64<-2>, t6
>
>             ...to this:
>
>             Optimized lowered selection DAG: BB#0 'bclr64:entry'
>             SelectionDAG has 13 nodes:
>               t0: ch = EntryToken
>                   t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>                       t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>                     t17: i64 = add t4, Constant:i64<-1>
>                   t15: i64 = rotl Constant:i64<-2>, t17
>                 t10: i64 = and t2, t15
>               t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>               t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>
>
>             That combining of the xor & and there ends up giving us
>             suboptimal results as compared with 3.6.
>
>             For example, in 3.6 the generated code is simply:
>
>             bclr64:                                 # @bclr64
>             # BB#0:                                 # %entry
>                 addI    r1, r1, -1, 64
>                 bclr        r1, r0, r1, 64
>                 jabs        r511
>
>             Whereas with 3.9 the generated code is:
>
>             bclr64:                                 # @bclr64
>             # BB#0:                                 # %entry
>                 addI    r1, r1, -1, 64
>                 movimm        r2, -2, 64
>                 rol        r1, r2, r1, 64
>                 bitop1        r1, r0, r1, AND, 64
>                 jabs        r511
>
>
>             ... it seems to be negatively impacting some of our larger
>             benchmarks as well that used to contains several bclr (bit
>             clear) commands but now contain much less.
>
>             Phil
>
>
>
>
>             On Wed, Nov 2, 2016 at 4:10 PM, Ryan Taylor
>             <ryta1203 at gmail.com <mailto:ryta1203 at
gmail.com>> wrote:
>
>                 I believe some of the ISDs were introduced to allow for
>                 DAG optimizations under the assumption that some of the
>                 major architectures directly support these types of
>                 instructions.
>
>                 -Ryan
>
>                 On Wed, Nov 2, 2016 at 6:24 PM, Phil Tomson via llvm-dev
>                 <llvm-dev at lists.llvm.org
>                 <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>                     We've recently moved our project from LLVM 3.6 to
>                     LLVM 3.9.  I noticed  one of our code generation
>                     tests is breaking in 3.9.
>
>                     The test is:
>
>                      ; RUN: llc < %s -march=xstg | FileCheck %s
>
>                     define i64 @bclr64(i64 %a, i64 %b) nounwind readnone {
>                     entry:
>                     ; CHECK: bclr     r1, r0, r1, 64
>                       %sub = sub i64 %b, 1
>                       %shl = shl i64 1, %sub
>                       %xor = xor i64 %shl, -1
>                       %and = and i64 %a, %xor
>                       ret i64 %and
>                     }
>
>                     I ran llc with -debug to get a better idea of
what's
>                     going on and found:
>
>                     Initial selection DAG: BB#0 'bclr64:entry'
>                     SelectionDAG has 14 nodes:
>                       t0: ch = EntryToken
>                           t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>                                 t4: i64,ch = CopyFromReg t0,
>                     Register:i64 %vreg1
>                               t6: i64 = sub t4, Constant:i64<1>
>                             t7: i64 = shl Constant:i64<1>, t6
>                           t9: i64 = xor t7, Constant:i64<-1>
>                         t10: i64 = and t2, t9
>                       t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>                       t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>
>
>
>                     Combining: t13: ch = XSTGISD::Ret t12, Register:i64
>                     %R1, t12:1
>
>                     Combining: t12: ch,glue = CopyToReg t0, Register:i64
>                     %R1, t10
>
>                     Combining: t11: i64 = Register %R1
>
>                     Combining: t10: i64 = and t2, t9
>
>                     Combining: t9: i64 = xor t7, Constant:i64<-1>
>                      ... into: t15: i64 = rotl Constant:i64<-2>, t6
>
>                     Combining: t10: i64 = and t2, t15
>
>                     Combining: t15: i64 = rotl Constant:i64<-2>, t6
>
>                     Combining: t14: i64 = Constant<-2>
>
>                     Combining: t6: i64 = sub t4, Constant:i64<1>
>                      ... into: t17: i64 = add t4, Constant:i64<-1>
>
>                     Combining: t15: i64 = rotl Constant:i64<-2>, t17
>
>
>
>                     These rotl instructions weren't showing up when I
>                     ran llc 3.6 and that's completely changing the
>                     generated code at the end which means the test fails
>                     (and it's less optimal than it was in 3.6).
>
>                     I've been looking in the LLVM language docs (3.9
>                     version) and I don't see any documentation on
>                     'rotl'. What does it do? Why isn't it in
the docs?
>
>                     Phil
>
>                     _______________________________________________
>                     LLVM Developers mailing list
>                     llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>                     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>                    
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

llvm dev - Nov 2016 - rotl: undocumented LLVM instruction?

[llvm-dev] rotl: undocumented LLVM instruction?

[llvm-dev] rotl: undocumented LLVM instruction?

[llvm-dev] rotl: undocumented LLVM instruction?

[llvm-dev] rotl: undocumented LLVM instruction?