thr3ads.net - llvm dev - [LLVMdev] Disable loop unroll pass [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Ivan Llopard

2012-Nov-22 11:00 UTC

[LLVMdev] Disable loop unroll pass

Hi Shuxin, Eli,

On 22/11/2012 03:19, Shuxin Yang wrote:> Hi, Ivan:
>
>     My $0.02. hasZeroCostLooping() disabling unrolling dose not seem 
> to be
> appropriate for other architectures, at least the one I worked before.
I appreciate your feed-back. Could you give an example where building a 
hw loop is not appropriate for your target?
>
>    You mentioned:
> >Currently, we cannot detect them because the loop unroller is
> >unrolling them before entering into the codegen. Looking at its 
> implementation,
> >it.
>
>   Could you please articulate why CG fail to recognize it?
Well, just because the loop unrolling pass runs before the CG is called.
>  I remember in gcc, recognizing hw loop is in a RTL pass, and in 
> Open64, one
> student(?) added some stuff in Scalar Opt, instead of CodeGen, just 
> for HW loop.
> I recalled there is only one reason sounds valid -- prevent the loop 
> become
> too big to fit in HW constraint.
It sounds very similar to our implementation. We've implemented the hw 
loop builder at IR level, just before isel, with new intrinsics that 
provide hw loops semantics. While intrinsics may look a bit tricky and 
additional isel code is needed to recognize them, it benefits from the 
current scalar evolution functionalities to detect trip counts. 
Therefore, it's based on the same interface as loop unroller but, for 
architectural issues, we have stronger constraints: e.g. we cannot build 
hw loops on loops with multiple exits.

The loop topology is important and our hw loop builder depends on it. I 
agree that hasZeroCostLoop may seem too restrictive.
What about something like hasZeroCostLoopTopology(Loop *L, unsigned 
TripCount) to complement the first one ?
>
>    The cost implied by hasZeroCostLoop() highly depends on the 
> underlying architecture;
> therefore the higher level opts don't know how to utilize this 
> interface for cost modeling.
> Maybe we can add a pretty vague interface, say
>    hw-please-advice-unrolling-factor(the loop, current-unrolling-factor),
> to encapsulate whatever reasons the arch might have to curtail 
> aggressive unrolling?
There are already some internals parameters in loop unroller to drive 
the heuristics. We use -unroll-count to skip unrolling.
But someone may want to enable unrolling even if the target says 
otherwise. IMHO, each target could provide internal flags to disable hw 
loop building and let the unroller works "normally".

Ivan
>
>    I'm LLVM newbie, so don't take my words seriously.
>
> Have a happy holiday!
>
> Shuxin
>
>
> On 11/21/2012 02:19 PM, Ivan Llopard wrote:
>> Hi Hal,
>>
>> On 21/11/2012 22:38, Hal Finkel wrote:
>>> ----- Original Message -----
>>>> From: "Ivan Llopard" <ivanllopard at gmail.com>
>>>> To: "LLVM Developers Mailing List" <llvmdev at
cs.uiuc.edu>
>>>> Sent: Wednesday, November 21, 2012 10:31:07 AM
>>>> Subject: [LLVMdev] Disable loop unroll pass
>>>>
>>>> Hi,
>>>>
>>>> We've a target which has hardware support for zero-overhead
loops.
>>>> Currently, we cannot detect them because the loop unroller is
>>>> unrolling
>>>> them before entering into the codegen. Looking at its
implementation,
>>>> it
>>>> seems that it checks if it is profitable to unroll it or not
based on
>>>> certain parameters.
>>>>
>>>> Given that zero cost loops building is based more or less on
the same
>>>> constraints that loop unroll pass, I wonder if it is reasonable
to
>>>> add
>>>> yet another target hook to prevent loop unrolling (something
like
>>>> hasZeroOverheadLooping or hasZeroCostLooping) for targets that
>>>> support
>>>> zero-cost looping.
>>>
>>> Ivan,
>>>
>>> Please feel free to extend the ScalarTargetTransformInfo interface 
>>> (in include/llvm/TargetTransformInfo.h) to provide 
>>> target-customizable parameters to the loop unroller. This is on my 
>>> TODO list, but if you'd like to work on this, that would be
great.
>>
>> Sure! I'll propose a patch ASAP.
>>
>>>
>>> Are there any cases in which loop unrolling is beneficial on your 
>>> target?
>>
>> I'd say that it's always beneficial to emit hardware loops
whenever
>> possible, either for static or dynamic trip counts, whether we look 
>> for smaller or faster code.
>>
>> Ivan
>>
>>>
>>>   -Hal
>>>
>>>>
>>>> Does Hexagon provides the same loop support? How have you
addressed
>>>> this?
>>>>
>>>> Ivan
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Gang Yu

2012-Nov-22 14:03 UTC

head link

[LLVMdev] Disable loop unroll pass

I am the designer for open64 hwloop structure, but I am not a student.

Hope the following helps:

To transform a loop into hwloop, we need the help from optimizer. For example, 
   while(k3>=10){
     sum+=k1;
     k3 --;
   }
into the form:
   zdl_loop(k3-9) {
      sum+=k1;
   }
So, we introduce a new ZDLBR whirl(open64 optimizer intermediate) operator,
which represents the loop in whirl as:
LABEL L2050 0 {line: 0}
LOOP_INFO 0 1 1
   I4I4LDID 73 <1,2,.preg_I4> T<4,.predef_I4,4> # k3
   I4I4LDID 77 <1,2,.preg_I4> T<4,.predef_I4,4> # <preg> 
 END_LOOP_INFO
   I4I4LDID 74 <1,2,.preg_I4> T<4,.predef_I4,4> # k1
   I4I4LDID 75 <1,2,.preg_I4> T<4,.predef_I4,4> # sum
  I4ADD
 I4STID 75 <1,2,.preg_I4> T<4,.predef_I4,4> # sum {line: 5}
 ZDLBR L2050 {line: 0}
Then, we let cg do things. Such a design abstract the general operations in
optimizer, while target specific part in cg, still a simulated op, until cg loop
optimization finished. We implement a multi nested level hwloop by this
approach. Gcc's 3 doloop expand names do the same, we believe.

More details, please take a look at

http://wiki.open64.net/index.php/Zero_Delay_Loop

Thanks
Gang

在 2012-11-22，19:00，Ivan Llopard <ivanllopard at gmail.com> 写道：
> Hi Shuxin, Eli,
> 
> On 22/11/2012 03:19, Shuxin Yang wrote:
>> Hi, Ivan:
>> 
>>    My $0.02. hasZeroCostLooping() disabling unrolling dose not seem to
be
>> appropriate for other architectures, at least the one I worked before.
> 
> I appreciate your feed-back. Could you give an example where building a hw
loop is not appropriate for your target?
> 
>> 
>>   You mentioned:
>> >Currently, we cannot detect them because the loop unroller is
>> >unrolling them before entering into the codegen. Looking at its
implementation,
>> >it.
>> 
>>  Could you please articulate why CG fail to recognize it?
> 
> Well, just because the loop unrolling pass runs before the CG is called.
> 
>> I remember in gcc, recognizing hw loop is in a RTL pass, and in Open64,
one
>> student(?) added some stuff in Scalar Opt, instead of CodeGen, just for
HW loop.
>> I recalled there is only one reason sounds valid -- prevent the loop
become
>> too big to fit in HW constraint.
> 
> It sounds very similar to our implementation. We've implemented the hw
loop builder at IR level, just before isel, with new intrinsics that provide hw
loops semantics. While intrinsics may look a bit tricky and additional isel code
is needed to recognize them, it benefits from the current scalar evolution
functionalities to detect trip counts. Therefore, it's based on the same
interface as loop unroller but, for architectural issues, we have stronger
constraints: e.g. we cannot build hw loops on loops with multiple exits.
> 
> The loop topology is important and our hw loop builder depends on it. I
agree that hasZeroCostLoop may seem too restrictive.
> What about something like hasZeroCostLoopTopology(Loop *L, unsigned
TripCount) to complement the first one ?
> 
>> 
>>   The cost implied by hasZeroCostLoop() highly depends on the
underlying architecture;
>> therefore the higher level opts don't know how to utilize this
interface for cost modeling.
>> Maybe we can add a pretty vague interface, say
>>   hw-please-advice-unrolling-factor(the loop,
current-unrolling-factor),
>> to encapsulate whatever reasons the arch might have to curtail
aggressive unrolling?
> 
> There are already some internals parameters in loop unroller to drive the
heuristics. We use -unroll-count to skip unrolling.
> But someone may want to enable unrolling even if the target says otherwise.
IMHO, each target could provide internal flags to disable hw loop building and
let the unroller works "normally".
> 
> Ivan
> 
>> 
>>   I'm LLVM newbie, so don't take my words seriously.
>> 
>> Have a happy holiday!
>> 
>> Shuxin
>> 
>> 
>> On 11/21/2012 02:19 PM, Ivan Llopard wrote:
>>> Hi Hal,
>>> 
>>> On 21/11/2012 22:38, Hal Finkel wrote:
>>>> ----- Original Message -----
>>>>> From: "Ivan Llopard" <ivanllopard at
gmail.com>
>>>>> To: "LLVM Developers Mailing List" <llvmdev at
cs.uiuc.edu>
>>>>> Sent: Wednesday, November 21, 2012 10:31:07 AM
>>>>> Subject: [LLVMdev] Disable loop unroll pass
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> We've a target which has hardware support for
zero-overhead loops.
>>>>> Currently, we cannot detect them because the loop unroller
is
>>>>> unrolling
>>>>> them before entering into the codegen. Looking at its
implementation,
>>>>> it
>>>>> seems that it checks if it is profitable to unroll it or
not based on
>>>>> certain parameters.
>>>>> 
>>>>> Given that zero cost loops building is based more or less
on the same
>>>>> constraints that loop unroll pass, I wonder if it is
reasonable to
>>>>> add
>>>>> yet another target hook to prevent loop unrolling
(something like
>>>>> hasZeroOverheadLooping or hasZeroCostLooping) for targets
that
>>>>> support
>>>>> zero-cost looping.
>>>> 
>>>> Ivan,
>>>> 
>>>> Please feel free to extend the ScalarTargetTransformInfo
interface (in include/llvm/TargetTransformInfo.h) to provide target-customizable
parameters to the loop unroller. This is on my TODO list, but if you'd like
to work on this, that would be great.
>>> 
>>> Sure! I'll propose a patch ASAP.
>>> 
>>>> 
>>>> Are there any cases in which loop unrolling is beneficial on
your target?
>>> 
>>> I'd say that it's always beneficial to emit hardware loops
whenever possible, either for static or dynamic trip counts, whether we look for
smaller or faster code.
>>> 
>>> Ivan
>>> 
>>>> 
>>>>  -Hal
>>>> 
>>>>> 
>>>>> Does Hexagon provides the same loop support? How have you
addressed
>>>>> this?
>>>>> 
>>>>> Ivan
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>> 
>>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121122/eb8724f9/attachment.html>

Shuxin Yang

2012-Nov-22 18:55 UTC

head link

[LLVMdev] Disable loop unroll pass

> I appreciate your feed-back. Could you give an example where building 
> a hw loop is not appropriate for your target?
>
In my case, unrolling and hw loop is orthogonal. So long as a loop is 
countable & size dosen't exceeds
some threshold, it can be converted into a hw loop. So loop is desirable 
to be unrolled.

One benefit of unrolling is to exposed inter-iteration redundancies, and 
the downstream redundancy
elimination can *easily* take care them.  We would otherwise have to 
resort to iter-iteration
redundancy eliminators to remove such redundancies in a *HARD* way, 
sometimes impossible.

As far as I know, LLVM doesn't have iter-iteration redundancy 
elimination (like predictive commoning),
and it dosen't have scalar replacement to promote subscript variable to 
register (i.e the 2nd
load in a[i]... a[i-n] is load from register; GVN can promote a[i-n] to 
register only if n==1).

Thanks
Shuxin

Shuxin Yang

2012-Nov-22 19:17 UTC

head link

[LLVMdev] Disable loop unroll pass

Hi, Gang:

    I remember there were different voices when you check-in the code.
I agree with them although I didn't reply your mail in open64's mailing 
list.

   In the transformation you illustrate, it involves two operations:
   1) promote WHILE-loop into DO-loop (i.e noncountable loop to 
countable loop)
   2) get rid of trip-count dec/inc and compare.

  1) is irrelevant to HW loop. Any scalar optimizer should handle 1).
It is not difficult at all to handle 2) in CodeGen and it is unnecessary to
to introduce a Operator just for that purpose.

Shuxin

On 11/22/2012 06:03 AM, Gang Yu wrote:> I am the designer for open64 hwloop structure, but I am not a student.
>
> Hope the following helps:
>
> To transform a loop into hwloop, we need the help from optimizer. For 
> example,
> |
>     while(k3>=10){
>       sum+=k1;
>       k3 --;
>     }
> |
>
> into the form:||
>
> |
>     zdl_loop(k3-9) {
>        sum+=k1;
>     }
> |
>
> So, we introduce a new ZDLBR whirl(open64 optimizer intermediate) 
> operator, which represents the loop in whirl as:||
>
> |
> LABEL L2050 0 {line: 0}
> LOOP_INFO 0 1 1
>     I4I4LDID 73 <1,2,.preg_I4> T<4,.predef_I4,4> # k3
>     I4I4LDID 77 <1,2,.preg_I4> T<4,.predef_I4,4> # <preg>
>   END_LOOP_INFO
>     I4I4LDID 74 <1,2,.preg_I4> T<4,.predef_I4,4> # k1
>     I4I4LDID 75 <1,2,.preg_I4> T<4,.predef_I4,4> # sum
>    I4ADD
>   I4STID 75 <1,2,.preg_I4> T<4,.predef_I4,4> # sum {line: 5}
>   ZDLBR L2050 {line: 0}
> Then, we let cg do things. Such a design abstract the general 
> operations in optimizer, while target specific part in cg, still a 
> simulated op, until cg loop optimization finished. We implement a 
> multi nested level hwloop by this approach. Gcc's 3 doloop expand 
> names do the same, we believe.
> |
>
> More details, please take a look at
>
> http://wiki.open64.net/index.php/Zero_Delay_Loop
>
> Thanks
> Gang
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121122/ba844f04/attachment.html>

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Nov 2012 - [LLVMdev] Disable loop unroll pass

[LLVMdev] Disable loop unroll pass

[LLVMdev] Disable loop unroll pass

[LLVMdev] Disable loop unroll pass

[LLVMdev] Disable loop unroll pass

Reasonably Related Threads