Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Disable loop unroll pass"
2012 Nov 21
2
[LLVMdev] Disable loop unroll pass
Hi Hal,
On 21/11/2012 22:38, Hal Finkel wrote:
> ----- Original Message -----
>> From: "Ivan Llopard" <ivanllopard at gmail.com>
>> To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
>> Sent: Wednesday, November 21, 2012 10:31:07 AM
>> Subject: [LLVMdev] Disable loop unroll pass
>>
>> Hi,
>>
>> We've a
2012 Nov 21
0
[LLVMdev] Disable loop unroll pass
----- Original Message -----
> From: "Ivan Llopard" <ivanllopard at gmail.com>
> To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Wednesday, November 21, 2012 10:31:07 AM
> Subject: [LLVMdev] Disable loop unroll pass
>
> Hi,
>
> We've a target which has hardware support for zero-overhead loops.
> Currently, we
2012 Nov 22
0
[LLVMdev] Disable loop unroll pass
Hi, Ivan:
My $0.02. hasZeroCostLooping() disabling unrolling dose not seem to be
appropriate for other architectures, at least the one I worked before.
You mentioned:
>Currently, we cannot detect them because the loop unroller is
>unrolling them before entering into the codegen. Looking at its
implementation,
>it.
Could you please articulate why CG fail to recognize it?
2012 Nov 22
3
[LLVMdev] Disable loop unroll pass
Hi Shuxin, Eli,
On 22/11/2012 03:19, Shuxin Yang wrote:
> Hi, Ivan:
>
> My $0.02. hasZeroCostLooping() disabling unrolling dose not seem
> to be
> appropriate for other architectures, at least the one I worked before.
I appreciate your feed-back. Could you give an example where building a
hw loop is not appropriate for your target?
>
> You mentioned:
>
2012 Nov 21
2
[LLVMdev] Disable loop unroll pass
I just wanted to add to Krzysztof's response. I'm not sure if you're
referring to the case when a compile-time trip count loop is completely
unrolled or for a loop with a run-time trip count, which would be partially
unrolled. For Hexagon, if we partially unroll a loop, we'd also like to use
our hardware loop instructions. That is, unrolling and hardware loops are
2012 Nov 21
0
[LLVMdev] Disable loop unroll pass
On 11/21/2012 10:31 AM, Ivan Llopard wrote:
>
> Does Hexagon provides the same loop support? How have you addressed this?
Yes, Hexagon has hardware support for loops: you set up the loop start
address, number of iterations and indicate where the loop ends, and the
hardware would execute the code repeatedly until the count goes down to
zero.
I'm not aware of any specific changes that
2012 Nov 21
0
[LLVMdev] Disable loop unroll pass
Hi Brendon, Krzysztof,
Thanks for your responses.
On 21/11/2012 20:49, Brendon Cahoon wrote:
> I just wanted to add to Krzysztof's response. I'm not sure if you're
> referring to the case when a compile-time trip count loop is completely
> unrolled or for a loop with a run-time trip count, which would be partially
> unrolled. For Hexagon, if we partially unroll a loop,
2012 Nov 23
1
[LLVMdev] Disable loop unroll pass
Hi, Ivan:
Sorry for deviating the topic a bit. As I told you before I'm a LLVM
newbie, I cannot
give you conclusive answer if the proposed interface is ok or not.
My personal opinion on these two interface is summarized bellow:
- hasZeroCostLoop()
pro: it is clearly state the HW support.
con: Having zero cost loop doesn't imply the benefit HW loop could
achieve.
2012 Nov 22
2
[LLVMdev] Disable loop unroll pass
Hi, Gang:
I don't want to discuss Open64 internal in LLVM mailing list. Let us
only focus on the design per se.
As your this mail and your previous mail combined give me a impression
that :
The only reason you introduce the specific operator for HW loop in
Scalar Opt simply because
you have hard time in figure out the trip count in CodeGen.
This might be true for Open64's
2012 Nov 23
0
[LLVMdev] Disable loop unroll pass
Hi Shuxin,
On 23/11/2012 00:17, Shuxin Yang wrote:
> Hi, Gang:
>
> I don't want to discuss Open64 internal in LLVM mailing list. Let us
> only focus on the design per se.
> As your this mail and your previous mail combined give me a impression
> that :
>
> The only reason you introduce the specific operator for HW loop in
> Scalar Opt simply because
>
2012 Nov 22
0
[LLVMdev] Disable loop unroll pass
I am the designer for open64 hwloop structure, but I am not a student.
Hope the following helps:
To transform a loop into hwloop, we need the help from optimizer. For example,
while(k3>=10){
sum+=k1;
k3 --;
}
into the form:
zdl_loop(k3-9) {
sum+=k1;
}
So, we introduce a new ZDLBR whirl(open64 optimizer intermediate) operator, which represents the loop in whirl as:
2017 Jan 31
3
(RFC) Adjusting default loop fully unroll threshold
> On Jan 30, 2017, at 4:56 PM, Dehao Chen <dehao at google.com> wrote:
>
>
>
> On Mon, Jan 30, 2017 at 3:56 PM, Chandler Carruth <chandlerc at google.com <mailto:chandlerc at google.com>> wrote:
> On Mon, Jan 30, 2017 at 3:51 PM Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> On Jan 30,
2017 Jan 30
2
(RFC) Adjusting default loop fully unroll threshold
On Mon, Jan 30, 2017 at 3:51 PM Mehdi Amini via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Jan 30, 2017, at 10:49 AM, Dehao Chen via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Currently, loop fully unroller shares the same default threshold as loop
> dynamic unroller and partial unroller. This seems conservative because
> unlike dynamic/partial
2017 Jan 30
4
(RFC) Adjusting default loop fully unroll threshold
Currently, loop fully unroller shares the same default threshold as loop
dynamic unroller and partial unroller. This seems conservative because
unlike dynamic/partial unrolling, fully unrolling will not affect
LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed to
double the threshold for loop fully unroller. This will change the codegen
of several SPECCPU benchmarks:
Code
2017 Jan 31
0
(RFC) Adjusting default loop fully unroll threshold
On Mon, Jan 30, 2017 at 4:59 PM Mehdi Amini <mehdi.amini at apple.com> wrote:
>
>
> Another question is about PGO integration: is it already hooked there?
> Should we have a more aggressive threshold in a hot function? (Assuming
> we’re willing to spend some binary size there but not on the cold path).
>
>
> I would even wire the *unrolling* the other way: just
2020 May 22
4
Loop Unroll
Hi,
I'm interesting in find a pass for loop unrolling in LLVM compiler. I tried
opt --loop-unroll --unroll-count=4, but it don't work well.
What pass I can used and how?
I would also like to know if there is any way to mark the loops that I want
them to be unroll
Thanks you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2017 Jan 31
2
(RFC) Adjusting default loop fully unroll threshold
Recollected the data from trunk head with stddev data and more threshold
data points attached:
Performance:
stddev/mean 300 450 600 750
403 0.37% 0.11% 0.11% 0.09% 0.79%
433 0.14% 0.51% 0.25% -0.63% -0.29%
445 0.08% 0.48% 0.89% 0.12% 0.83%
447 0.16% 3.50% 2.69% 3.66% 3.59%
453 0.11% 1.49% 0.45% -0.07% 0.78%
464 0.17% 0.75% 1.80% 1.86% 1.54%
Code size:
300 450 600 750
403 0.56% 2.41% 2.74% 3.75%
2014 Jan 16
11
[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info
I am starting to use the sample profiler to analyze new performance
opportunities. The loop unroller has popped up in several of the
benchmarks I'm running. In particular, libquantum. There is a ~12%
opportunity when the runtime unroller is triggered.
This helps functions like quantum_sigma_x
(http://sourcecodebrowser.com/libquantum/0.2.4/gates_8c_source.html#l00149).
The function accounts
2012 Jun 14
5
[LLVMdev] [PATCH] Refactoring the DFA generator
Hi,
I've refactored the DFA generator in TableGen because it takes too much
time to build the table of our BE and I'd like to share it.
We have 15 functional units and 13 different itineraries which, in the
worst case, can produce 13! states. Fortunately, many of those states
are reused :-) but it still takes up to 11min to build the entire table.
This patch reduces the build time to
2016 Oct 13
2
Loop Unrolling Fail in Simple Vectorized loop
Thanks for the explanation. But I am a little confused with the following
fact. Can't LLVM keep vectorizable_elements as a symbolic value and convert
the loop to say;
for(unsigned i = 0; i < vectorizable_elements ; i += 2){
//main loop
}
for(unsigned i=0 ; i < vectorizable_elements % 2; i++){
//fix up
}
Why does it have to reason about the range of vectorizable_elements? Even