similar to: [LLVMdev] Disable loop unroll pass

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Disable loop unroll pass"

2012 Nov 21
2
[LLVMdev] Disable loop unroll pass
Hi Hal, On 21/11/2012 22:38, Hal Finkel wrote: > ----- Original Message ----- >> From: "Ivan Llopard" <ivanllopard at gmail.com> >> To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> >> Sent: Wednesday, November 21, 2012 10:31:07 AM >> Subject: [LLVMdev] Disable loop unroll pass >> >> Hi, >> >> We've a
2012 Nov 21
0
[LLVMdev] Disable loop unroll pass
----- Original Message ----- > From: "Ivan Llopard" <ivanllopard at gmail.com> > To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Wednesday, November 21, 2012 10:31:07 AM > Subject: [LLVMdev] Disable loop unroll pass > > Hi, > > We've a target which has hardware support for zero-overhead loops. > Currently, we
2012 Nov 22
0
[LLVMdev] Disable loop unroll pass
Hi, Ivan: My $0.02. hasZeroCostLooping() disabling unrolling dose not seem to be appropriate for other architectures, at least the one I worked before. You mentioned: >Currently, we cannot detect them because the loop unroller is >unrolling them before entering into the codegen. Looking at its implementation, >it. Could you please articulate why CG fail to recognize it?
2012 Nov 22
3
[LLVMdev] Disable loop unroll pass
Hi Shuxin, Eli, On 22/11/2012 03:19, Shuxin Yang wrote: > Hi, Ivan: > > My $0.02. hasZeroCostLooping() disabling unrolling dose not seem > to be > appropriate for other architectures, at least the one I worked before. I appreciate your feed-back. Could you give an example where building a hw loop is not appropriate for your target? > > You mentioned: >
2012 Nov 21
2
[LLVMdev] Disable loop unroll pass
I just wanted to add to Krzysztof's response. I'm not sure if you're referring to the case when a compile-time trip count loop is completely unrolled or for a loop with a run-time trip count, which would be partially unrolled. For Hexagon, if we partially unroll a loop, we'd also like to use our hardware loop instructions. That is, unrolling and hardware loops are
2012 Nov 21
0
[LLVMdev] Disable loop unroll pass
On 11/21/2012 10:31 AM, Ivan Llopard wrote: > > Does Hexagon provides the same loop support? How have you addressed this? Yes, Hexagon has hardware support for loops: you set up the loop start address, number of iterations and indicate where the loop ends, and the hardware would execute the code repeatedly until the count goes down to zero. I'm not aware of any specific changes that
2012 Nov 21
0
[LLVMdev] Disable loop unroll pass
Hi Brendon, Krzysztof, Thanks for your responses. On 21/11/2012 20:49, Brendon Cahoon wrote: > I just wanted to add to Krzysztof's response. I'm not sure if you're > referring to the case when a compile-time trip count loop is completely > unrolled or for a loop with a run-time trip count, which would be partially > unrolled. For Hexagon, if we partially unroll a loop,
2012 Nov 23
1
[LLVMdev] Disable loop unroll pass
Hi, Ivan: Sorry for deviating the topic a bit. As I told you before I'm a LLVM newbie, I cannot give you conclusive answer if the proposed interface is ok or not. My personal opinion on these two interface is summarized bellow: - hasZeroCostLoop() pro: it is clearly state the HW support. con: Having zero cost loop doesn't imply the benefit HW loop could achieve.
2012 Nov 22
2
[LLVMdev] Disable loop unroll pass
Hi, Gang: I don't want to discuss Open64 internal in LLVM mailing list. Let us only focus on the design per se. As your this mail and your previous mail combined give me a impression that : The only reason you introduce the specific operator for HW loop in Scalar Opt simply because you have hard time in figure out the trip count in CodeGen. This might be true for Open64's
2012 Nov 23
0
[LLVMdev] Disable loop unroll pass
Hi Shuxin, On 23/11/2012 00:17, Shuxin Yang wrote: > Hi, Gang: > > I don't want to discuss Open64 internal in LLVM mailing list. Let us > only focus on the design per se. > As your this mail and your previous mail combined give me a impression > that : > > The only reason you introduce the specific operator for HW loop in > Scalar Opt simply because >
2012 Nov 22
0
[LLVMdev] Disable loop unroll pass
I am the designer for open64 hwloop structure, but I am not a student. Hope the following helps: To transform a loop into hwloop, we need the help from optimizer. For example, while(k3>=10){ sum+=k1; k3 --; } into the form: zdl_loop(k3-9) { sum+=k1; } So, we introduce a new ZDLBR whirl(open64 optimizer intermediate) operator, which represents the loop in whirl as:
2017 Jan 31
3
(RFC) Adjusting default loop fully unroll threshold
> On Jan 30, 2017, at 4:56 PM, Dehao Chen <dehao at google.com> wrote: > > > > On Mon, Jan 30, 2017 at 3:56 PM, Chandler Carruth <chandlerc at google.com <mailto:chandlerc at google.com>> wrote: > On Mon, Jan 30, 2017 at 3:51 PM Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> On Jan 30,
2017 Jan 30
2
(RFC) Adjusting default loop fully unroll threshold
On Mon, Jan 30, 2017 at 3:51 PM Mehdi Amini via llvm-dev < llvm-dev at lists.llvm.org> wrote: > On Jan 30, 2017, at 10:49 AM, Dehao Chen via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Currently, loop fully unroller shares the same default threshold as loop > dynamic unroller and partial unroller. This seems conservative because > unlike dynamic/partial
2017 Jan 30
4
(RFC) Adjusting default loop fully unroll threshold
Currently, loop fully unroller shares the same default threshold as loop dynamic unroller and partial unroller. This seems conservative because unlike dynamic/partial unrolling, fully unrolling will not affect LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed to double the threshold for loop fully unroller. This will change the codegen of several SPECCPU benchmarks: Code
2017 Jan 31
0
(RFC) Adjusting default loop fully unroll threshold
On Mon, Jan 30, 2017 at 4:59 PM Mehdi Amini <mehdi.amini at apple.com> wrote: > > > Another question is about PGO integration: is it already hooked there? > Should we have a more aggressive threshold in a hot function? (Assuming > we’re willing to spend some binary size there but not on the cold path). > > > I would even wire the *unrolling* the other way: just
2020 May 22
4
Loop Unroll
Hi, I'm interesting in find a pass for loop unrolling in LLVM compiler. I tried opt --loop-unroll --unroll-count=4, but it don't work well. What pass I can used and how? I would also like to know if there is any way to mark the loops that I want them to be unroll Thanks you. -------------- next part -------------- An HTML attachment was scrubbed... URL:
2017 Jan 31
2
(RFC) Adjusting default loop fully unroll threshold
Recollected the data from trunk head with stddev data and more threshold data points attached: Performance: stddev/mean 300 450 600 750 403 0.37% 0.11% 0.11% 0.09% 0.79% 433 0.14% 0.51% 0.25% -0.63% -0.29% 445 0.08% 0.48% 0.89% 0.12% 0.83% 447 0.16% 3.50% 2.69% 3.66% 3.59% 453 0.11% 1.49% 0.45% -0.07% 0.78% 464 0.17% 0.75% 1.80% 1.86% 1.54% Code size: 300 450 600 750 403 0.56% 2.41% 2.74% 3.75%
2014 Jan 16
11
[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info
I am starting to use the sample profiler to analyze new performance opportunities. The loop unroller has popped up in several of the benchmarks I'm running. In particular, libquantum. There is a ~12% opportunity when the runtime unroller is triggered. This helps functions like quantum_sigma_x (http://sourcecodebrowser.com/libquantum/0.2.4/gates_8c_source.html#l00149). The function accounts
2012 Jun 14
5
[LLVMdev] [PATCH] Refactoring the DFA generator
Hi, I've refactored the DFA generator in TableGen because it takes too much time to build the table of our BE and I'd like to share it. We have 15 functional units and 13 different itineraries which, in the worst case, can produce 13! states. Fortunately, many of those states are reused :-) but it still takes up to 11min to build the entire table. This patch reduces the build time to
2016 Oct 13
2
Loop Unrolling Fail in Simple Vectorized loop
Thanks for the explanation. But I am a little confused with the following fact. Can't LLVM keep vectorizable_elements as a symbolic value and convert the loop to say; for(unsigned i = 0; i < vectorizable_elements ; i += 2){ //main loop } for(unsigned i=0 ; i < vectorizable_elements % 2; i++){ //fix up } Why does it have to reason about the range of vectorizable_elements? Even