thr3ads.net - similar to: "[LLVMdev] [PATCH] Loop Rerolling Pass"

Displaying 20 results from an estimated 7000 matches similar to: "[LLVMdev] [PATCH] Loop Rerolling Pass"

[LLVMdev] First attempt at recognizing pointer reduction

2013 Oct 21

[LLVMdev] First attempt at recognizing pointer reduction

On Oct 21, 2013, at 1:00 PM, Renato Golin <renato.golin at linaro.org> wrote: > Hi Arnold, > > To sum up my intentions, I want to understand how the reduction/induction variable detection works in LLVM, so that I can know better how to detect different patterns in memory, not just the stride vectorization. To detect memory access patterns you will want to look at the SCEV of a

[LLVMdev] First attempt at recognizing pointer reduction

2013 Oct 23

[LLVMdev] First attempt at recognizing pointer reduction

On Oct 23, 2013, at 3:10 PM, Renato Golin <renato.golin at linaro.org> wrote: > On 23 October 2013 16:05, Arnold Schwaighofer <aschwaighofer at apple.com> wrote: > In the examples you gave there are no reduction variables in the loop vectorizer’s sense. But, they all have memory accesses that are strided. > > This is what I don't get. As far as I understood, a

[LLVMdev] LLVMdev Digest, Vol 112, Issue 56

2013 Oct 21

[LLVMdev] LLVMdev Digest, Vol 112, Issue 56

Has anyone worked with or used the LLVM backend or compiler for Haskell ?? David On Monday, October 21, 2013 5:26 PM, "llvmdev-request at cs.uiuc.edu" <llvmdev-request at cs.uiuc.edu> wrote: Send LLVMdev mailing list submissions to llvmdev at cs.uiuc.edu To subscribe or unsubscribe via the World Wide Web, visit http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev or,

[LLVMdev] First attempt at recognizing pointer reduction

2013 Oct 21

[LLVMdev] First attempt at recognizing pointer reduction

Hi Arnold, To sum up my intentions, I want to understand how the reduction/induction variable detection works in LLVM, so that I can know better how to detect different patterns in memory, not just the stride vectorization. For instance, even if the relationship between each loop would be complicated, I know that in each loop, all three reads are sequential. So, at least, I could use a

[LLVMdev] First attempt at recognizing pointer reduction

2013 Oct 23

[LLVMdev] First attempt at recognizing pointer reduction

On Oct 23, 2013, at 9:41 AM, Renato Golin <renato.golin at linaro.org> wrote: > On 21 October 2013 17:29, Arnold Schwaighofer <aschwaighofer at apple.com> wrote: > I don’t think that recognizing this as a reduction is going to get you far. A reduction is beneficial if the value reduced is only truly needed outside of a loop. > This is not the case here (we are storing/loading

[LLVMdev] First attempt at recognizing pointer reduction

2013 Oct 21

[LLVMdev] First attempt at recognizing pointer reduction

On 21 October 2013 20:58, Arnold Schwaighofer <aschwaighofer at apple.com>wrote: > For example these should be the SCEVs of “int a[2*i] = ; a[2*i+1] =”: > > {ptr, +, 8}_loop > {ptr+4, +, 8}_loop > > Each access on its own requires a gather/scather (2 loads/stores when > vectorized (VF=2) + inserts/extracts). But when we look at both at once we > see that we only

[LLVMdev] First attempt at recognizing pointer reduction

2013 Oct 23

[LLVMdev] First attempt at recognizing pointer reduction

On 21 October 2013 17:29, Arnold Schwaighofer <aschwaighofer at apple.com>wrote: > I don’t think that recognizing this as a reduction is going to get you > far. A reduction is beneficial if the value reduced is only truly needed > outside of a loop. > This is not the case here (we are storing/loading from the pointer). > Hi Arnold, Nadav, Let me resurrect this discussion a

[LLVMdev] Vectorization of pointer PHI nodes

2013 Oct 14

[LLVMdev] Vectorization of pointer PHI nodes

On 14 October 2013 19:31, Arnold Schwaighofer <aschwaighofer at apple.com>wrote: > Renato, can you post the c code for the function and the assembly that gcc > produces? > Attached. Your initial example could be well handled by vectorization of strided > loops (and the mentioning of VLD3(.8?)/VST3(.8?) lead me to assume that > this is what happened). But the LLVM-IR you

[LLVMdev] First attempt at recognizing pointer reduction

2013 Oct 23

[LLVMdev] First attempt at recognizing pointer reduction

On 23 October 2013 16:05, Arnold Schwaighofer <aschwaighofer at apple.com>wrote: > In the examples you gave there are no reduction variables in the loop > vectorizer’s sense. But, they all have memory accesses that are strided. > This is what I don't get. As far as I understood, a reduction variable is the one that aggregates the computation done by the loop, and is used

[LLVMdev] First attempt at recognizing pointer reduction

2013 Oct 24

[LLVMdev] First attempt at recognizing pointer reduction

On 23 October 2013 23:05, Arnold Schwaighofer <aschwaighofer at apple.com>wrote: > A reduction is something like: > > for (i= …) { > r+= a[i]; > } > return r; > Ok, so "reduction" is just a reduction in the map-reduce sense, and nothing else. You don’t need to transform them in the legality phase. Believe me ;). Look > at how we handle stride one

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 30

[Proposal][RFC] Strided Memory Access Vectorization

As a strong advocate of logical vector representation, I'm counting on community liking Michael's RFC and that'll proceed sooner than later. I plan to pitch in (e.g., perf experiments). >Probably can depend on the support provided by below RFC by Michael: > "Allow loop vectorizer to choose vector widths that generate illegal types" >In that case Loop Vectorizer will

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 18

[Proposal][RFC] Strided Memory Access Vectorization

>Vectorizer's output should be as clean as vector code can be so that analyses and optimizers downstream can >do a great job optimizing. Guess I should clarify this philosophical position of mine. In terms of vector code optimization that complicates the output of vectorizer: If vectorizer is the best place to perform the optimization, it should do so. This includes the cases like

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 30

[Proposal][RFC] Strided Memory Access Vectorization

One common concern raised for cases where Loop Vectorizer generate bigger types than target supported: Based on VF currently we check the cost and generate the expected set of instruction[s] for bigger type. It has two challenges for bigger types cost is not always correct and code generation may not generate efficient instruction[s]. Probably can depend on the support provided by below RFC by

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 15

[Proposal][RFC] Strided Memory Access Vectorization

Sorry for the spam. Copy-paste didn't capture the Subject properly. Resending with the correct Subject so that the thread is captured properly. -----Original Message----- From: Saito, Hideki Sent: Wednesday, June 15, 2016 1:39 PM To: 'llvm-dev at lists.llvm.org' <llvm-dev at lists.llvm.org> Subject: RE: [llvm-dev] [Proposal][RFC] Strided Memory Access Ashutosh, First,

[LLVMdev] Vectorization of pointer PHI nodes

2013 Oct 14

[LLVMdev] Vectorization of pointer PHI nodes

Renato, can you post the c code for the function and the assembly that gcc produces? Your initial example could be well handled by vectorization of strided loops (and the mentioning of VLD3(.8?)/VST3(.8?) lead me to assume that this is what happened). But the LLVM-IR you sent has a store of 0 in there ;) and strides by 4. Thanks, Arnold Vectorization of strided loops: I am using float as the

[LLVMdev] First attempt at recognizing pointer reduction

2013 Oct 21

[LLVMdev] First attempt at recognizing pointer reduction

Renato, can you post a hand-created vectorized IR of how a reduction would work on your example? I don’t think that recognizing this as a reduction is going to get you far. A reduction is beneficial if the value reduced is only truly needed outside of a loop. This is not the case here (we are storing/loading from the pointer). Your example is something like WRITEPTR = phi i8* [ outsideval,

enabling interleaved access loop vectorization

2016 Aug 05

enabling interleaved access loop vectorization

Regarding InterleavedAccessPass - sure, but proper strided/interleaved access optimization ought to have a positive impact even without target support. Case in point - Hal enabled it on PPC last September. An important difference vs. x86 seems to be that arbitrary shuffles are cheap on PPC, but, as I said below, I hope we can enable it on x86 with a conservative cost function, and still get

[LLVMdev] NEON vector instructions and the fast math IR flags

2013 Jun 07

[LLVMdev] NEON vector instructions and the fast math IR flags

On Jun 7, 2013, at 9:22 AM, Renato Golin <renato.golin at linaro.org> wrote: > On 7 June 2013 14:49, Arnold Schwaighofer <aschwaighofer at apple.com> wrote: > It is not the vectorizer that is the issue, it is the ARM backend that currently translates vectorized floating point IR to NEON instructions (it should scalarize it if desired to do so - i.e. if people care about

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hey Arnold, I have run into this situation many times while benchmarking. I think it is best if this is addressed using a simple heuristic. For that, we need to identify the loop cost and decide if it makes sense to completely unroll the loop, or partially unroll. I am unsure of the optimal way to implement this though. I want to run it by the list to get any ideas floating around :) Thanks

[LLVMdev] Disable vectorization for unaligned data

2013 Jul 21

[LLVMdev] Disable vectorization for unaligned data

No, I am afraid not without computing alignment based on the scalar code. In order to limit vectorization to 16-byte aligned data we need to know that data is 16-byte aligned. The way we vectorize we won’t know that until after we have vectorized. As you have observed we will pass “4” to getMemoryOpCost in the loop vectorizer (as that is the only thing that can be inferred from a consecutive

similar to: [LLVMdev] [PATCH] Loop Rerolling Pass