Displaying 20 results from an estimated 7000 matches similar to: "[LLVMdev] [PATCH] Loop Rerolling Pass"
2013 Oct 21
0
[LLVMdev] First attempt at recognizing pointer reduction
On Oct 21, 2013, at 1:00 PM, Renato Golin <renato.golin at linaro.org> wrote:
> Hi Arnold,
>
> To sum up my intentions, I want to understand how the reduction/induction variable detection works in LLVM, so that I can know better how to detect different patterns in memory, not just the stride vectorization.
To detect memory access patterns you will want to look at the SCEV of a
2013 Oct 23
0
[LLVMdev] First attempt at recognizing pointer reduction
On Oct 23, 2013, at 3:10 PM, Renato Golin <renato.golin at linaro.org> wrote:
> On 23 October 2013 16:05, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> In the examples you gave there are no reduction variables in the loop vectorizer’s sense. But, they all have memory accesses that are strided.
>
> This is what I don't get. As far as I understood, a
2013 Oct 21
0
[LLVMdev] LLVMdev Digest, Vol 112, Issue 56
Has anyone worked with or used the LLVM backend or compiler for Haskell ??
David
On Monday, October 21, 2013 5:26 PM, "llvmdev-request at cs.uiuc.edu" <llvmdev-request at cs.uiuc.edu> wrote:
Send LLVMdev mailing list submissions to
llvmdev at cs.uiuc.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
or,
2013 Oct 21
2
[LLVMdev] First attempt at recognizing pointer reduction
Hi Arnold,
To sum up my intentions, I want to understand how the reduction/induction
variable detection works in LLVM, so that I can know better how to detect
different patterns in memory, not just the stride vectorization.
For instance, even if the relationship between each loop would be
complicated, I know that in each loop, all three reads are sequential. So,
at least, I could use a
2013 Oct 23
0
[LLVMdev] First attempt at recognizing pointer reduction
On Oct 23, 2013, at 9:41 AM, Renato Golin <renato.golin at linaro.org> wrote:
> On 21 October 2013 17:29, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> I don’t think that recognizing this as a reduction is going to get you far. A reduction is beneficial if the value reduced is only truly needed outside of a loop.
> This is not the case here (we are storing/loading
2013 Oct 21
1
[LLVMdev] First attempt at recognizing pointer reduction
On 21 October 2013 20:58, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:
> For example these should be the SCEVs of “int a[2*i] = ; a[2*i+1] =”:
>
> {ptr, +, 8}_loop
> {ptr+4, +, 8}_loop
>
> Each access on its own requires a gather/scather (2 loads/stores when
> vectorized (VF=2) + inserts/extracts). But when we look at both at once we
> see that we only
2013 Oct 23
2
[LLVMdev] First attempt at recognizing pointer reduction
On 21 October 2013 17:29, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:
> I don’t think that recognizing this as a reduction is going to get you
> far. A reduction is beneficial if the value reduced is only truly needed
> outside of a loop.
> This is not the case here (we are storing/loading from the pointer).
>
Hi Arnold, Nadav,
Let me resurrect this discussion a
2013 Oct 14
1
[LLVMdev] Vectorization of pointer PHI nodes
On 14 October 2013 19:31, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:
> Renato, can you post the c code for the function and the assembly that gcc
> produces?
>
Attached.
Your initial example could be well handled by vectorization of strided
> loops (and the mentioning of VLD3(.8?)/VST3(.8?) lead me to assume that
> this is what happened). But the LLVM-IR you
2013 Oct 23
2
[LLVMdev] First attempt at recognizing pointer reduction
On 23 October 2013 16:05, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:
> In the examples you gave there are no reduction variables in the loop
> vectorizer’s sense. But, they all have memory accesses that are strided.
>
This is what I don't get. As far as I understood, a reduction variable is
the one that aggregates the computation done by the loop, and is used
2013 Oct 24
1
[LLVMdev] First attempt at recognizing pointer reduction
On 23 October 2013 23:05, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:
> A reduction is something like:
>
> for (i= …) {
> r+= a[i];
> }
> return r;
>
Ok, so "reduction" is just a reduction in the map-reduce sense, and nothing
else.
You don’t need to transform them in the legality phase. Believe me ;). Look
> at how we handle stride one
2016 Jun 30
1
[Proposal][RFC] Strided Memory Access Vectorization
As a strong advocate of logical vector representation, I'm counting on community liking Michael's RFC and that'll proceed sooner than later.
I plan to pitch in (e.g., perf experiments).
>Probably can depend on the support provided by below RFC by Michael:
> "Allow loop vectorizer to choose vector widths that generate illegal types"
>In that case Loop Vectorizer will
2016 Jun 18
2
[Proposal][RFC] Strided Memory Access Vectorization
>Vectorizer's output should be as clean as vector code can be so that analyses and optimizers downstream can
>do a great job optimizing.
Guess I should clarify this philosophical position of mine. In terms of vector code optimization that complicates
the output of vectorizer:
If vectorizer is the best place to perform the optimization, it should do so.
This includes the cases like
2016 Jun 30
0
[Proposal][RFC] Strided Memory Access Vectorization
One common concern raised for cases where Loop Vectorizer generate
bigger types than target supported:
Based on VF currently we check the cost and generate the expected set of
instruction[s] for bigger type. It has two challenges for bigger types cost
is not always correct and code generation may not generate efficient
instruction[s].
Probably can depend on the support provided by below RFC by
2016 Jun 15
3
[Proposal][RFC] Strided Memory Access Vectorization
Sorry for the spam. Copy-paste didn't capture the Subject properly. Resending with the correct Subject so that the thread is captured properly.
-----Original Message-----
From: Saito, Hideki
Sent: Wednesday, June 15, 2016 1:39 PM
To: 'llvm-dev at lists.llvm.org' <llvm-dev at lists.llvm.org>
Subject: RE: [llvm-dev] [Proposal][RFC] Strided Memory Access
Ashutosh,
First,
2013 Oct 14
0
[LLVMdev] Vectorization of pointer PHI nodes
Renato, can you post the c code for the function and the assembly that gcc produces?
Your initial example could be well handled by vectorization of strided loops (and the mentioning of VLD3(.8?)/VST3(.8?) lead me to assume that this is what happened). But the LLVM-IR you sent has a store of 0 in there ;) and strides by 4.
Thanks,
Arnold
Vectorization of strided loops:
I am using float as the
2013 Oct 21
0
[LLVMdev] First attempt at recognizing pointer reduction
Renato,
can you post a hand-created vectorized IR of how a reduction would work on your example?
I don’t think that recognizing this as a reduction is going to get you far. A reduction is beneficial if the value reduced is only truly needed outside of a loop.
This is not the case here (we are storing/loading from the pointer).
Your example is something like
WRITEPTR = phi i8* [ outsideval,
2016 Aug 05
2
enabling interleaved access loop vectorization
Regarding InterleavedAccessPass - sure, but proper strided/interleaved
access optimization ought to have a positive impact even without target
support.
Case in point - Hal enabled it on PPC last September. An important
difference vs. x86 seems to be that arbitrary shuffles are cheap on PPC,
but, as I said below, I hope we can enable it on x86 with a conservative
cost function, and still get
2013 Jun 07
0
[LLVMdev] NEON vector instructions and the fast math IR flags
On Jun 7, 2013, at 9:22 AM, Renato Golin <renato.golin at linaro.org> wrote:
> On 7 June 2013 14:49, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> It is not the vectorizer that is the issue, it is the ARM backend that currently translates vectorized floating point IR to NEON instructions (it should scalarize it if desired to do so - i.e. if people care about
2013 Sep 27
0
[LLVMdev] Trip count and Loop Vectorizer
Hey Arnold,
I have run into this situation many times while benchmarking.
I think it is best if this is addressed using a simple heuristic. For that, we need to identify the loop cost and decide if it makes sense to completely unroll the loop, or partially unroll. I am unsure of the optimal way to implement this though.
I want to run it by the list to get any ideas floating around :)
Thanks
2013 Jul 21
0
[LLVMdev] Disable vectorization for unaligned data
No, I am afraid not without computing alignment based on the scalar code.
In order to limit vectorization to 16-byte aligned data we need to know that data is 16-byte aligned. The way we vectorize we won’t know that until after we have vectorized. As you have observed we will pass “4” to getMemoryOpCost in the loop vectorizer (as that is the only thing that can be inferred from a consecutive