Displaying 20 results from an estimated 30000 matches similar to: "[LLVMdev] Vectorizing alloca instructions"
2013 Oct 24
0
[LLVMdev] Vectorizing alloca instructions
Hi Tom,
Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA.
Thanks,
Nadav
On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote:
> Hi,
>
>
2013 Oct 24
0
[LLVMdev] Vectorizing alloca instructions
On Thu, Oct 24, 2013 at 2:04 PM, Tom Stellard <tom at stellard.net> wrote:
> Hi,
>
> I've been playing around with the SLPVectorizer trying to get it to
> vectorize this simple program:
>
> define void @vector(i32 addrspace(1)* %out, i32 %index) {
> entry:
> %0 = alloca [4 x i32]
> %x = getelementptr [4 x i32]* %0, i32 0, i32 0
> %y = getelementptr [4
2013 Oct 24
1
[LLVMdev] Vectorizing alloca instructions
On Oct 24, 2013, at 3:00 PM, Chandler Carruth <chandlerc at google.com> wrote:
> Just a note, I don't think you should or need to vectorize the actual alloca stuff. If you can simply transform the dynamically indexed load:
>
> Then running SROA and InstCombine will mop up the rest. So its mostly about getting the SLPVectorizer to handle the dynamic GEP. As soon as it does
2014 Nov 10
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Suyog,
Thanks for looking at this. This has recently got itself onto my TODO list
too.
> I am not sure how much all this will improve the code quality for
horizontal reduction
> (donno how frequently such pattern of horizontal reduction from same
array occurs in real world/SPECS).
Actually the main loop of 470.lbm can be SLP vectorized like this. We have
three parts to it: A fully
2014 Sep 19
3
[LLVMdev] [Vectorization] Mis match in code generated
Hi Arnold,
Thanks for your reply.
I tried test case as suggested by you.
*void foo(int *a, int *sum) {*sum =
a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]+a[7]+a[8]+a[9]+a[10]+a[11]+a[12]+a[13]+a[14]+a[15];}*
so that it has a 'store' in its IR.
*IR before vectorization :*target datalayout =
"e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
target triple =
2013 Jul 27
2
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hey Nadav,
I'd humbly suggest that rather than use 3 directly, you should add a shared
constant between these two passes, so when one changes, the other doesn't
need to be updated. It would also ensure this bit of info about what needs
to be updated isn't only contained in the comments..
On Fri, Jul 26, 2013 at 4:07 PM, Nadav Rotem <nrotem at apple.com> wrote:
> Author:
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Nadav,
Thanks for the quick reply !!
Ok, so as of now we are lacking capability to handle flat large reductions.
I did go through function vectorizeChainsInBlock() (line number 2862). In
this function,
we try to vectorize if we have phi nodes in the IR (several if's check for
phi nodes) i.e we try to
construct tree that starts at chains.
Any pointers on how to join multiple trees? I
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi,
I am trying to understand LLVM vectorization implementation and was looking
into both loop and SLP vectorization.
test case 1:
*int foo(int *a) {int sum = 0,i;for(i=0; i<16; i++) sum += a[i];return
sum;}*
This code is vectorized by loop vectorizer where we calculate scalar loop
cost as 4 and vector loop cost as 2.
Since vector loop cost is less and above reduction is legal to
2014 Mar 17
2
[LLVMdev] Improving SLPVectorizer for Julia
I'm working on some small improvements to SLPVectorizer.cpp so that it can deal with some tuple operations arising from Julia code. Being fairly new to LLVM, I could use some advice, particular from those familiar with the internals of SLPVectorizer.
The motivation can be found in the Julia discussion https://github.com/JuliaLang/julia/issues/5857 . Here is an example of the kind of LLVM
2013 Jul 27
0
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hi Daniel,
Maybe my commit message was not clear. The idea is that the SelectionDAG store vectorizer can only handle pairs. So, the number three means "more than a pair".
Thanks,
Nadav
Sent from my iPhone
> On Jul 26, 2013, at 17:48, Daniel Berlin <dberlin at dberlin.org> wrote:
>
> Hey Nadav,
> I'd humbly suggest that rather than use 3 directly, you should
2015 Jul 07
2
[LLVMdev] Modifications to SLP
Hi all!
It takes the current SLP vectorizer too long to vectorize my scalar
code. I am talking here about functions that have a single, huge basic
block with O(10^6) instructions. Here's an example:
%0 = getelementptr float* %arg1, i32 49
%1 = load float* %0
%2 = getelementptr float* %arg1, i32 4145
%3 = load float* %2
%4 = getelementptr float* %arg2, i32 49
%5 = load
2013 Jul 27
1
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hi Nadav,
Okay.
1. The comment doesn't make this clear. I would suggest, at a minimum,
updating it to mention pairs specifically, to avoid the issue in #2
2. If the day comes when the selectiondag store vectorizer handles more
than pairs, and does so better, is anyone really going to remember this
random 3 exists in the other vectorizer?
I would posit, based on experience, the answer is
2014 Aug 07
2
[LLVMdev] MCJIT generates MOVAPS on unaligned address
> On Aug 7, 2014, at 2:57 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
>
> Your .ll file does not have a data layout. Opt will not initialize the DataLayoutPass. The SLP vectorizer will not vectorize because there is no DataLayoutPass.
>
> debug-cmake/bin/opt -default-data-layout="e-m:e-i64:64-f80:128-n8:16:32:64-S128" -basicaa -slp-vectorizer -S
2014 Aug 07
3
[LLVMdev] MCJIT generates MOVAPS on unaligned address
It's not reproducible with 'opt'. I call the SLP pass from my
application and only then the wrong IR gets generated.
On the attached module I call via the function pass manager:
1) TargetLibraryInfo with the target triple
2) Set the data layout
3) Basic Alias Analysis
4) SLP vectorizer
This produces the wrong IR. On the other hand running the attached
module through 'opt
2020 Jan 11
2
[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
Thanks so much for your feedback Simon.
I am not sure that what I am proposing here is at odds with what you're
referring to (here and in the PR you linked). The key difference AFAICT is
that the pattern I am referring to is probably more aptly described as
"reducing scalarization" than as "vectorization". The reason I say that is
that the inputs are vectors and the output
2020 Jan 11
2
[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
Absolutely. We do it for scalars, so it would likely be a matter of just
extending it.
But that is one example. The issue of extracting elements, performing an
operation on each element individually and then rebuilding the vector is
likely more prevalent than that. At least I think that is the case, but
I'll do some analysis to see if it is so or not.
On Sat, Jan 11, 2020 at 6:15 PM Craig
2020 Jan 10
2
[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
I have added a few PPC-specific DAG combines in the past that follow this
pattern on specific operations. Now that it appears that this would be
useful to do on yet another operation, I'm wondering what people think
about doing this in the target-independent DAG Combiner for any
legal/custom operation on the target.
TL; DR;
The generic pattern would look like this:
(build_vector (op
2013 Oct 30
2
[LLVMdev] loop vectorizer
The debug messages are misleading. They should read “trying to vectorize a list of …”; The problem is that the SCEV analysis is unable to detect that C[ir0] and C[ir1] are consecutive. Is this loop from an important benchmark ?
Thanks,
Nadav
On Oct 30, 2013, at 11:13 AM, Frank Winter <fwinter at jlab.org> wrote:
> The SLP vectorizer apparently did something in the prologue of the
2013 Oct 30
3
[LLVMdev] loop vectorizer
----- Original Message -----
>
>
> I ran the BB vectorizer as I guess this is the SLP vectorizer.
No, while the BB vectorizer is doing a form of SLP vectorization, there is a separate SLP vectorization pass which uses a different algorithm. You can pass -vectorize-slp to opt.
-Hal
>
> BBV: using target information
> BBV: fusing loop #1 for for.body in _Z3barmmPfS_S_...
2013 Oct 30
0
[LLVMdev] loop vectorizer
The SLP vectorizer apparently did something in the prologue of the
function (where storing of arguments on the stack happens) which then
got eliminated later on (since I don't see any vector instructions in
the final IR). Below the debug output of the SLP pass:
Args: opt -O1 -vectorize-slp -debug loop.ll -S
SLP: Analyzing blocks in _Z3barmmPfS_S_.
SLP: Found 2 stores to vectorize.
SLP: