Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] Improving SLPVectorizer for Julia"
2014 Apr 17
2
[LLVMdev] Extend SLPVectorizer to struct operations that are isomorphic to vector operations?
While playing with SLPVectorizer, I notice that it will happily vectorize cases involving extractelement/insertelement, but won't vectorize isomorphic cases involving extractvalue/insertvalue (such as the attached example). Is that something that could be straightforward to add to SLPVectorizer, or are there some hard issue? In particular, the transformation would seem to require casts of
2013 Oct 24
0
[LLVMdev] Vectorizing alloca instructions
On Thu, Oct 24, 2013 at 2:04 PM, Tom Stellard <tom at stellard.net> wrote:
> Hi,
>
> I've been playing around with the SLPVectorizer trying to get it to
> vectorize this simple program:
>
> define void @vector(i32 addrspace(1)* %out, i32 %index) {
> entry:
> %0 = alloca [4 x i32]
> %x = getelementptr [4 x i32]* %0, i32 0, i32 0
> %y = getelementptr [4
2020 Jan 11
2
[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
Thanks so much for your feedback Simon.
I am not sure that what I am proposing here is at odds with what you're
referring to (here and in the PR you linked). The key difference AFAICT is
that the pattern I am referring to is probably more aptly described as
"reducing scalarization" than as "vectorization". The reason I say that is
that the inputs are vectors and the output
2013 Oct 24
1
[LLVMdev] Vectorizing alloca instructions
On Oct 24, 2013, at 3:00 PM, Chandler Carruth <chandlerc at google.com> wrote:
> Just a note, I don't think you should or need to vectorize the actual alloca stuff. If you can simply transform the dynamically indexed load:
>
> Then running SROA and InstCombine will mop up the rest. So its mostly about getting the SLPVectorizer to handle the dynamic GEP. As soon as it does
2013 Oct 24
4
[LLVMdev] Vectorizing alloca instructions
Hi,
I've been playing around with the SLPVectorizer trying to get it to
vectorize this simple program:
define void @vector(i32 addrspace(1)* %out, i32 %index) {
entry:
%0 = alloca [4 x i32]
%x = getelementptr [4 x i32]* %0, i32 0, i32 0
%y = getelementptr [4 x i32]* %0, i32 0, i32 1
%z = getelementptr [4 x i32]* %0, i32 0, i32 2
%w = getelementptr [4 x i32]* %0, i32 0, i32 3
2020 Jan 11
2
[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
Absolutely. We do it for scalars, so it would likely be a matter of just
extending it.
But that is one example. The issue of extracting elements, performing an
operation on each element individually and then rebuilding the vector is
likely more prevalent than that. At least I think that is the case, but
I'll do some analysis to see if it is so or not.
On Sat, Jan 11, 2020 at 6:15 PM Craig
2013 Oct 24
0
[LLVMdev] Vectorizing alloca instructions
Hi Tom,
Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA.
Thanks,
Nadav
On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote:
> Hi,
>
>
2013 Feb 04
6
[LLVMdev] Vectorizer using Instruction, not opcodes
On 4 February 2013 18:25, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:
> For cases where this approach breaks really badly we could consider adding
> a specialized api or parameters (like the type of a user/use). But we
> should do so only as a last resort and backed by actual code that would
> benefit from doing so.
>
Very sensible, more or less what I had in
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The following IR implements the following nested loop:
for (int i = start ; i < end ; ++i )
for (int p = 0 ; p < 4 ; ++p )
a[i*4+p] = b[i*4+p] + c[i*4+p];
define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float*
noalias %arg4, float* noalias %arg5, float* noalias %arg6) {
entrypoint:
br i1 %arg2, label %L0, label %L1
L0:
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The loop vectorizer relies on cleanup passes to be run after it:
from Transforms/IPO/PassManagerBuilder.cpp:
// Add the various vectorization passes and relevant cleanup passes for
// them since we are no longer in the middle of the main scalar pipeline.
MPM.add(createLoopVectorizePass(DisableUnrollLoops));
MPM.add(createInstructionCombiningPass());
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The instcombine pass cleans up a lot.
Any idea why there are still shufflevector, insertelement, *and* bitcast
(!!) etc. instructions left? The original loop is so clean, a textbook
example I'd say. There is no need to shuffle anything.At least I don't
see it.
Frank
vector.ph: ; preds = %L5
%broadcast.splatinsert1 = insertelement <4 x
2020 Jan 10
2
[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
I have added a few PPC-specific DAG combines in the past that follow this
pattern on specific operations. Now that it appears that this would be
useful to do on yet another operation, I'm wondering what people think
about doing this in the target-independent DAG Combiner for any
legal/custom operation on the target.
TL; DR;
The generic pattern would look like this:
(build_vector (op
2012 Feb 28
1
[LLVMdev] How to vectorize a vector type cast?
Since Clang does not seem to allow type casts, such as uchar4 to float4, between vector types, it seems it is necessary to write them as element by element conversions, such as
typedef float float4 __attribute__((ext_vector_type(4)));
typedef unsigned char uchar4 __attribute__((ext_vector_type(4)));
float4 to_float4(uchar4 in)
{
float4 out = {in.x, in.y, in.z, in.w};
return out;
}
Running
2007 Sep 27
3
[LLVMdev] Vector swizzling and write masks code generation
Hey,
as some of you may know we're in process of experimenting with LLVM in
Gallium3D (Mesa's new driver model), where LLVM would be used both in the
software only (by just JIT executing shaders) and hardware (drivers will
implement LLVM code-generators) cases.
While the software only case is pretty straight forward I just realized I
missed something in my initial evaluation.
That
2012 Oct 24
3
[LLVMdev] RegisterCoalescing Pass seems to ignore part of CFG.
Hi,
I don't know if my llvm ir code is faulty, or if I spot a bug in the RegisterCoalescing Pass, so I'm posting my issue on the ML. Shader and print-before-all dump are given below.
The interessing part is the vreg6/vreg48 reduction : before RegCoalescing, the machine code is :
// BEFORE LOOP
... Some COPYs....
400B%vreg47<def> = COPY %vreg2<kill>; R600_Reg32:%vreg47,%vreg2
2014 Jul 22
2
[LLVMdev] InsertElementInst and ExtractElementInst
Hello,
I am create a <3 x i32> vector in LLVM IR. Then I insert 3 instructions
and later on I try to load one instruction from the vector. The
insertion seems to work, however, when I try to load a specific
instruction from a vector I seems that it does not work.
This is the part of my IR:
%"ins or1" = insertelement <3 x i32> undef, i32 %38, i32 0
%"ins and2"
2015 Nov 02
2
[StructurizeCFG] Trouble with branches out of a loop
Hi,
I've been investigating the StructurizeCFG pass, and it looks like it has
trouble handling CFG edges that break out of a loop and go directly to the
function exit. Am I running up against a bug in the structurizer, or a
general limitation of the algorithm used? As an aside, is there any
documentation for the algorithm used? Is it based on a published paper?
The input IR I have is the
2016 Aug 29
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Hello everyone,
I think I have found an gvn / alias analysis related bug, but before
opening an issue on the tracker I wanted to see if I am missing something.
I have the following testcase:
define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2, i32* %out) {
> entry:
> ; Just some temporary storage
> %tmp.0 = alloca i32
> %tmp.1 = alloca i32
> %tmp.i =
2013 Apr 05
4
[LLVMdev] A strange testing case of SROA
Hi,
Following is excerpted from dynamic-vector-gep.ll.
The resulting "extractelement" seems to always return 0.0f regardless
the value idx1 and idx2 is holding.
Am I missing something here or there is something fishy take place?
Thanks
Shuxin
101 ; CHECK: test6
102 ; CHECK: insertelement <4 x float> zeroinitializer, float
1.000000e+00, i32 %idx1
103 ; CHECK:
2008 Aug 01
0
[LLVMdev] Generating movq2dq using IRBuilder
Hi Dan,
Yes, they could be represented with insertelement and extractelement, but I
don't think they actually generate optimal code using movq2dq and such. Else
both bugs 2584 and 2585 would be fixed.
Anyway, I'm actually already encouraged to get involved myself. I'm quite
experienced with MMX and SSE but I'm still trying to learn more about how
LLVM does instruction selection