similar to: [LLVMdev] Improving SLPVectorizer for Julia

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] Improving SLPVectorizer for Julia"

2014 Apr 17
2
[LLVMdev] Extend SLPVectorizer to struct operations that are isomorphic to vector operations?
While playing with SLPVectorizer, I notice that it will happily vectorize cases involving extractelement/insertelement, but won't vectorize isomorphic cases involving extractvalue/insertvalue (such as the attached example). Is that something that could be straightforward to add to SLPVectorizer, or are there some hard issue? In particular, the transformation would seem to require casts of
2013 Oct 24
0
[LLVMdev] Vectorizing alloca instructions
On Thu, Oct 24, 2013 at 2:04 PM, Tom Stellard <tom at stellard.net> wrote: > Hi, > > I've been playing around with the SLPVectorizer trying to get it to > vectorize this simple program: > > define void @vector(i32 addrspace(1)* %out, i32 %index) { > entry: > %0 = alloca [4 x i32] > %x = getelementptr [4 x i32]* %0, i32 0, i32 0 > %y = getelementptr [4
2020 Jan 11
2
[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
Thanks so much for your feedback Simon. I am not sure that what I am proposing here is at odds with what you're referring to (here and in the PR you linked). The key difference AFAICT is that the pattern I am referring to is probably more aptly described as "reducing scalarization" than as "vectorization". The reason I say that is that the inputs are vectors and the output
2013 Oct 24
1
[LLVMdev] Vectorizing alloca instructions
On Oct 24, 2013, at 3:00 PM, Chandler Carruth <chandlerc at google.com> wrote: > Just a note, I don't think you should or need to vectorize the actual alloca stuff. If you can simply transform the dynamically indexed load: > > Then running SROA and InstCombine will mop up the rest. So its mostly about getting the SLPVectorizer to handle the dynamic GEP. As soon as it does
2013 Oct 24
4
[LLVMdev] Vectorizing alloca instructions
Hi, I've been playing around with the SLPVectorizer trying to get it to vectorize this simple program: define void @vector(i32 addrspace(1)* %out, i32 %index) { entry: %0 = alloca [4 x i32] %x = getelementptr [4 x i32]* %0, i32 0, i32 0 %y = getelementptr [4 x i32]* %0, i32 0, i32 1 %z = getelementptr [4 x i32]* %0, i32 0, i32 2 %w = getelementptr [4 x i32]* %0, i32 0, i32 3
2020 Jan 11
2
[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
Absolutely. We do it for scalars, so it would likely be a matter of just extending it. But that is one example. The issue of extracting elements, performing an operation on each element individually and then rebuilding the vector is likely more prevalent than that. At least I think that is the case, but I'll do some analysis to see if it is so or not. On Sat, Jan 11, 2020 at 6:15 PM Craig
2013 Oct 24
0
[LLVMdev] Vectorizing alloca instructions
Hi Tom, Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA. Thanks, Nadav On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote: > Hi, > >
2013 Feb 04
6
[LLVMdev] Vectorizer using Instruction, not opcodes
On 4 February 2013 18:25, Arnold Schwaighofer <aschwaighofer at apple.com>wrote: > For cases where this approach breaks really badly we could consider adding > a specialized api or parameters (like the type of a user/use). But we > should do so only as a last resort and backed by actual code that would > benefit from doing so. > Very sensible, more or less what I had in
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The following IR implements the following nested loop: for (int i = start ; i < end ; ++i ) for (int p = 0 ; p < 4 ; ++p ) a[i*4+p] = b[i*4+p] + c[i*4+p]; define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float* noalias %arg4, float* noalias %arg5, float* noalias %arg6) { entrypoint: br i1 %arg2, label %L0, label %L1 L0:
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The loop vectorizer relies on cleanup passes to be run after it: from Transforms/IPO/PassManagerBuilder.cpp: // Add the various vectorization passes and relevant cleanup passes for // them since we are no longer in the middle of the main scalar pipeline. MPM.add(createLoopVectorizePass(DisableUnrollLoops)); MPM.add(createInstructionCombiningPass());
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The instcombine pass cleans up a lot. Any idea why there are still shufflevector, insertelement, *and* bitcast (!!) etc. instructions left? The original loop is so clean, a textbook example I'd say. There is no need to shuffle anything.At least I don't see it. Frank vector.ph: ; preds = %L5 %broadcast.splatinsert1 = insertelement <4 x
2020 Jan 10
2
[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
I have added a few PPC-specific DAG combines in the past that follow this pattern on specific operations. Now that it appears that this would be useful to do on yet another operation, I'm wondering what people think about doing this in the target-independent DAG Combiner for any legal/custom operation on the target. TL; DR; The generic pattern would look like this: (build_vector (op
2012 Feb 28
1
[LLVMdev] How to vectorize a vector type cast?
Since Clang does not seem to allow type casts, such as uchar4 to float4, between vector types, it seems it is necessary to write them as element by element conversions, such as typedef float float4 __attribute__((ext_vector_type(4))); typedef unsigned char uchar4 __attribute__((ext_vector_type(4))); float4 to_float4(uchar4 in) { float4 out = {in.x, in.y, in.z, in.w}; return out; } Running
2007 Sep 27
3
[LLVMdev] Vector swizzling and write masks code generation
Hey, as some of you may know we're in process of experimenting with LLVM in Gallium3D (Mesa's new driver model), where LLVM would be used both in the software only (by just JIT executing shaders) and hardware (drivers will implement LLVM code-generators) cases. While the software only case is pretty straight forward I just realized I missed something in my initial evaluation. That
2012 Oct 24
3
[LLVMdev] RegisterCoalescing Pass seems to ignore part of CFG.
Hi, I don't know if my llvm ir code is faulty, or if I spot a bug in the RegisterCoalescing Pass, so I'm posting my issue on the ML. Shader and print-before-all dump are given below. The interessing part is the vreg6/vreg48 reduction : before RegCoalescing, the machine code is : // BEFORE LOOP ... Some COPYs.... 400B%vreg47<def> = COPY %vreg2<kill>; R600_Reg32:%vreg47,%vreg2
2014 Jul 22
2
[LLVMdev] InsertElementInst and ExtractElementInst
Hello, I am create a <3 x i32> vector in LLVM IR. Then I insert 3 instructions and later on I try to load one instruction from the vector. The insertion seems to work, however, when I try to load a specific instruction from a vector I seems that it does not work. This is the part of my IR: %"ins or1" = insertelement <3 x i32> undef, i32 %38, i32 0 %"ins and2"
2015 Nov 02
2
[StructurizeCFG] Trouble with branches out of a loop
Hi, I've been investigating the StructurizeCFG pass, and it looks like it has trouble handling CFG edges that break out of a loop and go directly to the function exit. Am I running up against a bug in the structurizer, or a general limitation of the algorithm used? As an aside, is there any documentation for the algorithm used? Is it based on a published paper? The input IR I have is the
2016 Aug 29
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Hello everyone, I think I have found an gvn / alias analysis related bug, but before opening an issue on the tracker I wanted to see if I am missing something. I have the following testcase: define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2, i32* %out) { > entry: > ; Just some temporary storage > %tmp.0 = alloca i32 > %tmp.1 = alloca i32 > %tmp.i =
2013 Apr 05
4
[LLVMdev] A strange testing case of SROA
Hi, Following is excerpted from dynamic-vector-gep.ll. The resulting "extractelement" seems to always return 0.0f regardless the value idx1 and idx2 is holding. Am I missing something here or there is something fishy take place? Thanks Shuxin 101 ; CHECK: test6 102 ; CHECK: insertelement <4 x float> zeroinitializer, float 1.000000e+00, i32 %idx1 103 ; CHECK:
2008 Aug 01
0
[LLVMdev] Generating movq2dq using IRBuilder
Hi Dan, Yes, they could be represented with insertelement and extractelement, but I don't think they actually generate optimal code using movq2dq and such. Else both bugs 2584 and 2585 would be fixed. Anyway, I'm actually already encouraged to get involved myself. I'm quite experienced with MMX and SSE but I'm still trying to learn more about how LLVM does instruction selection