thr3ads.net - similar to: "[LLVMdev] Will GetElementPtr work as expected"

Displaying 20 results from an estimated 20000 matches similar to: "[LLVMdev] Will GetElementPtr work as expected"

[LLVMdev] bb-vectorizer transforms only part of the block

2015 Jun 22

[LLVMdev] bb-vectorizer transforms only part of the block

The loads, stores and float arithmetic in attached function should be completely vectorizable. The bb-vectorizer does a good job at first, but from instruction %96 on it messes up by adding unnecessary vectorshuffles. (The function was designed so that no shuffle would be needed in order to vectorize it). I tested this with llvm 3.6 with the following command:

[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs

2019 Dec 09

[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs

Sanjay, I'm looking at some missed optimizations caused by D70246. Here's a test case: define <4 x float> @f(i32 %t32, <4 x float>* %t24) { .entry: %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 %t44 = bitcast <3 x i32> %t43 to <3 x float> %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 0, i32 undef,

LoopVectorizer -- generating bad and unhandled shufflevector sequence

2016 Oct 06

LoopVectorizer -- generating bad and unhandled shufflevector sequence

Hi, I have experimented with enabling the LoopVectorizer for SystemZ. I have come across a loop which, when vectorized, seems to have been poorly generated. In short, there seems to be a completely unnecessary sequence of shufflevector instructions, that doesn't get optimized away anywhere. In other words, there is a shuffling so that leads back to the original vector: [0 1 2 3

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

Yes, you need the latest ToT version of llvm or you run -loop-vectorize -earlycse -instcombine -simplifycfg The bitcast essentially is a noop to satisfy the type system. This is how your example looks like for me: vector.body: ; preds = %vector.body, %vector.ph %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %.lhs = shl i64 %6, 2

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The instcombine pass cleans up a lot. Any idea why there are still shufflevector, insertelement, *and* bitcast (!!) etc. instructions left? The original loop is so clean, a textbook example I'd say. There is no need to shuffle anything.At least I don't see it. Frank vector.ph: ; preds = %L5 %broadcast.splatinsert1 = insertelement <4 x

AVX512 instruction generated when JIT compiling for an avx2 architecture

2016 Jun 23

AVX512 instruction generated when JIT compiling for an avx2 architecture

With LLVM 3.8 the JIT compiler engine generates an AVX512 instruction although I target an 'avx2' CPU (intel Core I7). I just downloaded the most recent 3.8 and still it happens. It happens with this input module: target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" define void @module_cFFEMJ(i64 %lo, i64 %hi, i64 %myId, i1 %ordered, i64 %start, i32* noalias align 32

AVX512 instruction generated when JIT compiling for an avx2 architecture

2016 Jun 23

AVX512 instruction generated when JIT compiling for an avx2 architecture

On 06/23/2016 12:56 PM, Craig Topper wrote: > Can you check what value "getHostCPUName" returned? getHostCPUName() = skylake > > On Thu, Jun 23, 2016 at 9:53 AM, Frank Winter via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > With LLVM 3.8 the JIT compiler engine generates an AVX512 > instruction although I

[RFC] Make LoopVectorize Aware of SLP Operations

2018 Feb 08

[RFC] Make LoopVectorize Aware of SLP Operations

Hi Florian! This proposal sounds pretty exciting! Integrating SLP-aware loop vectorization (or the other way around) and SLP into the VPlan framework is definitely aligned with the long term vision and we would prefer this approach to the LoopReroll and InstCombine alternatives that you mentioned. We prefer a generic implementation that can handle complicated cases to something ad-hoc for some

[RFC] Make LoopVectorize Aware of SLP Operations

2018 Feb 06

[RFC] Make LoopVectorize Aware of SLP Operations

Hello, We would like to propose making LoopVectorize aware of SLP operations, to improve the generated code for loops operating on struct fields or doing complex math. At the moment, LoopVectorize uses interleaving to vectorize loops that operate on values loaded/stored from consecutive addresses: vector loads/stores are generated to combine consecutive loads/stores and then shufflevector

[arm, aarch64] Alignment checking in interleaved access pass

2016 Oct 10

[arm, aarch64] Alignment checking in interleaved access pass

On Mon, Oct 10, 2016 at 1:14 PM, Renato Golin <renato.golin at linaro.org> wrote: > On 10 October 2016 at 19:39, Alina Sbirlea <alina.sbirlea at gmail.com> > wrote: > > Now, for ARM archs Halide is currently generating explicit VSTn > intrinsics, > > with some of the patterns I described, and I found no reason why Halide > > shouldn't generate a single

[LLVMdev] alloca scalarization with dynamic indexing into vectors

2013 Feb 07

[LLVMdev] alloca scalarization with dynamic indexing into vectors

Hi all, I have a question regarding dynamic indexing into a vector with GEP. I see that in the ScalarReplAggregates pass in the LLVM 3.2 release the call SROA::isSafeGEP() will now allow alloca scalarization in the case where a GEP index into a vector isn’t a constant. My question is: what is the expected behavior when the index is out of bounds of the vector? Is it undefined? I have an

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The loop vectorizer relies on cleanup passes to be run after it: from Transforms/IPO/PassManagerBuilder.cpp: // Add the various vectorization passes and relevant cleanup passes for // them since we are no longer in the middle of the main scalar pipeline. MPM.add(createLoopVectorizePass(DisableUnrollLoops)); MPM.add(createInstructionCombiningPass());

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2012 Jan 26

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

On Thu, Jan 26, 2012 at 3:19 PM, Hal Finkel <hfinkel at anl.gov> wrote: > On Thu, 2012-01-26 at 15:12 -0600, Sebastian Pop wrote: >> On Thu, Jan 26, 2012 at 2:49 PM, Hal Finkel <hfinkel at anl.gov> wrote: >> > Thanks! Did you compile with any non-default flags other than -mllvm >> > -vectorize? >> >> I used -O3 and -vectorize, no other non-default

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The following IR implements the following nested loop: for (int i = start ; i < end ; ++i ) for (int p = 0 ; p < 4 ; ++p ) a[i*4+p] = b[i*4+p] + c[i*4+p]; define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float* noalias %arg4, float* noalias %arg5, float* noalias %arg6) { entrypoint: br i1 %arg2, label %L0, label %L1 L0:

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 18

[LLVMdev] [Vectorization] Mis match in code generated

Hi, I am trying to understand LLVM vectorization implementation and was looking into both loop and SLP vectorization. test case 1: *int foo(int *a) {int sum = 0,i;for(i=0; i<16; i++) sum += a[i];return sum;}* This code is vectorized by loop vectorizer where we calculate scalar loop cost as 4 and vector loop cost as 2. Since vector loop cost is less and above reduction is legal to

[LLVMdev] ConstantExpr refactoring

2012 Jul 01

[LLVMdev] ConstantExpr refactoring

Renato Golin wrote: > Hi all, > > It's been a long time, and I'm probably going to kill myself, but I > want to try it anyway. I don't think that turning off folding of constants is the right place to start. To implement this, start by adding new constants that are going to replace the existing ones. A good rule of thumb is "whatever the relocations in a given

[LLVMdev] [BBVectorizer] Obvious vectorization benefit, but req-chain is too short

2012 Feb 03

[LLVMdev] [BBVectorizer] Obvious vectorization benefit, but req-chain is too short

Hi Hal, this is one of the first test cases, I would love to have improved vectorizer support. I sent it out earlier, but I think it is a good time to look into it again, after the vectorizer was committed. The basic examples is a set of scalar loads that load for consecutive elements and store them back right ahead. For me this is an obvious case where vectorization is beneficial

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 18

[LLVMdev] [Vectorization] Mis match in code generated

Hi Nadav, Thanks for the quick reply !! Ok, so as of now we are lacking capability to handle flat large reductions. I did go through function vectorizeChainsInBlock() (line number 2862). In this function, we try to vectorize if we have phi nodes in the IR (several if's check for phi nodes) i.e we try to construct tree that starts at chains. Any pointers on how to join multiple trees? I

Data structure improvement for the SLP vectorizer

2017 Mar 15

Data structure improvement for the SLP vectorizer

Maybe it would illustrative to give an IR example of the case I'm interested in. Consider define void @"julia_transform_bvn_derivs_hessian!"(double* %data, double* %data2, double *%data3, double *%out) { %element11 = getelementptr inbounds double, double* %data, i32 1 %load10 = load double, double* %data %load11 = load double, double* %element11 %element21 =

Opportunity to split store of shuffled vector.

2019 Sep 26

Opportunity to split store of shuffled vector.

Hi there, I notice that LLVM seems to always generate vector instructions for vector operations in C, even it's just simple stores: void foo(vector int* c) { (*c)[0] = 1; (*c)[1] = 2; } %0 = load <4 x i32>, <4 x i32>* %c, align 16 %vecins1 = shufflevector <4 x i32> <i32 1, i32 2, i32 undef, i32 undef>, <4 x i32> %0, <4 x i32> <i32 0, i32 1, i32 6,

similar to: [LLVMdev] Will GetElementPtr work as expected