similar to: [LLVMdev] MCJIT generates MOVAPS on unaligned address

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] MCJIT generates MOVAPS on unaligned address"

2014 Aug 07
3
[LLVMdev] MCJIT generates MOVAPS on unaligned address
It's not reproducible with 'opt'. I call the SLP pass from my application and only then the wrong IR gets generated. On the attached module I call via the function pass manager: 1) TargetLibraryInfo with the target triple 2) Set the data layout 3) Basic Alias Analysis 4) SLP vectorizer This produces the wrong IR. On the other hand running the attached module through 'opt
2014 Aug 07
2
[LLVMdev] MCJIT generates MOVAPS on unaligned address
> On Aug 7, 2014, at 2:57 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote: > > Your .ll file does not have a data layout. Opt will not initialize the DataLayoutPass. The SLP vectorizer will not vectorize because there is no DataLayoutPass. > > debug-cmake/bin/opt -default-data-layout="e-m:e-i64:64-f80:128-n8:16:32:64-S128" -basicaa -slp-vectorizer -S
2014 Aug 07
3
[LLVMdev] How to broaden the SLP vectorizer's search
On 7 August 2014 17:33, Chad Rosier <mcrosier at codeaurora.org> wrote: > You might consider filing a bug (llvm.org/bugs) requesting a flag, but I > don't know if the code owners want to expose such a flag. I'm not sure that's a good idea as a raw access to that limit, as there are no guarantees that it'll stay the same. But maybe a flag turning some
2015 Jul 07
2
[LLVMdev] Modifications to SLP
Hi all! It takes the current SLP vectorizer too long to vectorize my scalar code. I am talking here about functions that have a single, huge basic block with O(10^6) instructions. Here's an example: %0 = getelementptr float* %arg1, i32 49 %1 = load float* %0 %2 = getelementptr float* %arg1, i32 4145 %3 = load float* %2 %4 = getelementptr float* %arg2, i32 49 %5 = load
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
I seem to have problem to get the SLP vectorizer to make use of the full 8 floats available in a SIMD vector on a Sandy Bridge CPU with AVX. The function is attached, the CPU flags are: flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
Frank, It sounds like the SLP vectorizer thinks that it is more profitable to use 128bit wide operations (because 256bit operations are double pumped on Sandybridge). Did you see a different result on Haswell? Thanks, Nadav > On Jul 1, 2015, at 11:06 AM, Frank Winter <fwinter at jlab.org> wrote: > > I realized that the function parameters had no alignment attributes on them.
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
Hi Frank, What does --debug-only=vectorize says? You may try to get the datalayout and the triple on the IR header, just to make sure you got everything right. LLVM will honour those, and front-ends should create them correctly. --renato On 1 July 2015 at 19:06, Frank Winter <fwinter at jlab.org> wrote: > I realized that the function parameters had no alignment attributes on them.
2015 Jun 22
2
[LLVMdev] bb-vectorizer transforms only part of the block
The loads, stores and float arithmetic in attached function should be completely vectorizable. The bb-vectorizer does a good job at first, but from instruction %96 on it messes up by adding unnecessary vectorshuffles. (The function was designed so that no shuffle would be needed in order to vectorize it). I tested this with llvm 3.6 with the following command:
2013 Nov 01
2
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
I am trying a setup where the one loop is rewritten as two loops. This avoids the 'rem' and 'div' instructions in the index calculation (which give the loop vectorizer a hard time). However, with this setup the loop vectorizer complains about a too small loop. LV: Checking a loop in "main" LV: Found a loop: L3 LV: Found a loop with a very small trip count. This loop
2015 Jun 03
2
[LLVMdev] Replacing a repetitive sequence of code with a loop
Hey guys, in an HPC project I am working on I am given an LLVM program consisting of a linear sequence of repetitive junks of code with an uniform memory access pattern. Each code junk does the following: 1) loads some memory, 2) performs some arithmetic operations, 3) stores the result back to memory. The memory stride between consecutive junks is constant over the whole program, thus the
2013 Oct 28
2
[LLVMdev] loop vectorizer says Bad stride
Verifying function running passes ... LV: Checking a loop in "bar" LV: Found a loop: L0 LV: Found an induction variable. LV: We need to do 0 pointer comparisons. LV: Checking memory dependencies LV: Bad stride - Not an AddRecExpr pointer %13 = getelementptr float* %arg2, i32 %1 SCEV: ((4 * (sext i32 {(256 + %arg0),+,1}<nw><%L0> to i64)) + %arg2) LV: Src Scev: {((4 * (sext
2013 Nov 01
0
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
In the case when coming from C it was probably the loop unroller and SLP vectorizer which vectorized the code. Potentially I could do the same in the IR. However, the loop body that is generated in the IR can get very large. Thus, the loop unroller will refuse to unroll the loop in a large number of (important) cases. Isn't there a way to convince the loop vectorizer that it should
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The following IR implements the following nested loop: for (int i = start ; i < end ; ++i ) for (int p = 0 ; p < 4 ; ++p ) a[i*4+p] = b[i*4+p] + c[i*4+p]; define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float* noalias %arg4, float* noalias %arg5, float* noalias %arg6) { entrypoint: br i1 %arg2, label %L0, label %L1 L0:
2013 Oct 28
0
[LLVMdev] loop vectorizer says Bad stride
Frank, It looks like the loop vectorizer is unable to tell that the two stores in your code never overlap. This is probably because of the sign-extend in your code. Can you extend the indices to 64bit ? Thanks, Nadav On Oct 28, 2013, at 1:38 PM, Frank Winter <fwinter at jlab.org> wrote: > Verifying function > running passes ... > LV: Checking a loop in "bar" > LV:
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The loop vectorizer relies on cleanup passes to be run after it: from Transforms/IPO/PassManagerBuilder.cpp: // Add the various vectorization passes and relevant cleanup passes for // them since we are no longer in the middle of the main scalar pipeline. MPM.add(createLoopVectorizePass(DisableUnrollLoops)); MPM.add(createInstructionCombiningPass());
2013 Oct 31
3
[LLVMdev] loop vectorizer misses opportunity, exploit
----- Original Message ----- > > Hi Nadav, > > that's the whole point of it. I can't in general make the index > calculation simpler. The example given is the simplest non-trivial > index function that is needed. It might well be that it's that > simple that the index calculation in this case can be thrown aways > altogether and - as you say - be replaced by
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The instcombine pass cleans up a lot. Any idea why there are still shufflevector, insertelement, *and* bitcast (!!) etc. instructions left? The original loop is so clean, a textbook example I'd say. There is no need to shuffle anything.At least I don't see it. Frank vector.ph: ; preds = %L5 %broadcast.splatinsert1 = insertelement <4 x
2015 May 09
4
[LLVMdev] [LSR] hoisting loop invariants in reverse order
Hi, I was tracking down a performance regression and noticed that LoopStrengthReduce hoists loop invariants (e.g., the initial formulae of indvars) in the reverse order of how they appear in the loop. This reverse order creates troubles for the StraightLineStrengthReduce pass I recently add. While I understand ultimately SLSR should be able to sort independent candidates in an optimal order,
2015 Jun 18
3
[LLVMdev] problem with replacing an instruction
I am trying to change this define void @main(float* noalias %arg0, float* noalias %arg1, float* noalias %arg2) { entrypoint: %0 = bitcast float* %arg1 to <4 x float>* intothis define void @main(float* noalias %arg0, float* noalias %arg1, float* noalias %arg2) { entrypoint: %0 = getelementptr float* %arg1, i64 0 %1 = bitcast float* %0 to <4 x float>* I must be close but
2015 Jul 27
3
[LLVMdev] i1* function argument on x86-64
I am running into a problem with 'i1*' as a function's argument which seems to have appeared since I switched to LLVM 3.6 (but can have other source, of course). If I look at the assembler that the MCJIT generates for an x86-64 target I see that the array 'i1*' is taken as a sequence of 1 bit wide elements. (I guess that's correct). However, I used to call the function