search for: float

Displaying 20 results from an estimated 9219 matches for "float".

2015 Jun 22
2
[LLVMdev] bb-vectorizer transforms only part of the block
The loads, stores and float arithmetic in attached function should be completely vectorizable. The bb-vectorizer does a good job at first, but from instruction %96 on it messes up by adding unnecessary vectorshuffles. (The function was designed so that no shuffle would be needed in order to vectorize it). I tested this w...
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
I seem to have problem to get the SLP vectorizer to make use of the full 8 floats available in a SIMD vector on a Sandy Bridge CPU with AVX. The function is attached, the CPU flags are: flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs...
2014 Aug 07
3
[LLVMdev] MCJIT generates MOVAPS on unaligned address
MCJIT when lowering to x86-64 generates a MOVAPS (Move Aligned Packed Single-Precision Floating-Point Values) on a non-aligned memory address: movaps 88(%rdx), %xmm0 where %rdx comes in as a function argument with only natural alignment (float*). This x86 instruction requires the memory address to be 16 byte aligned which 88 plus something aligned to 4 byte isn't. Here the...
2019 Dec 09
2
[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs
Sanjay, I'm looking at some missed optimizations caused by D70246. Here's a test case: define <4 x float> @f(i32 %t32, <4 x float>* %t24) { .entry: %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 %t44 = bitcast <3 x i32> %t43 to <3 x float> %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 u...
2014 Aug 07
3
[LLVMdev] How to broaden the SLP vectorizer's search
On 7 August 2014 17:33, Chad Rosier <mcrosier at codeaurora.org> wrote: > You might consider filing a bug (llvm.org/bugs) requesting a flag, but I > don't know if the code owners want to expose such a flag. I'm not sure that's a good idea as a raw access to that limit, as there are no guarantees that it'll stay the same. But maybe a flag turning some
2013 Nov 11
2
[LLVMdev] What's the Alias Analysis does clang use ?
...M community: I found basicaa seems not to tell must-not-alias for __restrict__ arguments in c/c++. It only compares two pointers and the underlying objects they point to. I wonder how clang does alias analysis for c/c++ keyword restrict. let assume we compile the following code: $cat myalias.cc float foo(float * __restrict__ v0, float * __restrict__ v1, float * __restrict__ v2, float * __restrict__ t) { float res; for (int i=0; i<10; ++i) { float x = v0[1]; float y = v1[1]; float z = v2[1]; res = x * 0.67 + y *...
2015 Nov 02
2
[StructurizeCFG] Trouble with branches out of a loop
...a loop and go directly to the function exit. Am I running up against a bug in the structurizer, or a general limitation of the algorithm used? As an aside, is there any documentation for the algorithm used? Is it based on a published paper? The input IR I have is the following: define <4 x float> @structurizer_test(<4 x float> %inp.coerce) { %1 = extractelement <4 x float> %inp.coerce, i32 0 %2 = fcmp ogt float %1, 0.000000e+00 br i1 %2, label %.lr.ph.i, label %._crit_edge.i .lr.ph.i: ; preds = %7, %0 %i.03.i = phi float [ %8,...
2015 Jul 07
2
[LLVMdev] Modifications to SLP
Hi all! It takes the current SLP vectorizer too long to vectorize my scalar code. I am talking here about functions that have a single, huge basic block with O(10^6) instructions. Here's an example: %0 = getelementptr float* %arg1, i32 49 %1 = load float* %0 %2 = getelementptr float* %arg1, i32 4145 %3 = load float* %2 %4 = getelementptr float* %arg2, i32 49 %5 = load float* %4 %6 = getelementptr float* %arg2, i32 4145 %7 = load float* %6 %8 = fmul float %7, %1 %9 = fmul float %5, %3 %10...
2015 Jun 03
2
[LLVMdev] Replacing a repetitive sequence of code with a loop
...n consecutive junks is constant over the whole program, thus the whole program could be replaced with a single loop with it's loop body containing a generic version of said code junk.Here's an example (a short one, the real world program would be much longer): define void @vec_plus_vec(float* noalias %arg0, float* noalias %arg1, float* noalias %arg2) { entrypoint: %0 = bitcast float* %arg1 to <4 x float>* %1 = bitcast float* %arg2 to <4 x float>* %2 = load <4 x float>* %0, align 16 %3 = load <4 x float>* %1, align 16 %4 = fadd <4 x float> %...
2013 Nov 12
0
[LLVMdev] What's the Alias Analysis does clang use ?
Hi, Your problem is that the function arguments, which are makes as noalias, are not being directly used as the base objects of the array accesses: > %v0.addr = alloca float*, align 8 > %v1.addr = alloca float*, align 8 > %v2.addr = alloca float*, align 8 > %t.addr = alloca float*, align 8 ... > store float* %v0, float** %v0.addr, align 8 > store float* %v1, float** %v1.addr, align 8 > store float* %v2, float** %v2.addr, align 8 > store float* %t,...
2019 Nov 10
2
Reassociation is blocking a vectorization
Hi Devs, I am looking at the bug https://bugs.llvm.org/show_bug.cgi?id=43953 and found that following piece of ir %arrayidx = getelementptr inbounds float, float* %Vec0, i64 %idxprom %0 = load float, float* %arrayidx, align 4, !tbaa !2 %arrayidx2 = getelementptr inbounds float, float* %Vec1, i64 %idxprom %1 = load float, float* %arrayidx2, align 4, !tbaa !2 %sub = fsub fast float %0, %1 %add = fadd fast float %sum.0, %sub is transformed in...
2010 Sep 29
0
[LLVMdev] spilling & xmm register usage
...> > > Best regards, > Ralf > ; ModuleID = 'BAD.bc' > target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128" > > %0 = type { i8*, i8*, i8*, i8*, i32 } > %1 = type { float addrspace(1)*, i32, float addrspace(1)*, float addrspace(1)* } > > @sgv = internal constant [1 x i8] zeroinitializer > @fgv = internal constant [1 x i8] zeroinitializer > @lvgv = internal constant [0 x i8*] zeroinitializer > > declare float @llvm.sqrt.f32(float) nounwind readonl...
2013 Jul 05
0
[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR
On 07/04/2013 01:39 PM, Stéphane Letz wrote: > Hi, > > Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some
2010 May 29
3
[LLVMdev] Vectorized LLVM IR
...Letz wrote: We are actually testing LLVM for the Faust language (http://faust.grame.fr/) Currently Faust generates à C++ class from its .dsp Faust source file. So for the simple following Faust example : process = (+,+):*; Which can be displayed as the following processor (takes 4 streams of float samples, do a "+" and then a "*" operation on the streams to produce a single output) -------------- next part -------------- A non-text attachment was scrubbed... Name: plus.png Type: image/png Size: 10191 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llv...
2010 Jan 29
2
[LLVMdev] 64bit MRV problem: { float, float, float} -> { double, float }
Hey Duncan, hey everybody else, I just stumbled upon a problem in the latest llvm-gcc trunk which is related to my previous problem with the 64bit ABI and structs: Given the following code: struct float3 { float x, y, z; }; extern "C" void __attribute__((noinline)) test(float3 a, float3* res) { res->y = a.y; } int main(void) { float3 a; float3 res; test(a, &res); } llvm-gcc -c -emit-llvm -O3 produces this: %struct.float3 = type { float, float, float } define voi...
2013 Jul 04
3
[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR
Hi, Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some informations that are needed by the vectorization passes to
2009 Jun 30
2
[LLVMdev] JIT on Windows x64
Hi, I'm new to LLVM and have some questions about using the JIT on Windows x64. I am aware that this is currently broken but am attempting to use the hack/patch proposed in this bug http://llvm.org/bugs/show_bug.cgi?id=3739. I checked out the revision the patch was created for (66183) and applied it but the assembler generated seems to fail whenever it reaches a movaps insctruction.
2013 Feb 19
2
[LLVMdev] Is it a bug or am I missing something ?
...ll, on following code: ; ModuleID = 'shufxbug.ll' target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32" target triple = "i386-pc-linux-gnu" define void @sample_test(<4 x float>* nocapture %source, <8 x float>* nocapture %dest) nounwind noinline { L.entry: %0 = getelementptr <4 x float>* %source, i32 19 %1 = load <4 x float>* %0, align 16 %2 = extractelement <4 x float> %1, i32 0 %3 = insertelement <8 x float> <float 0.000000e+0...
2013 Oct 28
2
[LLVMdev] loop vectorizer says Bad stride
Verifying function running passes ... LV: Checking a loop in "bar" LV: Found a loop: L0 LV: Found an induction variable. LV: We need to do 0 pointer comparisons. LV: Checking memory dependencies LV: Bad stride - Not an AddRecExpr pointer %13 = getelementptr float* %arg2, i32 %1 SCEV: ((4 * (sext i32 {(256 + %arg0),+,1}<nw><%L0> to i64)) + %arg2) LV: Src Scev: {((4 * (sext i32 %arg0 to i64)) + %arg2),+,4}<%L0>Sink Scev: ((4 * (sext i32 {(256 + %arg0),+,1}<nw><%L0> to i64)) + %arg2)(Induction step: 1) LV: Distance for store...
2010 Sep 29
3
[LLVMdev] spilling & xmm register usage
Hello everybody, I have stumbled upon a test case (the attached module is a slightly reduced version) that shows extremely reduced performance on linux compared to windows when executed using LLVM's JIT. We narrowed the problem down to the actual code being generated, the source IR on both systems is the same. Try compiling the attached module: llc -O3 -filetype=asm -o BAD.s BAD.ll Under