thr3ads.net - search: "float"

Displaying 20 results from an estimated 9257 matches for "float".

[LLVMdev] bb-vectorizer transforms only part of the block

2015 Jun 22

[LLVMdev] bb-vectorizer transforms only part of the block

The loads, stores and float arithmetic in attached function should be completely vectorizable. The bb-vectorizer does a good job at first, but from instruction %96 on it messes up by adding unnecessary vectorshuffles. (The function was designed so that no shuffle would be needed in order to vectorize it). I tested this w...

[LLVMdev] SLP vectorizer on AVX feature

2015 Jul 01

[LLVMdev] SLP vectorizer on AVX feature

I seem to have problem to get the SLP vectorizer to make use of the full 8 floats available in a SIMD vector on a Sandy Bridge CPU with AVX. The function is attached, the CPU flags are: flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs...

[LLVMdev] MCJIT generates MOVAPS on unaligned address

2014 Aug 07

[LLVMdev] MCJIT generates MOVAPS on unaligned address

MCJIT when lowering to x86-64 generates a MOVAPS (Move Aligned Packed Single-Precision Floating-Point Values) on a non-aligned memory address: movaps 88(%rdx), %xmm0 where %rdx comes in as a function argument with only natural alignment (float*). This x86 instruction requires the memory address to be 16 byte aligned which 88 plus something aligned to 4 byte isn't. Here the...

[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs

2019 Dec 09

[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs

Sanjay, I'm looking at some missed optimizations caused by D70246. Here's a test case: define <4 x float> @f(i32 %t32, <4 x float>* %t24) { .entry: %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 %t44 = bitcast <3 x i32> %t43 to <3 x float> %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 u...

[LLVMdev] How to broaden the SLP vectorizer's search

2014 Aug 07

[LLVMdev] How to broaden the SLP vectorizer's search

On 7 August 2014 17:33, Chad Rosier <mcrosier at codeaurora.org> wrote: > You might consider filing a bug (llvm.org/bugs) requesting a flag, but I > don't know if the code owners want to expose such a flag. I'm not sure that's a good idea as a raw access to that limit, as there are no guarantees that it'll stay the same. But maybe a flag turning some

[LLVMdev] What's the Alias Analysis does clang use ?

2013 Nov 11

[LLVMdev] What's the Alias Analysis does clang use ?

...M community: I found basicaa seems not to tell must-not-alias for __restrict__ arguments in c/c++. It only compares two pointers and the underlying objects they point to. I wonder how clang does alias analysis for c/c++ keyword restrict. let assume we compile the following code: $cat myalias.cc float foo(float * __restrict__ v0, float * __restrict__ v1, float * __restrict__ v2, float * __restrict__ t) { float res; for (int i=0; i<10; ++i) { float x = v0[1]; float y = v1[1]; float z = v2[1]; res = x * 0.67 + y *...

[StructurizeCFG] Trouble with branches out of a loop

2015 Nov 02

[StructurizeCFG] Trouble with branches out of a loop

...a loop and go directly to the function exit. Am I running up against a bug in the structurizer, or a general limitation of the algorithm used? As an aside, is there any documentation for the algorithm used? Is it based on a published paper? The input IR I have is the following: define <4 x float> @structurizer_test(<4 x float> %inp.coerce) { %1 = extractelement <4 x float> %inp.coerce, i32 0 %2 = fcmp ogt float %1, 0.000000e+00 br i1 %2, label %.lr.ph.i, label %._crit_edge.i .lr.ph.i: ; preds = %7, %0 %i.03.i = phi float [ %8,...

[LLVMdev] Modifications to SLP

2015 Jul 07

[LLVMdev] Modifications to SLP

Hi all! It takes the current SLP vectorizer too long to vectorize my scalar code. I am talking here about functions that have a single, huge basic block with O(10^6) instructions. Here's an example: %0 = getelementptr float* %arg1, i32 49 %1 = load float* %0 %2 = getelementptr float* %arg1, i32 4145 %3 = load float* %2 %4 = getelementptr float* %arg2, i32 49 %5 = load float* %4 %6 = getelementptr float* %arg2, i32 4145 %7 = load float* %6 %8 = fmul float %7, %1 %9 = fmul float %5, %3 %10...

[LLVMdev] Replacing a repetitive sequence of code with a loop

2015 Jun 03

[LLVMdev] Replacing a repetitive sequence of code with a loop

...n consecutive junks is constant over the whole program, thus the whole program could be replaced with a single loop with it's loop body containing a generic version of said code junk.Here's an example (a short one, the real world program would be much longer): define void @vec_plus_vec(float* noalias %arg0, float* noalias %arg1, float* noalias %arg2) { entrypoint: %0 = bitcast float* %arg1 to <4 x float>* %1 = bitcast float* %arg2 to <4 x float>* %2 = load <4 x float>* %0, align 16 %3 = load <4 x float>* %1, align 16 %4 = fadd <4 x float> %...

[LLVMdev] What's the Alias Analysis does clang use ?

2013 Nov 12

[LLVMdev] What's the Alias Analysis does clang use ?

Hi, Your problem is that the function arguments, which are makes as noalias, are not being directly used as the base objects of the array accesses: > %v0.addr = alloca float*, align 8 > %v1.addr = alloca float*, align 8 > %v2.addr = alloca float*, align 8 > %t.addr = alloca float*, align 8 ... > store float* %v0, float** %v0.addr, align 8 > store float* %v1, float** %v1.addr, align 8 > store float* %v2, float** %v2.addr, align 8 > store float* %t,...

Reassociation is blocking a vectorization

2019 Nov 10

Reassociation is blocking a vectorization

Hi Devs, I am looking at the bug https://bugs.llvm.org/show_bug.cgi?id=43953 and found that following piece of ir %arrayidx = getelementptr inbounds float, float* %Vec0, i64 %idxprom %0 = load float, float* %arrayidx, align 4, !tbaa !2 %arrayidx2 = getelementptr inbounds float, float* %Vec1, i64 %idxprom %1 = load float, float* %arrayidx2, align 4, !tbaa !2 %sub = fsub fast float %0, %1 %add = fadd fast float %sum.0, %sub is transformed in...

[LLVMdev] spilling & xmm register usage

2010 Sep 29

[LLVMdev] spilling & xmm register usage

...> > > Best regards, > Ralf > ; ModuleID = 'BAD.bc' > target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128" > > %0 = type { i8*, i8*, i8*, i8*, i32 } > %1 = type { float addrspace(1)*, i32, float addrspace(1)*, float addrspace(1)* } > > @sgv = internal constant [1 x i8] zeroinitializer > @fgv = internal constant [1 x i8] zeroinitializer > @lvgv = internal constant [0 x i8*] zeroinitializer > > declare float @llvm.sqrt.f32(float) nounwind readonl...

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

2013 Jul 05

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

On 07/04/2013 01:39 PM, Stéphane Letz wrote: > Hi, > > Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some

[LLVMdev] Vectorized LLVM IR

2010 May 29

[LLVMdev] Vectorized LLVM IR

...Letz wrote: We are actually testing LLVM for the Faust language (http://faust.grame.fr/) Currently Faust generates à C++ class from its .dsp Faust source file. So for the simple following Faust example : process = (+,+):*; Which can be displayed as the following processor (takes 4 streams of float samples, do a "+" and then a "*" operation on the streams to produce a single output) -------------- next part -------------- A non-text attachment was scrubbed... Name: plus.png Type: image/png Size: 10191 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llv...

[LLVMdev] 64bit MRV problem: { float, float, float} -> { double, float }

2010 Jan 29

[LLVMdev] 64bit MRV problem: { float, float, float} -> { double, float }

Hey Duncan, hey everybody else, I just stumbled upon a problem in the latest llvm-gcc trunk which is related to my previous problem with the 64bit ABI and structs: Given the following code: struct float3 { float x, y, z; }; extern "C" void __attribute__((noinline)) test(float3 a, float3* res) { res->y = a.y; } int main(void) { float3 a; float3 res; test(a, &res); } llvm-gcc -c -emit-llvm -O3 produces this: %struct.float3 = type { float, float, float } define voi...

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

2013 Jul 04

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

Hi, Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some informations that are needed by the vectorization passes to

[LLVMdev] JIT on Windows x64

2009 Jun 30

[LLVMdev] JIT on Windows x64

Hi, I'm new to LLVM and have some questions about using the JIT on Windows x64. I am aware that this is currently broken but am attempting to use the hack/patch proposed in this bug http://llvm.org/bugs/show_bug.cgi?id=3739. I checked out the revision the patch was created for (66183) and applied it but the assembler generated seems to fail whenever it reaches a movaps insctruction.

[LLVMdev] Is it a bug or am I missing something ?

2013 Feb 19

[LLVMdev] Is it a bug or am I missing something ?

...ll, on following code: ; ModuleID = 'shufxbug.ll' target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32" target triple = "i386-pc-linux-gnu" define void @sample_test(<4 x float>* nocapture %source, <8 x float>* nocapture %dest) nounwind noinline { L.entry: %0 = getelementptr <4 x float>* %source, i32 19 %1 = load <4 x float>* %0, align 16 %2 = extractelement <4 x float> %1, i32 0 %3 = insertelement <8 x float> <float 0.000000e+0...

[LLVMdev] loop vectorizer says Bad stride

2013 Oct 28

[LLVMdev] loop vectorizer says Bad stride

Verifying function running passes ... LV: Checking a loop in "bar" LV: Found a loop: L0 LV: Found an induction variable. LV: We need to do 0 pointer comparisons. LV: Checking memory dependencies LV: Bad stride - Not an AddRecExpr pointer %13 = getelementptr float* %arg2, i32 %1 SCEV: ((4 * (sext i32 {(256 + %arg0),+,1}<nw><%L0> to i64)) + %arg2) LV: Src Scev: {((4 * (sext i32 %arg0 to i64)) + %arg2),+,4}<%L0>Sink Scev: ((4 * (sext i32 {(256 + %arg0),+,1}<nw><%L0> to i64)) + %arg2)(Induction step: 1) LV: Distance for store...

[LLVMdev] spilling & xmm register usage

2010 Sep 29

[LLVMdev] spilling & xmm register usage

Hello everybody, I have stumbled upon a test case (the attached module is a slightly reduced version) that shows extremely reduced performance on linux compared to windows when executed using LLVM's JIT. We narrowed the problem down to the actual code being generated, the source IR on both systems is the same. Try compiling the attached module: llc -O3 -filetype=asm -o BAD.s BAD.ll Under

search for: float