thr3ads.net - search: "floating"

Displaying 20 results from an estimated 9257 matches for "floating".

[LLVMdev] bb-vectorizer transforms only part of the block

2015 Jun 22

[LLVMdev] bb-vectorizer transforms only part of the block

The loads, stores and float arithmetic in attached function should be completely vectorizable. The bb-vectorizer does a good job at first, but from instruction %96 on it messes up by adding unnecessary vectorshuffles. (The function was designed so that no shuffle would be needed in order to vectorize it). I tested this with llvm 3.6 with the following command:

[LLVMdev] SLP vectorizer on AVX feature

2015 Jul 01

[LLVMdev] SLP vectorizer on AVX feature

I seem to have problem to get the SLP vectorizer to make use of the full 8 floats available in a SIMD vector on a Sandy Bridge CPU with AVX. The function is attached, the CPU flags are: flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good

[LLVMdev] MCJIT generates MOVAPS on unaligned address

2014 Aug 07

[LLVMdev] MCJIT generates MOVAPS on unaligned address

MCJIT when lowering to x86-64 generates a MOVAPS (Move Aligned Packed Single-Precision Floating-Point Values) on a non-aligned memory address: movaps 88(%rdx), %xmm0 where %rdx comes in as a function argument with only natural alignment (float*). This x86 instruction requires the memory address to be 16 byte aligned which 88 plus something aligned to 4 byte isn't. Here the ac...

[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs

2019 Dec 09

[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs

Sanjay, I'm looking at some missed optimizations caused by D70246. Here's a test case: define <4 x float> @f(i32 %t32, <4 x float>* %t24) { .entry: %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 %t44 = bitcast <3 x i32> %t43 to <3 x float> %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 0, i32 undef,

[LLVMdev] How to broaden the SLP vectorizer's search

2014 Aug 07

[LLVMdev] How to broaden the SLP vectorizer's search

On 7 August 2014 17:33, Chad Rosier <mcrosier at codeaurora.org> wrote: > You might consider filing a bug (llvm.org/bugs) requesting a flag, but I > don't know if the code owners want to expose such a flag. I'm not sure that's a good idea as a raw access to that limit, as there are no guarantees that it'll stay the same. But maybe a flag turning some

[LLVMdev] What's the Alias Analysis does clang use ?

2013 Nov 11

[LLVMdev] What's the Alias Analysis does clang use ?

Hi, LLVM community: I found basicaa seems not to tell must-not-alias for __restrict__ arguments in c/c++. It only compares two pointers and the underlying objects they point to. I wonder how clang does alias analysis for c/c++ keyword restrict. let assume we compile the following code: $cat myalias.cc float foo(float * __restrict__ v0, float * __restrict__ v1, float * __restrict__ v2, float *

[StructurizeCFG] Trouble with branches out of a loop

2015 Nov 02

[StructurizeCFG] Trouble with branches out of a loop

Hi, I've been investigating the StructurizeCFG pass, and it looks like it has trouble handling CFG edges that break out of a loop and go directly to the function exit. Am I running up against a bug in the structurizer, or a general limitation of the algorithm used? As an aside, is there any documentation for the algorithm used? Is it based on a published paper? The input IR I have is the

[LLVMdev] Modifications to SLP

2015 Jul 07

[LLVMdev] Modifications to SLP

Hi all! It takes the current SLP vectorizer too long to vectorize my scalar code. I am talking here about functions that have a single, huge basic block with O(10^6) instructions. Here's an example: %0 = getelementptr float* %arg1, i32 49 %1 = load float* %0 %2 = getelementptr float* %arg1, i32 4145 %3 = load float* %2 %4 = getelementptr float* %arg2, i32 49 %5 = load

[LLVMdev] Replacing a repetitive sequence of code with a loop

2015 Jun 03

[LLVMdev] Replacing a repetitive sequence of code with a loop

Hey guys, in an HPC project I am working on I am given an LLVM program consisting of a linear sequence of repetitive junks of code with an uniform memory access pattern. Each code junk does the following: 1) loads some memory, 2) performs some arithmetic operations, 3) stores the result back to memory. The memory stride between consecutive junks is constant over the whole program, thus the

[LLVMdev] What's the Alias Analysis does clang use ?

2013 Nov 12

[LLVMdev] What's the Alias Analysis does clang use ?

Hi, Your problem is that the function arguments, which are makes as noalias, are not being directly used as the base objects of the array accesses: > %v0.addr = alloca float*, align 8 > %v1.addr = alloca float*, align 8 > %v2.addr = alloca float*, align 8 > %t.addr = alloca float*, align 8 ... > store float* %v0, float** %v0.addr, align 8 > store float* %v1, float** %v1.addr,

Reassociation is blocking a vectorization

2019 Nov 10

Reassociation is blocking a vectorization

Hi Devs, I am looking at the bug https://bugs.llvm.org/show_bug.cgi?id=43953 and found that following piece of ir %arrayidx = getelementptr inbounds float, float* %Vec0, i64 %idxprom %0 = load float, float* %arrayidx, align 4, !tbaa !2 %arrayidx2 = getelementptr inbounds float, float* %Vec1, i64 %idxprom %1 = load float, float* %arrayidx2, align 4, !tbaa !2 %sub = fsub fast float %0, %1

[LLVMdev] spilling & xmm register usage

2010 Sep 29

[LLVMdev] spilling & xmm register usage

On Sep 29, 2010, at 8:35 AMPDT, Ralf Karrenberg wrote: > Hello everybody, > > I have stumbled upon a test case (the attached module is a slightly > reduced version) that shows extremely reduced performance on linux > compared to windows when executed using LLVM's JIT. > > We narrowed the problem down to the actual code being generated, the > source IR on both systems

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

2013 Jul 05

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

On 07/04/2013 01:39 PM, Stéphane Letz wrote: > Hi, > > Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some

[LLVMdev] Vectorized LLVM IR

2010 May 29

[LLVMdev] Vectorized LLVM IR

Le 29 mai 2010 à 01:08, Bill Wendling a écrit : > Hi Stéphane, > > The SSE support is the LLVM backend is fine. What is the code that's generated? Do you have some short examples of where LLVM doesn't do as well as the equivalent scalar code? > > -bw > > On May 28, 2010, at 12:13 PM, Stéphane Letz wrote: We are actually testing LLVM for the Faust language

[LLVMdev] 64bit MRV problem: { float, float, float} -> { double, float }

2010 Jan 29

[LLVMdev] 64bit MRV problem: { float, float, float} -> { double, float }

Hey Duncan, hey everybody else, I just stumbled upon a problem in the latest llvm-gcc trunk which is related to my previous problem with the 64bit ABI and structs: Given the following code: struct float3 { float x, y, z; }; extern "C" void __attribute__((noinline)) test(float3 a, float3* res) { res->y = a.y; } int main(void) { float3 a; float3 res; test(a,

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

2013 Jul 04

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

Hi, Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some informations that are needed by the vectorization passes to

[LLVMdev] JIT on Windows x64

2009 Jun 30

[LLVMdev] JIT on Windows x64

Hi, I'm new to LLVM and have some questions about using the JIT on Windows x64. I am aware that this is currently broken but am attempting to use the hack/patch proposed in this bug http://llvm.org/bugs/show_bug.cgi?id=3739. I checked out the revision the patch was created for (66183) and applied it but the assembler generated seems to fail whenever it reaches a movaps insctruction.

[LLVMdev] Is it a bug or am I missing something ?

2013 Feb 19

[LLVMdev] Is it a bug or am I missing something ?

Hi all, on following code: ; ModuleID = 'shufxbug.ll' target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32" target triple = "i386-pc-linux-gnu" define void @sample_test(<4 x float>* nocapture %source, <8 x float>* nocapture %dest) nounwind noinline { L.entry:

[LLVMdev] loop vectorizer says Bad stride

2013 Oct 28

[LLVMdev] loop vectorizer says Bad stride

Verifying function running passes ... LV: Checking a loop in "bar" LV: Found a loop: L0 LV: Found an induction variable. LV: We need to do 0 pointer comparisons. LV: Checking memory dependencies LV: Bad stride - Not an AddRecExpr pointer %13 = getelementptr float* %arg2, i32 %1 SCEV: ((4 * (sext i32 {(256 + %arg0),+,1}<nw><%L0> to i64)) + %arg2) LV: Src Scev: {((4 * (sext

[LLVMdev] spilling & xmm register usage

2010 Sep 29

[LLVMdev] spilling & xmm register usage

Hello everybody, I have stumbled upon a test case (the attached module is a slightly reduced version) that shows extremely reduced performance on linux compared to windows when executed using LLVM's JIT. We narrowed the problem down to the actual code being generated, the source IR on both systems is the same. Try compiling the attached module: llc -O3 -filetype=asm -o BAD.s BAD.ll Under

search for: floating