Displaying 20 results from an estimated 9248 matches for "floating".
2015 Jun 22
2
[LLVMdev] bb-vectorizer transforms only part of the block
The loads, stores and float arithmetic in attached function should be
completely vectorizable. The bb-vectorizer does a good job at first, but
from instruction %96 on it messes up by adding unnecessary
vectorshuffles. (The function was designed so that no shuffle would be
needed in order to vectorize it).
I tested this with llvm 3.6 with the following command:
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
I seem to have problem to get the SLP vectorizer to make use of the full
8 floats available in a SIMD vector on a Sandy Bridge CPU with AVX. The
function is attached, the CPU flags are:
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
2014 Aug 07
3
[LLVMdev] MCJIT generates MOVAPS on unaligned address
MCJIT when lowering to x86-64 generates a MOVAPS (Move Aligned Packed
Single-Precision Floating-Point Values) on a non-aligned memory address:
movaps 88(%rdx), %xmm0
where %rdx comes in as a function argument with only natural alignment
(float*). This x86 instruction requires the memory address to be 16 byte
aligned which 88 plus something aligned to 4 byte isn't.
Here the ac...
2019 Dec 09
2
[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs
Sanjay,
I'm looking at some missed optimizations caused by D70246. Here's a test case:
define <4 x float> @f(i32 %t32, <4 x float>* %t24) {
.entry:
%t43 = insertelement <3 x i32> undef, i32 %t32, i32 2
%t44 = bitcast <3 x i32> %t43 to <3 x float>
%t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32>
<i32 0, i32 undef,
2014 Aug 07
3
[LLVMdev] How to broaden the SLP vectorizer's search
On 7 August 2014 17:33, Chad Rosier <mcrosier at codeaurora.org> wrote:
> You might consider filing a bug (llvm.org/bugs) requesting a flag, but I
> don't know if the code owners want to expose such a flag.
I'm not sure that's a good idea as a raw access to that limit, as
there are no guarantees that it'll stay the same. But maybe a flag
turning some
2013 Nov 11
2
[LLVMdev] What's the Alias Analysis does clang use ?
Hi, LLVM community:
I found basicaa seems not to tell must-not-alias for __restrict__ arguments
in c/c++. It only compares two pointers and the underlying objects they
point to. I wonder how clang does alias analysis
for c/c++ keyword restrict.
let assume we compile the following code:
$cat myalias.cc
float foo(float * __restrict__ v0, float * __restrict__ v1, float *
__restrict__ v2, float *
2015 Nov 02
2
[StructurizeCFG] Trouble with branches out of a loop
Hi,
I've been investigating the StructurizeCFG pass, and it looks like it has
trouble handling CFG edges that break out of a loop and go directly to the
function exit. Am I running up against a bug in the structurizer, or a
general limitation of the algorithm used? As an aside, is there any
documentation for the algorithm used? Is it based on a published paper?
The input IR I have is the
2015 Jul 07
2
[LLVMdev] Modifications to SLP
Hi all!
It takes the current SLP vectorizer too long to vectorize my scalar
code. I am talking here about functions that have a single, huge basic
block with O(10^6) instructions. Here's an example:
%0 = getelementptr float* %arg1, i32 49
%1 = load float* %0
%2 = getelementptr float* %arg1, i32 4145
%3 = load float* %2
%4 = getelementptr float* %arg2, i32 49
%5 = load
2015 Jun 03
2
[LLVMdev] Replacing a repetitive sequence of code with a loop
Hey guys, in an HPC project I am working on I am given an LLVM program
consisting of a linear sequence of repetitive junks of code with an
uniform memory access pattern. Each code junk does the following: 1)
loads some memory, 2) performs some arithmetic operations, 3) stores the
result back to memory. The memory stride between consecutive junks is
constant over the whole program, thus the
2013 Nov 12
0
[LLVMdev] What's the Alias Analysis does clang use ?
Hi,
Your problem is that the function arguments, which are makes as noalias, are not being directly used as the base objects of the array accesses:
> %v0.addr = alloca float*, align 8
> %v1.addr = alloca float*, align 8
> %v2.addr = alloca float*, align 8
> %t.addr = alloca float*, align 8
...
> store float* %v0, float** %v0.addr, align 8
> store float* %v1, float** %v1.addr,
2019 Nov 10
2
Reassociation is blocking a vectorization
Hi Devs,
I am looking at the bug
https://bugs.llvm.org/show_bug.cgi?id=43953
and found that following piece of ir
%arrayidx = getelementptr inbounds float, float* %Vec0, i64 %idxprom
%0 = load float, float* %arrayidx, align 4, !tbaa !2
%arrayidx2 = getelementptr inbounds float, float* %Vec1, i64 %idxprom
%1 = load float, float* %arrayidx2, align 4, !tbaa !2
%sub = fsub fast float %0, %1
2010 Sep 29
0
[LLVMdev] spilling & xmm register usage
On Sep 29, 2010, at 8:35 AMPDT, Ralf Karrenberg wrote:
> Hello everybody,
>
> I have stumbled upon a test case (the attached module is a slightly
> reduced version) that shows extremely reduced performance on linux
> compared to windows when executed using LLVM's JIT.
>
> We narrowed the problem down to the actual code being generated, the
> source IR on both systems
2013 Jul 05
0
[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR
On 07/04/2013 01:39 PM, Stéphane Letz wrote:
> Hi,
>
> Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some
2010 May 29
3
[LLVMdev] Vectorized LLVM IR
Le 29 mai 2010 à 01:08, Bill Wendling a écrit :
> Hi Stéphane,
>
> The SSE support is the LLVM backend is fine. What is the code that's generated? Do you have some short examples of where LLVM doesn't do as well as the equivalent scalar code?
>
> -bw
>
> On May 28, 2010, at 12:13 PM, Stéphane Letz wrote:
We are actually testing LLVM for the Faust language
2010 Jan 29
2
[LLVMdev] 64bit MRV problem: { float, float, float} -> { double, float }
Hey Duncan, hey everybody else,
I just stumbled upon a problem in the latest llvm-gcc trunk which is
related to my previous problem with the 64bit ABI and structs:
Given the following code:
struct float3 { float x, y, z; };
extern "C" void __attribute__((noinline)) test(float3 a, float3* res) {
res->y = a.y;
}
int main(void) {
float3 a;
float3 res;
test(a,
2013 Jul 04
3
[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR
Hi,
Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some informations that are needed by the vectorization passes to
2009 Jun 30
2
[LLVMdev] JIT on Windows x64
Hi,
I'm new to LLVM and have some questions about using the JIT on Windows
x64. I am aware that this is currently broken but am attempting to use
the hack/patch proposed in this bug
http://llvm.org/bugs/show_bug.cgi?id=3739.
I checked out the revision the patch was created for (66183) and applied
it but the assembler generated seems to fail whenever it reaches a
movaps insctruction.
2013 Feb 19
2
[LLVMdev] Is it a bug or am I missing something ?
Hi all,
on following code:
; ModuleID = 'shufxbug.ll'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32"
target triple = "i386-pc-linux-gnu"
define void @sample_test(<4 x float>* nocapture %source, <8 x float>* nocapture %dest) nounwind noinline {
L.entry:
2013 Oct 28
2
[LLVMdev] loop vectorizer says Bad stride
Verifying function
running passes ...
LV: Checking a loop in "bar"
LV: Found a loop: L0
LV: Found an induction variable.
LV: We need to do 0 pointer comparisons.
LV: Checking memory dependencies
LV: Bad stride - Not an AddRecExpr pointer %13 = getelementptr float*
%arg2, i32 %1 SCEV: ((4 * (sext i32 {(256 + %arg0),+,1}<nw><%L0> to
i64)) + %arg2)
LV: Src Scev: {((4 * (sext
2010 Sep 29
3
[LLVMdev] spilling & xmm register usage
Hello everybody,
I have stumbled upon a test case (the attached module is a slightly
reduced version) that shows extremely reduced performance on linux
compared to windows when executed using LLVM's JIT.
We narrowed the problem down to the actual code being generated, the
source IR on both systems is the same.
Try compiling the attached module:
llc -O3 -filetype=asm -o BAD.s BAD.ll
Under