Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] Duplicate loading of double constants"
2013 Aug 20
0
[LLVMdev] Duplicate loading of double constants
On 18 August 2013 22:38, Eugene Toder <eltoder at gmail.com> wrote:
> Hi,
>
> I found that in some cases llvm generates duplicate loads of double
> constants,
> e.g.
>
> $ cat t.c
> double f(double* p, int n)
> {
> double s = 0;
> if (n)
> s += *p;
> return s;
> }
> $ clang -S -O3 t.c -o -
> ...
> f:
2008 Jul 12
2
[LLVMdev] Shuffle regression
Hi all,
I think I found a regression in the shuffle instruction. I've attached a
replacement of fibonacci.cpp to reproduce the issue. It runs fine on release
2.3 but revision 52648 fails, and I suspect that the issue is still present.
2.3 generates the following x86 code:
03A10010 push ebp
03A10011 mov ebp,esp
03A10013 and esp,0FFFFFFF0h
03A10019
2012 Mar 28
2
[LLVMdev] Suboptimal code due to excessive spilling
Hi,
I have run into the following strange behavior and wanted to ask for
some advice. For the C program below, function sum() gets inlined in
foo() but the code generated looks very suboptimal (the code is an
extract from a larger program).
Below I show the 32-bit x86 assembly as produced by the demo page on
the llvm home page ("Output A"). As you can see from the assembly,
after
2012 Apr 05
0
[LLVMdev] Suboptimal code due to excessive spilling
I don't know much about this, but maybe -mllvm -unroll-count=1 can be used as a workaround?
/Patrik Hägglund
-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Brent Walker
Sent: den 28 mars 2012 03:18
To: llvmdev
Subject: [LLVMdev] Suboptimal code due to excessive spilling
Hi,
I have run into the following strange behavior
2008 Jul 12
0
[LLVMdev] Shuffle regression
I have fixed a related bug: 52740. Can you check if that fixes this
problem?
Evan
On Jul 11, 2008, at 6:43 PM, Nicolas Capens wrote:
> Hi all,
>
> I think I found a regression in the shuffle instruction. I’ve
> attached a replacement of fibonacci.cpp to reproduce the issue. It
> runs fine on release 2.3 but revision 52648 fails, and I suspect
> that the issue is still
2013 Jul 19
0
[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
(Changing subject line as diagnosis has changed)
I'm attaching the compiled code that I've been getting, both with
CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring
with CodeGenOpt::None, but that seems to be because ECX isn't being used
- it still gets set to 0x7fffffff by one of the calls to 76719BA1
I notice that X86::SQRTPD[m|r] appear in
2013 Aug 22
2
New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
libFLAC have three SSE-accelerated functions FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_N (N = 4, 8, 12). They require lpc_order less than N.
The best compression preset (flac -8) uses lpc_order up to 12; it means that during encoding FLAC also uses unaccelerated C function.
I'm not very familiar with asm so I took FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_12, changed it and
2010 May 11
2
[LLVMdev] How does SSEDomainFix work?
Hello. This is my 1st post.
I have tried SSE execution domain fixup pass.
But I am not able to see any improvements.
I expect for the example below to use MOVDQA, PAND &c.
(On nehalem, ANDPS is extremely slower than PAND)
Please tell me if something would be wrong for me.
Thank you.
Takumi
Host: i386-mingw32
Build: trunk at 103373
foo.ll:
define <4 x i32> @foo(<4 x i32> %x,
2013 Feb 19
2
[LLVMdev] Is it a bug or am I missing something ?
Hi all,
on following code:
; ModuleID = 'shufxbug.ll'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32"
target triple = "i386-pc-linux-gnu"
define void @sample_test(<4 x float>* nocapture %source, <8 x float>* nocapture %dest) nounwind noinline {
L.entry:
2017 Mar 01
2
[Codegen bug in LLVM 3.8?] br following `fcmp une` is present in ll, absent in asm
Hi,
We seem to have found a bug in the LLVM 3.8 code generator. We are using MCJIT and have isolated working.ll and broken.ll after middle-end optimizations -- in the block merge128, notice that broken.ll has a fcmp une comparison to zero and a jump based on that branch:
merge128: ; preds = %true71, %false72
%_rtB_724 = load %B_repro_T*, %B_repro_T**
2012 Jul 06
0
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Sat, Jul 7, 2012 at 12:25 AM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
> On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
>>> [...]
>>> movaps 32(%rdi), %xmm3
>>> movaps 48(%rdi), %xmm2
>>>
2013 Feb 19
0
[LLVMdev] Is it a bug or am I missing something ?
<<<<<<<<<<<<<<<<<<<<<<<<<<
; ModuleID = 'shufxbug.ll'
target datalayout =
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:6
4-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32"
target triple = "i386-pc-linux-gnu"
define void @sample_test(<4 x float>* nocapture
2012 Jul 06
2
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>
> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
>
>> I've noticed that LLVM tends to generate suboptimal code and spill an
>> excessive amount of registers in large functions, such as in those
>> that are automatically generated by FFTW.
>
2013 Jul 19
4
[LLVMdev] SIMD instructions and memory alignment on X86
Hmm, I'm not able to get those .ll files to compile if I disable SSE and I
end up with SSE instructions(including sqrtpd) if I don't disable it.
On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote:
> Is there something specifically required to enable SSE? If it's not
> detected as available (based from the target triple?) then I don't think
2010 May 11
0
[LLVMdev] How does SSEDomainFix work?
On May 10, 2010, at 9:07 PM, NAKAMURA Takumi wrote:
> Hello. This is my 1st post.
ようこそ!
> I have tried SSE execution domain fixup pass.
> But I am not able to see any improvements.
Did you actually measure runtime, or did you look at assembly?
> I expect for the example below to use MOVDQA, PAND &c.
> (On nehalem, ANDPS is extremely slower than PAND)
Are you sure? The
2013 Feb 26
2
[LLVMdev] passing vector of booleans to functions
Hi all,
I'm currently trying to figure out the best way to pass vector of
booleans to other functions. Take this small example:
define <4 x float> @vcmp_add(<4 x float> %a, <4 x float> %b) {
entry:
%cmp = fcmp olt <4 x float> %a, %b
%add = fadd <4 x float> %a, %b
%sel = select <4 x i1> %cmp, <4 x float> %add, <4 x float> %a
ret <4 x
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
All,
Attached is a patch that does two things. First it makes the use
of the current SSE code a run time option through the use
of speex_decoder_ctl() and speex_encoder_ctl
It does this twofold. First there is a modification to the configure.in
script which introduces a check based upon platform. It will compile in the
sse assembly if you are on an i?86 based platform by making a
2015 Jul 29
2
[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address
When I compile attached IR with LLVM 3.6
llc -march=x86-64 -o f.S f.ll
it generates an aligned ADDPS with unaligned address. See attached f.S,
here an extract:
addq $12, %r9 # $12 is not a multiple of 4, thus for
xmm0 this is unaligned
xorl %esi, %esi
.align 16, 0x90
.LBB0_1: # %loop2
2012 Jul 06
0
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
> I've noticed that LLVM tends to generate suboptimal code and spill an
> excessive amount of registers in large functions, such as in those
> that are automatically generated by FFTW.
One problem might be that we're forcing the 16 stores to the out array to happen in source order, which
2008 Jun 17
2
[LLVMdev] VFCmp failing when unordered or UnsafeFPMath on x86
Hi Nate!
I don't see how that would work. Select doesn't work per element.
Say we're trying to vectorize the following C++ code:
if(v[0] < 0) v[0] += 1.0f;
if(v[1] < 0) v[1] += 1.0f;
if(v[2] < 0) v[2] += 1.0f;
if(v[3] < 0) v[3] += 1.0f;
With SSE assembly this would be as simple as:
movaps xmm1, xmm0 // v in xmm0
cmpltps xmm1, zero // zero =