Displaying 20 results from an estimated 25 matches for "addsd".
Did you mean:
added
2012 Mar 28
2
[LLVMdev] Suboptimal code due to excessive spilling
...ng -- nothing could have been more explicit.
The really strange thing, is that in the assingment to p[i] is removed
(line marked with "xxx..."), then the code produced is optimal and
exactly what one expects. I show this result in "Output B" where you
get a beatiful sequence of addsd into register xmm2.
It's all very strange and it points to some questionable decision
making on the part of llvm. I tried different versions of the sum()
function (elliminating the loop for example) but it does not help.
Another observation is that the loop variable i (in foo) must be
involve...
2012 Apr 05
0
[LLVMdev] Suboptimal code due to excessive spilling
...ng -- nothing could have been more explicit.
The really strange thing, is that in the assingment to p[i] is removed
(line marked with "xxx..."), then the code produced is optimal and
exactly what one expects. I show this result in "Output B" where you
get a beatiful sequence of addsd into register xmm2.
It's all very strange and it points to some questionable decision
making on the part of llvm. I tried different versions of the sum()
function (elliminating the loop for example) but it does not help.
Another observation is that the loop variable i (in foo) must be
involve...
2017 Mar 01
2
[Codegen bug in LLVM 3.8?] br following `fcmp une` is present in ll, absent in asm
...q 184(%rsp), %rax
movq $0, 752(%rax)
movq 184(%rsp), %rax
movq $0, 760(%rax)
movq 176(%rsp), %rax
movsd 5608(%rax), %xmm0 # xmm0 = mem[0],zero
movq 184(%rsp), %rax
mulsd 648(%rax), %xmm0
movsd 160(%rsp), %xmm1 # 8-byte Reload
# xmm1 = mem[0],zero
addsd %xmm0, %xmm1
movsd %xmm1, 672(%rax)
movq 176(%rsp), %rax
movsd 5648(%rax), %xmm0 # xmm0 = mem[0],zero
movq 184(%rsp), %rax
mulsd 648(%rax), %xmm0
movsd %xmm0, 704(%rax)
movsd 192(%rsp), %xmm0 # xmm0 = mem[0],zero
movq 184(%rsp), %rax
xorpd %xmm1, %xmm1
ucomisd %xmm1, %xmm0
movq 672(%ra...
2013 Aug 19
2
[LLVMdev] Duplicate loading of double constants
...int n)
{
double s = 0;
if (n)
s += *p;
return s;
}
$ clang -S -O3 t.c -o -
...
f: # @f
.cfi_startproc
# BB#0:
xorps %xmm0, %xmm0
testl %esi, %esi
je .LBB0_2
# BB#1:
xorps %xmm0, %xmm0
addsd (%rdi), %xmm0
.LBB0_2:
ret
...
Note that there are 2 xorps instructions, the one in BB#1 being clearly
redundant
as it's dominated by the first one. Two xorps come from 2 FsFLD0SD
generated by
instruction selection and never eliminated by machine passes. My guess
would be
machine CSE...
2016 Oct 12
4
[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On Wed, Oct 12, 2016 at 10:53 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> I don't think that Clang/LLVM uses it by default on x86_64. If you're using -Ofast, however, that would explain it. I recommend looking at -O3 vs -O0 and make sure those are the same. -Ofast enables -ffast-math, which can legitimately cause differences.
>
The following tests pass at "-O3" and
2013 Jul 19
0
[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
...pcklpd xmm1,xmm3
002E0520 mulpd xmm1,xmmword ptr ds:[2E00A0h]
002E0528 addpd xmm1,xmm0
002E052C movapd xmm3,xmmword ptr [esp+0A0h]
002E0535 movapd xmm0,xmm3
002E0539 unpckhpd xmm0,xmm0
002E053D movapd xmm2,xmm3
002E0541 movapd xmm6,xmm3
002E0545 addsd xmm2,xmm0
002E0549 movapd xmm3,xmmword ptr [esp+0B0h]
002E0552 addsd xmm2,xmm3
002E0556 movapd xmm7,xmm3
002E055A xorpd xmm3,xmm3
002E055E ucomisd xmm2,xmm3
002E0562 setnp al
002E0565 sete cl
002E0568 test al,cl
002E056A jne...
2010 Jun 07
1
[LLVMdev] XMM in X86 Backend
...ving an excessive use of xmm registers in the output assembly
produced by x86 backend. Basically, for a code like this
double test(double a, double b) {
double c;
c = 1.0 + sin (a + b*b);
return c;
}
llc produced somthing like....
movsd 16(%ebp), %xmm0
mulsd %xmm0, %xmm0
addsd 8(%ebp), %xmm0
movsd %xmm0, (%esp)
.......
fstpl -8(%ebp
movsd -8(%ebp), %xmm0
addsd .LC1, %xmm0
movsd %xmm0, -8(%ebp)
fldl -8(%ebp)
LLVM Backend is using xmms it involves a lot of register moves. llc has one
option -mcpu=686, where...
2013 Aug 20
0
[LLVMdev] Duplicate loading of double constants
...eturn s;
> }
> $ clang -S -O3 t.c -o -
> ...
> f: # @f
> .cfi_startproc
> # BB#0:
> xorps %xmm0, %xmm0
> testl %esi, %esi
> je .LBB0_2
> # BB#1:
> xorps %xmm0, %xmm0
> addsd (%rdi), %xmm0
> .LBB0_2:
> ret
> ...
>
Thanks. Please file a bug for this on llvm.org/bugs .
The crux of the problem is that machine CSE runs before register allocation
and is consequently extremely conservative when doing CSE to avoid
potentially increasing register pressur...
2013 Jul 15
3
[LLVMdev] Enabling the SLP vectorizer by default for -O3
...ous yet.
+0x00 movupd 16(%rsi), %xmm0
+0x05 movupd 16(%rsp), %xmm1
+0x0b subpd %xmm1, %xmm0 <———— 18% of the runtime of bh ?
+0x0f movapd %xmm0, %xmm2
+0x13 mulsd %xmm2, %xmm2
+0x17 xorpd %xmm1, %xmm1
+0x1b addsd %xmm2, %xmm1
I spent less time on Bullet. Bullet also has one hot function (“resolveSingleConstraintRowLowerLimit”). On this code the vectorizer generates several trees that use the <3 x float> type. This is risky because the loads/stores are inefficient, but unfortunately...
2013 Jul 19
4
[LLVMdev] SIMD instructions and memory alignment on X86
Hmm, I'm not able to get those .ll files to compile if I disable SSE and I
end up with SSE instructions(including sqrtpd) if I don't disable it.
On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote:
> Is there something specifically required to enable SSE? If it's not
> detected as available (based from the target triple?) then I don't think
2013 Jul 23
0
[LLVMdev] Enabling the SLP vectorizer by default for -O3
...16(%rsi), %xmm0
> +0x05 movupd 16(%rsp), %xmm1
> +0x0b subpd %xmm1, %xmm0 <———— 18% of the runtime of bh ?
> +0x0f movapd %xmm0, %xmm2
> +0x13 mulsd %xmm2, %xmm2
> +0x17 xorpd %xmm1, %xmm1
> +0x1b addsd %xmm2, %xmm1
>
> I spent less time on Bullet. Bullet also has one hot function (“resolveSingleConstraintRowLowerLimit”). On this code the vectorizer generates several trees that use the <3 x float> type. This is risky because the loads/stores are inefficient, but unfo...
2006 Apr 19
0
[LLVMdev] floating point exception and SSE2 instructions
..., %ecx
cmpl $0, %eax
jne LBB_sum_d_2 # cond_true.preheader
LBB_sum_d_1: # entry.bb9_crit_edge
pxor %xmm0, %xmm0
jmp LBB_sum_d_5 # bb9
LBB_sum_d_2: # cond_true.preheader
pxor %xmm0, %xmm0
xorl %edx, %edx
LBB_sum_d_3: # cond_true
addsd (%ecx), %xmm0
addl $8, %ecx
incl %edx
cmpl %eax, %edx
jne LBB_sum_d_3 # cond_true
LBB_sum_d_4: # bb9.loopexit
LBB_sum_d_5: # bb9
movsd %xmm0, (%esp)
fldl (%esp)
addl $12, %esp
ret
There is nothing here that should cause...
2016 Jun 27
3
Finding caller-saved registers at a function call site
...mm0
40069d: 00
40069e: f2 0f 59 c1 mulsd %xmm1,%xmm0 # val *
1.2
4006a2: f2 0f 11 4d f8 movsd %xmm1,-0x8(%rbp) # Spill
val to the stack
4006a7: e8 d4 ff ff ff callq 400680 <recurse>
4006ac: f2 0f 58 45 f8 addsd -0x8(%rbp),%xmm0 #
recurse's return value + val
4006b1: 48 83 c4 10 add $0x10,%rsp
4006b5: 5d pop %rbp
4006b6: c3 retq
...
Notice how xmm1 (the storage location of "val", which is live across the
cal...
2006 Apr 19
2
[LLVMdev] floating point exception and SSE2 instructions
Hi,
I'm building a little JIT that creates functions to do array manipulations,
eg. sum all the elements of a double* array. I'm writing this in python, generating
llvm assembly intructions and piping that through a call to ParseAssemblyString,
ExecutionEngine, etc.
It's working OK on integer values, but i'm getting nasty floating point exceptions
when i try this on double*
2018 Nov 15
2
[RFC][llvm-mca] Adding binary support to llvm-mca.
...ls. While the markers are presented as
function calls, in reality they are no-ops.
test:
pushq %rbp
movq %rsp, %rbp
movsd %xmm0, -8(%rbp)
movsd %xmm1, -16(%rbp)
.Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
xorps %xmm0, %xmm0
movsd %xmm0, -24(%rbp)
movsd -8(%rbp), %xmm0
mulsd -16(%rbp), %xmm0
addsd -24(%rbp), %xmm0
movsd %xmm0, -24(%rbp)
.Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
movsd -24(%rbp), %xmm0
popq %rbp
retq
.section .mca_code_regions,"", at progbits
.quad 42
.quad .Lmca_code_region_start_0
.quad .Lmca_code_region_end_0-.Lmca_code_region_start_0
The assembly has been t...
2016 Jun 22
0
Finding caller-saved registers at a function call site
Hi Rob,
Rob Lyerly via llvm-dev wrote:
> I'm looking for a way to get all the caller-saved registers (both the
> register and the stack slot at which it was saved) for a given function
> call site in the backend. What's the best way to grab this
> information? Is it possible to get this information if I have the
> MachineInstr of the function call? I'm currently
2016 Jun 22
3
Finding caller-saved registers at a function call site
Hi everyone,
I'm looking for a way to get all the caller-saved registers (both the
register and the stack slot at which it was saved) for a given function
call site in the backend. What's the best way to grab this information?
Is it possible to get this information if I have the MachineInstr of the
function call? I'm currently targeting the AArch64 & X86 backends.
Thanks!
--
2013 Jul 15
0
[LLVMdev] Enabling the SLP vectorizer by default for -O3
On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote:
> Hi,
>
> LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP
2016 Jun 27
0
Finding caller-saved registers at a function call site
...9e: f2 0f 59 c1 mulsd %xmm1,%xmm0 # val
> * 1.2
> 4006a2: f2 0f 11 4d f8 movsd %xmm1,-0x8(%rbp) #
> Spill val to the stack
> 4006a7: e8 d4 ff ff ff callq 400680 <recurse>
> 4006ac: f2 0f 58 45 f8 addsd -0x8(%rbp),%xmm0 #
> recurse's return value + val
> 4006b1: 48 83 c4 10 add $0x10,%rsp
> 4006b5: 5d pop %rbp
> 4006b6: c3 retq
> ...
>
> Notice how xmm1 (the storage location of "val&q...
2018 Nov 21
2
[RFC][llvm-mca] Adding binary support to llvm-mca.
...> movq %rsp, %rbp
> > movsd %xmm0, -8(%rbp)
> > movsd %xmm1, -16(%rbp)
> > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> > xorps %xmm0, %xmm0
> > movsd %xmm0, -24(%rbp)
> > movsd -8(%rbp), %xmm0
> > mulsd -16(%rbp), %xmm0
> > addsd -24(%rbp), %xmm0
> > movsd %xmm0, -24(%rbp)
> > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
> > movsd -24(%rbp), %xmm0
> > popq %rbp
> > retq
> > .section .mca_code_regions,"", at progbits
> > .quad 42
> > .quad .Lmca_...