thr3ads.net - search: "addsd"

[LLVMdev] Suboptimal code due to excessive spilling

2012 Mar 28

2

[LLVMdev] Suboptimal code due to excessive spilling

...ng -- nothing could have been more explicit. The really strange thing, is that in the assingment to p[i] is removed (line marked with "xxx..."), then the code produced is optimal and exactly what one expects. I show this result in "Output B" where you get a beatiful sequence of addsd into register xmm2. It's all very strange and it points to some questionable decision making on the part of llvm. I tried different versions of the sum() function (elliminating the loop for example) but it does not help. Another observation is that the loop variable i (in foo) must be involve...

[LLVMdev] Suboptimal code due to excessive spilling

2012 Apr 05

0

[LLVMdev] Suboptimal code due to excessive spilling

...ng -- nothing could have been more explicit. The really strange thing, is that in the assingment to p[i] is removed (line marked with "xxx..."), then the code produced is optimal and exactly what one expects. I show this result in "Output B" where you get a beatiful sequence of addsd into register xmm2. It's all very strange and it points to some questionable decision making on the part of llvm. I tried different versions of the sum() function (elliminating the loop for example) but it does not help. Another observation is that the loop variable i (in foo) must be involve...

[Codegen bug in LLVM 3.8?] br following `fcmp une` is present in ll, absent in asm

2017 Mar 01

2

[Codegen bug in LLVM 3.8?] br following `fcmp une` is present in ll, absent in asm

...q 184(%rsp), %rax movq $0, 752(%rax) movq 184(%rsp), %rax movq $0, 760(%rax) movq 176(%rsp), %rax movsd 5608(%rax), %xmm0 # xmm0 = mem[0],zero movq 184(%rsp), %rax mulsd 648(%rax), %xmm0 movsd 160(%rsp), %xmm1 # 8-byte Reload # xmm1 = mem[0],zero addsd %xmm0, %xmm1 movsd %xmm1, 672(%rax) movq 176(%rsp), %rax movsd 5648(%rax), %xmm0 # xmm0 = mem[0],zero movq 184(%rsp), %rax mulsd 648(%rax), %xmm0 movsd %xmm0, 704(%rax) movsd 192(%rsp), %xmm0 # xmm0 = mem[0],zero movq 184(%rsp), %rax xorpd %xmm1, %xmm1 ucomisd %xmm1, %xmm0 movq 672(%ra...

[LLVMdev] Duplicate loading of double constants

2013 Aug 19

2

[LLVMdev] Duplicate loading of double constants

...int n) { double s = 0; if (n) s += *p; return s; } $ clang -S -O3 t.c -o - ... f: # @f .cfi_startproc # BB#0: xorps %xmm0, %xmm0 testl %esi, %esi je .LBB0_2 # BB#1: xorps %xmm0, %xmm0 addsd (%rdi), %xmm0 .LBB0_2: ret ... Note that there are 2 xorps instructions, the one in BB#1 being clearly redundant as it's dominated by the first one. Two xorps come from 2 FsFLD0SD generated by instruction selection and never eliminated by machine passes. My guess would be machine CSE...

[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

2016 Oct 12

4

[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On Wed, Oct 12, 2016 at 10:53 AM, Hal Finkel <hfinkel at anl.gov> wrote: > I don't think that Clang/LLVM uses it by default on x86_64. If you're using -Ofast, however, that would explain it. I recommend looking at -O3 vs -O0 and make sure those are the same. -Ofast enables -ffast-math, which can legitimately cause differences. > The following tests pass at "-O3" and

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

2013 Jul 19

0

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

...pcklpd xmm1,xmm3 002E0520 mulpd xmm1,xmmword ptr ds:[2E00A0h] 002E0528 addpd xmm1,xmm0 002E052C movapd xmm3,xmmword ptr [esp+0A0h] 002E0535 movapd xmm0,xmm3 002E0539 unpckhpd xmm0,xmm0 002E053D movapd xmm2,xmm3 002E0541 movapd xmm6,xmm3 002E0545 addsd xmm2,xmm0 002E0549 movapd xmm3,xmmword ptr [esp+0B0h] 002E0552 addsd xmm2,xmm3 002E0556 movapd xmm7,xmm3 002E055A xorpd xmm3,xmm3 002E055E ucomisd xmm2,xmm3 002E0562 setnp al 002E0565 sete cl 002E0568 test al,cl 002E056A jne...

[LLVMdev] XMM in X86 Backend

2010 Jun 07

1

[LLVMdev] XMM in X86 Backend

...ving an excessive use of xmm registers in the output assembly produced by x86 backend. Basically, for a code like this double test(double a, double b) { double c; c = 1.0 + sin (a + b*b); return c; } llc produced somthing like.... movsd 16(%ebp), %xmm0 mulsd %xmm0, %xmm0 addsd 8(%ebp), %xmm0 movsd %xmm0, (%esp) ....... fstpl -8(%ebp movsd -8(%ebp), %xmm0 addsd .LC1, %xmm0 movsd %xmm0, -8(%ebp) fldl -8(%ebp) LLVM Backend is using xmms it involves a lot of register moves. llc has one option -mcpu=686, where...

[LLVMdev] Duplicate loading of double constants

2013 Aug 20

0

[LLVMdev] Duplicate loading of double constants

...eturn s; > } > $ clang -S -O3 t.c -o - > ... > f: # @f > .cfi_startproc > # BB#0: > xorps %xmm0, %xmm0 > testl %esi, %esi > je .LBB0_2 > # BB#1: > xorps %xmm0, %xmm0 > addsd (%rdi), %xmm0 > .LBB0_2: > ret > ... > Thanks. Please file a bug for this on llvm.org/bugs . The crux of the problem is that machine CSE runs before register allocation and is consequently extremely conservative when doing CSE to avoid potentially increasing register pressur...

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 15

3

[LLVMdev] Enabling the SLP vectorizer by default for -O3

...ous yet. +0x00 movupd 16(%rsi), %xmm0 +0x05 movupd 16(%rsp), %xmm1 +0x0b subpd %xmm1, %xmm0 <———— 18% of the runtime of bh ? +0x0f movapd %xmm0, %xmm2 +0x13 mulsd %xmm2, %xmm2 +0x17 xorpd %xmm1, %xmm1 +0x1b addsd %xmm2, %xmm1 I spent less time on Bullet. Bullet also has one hot function (“resolveSingleConstraintRowLowerLimit”). On this code the vectorizer generates several trees that use the <3 x float> type. This is risky because the loads/stores are inefficient, but unfortunately...

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 19

4

[LLVMdev] SIMD instructions and memory alignment on X86

Hmm, I'm not able to get those .ll files to compile if I disable SSE and I end up with SSE instructions(including sqrtpd) if I don't disable it. On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote: > Is there something specifically required to enable SSE? If it's not > detected as available (based from the target triple?) then I don't think

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 23

0

[LLVMdev] Enabling the SLP vectorizer by default for -O3

...16(%rsi), %xmm0 > +0x05 movupd 16(%rsp), %xmm1 > +0x0b subpd %xmm1, %xmm0 <———— 18% of the runtime of bh ? > +0x0f movapd %xmm0, %xmm2 > +0x13 mulsd %xmm2, %xmm2 > +0x17 xorpd %xmm1, %xmm1 > +0x1b addsd %xmm2, %xmm1 > > I spent less time on Bullet. Bullet also has one hot function (“resolveSingleConstraintRowLowerLimit”). On this code the vectorizer generates several trees that use the <3 x float> type. This is risky because the loads/stores are inefficient, but unfo...

[LLVMdev] floating point exception and SSE2 instructions

2006 Apr 19

0

[LLVMdev] floating point exception and SSE2 instructions

..., %ecx cmpl $0, %eax jne LBB_sum_d_2 # cond_true.preheader LBB_sum_d_1: # entry.bb9_crit_edge pxor %xmm0, %xmm0 jmp LBB_sum_d_5 # bb9 LBB_sum_d_2: # cond_true.preheader pxor %xmm0, %xmm0 xorl %edx, %edx LBB_sum_d_3: # cond_true addsd (%ecx), %xmm0 addl $8, %ecx incl %edx cmpl %eax, %edx jne LBB_sum_d_3 # cond_true LBB_sum_d_4: # bb9.loopexit LBB_sum_d_5: # bb9 movsd %xmm0, (%esp) fldl (%esp) addl $12, %esp ret There is nothing here that should cause...

Finding caller-saved registers at a function call site

2016 Jun 27

3

Finding caller-saved registers at a function call site

...mm0 40069d: 00 40069e: f2 0f 59 c1 mulsd %xmm1,%xmm0 # val * 1.2 4006a2: f2 0f 11 4d f8 movsd %xmm1,-0x8(%rbp) # Spill val to the stack 4006a7: e8 d4 ff ff ff callq 400680 <recurse> 4006ac: f2 0f 58 45 f8 addsd -0x8(%rbp),%xmm0 # recurse's return value + val 4006b1: 48 83 c4 10 add $0x10,%rsp 4006b5: 5d pop %rbp 4006b6: c3 retq ... Notice how xmm1 (the storage location of "val", which is live across the cal...

[LLVMdev] floating point exception and SSE2 instructions

2006 Apr 19

2

[LLVMdev] floating point exception and SSE2 instructions

Hi, I'm building a little JIT that creates functions to do array manipulations, eg. sum all the elements of a double* array. I'm writing this in python, generating llvm assembly intructions and piping that through a call to ParseAssemblyString, ExecutionEngine, etc. It's working OK on integer values, but i'm getting nasty floating point exceptions when i try this on double*

[RFC][llvm-mca] Adding binary support to llvm-mca.

2018 Nov 15

2

[RFC][llvm-mca] Adding binary support to llvm-mca.

...ls. While the markers are presented as function calls, in reality they are no-ops. test: pushq %rbp movq %rsp, %rbp movsd %xmm0, -8(%rbp) movsd %xmm1, -16(%rbp) .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42 xorps %xmm0, %xmm0 movsd %xmm0, -24(%rbp) movsd -8(%rbp), %xmm0 mulsd -16(%rbp), %xmm0 addsd -24(%rbp), %xmm0 movsd %xmm0, -24(%rbp) .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42 movsd -24(%rbp), %xmm0 popq %rbp retq .section .mca_code_regions,"", at progbits .quad 42 .quad .Lmca_code_region_start_0 .quad .Lmca_code_region_end_0-.Lmca_code_region_start_0 The assembly has been t...

Finding caller-saved registers at a function call site

2016 Jun 22

0

Finding caller-saved registers at a function call site

Hi Rob, Rob Lyerly via llvm-dev wrote: > I'm looking for a way to get all the caller-saved registers (both the > register and the stack slot at which it was saved) for a given function > call site in the backend. What's the best way to grab this > information? Is it possible to get this information if I have the > MachineInstr of the function call? I'm currently

Finding caller-saved registers at a function call site

2016 Jun 22

3

Finding caller-saved registers at a function call site

Hi everyone, I'm looking for a way to get all the caller-saved registers (both the register and the stack slot at which it was saved) for a given function call site in the backend. What's the best way to grab this information? Is it possible to get this information if I have the MachineInstr of the function call? I'm currently targeting the AArch64 & X86 backends. Thanks! --

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 15

0

[LLVMdev] Enabling the SLP vectorizer by default for -O3

On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > Hi, > > LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP

Finding caller-saved registers at a function call site

2016 Jun 27

0

Finding caller-saved registers at a function call site

...9e: f2 0f 59 c1 mulsd %xmm1,%xmm0 # val > * 1.2 > 4006a2: f2 0f 11 4d f8 movsd %xmm1,-0x8(%rbp) # > Spill val to the stack > 4006a7: e8 d4 ff ff ff callq 400680 <recurse> > 4006ac: f2 0f 58 45 f8 addsd -0x8(%rbp),%xmm0 # > recurse's return value + val > 4006b1: 48 83 c4 10 add $0x10,%rsp > 4006b5: 5d pop %rbp > 4006b6: c3 retq > ... > > Notice how xmm1 (the storage location of "val&q...

[RFC][llvm-mca] Adding binary support to llvm-mca.

2018 Nov 21

2

[RFC][llvm-mca] Adding binary support to llvm-mca.

...> movq %rsp, %rbp > > movsd %xmm0, -8(%rbp) > > movsd %xmm1, -16(%rbp) > > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42 > > xorps %xmm0, %xmm0 > > movsd %xmm0, -24(%rbp) > > movsd -8(%rbp), %xmm0 > > mulsd -16(%rbp), %xmm0 > > addsd -24(%rbp), %xmm0 > > movsd %xmm0, -24(%rbp) > > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42 > > movsd -24(%rbp), %xmm0 > > popq %rbp > > retq > > .section .mca_code_regions,"", at progbits > > .quad 42 > > .quad .Lmca_...

search for: addsd