thr3ads.net - search: "mulsd"

[Codegen bug in LLVM 3.8?] br following `fcmp une` is present in ll, absent in asm

2017 Mar 01

2

[Codegen bug in LLVM 3.8?] br following `fcmp une` is present in ll, absent in asm

...cx movq %rax, 728(%rcx) movq 184(%rsp), %rax movq 728(%rax), %rcx movq %rcx, 736(%rax) movq 184(%rsp), %rax movq $0, 744(%rax) movq 184(%rsp), %rax movq $0, 752(%rax) movq 184(%rsp), %rax movq $0, 760(%rax) movq 176(%rsp), %rax movsd 5608(%rax), %xmm0 # xmm0 = mem[0],zero movq 184(%rsp), %rax mulsd 648(%rax), %xmm0 movsd 160(%rsp), %xmm1 # 8-byte Reload # xmm1 = mem[0],zero addsd %xmm0, %xmm1 movsd %xmm1, 672(%rax) movq 176(%rsp), %rax movsd 5648(%rax), %xmm0 # xmm0 = mem[0],zero movq 184(%rsp), %rax mulsd 648(%rax), %xmm0 movsd %xmm0, 704(...

[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

2016 Oct 12

4

[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On Wed, Oct 12, 2016 at 10:53 AM, Hal Finkel <hfinkel at anl.gov> wrote: > I don't think that Clang/LLVM uses it by default on x86_64. If you're using -Ofast, however, that would explain it. I recommend looking at -O3 vs -O0 and make sure those are the same. -Ofast enables -ffast-math, which can legitimately cause differences. > The following tests pass at "-O3" and

[LLVMdev] 2.6 JIT using wrong address for external functions

2009 Dec 07

4

[LLVMdev] 2.6 JIT using wrong address for external functions

...6: mov %r14d,%esi >>> 0xfffffd7ff9302549: callq 0xfffffd800066f690 0xfffffd7ff930254e: cvtsi2sd %rax,%xmm0 0xfffffd7ff9302553: mov $0xfffffd7ff93024d0,%rax 0xfffffd7ff930255d: movsd (%rax),%xmm1 0xfffffd7ff9302561: movsd %xmm1,(%rsp) 0xfffffd7ff9302566: mulsd %xmm1,%xmm0 (gdb) x/i 0xfffffd800066f690 0xfffffd800066f690: Cannot access memory at address 0xfffffd800066f690 (gdb) disassemble 0x66f690 Dump of assembler code for function _ZN12ContextFrame13GetInt64ValueEPKS_ix: 0x000000000066f690 <_ZN12ContextFrame13GetInt64ValueEPKS_ix+0>: push %...

[LLVMdev] llc -O# / opt -O# differences

2012 Jun 30

2

[LLVMdev] llc -O# / opt -O# differences

...x - a[0].y; return dx * dx; } Running through opt $ llvm-as < x.ll | opt -O3 | llc > y.s Produces the following: _foo: ## @foo .cfi_startproc ## BB#0: ## %entry movsd (%rdi), %xmm0 subsd (%rsi), %xmm0 mulsd %xmm0, %xmm0 ret .cfi_endproc This also matches what clang compiles from the C function. However, running through llc with the same optimization flag $ llc -O3 x.ll -o z.s _foo: ## @foo .cfi_startproc ## BB#0: ## %en...

[LLVMdev] How can I compile a c source file to use SSE2 Data Movement Instructions?

2012 Jan 04

1

[LLVMdev] How can I compile a c source file to use SSE2 Data Movement Instructions?

...32; .endef .text .globl _f .align 16, 0x90 _f: # @f # BB#0: movl $-800, %eax # imm = 0xFFFFFFFFFFFFFCE0 movsd _DA, %xmm0 .align 16, 0x90 LBB0_1: # =>This Inner Loop Header: Depth=1 movsd _X+800(%eax), %xmm1 mulsd %xmm0, %xmm1 movsd _Y+800(%eax), %xmm2 subsd %xmm1, %xmm2 movsd %xmm2, _Y+800(%eax) addl $8, %eax jne LBB0_1 # BB#2: xorl %eax, %eax ret .data .globl _DA # @DA .align 8 _DA: .quad 4599075939470750515 # double 3.000000e-01 .comm _Y,800,3 # @Y .comm...

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 15

3

[LLVMdev] Enabling the SLP vectorizer by default for -O3

...king at some performance counters on Friday, but I did not find anything suspicious yet. +0x00 movupd 16(%rsi), %xmm0 +0x05 movupd 16(%rsp), %xmm1 +0x0b subpd %xmm1, %xmm0 <———— 18% of the runtime of bh ? +0x0f movapd %xmm0, %xmm2 +0x13 mulsd %xmm2, %xmm2 +0x17 xorpd %xmm1, %xmm1 +0x1b addsd %xmm2, %xmm1 I spent less time on Bullet. Bullet also has one hot function (“resolveSingleConstraintRowLowerLimit”). On this code the vectorizer generates several trees that use the <3 x float>...

[LLVMdev] XMM in X86 Backend

2010 Jun 07

1

[LLVMdev] XMM in X86 Backend

Hi all, I am observing an excessive use of xmm registers in the output assembly produced by x86 backend. Basically, for a code like this double test(double a, double b) { double c; c = 1.0 + sin (a + b*b); return c; } llc produced somthing like.... movsd 16(%ebp), %xmm0 mulsd %xmm0, %xmm0 addsd 8(%ebp), %xmm0 movsd %xmm0, (%esp) ....... fstpl -8(%ebp movsd -8(%ebp), %xmm0 addsd .LC1, %xmm0 movsd %xmm0, -8(%ebp) fldl -8(%ebp) LLVM Backend is using xmms it involves a lot of register moves. llc has...

[LLVMdev] RuntimeDyld bug in resolving addresses with offset?

2013 Jan 05

0

[LLVMdev] RuntimeDyld bug in resolving addresses with offset?

...std::string fileName = "rtdyldbug.o"; myFun fptr = (myFun)getFunctionPointer(funName, fileName); double w[5] = {0, 0, 0, 0, 0}; fptr(4, w); printf("%f \n", w[2]); return 0; } The printed result should be 148, but its 132. The instruction which reads numbers[4] is mulsd _numbers+0x00000020(%rip),%xmm0 When I did debugging at the assembly level, I found that the offset 0x20 is ignored. The resolved address points to numbers[0] instead of numbers[4]. I compiled the attached rtdyldbug.c as "clang -c -o rtdyldbug.o rtdyldbug.c". I compiled myrtdyld.cpp as...

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 23

0

[LLVMdev] Enabling the SLP vectorizer by default for -O3

...ers on Friday, but I did not find anything suspicious yet. > > +0x00 movupd 16(%rsi), %xmm0 > +0x05 movupd 16(%rsp), %xmm1 > +0x0b subpd %xmm1, %xmm0 <———— 18% of the runtime of bh ? > +0x0f movapd %xmm0, %xmm2 > +0x13 mulsd %xmm2, %xmm2 > +0x17 xorpd %xmm1, %xmm1 > +0x1b addsd %xmm2, %xmm1 > > I spent less time on Bullet. Bullet also has one hot function (“resolveSingleConstraintRowLowerLimit”). On this code the vectorizer generates several trees that use th...

[LLVMdev] 2.6 JIT using wrong address for external functions

2009 Dec 07

0

[LLVMdev] 2.6 JIT using wrong address for external functions

...>>> 0xfffffd7ff9302549: callq 0xfffffd800066f690 > 0xfffffd7ff930254e: cvtsi2sd %rax,%xmm0 > 0xfffffd7ff9302553: mov $0xfffffd7ff93024d0,%rax > 0xfffffd7ff930255d: movsd (%rax),%xmm1 > 0xfffffd7ff9302561: movsd %xmm1,(%rsp) > 0xfffffd7ff9302566: mulsd %xmm1,%xmm0 > > (gdb) x/i 0xfffffd800066f690 > 0xfffffd800066f690: Cannot access memory at address > 0xfffffd800066f690 > > (gdb) disassemble 0x66f690 > Dump of assembler code for function > _ZN12ContextFrame13GetInt64ValueEPKS_ix: > 0x000000000066f690 <_ZN12Conte...

Finding caller-saved registers at a function call site

2016 Jun 27

3

Finding caller-saved registers at a function call site

...th clang/LLVM 3.8 (-O3) on Ubuntu 14.04 looks like this: ... 400694: ff c7 inc %edi # Add 1 to depth 400696: f2 0f 10 05 a2 92 05 movsd 0x592a2(%rip),%xmm0 # Move constant 1.2 into xmm0 40069d: 00 40069e: f2 0f 59 c1 mulsd %xmm1,%xmm0 # val * 1.2 4006a2: f2 0f 11 4d f8 movsd %xmm1,-0x8(%rbp) # Spill val to the stack 4006a7: e8 d4 ff ff ff callq 400680 <recurse> 4006ac: f2 0f 58 45 f8 addsd -0x8(%rbp),%xmm0 # recurse's return value + val...

[LLVMdev] pb05 results for current llvm/dragonegg

2012 Apr 03

1

[LLVMdev] pb05 results for current llvm/dragonegg

Attached are the Polyhedron 2005 benchmark results for current llvm/dragonegg svn on x86_64-apple-darwin11 built against Xcode 4.3.2 and FSF gcc 4.6.3. The benchmarks for -msse3 and -msse4 appear identical (at least for degg+optnz). This is fortunate since there seems to be a bug in -msse4 on 2.33 GHz (T7600) Intel Core 2 Duo Merom (http://llvm.org/bugs/show_bug.cgi?id=12434). I've added two

[RFC][llvm-mca] Adding binary support to llvm-mca.

2018 Nov 15

2

[RFC][llvm-mca] Adding binary support to llvm-mca.

...rmed into assembly labels. While the markers are presented as function calls, in reality they are no-ops. test: pushq %rbp movq %rsp, %rbp movsd %xmm0, -8(%rbp) movsd %xmm1, -16(%rbp) .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42 xorps %xmm0, %xmm0 movsd %xmm0, -24(%rbp) movsd -8(%rbp), %xmm0 mulsd -16(%rbp), %xmm0 addsd -24(%rbp), %xmm0 movsd %xmm0, -24(%rbp) .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42 movsd -24(%rbp), %xmm0 popq %rbp retq .section .mca_code_regions,"", at progbits .quad 42 .quad .Lmca_code_region_start_0 .quad .Lmca_code_region_end_0-.Lmca_code_region_start_0...

Finding caller-saved registers at a function call site

2016 Jun 22

0

Finding caller-saved registers at a function call site

Hi Rob, Rob Lyerly via llvm-dev wrote: > I'm looking for a way to get all the caller-saved registers (both the > register and the stack slot at which it was saved) for a given function > call site in the backend. What's the best way to grab this > information? Is it possible to get this information if I have the > MachineInstr of the function call? I'm currently

Finding caller-saved registers at a function call site

2016 Jun 22

3

Finding caller-saved registers at a function call site

Hi everyone, I'm looking for a way to get all the caller-saved registers (both the register and the stack slot at which it was saved) for a given function call site in the backend. What's the best way to grab this information? Is it possible to get this information if I have the MachineInstr of the function call? I'm currently targeting the AArch64 & X86 backends. Thanks! --

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 15

0

[LLVMdev] Enabling the SLP vectorizer by default for -O3

On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > Hi, > > LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP

Finding caller-saved registers at a function call site

2016 Jun 27

0

Finding caller-saved registers at a function call site

...looks like this: > > ... > 400694: ff c7 inc %edi # Add > 1 to depth > 400696: f2 0f 10 05 a2 92 05 movsd 0x592a2(%rip),%xmm0 # Move > constant 1.2 into xmm0 > 40069d: 00 > 40069e: f2 0f 59 c1 mulsd %xmm1,%xmm0 # val > * 1.2 > 4006a2: f2 0f 11 4d f8 movsd %xmm1,-0x8(%rbp) # > Spill val to the stack > 4006a7: e8 d4 ff ff ff callq 400680 <recurse> > 4006ac: f2 0f 58 45 f8 addsd -0x8(%rbp),%xmm0 # > re...

[RFC][llvm-mca] Adding binary support to llvm-mca.

2018 Nov 21

2

[RFC][llvm-mca] Adding binary support to llvm-mca.

...; test: > > pushq %rbp > > movq %rsp, %rbp > > movsd %xmm0, -8(%rbp) > > movsd %xmm1, -16(%rbp) > > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42 > > xorps %xmm0, %xmm0 > > movsd %xmm0, -24(%rbp) > > movsd -8(%rbp), %xmm0 > > mulsd -16(%rbp), %xmm0 > > addsd -24(%rbp), %xmm0 > > movsd %xmm0, -24(%rbp) > > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42 > > movsd -24(%rbp), %xmm0 > > popq %rbp > > retq > > .section .mca_code_regions,"", at progbits > >...

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 14

6

[LLVMdev] Enabling the SLP vectorizer by default for -O3

Hi, LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP vectorizer on a Sandybridge mac (using SSE4, w/o AVX). Based on my performance measurements

[RFC][llvm-mca] Adding binary support to llvm-mca.

2018 Nov 27

2

[RFC][llvm-mca] Adding binary support to llvm-mca.

...> > movsd %xmm0, -8(%rbp) > > > > movsd %xmm1, -16(%rbp) > > > > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42 > > > > xorps %xmm0, %xmm0 > > > > movsd %xmm0, -24(%rbp) > > > > movsd -8(%rbp), %xmm0 > > > > mulsd -16(%rbp), %xmm0 > > > > addsd -24(%rbp), %xmm0 > > > > movsd %xmm0, -24(%rbp) > > > > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42 > > > > movsd -24(%rbp), %xmm0 > > > > popq %rbp > > > > retq > > > >...

search for: mulsd