Displaying 7 results from an estimated 7 matches for "subpd".
Did you mean:
subid
2013 Jul 19
0
[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
...44h
002E0153 movdqa xmm5,xmm0
002E0157 xorpd xmm4,xmm4
002E015B mulpd xmm5,xmm4
002E015F pshufd xmm2,xmm3,44h
002E0164 movdqa xmm1,xmm2
002E0168 mulpd xmm1,xmm4
002E016C xorpd xmm7,xmm7
002E0170 movapd xmm4,xmmword ptr [esp+70h]
002E0176 subpd xmm4,xmm1
002E017A pshufd xmm3,xmm3,0EEh
002E017F subpd xmm4,xmm3
002E0183 subpd xmm4,xmm5
002E0187 fld qword ptr [esp+0F0h]
002E018E call 76719BA1 CALL
002E0193 imul ebx,eax,0Ch
002E0196 lea esi,[ebx+3]
002E0199 shl...
2013 Jul 19
4
[LLVMdev] SIMD instructions and memory alignment on X86
Hmm, I'm not able to get those .ll files to compile if I disable SSE and I
end up with SSE instructions(including sqrtpd) if I don't disable it.
On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote:
> Is there something specifically required to enable SSE? If it's not
> detected as available (based from the target triple?) then I don't think
2012 Jan 04
1
[LLVMdev] How can I compile a c source file to use SSE2 Data Movement Instructions?
...-S -O3 -o test2.s test.c -march=native
result:
.file "test.c"
.text
.p2align 4,,15
.globl _f
.def _f; .scl 2; .type 32; .endef
_f:
pushl %ebp
movddup _DA, %xmm2
movl %esp, %ebp
xorl %eax, %eax
.p2align 4,,10
L2:
movapd _Y(%eax), %xmm0
movapd _X(%eax), %xmm1
mulpd %xmm2, %xmm1
subpd %xmm1, %xmm0
movapd %xmm0, _Y(%eax)
addl $16, %eax
cmpl $800, %eax
jne L2
xorw %ax, %ax
leave
ret
.globl _DA
.data
.align 16
_DA:
.long 858993459
.long 1070805811
.comm _X, 800, 5
.comm _Y, 800, 5
It seems gcc emit more effectivenss instuction. Are there any clang command
arguments to...
2013 Jul 15
3
[LLVMdev] Enabling the SLP vectorizer by default for -O3
...ith the VEX prefix and without. I suspected that the problem is the movupd's that load xmm0 and xmm1. I started looking at some performance counters on Friday, but I did not find anything suspicious yet.
+0x00 movupd 16(%rsi), %xmm0
+0x05 movupd 16(%rsp), %xmm1
+0x0b subpd %xmm1, %xmm0 <———— 18% of the runtime of bh ?
+0x0f movapd %xmm0, %xmm2
+0x13 mulsd %xmm2, %xmm2
+0x17 xorpd %xmm1, %xmm1
+0x1b addsd %xmm2, %xmm1
I spent less time on Bullet. Bullet also has one hot function (“resol...
2013 Jul 23
0
[LLVMdev] Enabling the SLP vectorizer by default for -O3
...nd without. I suspected that the problem is the movupd's that load xmm0 and xmm1. I started looking at some performance counters on Friday, but I did not find anything suspicious yet.
>
> +0x00 movupd 16(%rsi), %xmm0
> +0x05 movupd 16(%rsp), %xmm1
> +0x0b subpd %xmm1, %xmm0 <———— 18% of the runtime of bh ?
> +0x0f movapd %xmm0, %xmm2
> +0x13 mulsd %xmm2, %xmm2
> +0x17 xorpd %xmm1, %xmm1
> +0x1b addsd %xmm2, %xmm1
>
> I spent less time on Bullet. Bullet als...
2013 Jul 15
0
[LLVMdev] Enabling the SLP vectorizer by default for -O3
On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote:
> Hi,
>
> LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP
2013 Jul 14
6
[LLVMdev] Enabling the SLP vectorizer by default for -O3
Hi,
LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP vectorizer on a Sandybridge mac (using SSE4, w/o AVX). Based on my performance measurements