thr3ads.net - search: "subpd"

Displaying 7 results from an estimated 7 matches for "subpd".

Did you mean: subid

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

2013 Jul 19

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

...44h 002E0153 movdqa xmm5,xmm0 002E0157 xorpd xmm4,xmm4 002E015B mulpd xmm5,xmm4 002E015F pshufd xmm2,xmm3,44h 002E0164 movdqa xmm1,xmm2 002E0168 mulpd xmm1,xmm4 002E016C xorpd xmm7,xmm7 002E0170 movapd xmm4,xmmword ptr [esp+70h] 002E0176 subpd xmm4,xmm1 002E017A pshufd xmm3,xmm3,0EEh 002E017F subpd xmm4,xmm3 002E0183 subpd xmm4,xmm5 002E0187 fld qword ptr [esp+0F0h] 002E018E call 76719BA1 CALL 002E0193 imul ebx,eax,0Ch 002E0196 lea esi,[ebx+3] 002E0199 shl...

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 19

[LLVMdev] SIMD instructions and memory alignment on X86

Hmm, I'm not able to get those .ll files to compile if I disable SSE and I end up with SSE instructions(including sqrtpd) if I don't disable it. On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote: > Is there something specifically required to enable SSE? If it's not > detected as available (based from the target triple?) then I don't think

[LLVMdev] How can I compile a c source file to use SSE2 Data Movement Instructions?

2012 Jan 04

[LLVMdev] How can I compile a c source file to use SSE2 Data Movement Instructions?

...-S -O3 -o test2.s test.c -march=native result: .file "test.c" .text .p2align 4,,15 .globl _f .def _f; .scl 2; .type 32; .endef _f: pushl %ebp movddup _DA, %xmm2 movl %esp, %ebp xorl %eax, %eax .p2align 4,,10 L2: movapd _Y(%eax), %xmm0 movapd _X(%eax), %xmm1 mulpd %xmm2, %xmm1 subpd %xmm1, %xmm0 movapd %xmm0, _Y(%eax) addl $16, %eax cmpl $800, %eax jne L2 xorw %ax, %ax leave ret .globl _DA .data .align 16 _DA: .long 858993459 .long 1070805811 .comm _X, 800, 5 .comm _Y, 800, 5 It seems gcc emit more effectivenss instuction. Are there any clang command arguments to...

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 15

[LLVMdev] Enabling the SLP vectorizer by default for -O3

...ith the VEX prefix and without. I suspected that the problem is the movupd's that load xmm0 and xmm1. I started looking at some performance counters on Friday, but I did not find anything suspicious yet. +0x00 movupd 16(%rsi), %xmm0 +0x05 movupd 16(%rsp), %xmm1 +0x0b subpd %xmm1, %xmm0 <———— 18% of the runtime of bh ? +0x0f movapd %xmm0, %xmm2 +0x13 mulsd %xmm2, %xmm2 +0x17 xorpd %xmm1, %xmm1 +0x1b addsd %xmm2, %xmm1 I spent less time on Bullet. Bullet also has one hot function (“resol...

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 23

[LLVMdev] Enabling the SLP vectorizer by default for -O3

...nd without. I suspected that the problem is the movupd's that load xmm0 and xmm1. I started looking at some performance counters on Friday, but I did not find anything suspicious yet. > > +0x00 movupd 16(%rsi), %xmm0 > +0x05 movupd 16(%rsp), %xmm1 > +0x0b subpd %xmm1, %xmm0 <———— 18% of the runtime of bh ? > +0x0f movapd %xmm0, %xmm2 > +0x13 mulsd %xmm2, %xmm2 > +0x17 xorpd %xmm1, %xmm1 > +0x1b addsd %xmm2, %xmm1 > > I spent less time on Bullet. Bullet als...

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 15

[LLVMdev] Enabling the SLP vectorizer by default for -O3

On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > Hi, > > LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 14

[LLVMdev] Enabling the SLP vectorizer by default for -O3

Hi, LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP vectorizer on a Sandybridge mac (using SSE4, w/o AVX). Based on my performance measurements

search for: subpd