thr3ads.net - search: "xmm0"

Displaying 20 results from an estimated 471 matches for "xmm0".

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 05

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> wrote: > Unfortunately, another team, while doing internal testing has seen the > new path generating illegal insertps masks. A sample here: > > vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] > vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] > vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] > vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 = > xmm4[0,1],xmm1[2],xmm4[3] >...

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 29

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

...s, in case any of it rings a bell: > Very cool, and thanks for the analysis! > > > Another problem I'm seeing is that in some cases we can't fold memory > anymore: > vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2] > vblendps $0x1, %xmm2, %xmm0, %xmm0 > becomes: > vmovaps -0xXX(%rdx), %xmm2 > vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0] > vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 = xmm3[0,2],xmm0[1,2] > > > Also, I see differences when some loads are shuffled, that I&...

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

2013 Jul 19

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

...The generated instruction varies, but >>>>>> it seems to often be similar to (I >>>>>> don't have it in front of me, sorry): >>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>> Where the xmm register changes, and >>>>>> the second parameter is a memory access. >>>>>> ECX is always set to 0x7ffffff - >>&g...

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

...sed. Output from clang 3.4 for target corei7-avx: $ clang++ test.cpp -O3 -fstrict-aliasing -funroll-loops -ffast-math -march=native -mtune=native -DSPILLING_ENSUES=0 /* no spilling */ $ objdump -dC --no-show-raw-insn ./a.out ... 00000000004004f0 <main>: 4004f0: vmovdqa 0x2004c8(%rip),%xmm0 # 6009c0 <x> 4004f8: vpsrld $0x17,%xmm0,%xmm0 4004fd: vpaddd 0x17b(%rip),%xmm0,%xmm0 # 400680 <__dso_handle+0x8> 400505: vcvtdq2ps %xmm0,%xmm1 400509: vdivps 0x17f(%rip),%xmm1,%xmm1 # 400690 <__dso_handle+0x18> 400511: vcvttps2dq %xmm1,%xmm...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 05

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...14 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> >> wrote: >>> >>> Unfortunately, another team, while doing internal testing has seen the >>> new path generating illegal insertps masks. A sample here: >>> >>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 = >>> xmm4[...

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 30

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

...or the analysis! >> >> >>> >>> >>> Another problem I'm seeing is that in some cases we can't fold memory >>> anymore: >>> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2] >>> vblendps $0x1, %xmm2, %xmm0, %xmm0 >>> becomes: >>> vmovaps -0xXX(%rdx), %xmm2 >>> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0] >>> vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 = >>> xmm3[0,2],xmm0[1,2] >>> >>> >&...

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 29

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

...gt; > Very cool, and thanks for the analysis! > > >> >> >> Another problem I'm seeing is that in some cases we can't fold memory >> anymore: >> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2] >> vblendps $0x1, %xmm2, %xmm0, %xmm0 >> becomes: >> vmovaps -0xXX(%rdx), %xmm2 >> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0] >> vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 = >> xmm3[0,2],xmm0[1,2] >> >> >> Also, I see differences...

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 30

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

...t;>> >>>> >>>> >>>> Another problem I'm seeing is that in some cases we can't fold memory >>>> anymore: >>>> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2] >>>> vblendps $0x1, %xmm2, %xmm0, %xmm0 >>>> becomes: >>>> vmovaps -0xXX(%rdx), %xmm2 >>>> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = >>>> xmm2[3,0],xmm0[0,0] >>>> vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 = >>>> xmm3[0,2],xmm0...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 09

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...tructions. Example: ;;; define <4 x float> @foo(<4 x float> %A, <4 x float> %B) { %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 0, i32 5, i32 2, i32 7> ret <4 x float> %1 } ;;; llc (-mcpu=corei7-avx): vblendps $10, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[5],xmm0[2],xmm1[7] llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx): vshufps $-40, %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0,2],xmm0[1,3] vshufps $-40, %xmm0, %xmm0, %xmm0 # xmm0[0,2,1,3] 2) On SSE4.1, we should try not to emit an insertps if the shuf...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 06

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...ri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> >> wrote: >> >> >> Unfortunately, another team, while doing internal testing has seen the >> new path generating illegal insertps masks. A sample here: >> >> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 = >> xmm4[0,1],xmm1[2],xmm...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 08

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...il.com>> >>>> wrote: >>>>> >>>>> Unfortunately, another team, while doing internal testing has seen the >>>>> new path generating illegal insertps masks. A sample here: >>>>> >>>>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >>>>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >>>>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >>>>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xm...

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 19

[LLVMdev] SIMD instructions and memory alignment on X86

...>>>>>>> fault raising mechanism appears. >>>>>>> >>>>>>> The generated instruction varies, but it seems to often be similar >>>>>>> to (I don't have it in front of me, sorry): >>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>> Where the xmm register changes, and the second parameter is a memory >>>>>>> access. >>>>>>> ECX is always set to 0x7ffffff - however I don't know if this is >>>>>>> part of the SSE e...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 19

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...t regression in our internal codebase. In one particular case I observed a slowdown (around 1%); here is what I found when investigating on this slowdown. 1. With the new shuffle lowering, there is one case where we end up producing the following sequence: vmovss .LCPxx(%rip), %xmm1 vxorps %xmm0, %xmm0, %xmm0 vblendps $1, %xmm1, %xmm0, %xmm0 Before, we used to generate a simpler: vmovss .LCPxx(%rip), %xmm1 In this particular case, the 'vblendps' is redundant since the vmovss would zero the upper bits in %xmm1. I am not sure why we get this poor-codegen with your new shuffle...

[Codegen bug in LLVM 3.8?] br following `fcmp une` is present in ll, absent in asm

2017 Mar 01

[Codegen bug in LLVM 3.8?] br following `fcmp une` is present in ll, absent in asm

...# %merge128 movq 184(%rsp), %rcx movq %rax, 728(%rcx) movq 184(%rsp), %rax movq 728(%rax), %rcx movq %rcx, 736(%rax) movq 184(%rsp), %rax movq $0, 744(%rax) movq 184(%rsp), %rax movq $0, 752(%rax) movq 184(%rsp), %rax movq $0, 760(%rax) movq 176(%rsp), %rax movsd 5608(%rax), %xmm0 # xmm0 = mem[0],zero movq 184(%rsp), %rax mulsd 648(%rax), %xmm0 movsd 160(%rsp), %xmm1 # 8-byte Reload # xmm1 = mem[0],zero addsd %xmm0, %xmm1 movsd %xmm1, 672(%rax) movq 176(%rsp), %rax movsd 5648(%rax), %xmm0 # xmm0 = mem[0],zero movq 18...

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 23

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

Greetings LLVM hackers and x86 vector shufflers! I would like to flip on another chunk of the new vector shuffling, specifically the logic to mark ~all shuffles as "legal". This can be tested today with the flag "-x86-experimental-vector-shuffle-legality". I would essentially like to make this the default (by removing the "false" path). Doing this will allow me to

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 09

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...gt; %A, <4 x float> %B) { >> %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 0, >> i32 5, i32 2, i32 7> >> ret <4 x float> %1 >> } >> ;;; >> >> llc (-mcpu=corei7-avx): >> vblendps $10, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[5],xmm0[2],xmm1[7] >> >> llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx): >> vshufps $-40, %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0,2],xmm0[1,3] >> vshufps $-40, %xmm0, %xmm0, %xmm0 # xmm0[0,2,1,3] >> >> >> 2)...

Vectorization of math function failed?

2020 Aug 31

Vectorization of math function failed?

...: clang++ -O3 -march=native -mtune=native -c -o vec.o vec.cc -lmvec -fno-math-errno And here is what I get: vec.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <_Z4fct1Dv4_f>: 0: 48 83 ec 48 sub $0x48,%rsp 4: c5 f8 29 04 24 vmovaps %xmm0,(%rsp) 9: e8 00 00 00 00 callq e <_Z4fct1Dv4_f+0xe> e: c5 f8 29 44 24 30 vmovaps %xmm0,0x30(%rsp) 14: c5 fa 16 04 24 vmovshdup (%rsp),%xmm0 19: e8 00 00 00 00 callq 1e <_Z4fct1Dv4_f+0x1e> 1e: c5 f8 29 44 24 20 vmovaps %xmm0,0x20(%rsp) 24: c4 e3...

RFC: A proposal for vectorizing loops with calls to math functions using SVML

2016 Apr 01

RFC: A proposal for vectorizing loops with calls to math functions using SVML

...index, 4, !dbg !6 %5 = icmp eq i64 %index.next, 1000, !dbg !6 br i1 %5, label %middle.block, label %vector.body, !dbg !6, !llvm.loop !15 .LBB0_1: # %vector.body # =>This Inner Loop Header: Depth=1 movd %ebx, %xmm0 pshufd $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0] paddd .LCPI0_0(%rip), %xmm0 cvtdq2ps %xmm0, %xmm0 movaps %xmm0, 16(%rsp) # 16-byte Spill shufps $231, %xmm0, %xmm0 # xmm0 = xmm0[3,1,2,3] callq sinf movaps %xmm0...

2015 Oct 02

This conflict is with many optimizations incl. copy prop, coalescing, hoisting etc. Each could increase register pressure and with similar impact. Attempts to control the register pressure locally (within an optimization pass) tend to get hard to tune and maintain. Would it be a better way to describe eg in metadata how to undo an optimization? Optimizations that attempt to reduce pressure like

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

2013 Aug 22

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

...c_compute_autocorrelation_asm_ia32_sse_lag_16 cglobal FLAC__lpc_compute_autocorrelation_asm_ia32_3dnow cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32 cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32_mmx @@ -596,7 +597,7 @@ movss xmm3, xmm2 movss xmm2, xmm0 - ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm3:xmm3:xmm2 + ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm4:xmm3:xmm2 movaps xmm1, xmm0 mulps xmm1, xmm2 addps xmm5, xmm1 @@ -619,6 +620,95 @@ ret ALIGN 16 +cident FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16 + ;[ebp + 20] == autoc[] +...

search for: xmm0