thr3ads.net - search: "movhlps"

[LLVMdev] RFC: AVX Pattern Specification [LONG]

2009 May 01

0

[LLVMdev] RFC: AVX Pattern Specification [LONG]

...if (N can be matched by movddup) { unsigned movddupcost = ... // can be either constant, or callback into subtarget info if (LowestCost > movddupcost) LowestCost = movddupcost; operands = [whatever] opcode = X86::MOVDDUP; } } if (N can be matched by movhlps) { unsigned movhlpscost = ... if (LowestCost > movhlpscost) LowestCost = movhlpscost; operands = [whatever] opcode = X86::MOVHLPS; } } ... } The advantage of doing this is that it moves the current heuristics for match ordering (which is a poor way to...

[LLVMdev] Generalizing shuffle vector

2008 Sep 30

0

[LLVMdev] Generalizing shuffle vector

...r X86, legalize will convert each insertelement to become a vector > shuffle. We are very careful in combining vector shuffles because we don't > want to produce a vector shuffle whose mask is illegal or hard to code gen > so we end up in this code to generate a sequence of unpcks and movhlps for > this. With the new form, Legalize will divide the 8xf32 vector into two > 4xf32 and since the two sides are the same, it will generate quad word moves > to copy the values. I think this specific issue can be fixed without extending the IL-level syntax; DAGCombiner could easily be m...

[LLVMdev] Generalizing shuffle vector

2008 Sep 30

4

[LLVMdev] Generalizing shuffle vector

...... For X86, legalize will convert each insertelement to become a vector shuffle. We are very careful in combining vector shuffles because we don't want to produce a vector shuffle whose mask is illegal or hard to code gen so we end up in this code to generate a sequence of unpcks and movhlps for this. With the new form, Legalize will divide the 8xf32 vector into two 4xf32 and since the two sides are the same, it will generate quad word moves to copy the values. There are other cases when a user write vector code, the generation of extract element and insert elements will cause...

[LLVMdev] Heads up! Planning to remove old vector shuffle lowering this week...

2015 Jan 04

2

[LLVMdev] Heads up! Planning to remove old vector shuffle lowering this week...

On Sun, Jan 4, 2015 at 3:20 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > On 24 Nov 2014, at 17:53, Chandler Carruth <chandlerc at gmail.com> wrote: > > > I'll be skimming the PRs to see if there are any really critical > regressions, but so far it looks pretty good. > > > > If you are actively disabling the new vector shuffling and have some PR

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

2014 Jul 23

4

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

...addq $32, %rdi addq $-8, %rdx jne .LBB0_3 # BB#4: movq %r8, %rdi movq %rax, %rdx jmp .LBB0_5 .LBB0_1: pxor %xmm1, %xmm1 .LBB0_5: # %middle.block paddd %xmm1, %xmm0 movdqa %xmm0, %xmm1 movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1] paddd %xmm0, %xmm1 pshufd $1, %xmm1, %xmm0 # xmm0 = xmm1[1,0,0,0] paddd %xmm1, %xmm0 movd %xmm0, %eax cmpq %rdx, %rsi je .LBB0_7 .align 16, 0x90 .LBB0_6:...

[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!

2010 Aug 02

0

[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!

...ulhw %xmm5, %xmm0 por %xmm0, %xmm4 psignw %xmm1, %xmm0 movdqa %xmm0, ($5, %eax) pcmpeqw %xmm7, %xmm0 movdqa ($4, %eax), %xmm1 movdqa %xmm7, ($1, %eax) pandn %xmm1, %xmm0 pmaxsw %xmm0, %xmm3 add $$16, %eax js 1b movhlps %xmm3, %xmm0 pmaxsw %xmm0, %xmm3 pshuflw $$0x0E, %xmm3, %xmm0 pmaxsw %xmm0, %xmm3 pshuflw $$0x01, %xmm3, %xmm0 pmaxsw %xmm0, %xmm3 movd %xmm3, %eax movzb %al, %eax >, 0, 10, %EAX<imp-def>, 9, %reg1303<kill>, 9, %reg1308,...

[LLVMdev] RFC: AVX Pattern Specification [LONG]

2009 May 01

4

[LLVMdev] RFC: AVX Pattern Specification [LONG]

On Friday 01 May 2009 13:46, Chris Lattner wrote: > Right, a lot of these problems can be solved by some nice refactoring > stuff. I'm also hoping that some of the complexity in defining > shuffle matching code can be helped by making the definition of the > shuffle patterns more declarative within the td file. It would be > really nice to say that "this shuffle does a

[LLVMdev] Heads up! Planning to remove old vector shuffle lowering this week...

2015 Jan 05

3

[LLVMdev] Heads up! Planning to remove old vector shuffle lowering this week...

...ily track down regressions. Thanks, Q. > > The amount of domain crossing is much lower now - but there are a number of float shuffles that now use double shuffles instead - fine from a domain point of view but rather unexpected. IIRC this often appeared in matrix transpose code - movlhps / movhlps being replaced by unpcklpd / unpckhpd is the one I seem to remember. > > Overall - a massive improvement - thank you! > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.c...

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

2

[PATCH] Make SSE Run Time option. Add Win32 SSE code

...m1, [edi+64] + mulps xmm1, xmm0 + addps xmm4, xmm6 + addps xmm3, xmm1 + + add eax, 80 + add edi, 80 + sub ecx, 20 + + jae mul20_loop + + addps xmm3, xmm4 + movhlps xmm4, xmm3 + addps xmm3, xmm4 + movaps xmm4, xmm3 + shufps xmm4, xmm4, 0x55 + addss xmm3, xmm4 + movss [edx], xmm3 + + pop edi + pop edx + pop ecx + pop eb...

search for: movhlps