Displaying 9 results from an estimated 9 matches for "movhlps".
2009 May 01
0
[LLVMdev] RFC: AVX Pattern Specification [LONG]
...if (N can be matched by movddup) {
unsigned movddupcost = ... // can be either constant, or
callback into subtarget info
if (LowestCost > movddupcost)
LowestCost = movddupcost;
operands = [whatever]
opcode = X86::MOVDDUP;
}
}
if (N can be matched by movhlps) {
unsigned movhlpscost = ...
if (LowestCost > movhlpscost)
LowestCost = movhlpscost;
operands = [whatever]
opcode = X86::MOVHLPS;
}
}
...
}
The advantage of doing this is that it moves the current heuristics
for match ordering (which is a poor way to...
2008 Sep 30
0
[LLVMdev] Generalizing shuffle vector
...r X86, legalize will convert each insertelement to become a vector
> shuffle. We are very careful in combining vector shuffles because we don't
> want to produce a vector shuffle whose mask is illegal or hard to code gen
> so we end up in this code to generate a sequence of unpcks and movhlps for
> this. With the new form, Legalize will divide the 8xf32 vector into two
> 4xf32 and since the two sides are the same, it will generate quad word moves
> to copy the values.
I think this specific issue can be fixed without extending the
IL-level syntax; DAGCombiner could easily be m...
2008 Sep 30
4
[LLVMdev] Generalizing shuffle vector
......
For X86, legalize will convert each insertelement to become a vector
shuffle. We are very careful in combining vector shuffles because we
don't want to produce a vector shuffle whose mask is illegal or hard
to code gen so we end up in this code to generate a sequence of unpcks
and movhlps for this. With the new form, Legalize will divide the
8xf32 vector into two 4xf32 and since the two sides are the same, it
will generate quad word moves to copy the values.
There are other cases when a user write vector code, the generation of
extract element and insert elements will cause...
2015 Jan 04
2
[LLVMdev] Heads up! Planning to remove old vector shuffle lowering this week...
On Sun, Jan 4, 2015 at 3:20 PM, Simon Pilgrim <llvm-dev at redking.me.uk>
wrote:
> On 24 Nov 2014, at 17:53, Chandler Carruth <chandlerc at gmail.com> wrote:
>
> > I'll be skimming the PRs to see if there are any really critical
> regressions, but so far it looks pretty good.
> >
> > If you are actively disabling the new vector shuffling and have some PR
2014 Jul 23
4
[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
...addq $32, %rdi
addq $-8, %rdx
jne .LBB0_3
# BB#4:
movq %r8, %rdi
movq %rax, %rdx
jmp .LBB0_5
.LBB0_1:
pxor %xmm1, %xmm1
.LBB0_5: # %middle.block
paddd %xmm1, %xmm0
movdqa %xmm0, %xmm1
movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1]
paddd %xmm0, %xmm1
pshufd $1, %xmm1, %xmm0 # xmm0 = xmm1[1,0,0,0]
paddd %xmm1, %xmm0
movd %xmm0, %eax
cmpq %rdx, %rsi
je .LBB0_7
.align 16, 0x90
.LBB0_6:...
2010 Aug 02
0
[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!
...ulhw %xmm5, %xmm0
por %xmm0, %xmm4
psignw %xmm1, %xmm0
movdqa %xmm0, ($5, %eax)
pcmpeqw %xmm7, %xmm0
movdqa ($4, %eax), %xmm1
movdqa %xmm7, ($1, %eax)
pandn %xmm1, %xmm0
pmaxsw %xmm0, %xmm3
add $$16, %eax
js 1b
movhlps %xmm3, %xmm0
pmaxsw %xmm0, %xmm3
pshuflw $$0x0E, %xmm3, %xmm0
pmaxsw %xmm0, %xmm3
pshuflw $$0x01, %xmm3, %xmm0
pmaxsw %xmm0, %xmm3
movd %xmm3, %eax
movzb %al, %eax
>, 0, 10, %EAX<imp-def>, 9, %reg1303<kill>, 9, %reg1308,...
2009 May 01
4
[LLVMdev] RFC: AVX Pattern Specification [LONG]
On Friday 01 May 2009 13:46, Chris Lattner wrote:
> Right, a lot of these problems can be solved by some nice refactoring
> stuff. I'm also hoping that some of the complexity in defining
> shuffle matching code can be helped by making the definition of the
> shuffle patterns more declarative within the td file. It would be
> really nice to say that "this shuffle does a
2015 Jan 05
3
[LLVMdev] Heads up! Planning to remove old vector shuffle lowering this week...
...ily track down regressions.
Thanks,
Q.
>
> The amount of domain crossing is much lower now - but there are a number of float shuffles that now use double shuffles instead - fine from a domain point of view but rather unexpected. IIRC this often appeared in matrix transpose code - movlhps / movhlps being replaced by unpcklpd / unpckhpd is the one I seem to remember.
>
> Overall - a massive improvement - thank you!
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.c...
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
...m1, [edi+64]
+ mulps xmm1, xmm0
+ addps xmm4, xmm6
+ addps xmm3, xmm1
+
+ add eax, 80
+ add edi, 80
+ sub ecx, 20
+
+ jae mul20_loop
+
+ addps xmm3, xmm4
+ movhlps xmm4, xmm3
+ addps xmm3, xmm4
+ movaps xmm4, xmm3
+ shufps xmm4, xmm4, 0x55
+ addss xmm3, xmm4
+ movss [edx], xmm3
+
+ pop edi
+ pop edx
+ pop ecx
+ pop eb...