Displaying 7 results from an estimated 7 matches for "punpckldq".
Did you mean:
punpcklqdq
2011 Oct 17
0
[LLVMdev] LLVM Build Bot failure on llmv-x86_64-ubuntu
...xff
.byte 254 # 0xfe
.byte 255 # 0xff
.text
.globl __unnamed_1
.align 16, 0x90
.type __unnamed_1, at function
__unnamed_1: # @2
.Ltmp0:
.cfi_startproc
# BB#0:
movd __unnamed_2+6(%rip), %xmm1
movd __unnamed_2+2(%rip), %xmm0
punpckldq %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
movzwl __unnamed_2+8(%rip), %eax
movd %eax, %xmm2
movzwl __unnamed_2+4(%rip), %eax
movd %eax, %xmm1
punpckldq %xmm2, %xmm1 # xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
punpckldq %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
m...
2004 Sep 10
2
An assembly optimization and fix
...mov ebx, [esp + 36] ; ebx = data[]
+ movd mm3, [ebx - 4] ; mm3 = 0:last_error_0
+ movd mm2, [ebx - 8] ; mm2 = 0:data[-2]
+ movd mm1, [ebx - 12] ; mm1 = 0:data[-3]
+ movd mm0, [ebx - 16] ; mm0 = 0:data[-4]
+ movq mm5, mm3 ; mm5 = 0:last_error_0
+ psubd mm5, mm2 ; mm5 = 0:last_error_1
+ punpckldq mm3, mm5 ; mm3 = last_error_1:last_error_0
+ psubd mm2, mm1 ; mm2 = 0:data[-2] - data[-3]
+ psubd mm5, mm2 ; mm5 = 0:last_error_2
+ movq mm4, mm5 ; mm4 = 0:last_error_2
+ psubd mm4, mm2 ; mm4 = 0:last_error_2 - (data[-2] - data[-3])
+ paddd mm4, mm1 ; mm4 = 0:last_error_2 - (data[-2] - 2...
2016 Aug 12
4
Invoke loop vectorizer
...pslldq $8, %xmm1 ## xmm1 =
> zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]
> pshufd $68, %xmm3, %xmm3 ## xmm3 = xmm3[0,1,0,1]
> paddq %xmm1, %xmm3
> pshufd $78, %xmm3, %xmm4 ## xmm4 = xmm3[2,3,0,1]
> punpckldq %xmm5, %xmm4 ## xmm4 =
> xmm4[0],xmm5[0],xmm4[1],xmm5[1]
> pshufd $212, %xmm4, %xmm4 ## xmm4 = xmm4[0,1,1,3]
>
>
>
> Note:
> It also vectorizes at SIZE=8.
>
> Not sure what the exact translation of options from clang-cl to clang is.
> Maybe try a...
2016 Aug 12
2
Invoke loop vectorizer
Hi Daniel,
I increased the size of your test to be 128 but -stats still shows no loop
optimized...
Xiaochu
On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote:
> It's not possible to know that A and B don't alias in this example. It's
> almost certainly not profitable to add a runtime check given the size of
> the loop.
>
>
>
2005 Jul 20
1
MMX IDCT for theora-exp
...movq "r0","I(0)"\n" \
+ " punpckhwd "r5","r1"\n" \
+ " movq "r6","r0"\n" \
+ " punpcklwd "r7","r6"\n" \
+ " movq "r4","r5"\n" \
+ " punpckldq "r6","r4"\n" \
+ " punpckhdq "r6","r5"\n" \
+ " movq "r1","r6"\n" \
+ " movq "r4","J(4)"\n" \
+ " punpckhwd "r7","r0"\n" \
+ " movq &q...
2004 Aug 24
5
MMX/mmxext optimisations
quite some speed improvement indeed.
attached the updated patch to apply to svn/trunk.
j
-------------- next part --------------
A non-text attachment was scrubbed...
Name: theora-mmx.patch.gz
Type: application/x-gzip
Size: 8648 bytes
Desc: not available
Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin
2006 May 25
2
Compilation issues with s390
Hi all,
I'm trying to compile asterisk on the mainframe (s390 / s390x) and I am
running into issues. I was wondering if somebody could give a hand?
I'm thinking that I should be able to do this. I have noticed that Debian
even has binary RPM's out for Asterisk now. I'm trying to do this on SuSE
SLES8 (with the 2.4 kernel).
What I see is, an issue that arch=s390 isn't