search for: punpckldq

Displaying 7 results from an estimated 7 matches for "punpckldq".

Did you mean: punpcklqdq
2011 Oct 17
0
[LLVMdev] LLVM Build Bot failure on llmv-x86_64-ubuntu
...xff .byte 254 # 0xfe .byte 255 # 0xff .text .globl __unnamed_1 .align 16, 0x90 .type __unnamed_1, at function __unnamed_1: # @2 .Ltmp0: .cfi_startproc # BB#0: movd __unnamed_2+6(%rip), %xmm1 movd __unnamed_2+2(%rip), %xmm0 punpckldq %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] movzwl __unnamed_2+8(%rip), %eax movd %eax, %xmm2 movzwl __unnamed_2+4(%rip), %eax movd %eax, %xmm1 punpckldq %xmm2, %xmm1 # xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1] punpckldq %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] m...
2004 Sep 10
2
An assembly optimization and fix
...mov ebx, [esp + 36] ; ebx = data[] + movd mm3, [ebx - 4] ; mm3 = 0:last_error_0 + movd mm2, [ebx - 8] ; mm2 = 0:data[-2] + movd mm1, [ebx - 12] ; mm1 = 0:data[-3] + movd mm0, [ebx - 16] ; mm0 = 0:data[-4] + movq mm5, mm3 ; mm5 = 0:last_error_0 + psubd mm5, mm2 ; mm5 = 0:last_error_1 + punpckldq mm3, mm5 ; mm3 = last_error_1:last_error_0 + psubd mm2, mm1 ; mm2 = 0:data[-2] - data[-3] + psubd mm5, mm2 ; mm5 = 0:last_error_2 + movq mm4, mm5 ; mm4 = 0:last_error_2 + psubd mm4, mm2 ; mm4 = 0:last_error_2 - (data[-2] - data[-3]) + paddd mm4, mm1 ; mm4 = 0:last_error_2 - (data[-2] - 2...
2016 Aug 12
4
Invoke loop vectorizer
...pslldq $8, %xmm1 ## xmm1 = > zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7] > pshufd $68, %xmm3, %xmm3 ## xmm3 = xmm3[0,1,0,1] > paddq %xmm1, %xmm3 > pshufd $78, %xmm3, %xmm4 ## xmm4 = xmm3[2,3,0,1] > punpckldq %xmm5, %xmm4 ## xmm4 = > xmm4[0],xmm5[0],xmm4[1],xmm5[1] > pshufd $212, %xmm4, %xmm4 ## xmm4 = xmm4[0,1,1,3] > > > > Note: > It also vectorizes at SIZE=8. > > Not sure what the exact translation of options from clang-cl to clang is. > Maybe try a...
2016 Aug 12
2
Invoke loop vectorizer
Hi Daniel, I increased the size of your test to be 128 but -stats still shows no loop optimized... Xiaochu On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote: > It's not possible to know that A and B don't alias in this example. It's > almost certainly not profitable to add a runtime check given the size of > the loop. > > >
2005 Jul 20
1
MMX IDCT for theora-exp
...movq "r0","I(0)"\n" \ + " punpckhwd "r5","r1"\n" \ + " movq "r6","r0"\n" \ + " punpcklwd "r7","r6"\n" \ + " movq "r4","r5"\n" \ + " punpckldq "r6","r4"\n" \ + " punpckhdq "r6","r5"\n" \ + " movq "r1","r6"\n" \ + " movq "r4","J(4)"\n" \ + " punpckhwd "r7","r0"\n" \ + " movq &q...
2004 Aug 24
5
MMX/mmxext optimisations
quite some speed improvement indeed. attached the updated patch to apply to svn/trunk. j -------------- next part -------------- A non-text attachment was scrubbed... Name: theora-mmx.patch.gz Type: application/x-gzip Size: 8648 bytes Desc: not available Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin
2006 May 25
2
Compilation issues with s390
Hi all, I'm trying to compile asterisk on the mainframe (s390 / s390x) and I am running into issues. I was wondering if somebody could give a hand? I'm thinking that I should be able to do this. I have noticed that Debian even has binary RPM's out for Asterisk now. I'm trying to do this on SuSE SLES8 (with the 2.4 kernel). What I see is, an issue that arch=s390 isn't