thr3ads.net - search: "punpcklbw"

2011 Oct 26

2

[LLVMdev] Lowering to MMX

...MMX registers). Take for example the following LLVM IR: define internal void @unpack(i8*, i8*) { %3 = bitcast i8* %1 to i32* %4 = load i32* %3, align 1 %5 = insertelement <2 x i32> undef, i32 %4, i32 0 %6 = bitcast <2 x i32> %5 to x86_mmx %7 = call x86_mmx @llvm.x86.mmx.punpcklbw(x86_mmx %6, x86_mmx %6) %8 = bitcast i8* %0 to x86_mmx* store x86_mmx %7, x86_mmx* %8, align 1 ret void } declare x86_mmx @llvm.x86.mmx.punpcklbw(x86_mmx, x86_mmx) nounwind readnone Which gives me the following assembly code: push ebp mov ebp,esp and esp,0FFF...

[LLVMdev] Lowering to MMX

2011 Oct 26

0

[LLVMdev] Lowering to MMX

...example the following LLVM IR: > > define internal void @unpack(i8*, i8*) { > %3 = bitcast i8* %1 to i32* > %4 = load i32* %3, align 1 > %5 = insertelement <2 x i32> undef, i32 %4, i32 0 > %6 = bitcast <2 x i32> %5 to x86_mmx > %7 = call x86_mmx @llvm.x86.mmx.punpcklbw(x86_mmx %6, x86_mmx %6) > %8 = bitcast i8* %0 to x86_mmx* > store x86_mmx %7, x86_mmx* %8, align 1 > ret void > } > declare x86_mmx @llvm.x86.mmx.punpcklbw(x86_mmx, x86_mmx) nounwind readnone > > Which gives me the following assembly code: > > push ebp > mo...

MMX loop filter for theora-exp

2005 Aug 17

2

MMX loop filter for theora-exp

...le__( +"pxor %%mm0,%%mm0\n" /* mm0 = 0 */ +"movq (%0),%%mm7\n" /* mm7 = _pix[0..8] */ +"lea (%1,%1,2),%%esi\n" /* esi = _ystride*3 */ +"movq (%0,%%esi),%%mm4\n" /* mm4 = _pix[0..8]+_ystride*3] */ +"movq %%mm7,%%mm6\n" /* mm6 = _pix[0..8] */ +"punpcklbw %%mm0,%%mm6\n" /* expand unsigned _pix[0..3] to 16 bits */ +"movq %%mm4,%%mm5\n" +"punpckhbw %%mm0,%%mm7\n" /* expand unsigned _pix[4..8] to 16 bits */ +"punpcklbw %%mm0,%%mm4\n" /* expand other arrays too */ +"punpckhbw %%mm0,%%mm5\n" +"psubw %%mm4...

[LLVMdev] Lowering to MMX

2011 Oct 25

0

[LLVMdev] Lowering to MMX

On Oct 20, 2011, at 8:42 AM, Nicolas Capens wrote: > Hi all, > > I'm working on a graphics project which uses LLVM for dynamic code > generation, and I noticed a major performance regression when upgrading > from LLVM 2.8 to 3.0-rc1 (LLVM 2.9 didn't support Win64 so I skipped it > entirely). > > I found out that the performance regression is due to removing

MMX/mmxext optimisations

2004 Aug 24

5

MMX/mmxext optimisations

quite some speed improvement indeed. attached the updated patch to apply to svn/trunk. j -------------- next part -------------- A non-text attachment was scrubbed... Name: theora-mmx.patch.gz Type: application/x-gzip Size: 8648 bytes Desc: not available Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin

[LLVMdev] Lowering to MMX

2011 Oct 20

4

[LLVMdev] Lowering to MMX

Hi all, I'm working on a graphics project which uses LLVM for dynamic code generation, and I noticed a major performance regression when upgrading from LLVM 2.8 to 3.0-rc1 (LLVM 2.9 didn't support Win64 so I skipped it entirely). I found out that the performance regression is due to removing support for lowering 64-bit vector operations to MMX, and using SSE2 instead. My code uses a

[PATCH] promised MMX patches rc1

2005 Mar 23

3

[PATCH] promised MMX patches rc1

...\t" /* zero mm0 */ +" .balign 16 \n\t" +"1: movq (%4), %%mm2 \n\t" /* load mm2 with _src */ +" movq %%mm2, %%mm3 \n\t" /* copy mm2 to mm3 */ +" punpckhbw %%mm0, %%mm2 \n\t" /* expand high part of _src to 16 bits */ +" punpcklbw %%mm0, %%mm3 \n\t" /* expand low part of _src to 16 bits */ +" paddsw (%1), %%mm3 \n\t" /* add low part with low part of residue */ +" paddsw 8(%1), %%mm2 \n\t" /* high with high */ +" packuswb %%mm2, %%mm3 \n\t" /* pack and saturate to mm3 */ +" le...

[PATCH]

2005 Mar 23

0

[PATCH]

...\t" /* zero mm0 */ +" .balign 16 \n\t" +"1: movq (%4), %%mm2 \n\t" /* load mm2 with _src */ +" movq %%mm2, %%mm3 \n\t" /* copy mm2 to mm3 */ +" punpckhbw %%mm0, %%mm2 \n\t" /* expand high part of _src to 16 bits */ +" punpcklbw %%mm0, %%mm3 \n\t" /* expand low part of _src to 16 bits */ +" paddsw (%1), %%mm3 \n\t" /* add low part with low part of residue */ +" paddsw 8(%1), %%mm2 \n\t" /* high with high */ +" packuswb %%mm2, %%mm3 \n\t" /* pack and saturate to mm3 */ +" le...

search for: punpcklbw