Displaying 8 results from an estimated 8 matches for "punpcklbw".
2011 Oct 26
2
[LLVMdev] Lowering to MMX
...MMX registers). Take for example the following LLVM IR:
define internal void @unpack(i8*, i8*) {
%3 = bitcast i8* %1 to i32*
%4 = load i32* %3, align 1
%5 = insertelement <2 x i32> undef, i32 %4, i32 0
%6 = bitcast <2 x i32> %5 to x86_mmx
%7 = call x86_mmx @llvm.x86.mmx.punpcklbw(x86_mmx %6, x86_mmx %6)
%8 = bitcast i8* %0 to x86_mmx*
store x86_mmx %7, x86_mmx* %8, align 1
ret void
}
declare x86_mmx @llvm.x86.mmx.punpcklbw(x86_mmx, x86_mmx) nounwind readnone
Which gives me the following assembly code:
push ebp
mov ebp,esp
and esp,0FFF...
2011 Oct 26
0
[LLVMdev] Lowering to MMX
...example the following LLVM IR:
>
> define internal void @unpack(i8*, i8*) {
> %3 = bitcast i8* %1 to i32*
> %4 = load i32* %3, align 1
> %5 = insertelement <2 x i32> undef, i32 %4, i32 0
> %6 = bitcast <2 x i32> %5 to x86_mmx
> %7 = call x86_mmx @llvm.x86.mmx.punpcklbw(x86_mmx %6, x86_mmx %6)
> %8 = bitcast i8* %0 to x86_mmx*
> store x86_mmx %7, x86_mmx* %8, align 1
> ret void
> }
> declare x86_mmx @llvm.x86.mmx.punpcklbw(x86_mmx, x86_mmx) nounwind readnone
>
> Which gives me the following assembly code:
>
> push ebp
> mo...
2005 Aug 17
2
MMX loop filter for theora-exp
...le__(
+"pxor %%mm0,%%mm0\n" /* mm0 = 0 */
+"movq (%0),%%mm7\n" /* mm7 = _pix[0..8] */
+"lea (%1,%1,2),%%esi\n" /* esi = _ystride*3 */
+"movq (%0,%%esi),%%mm4\n" /* mm4 = _pix[0..8]+_ystride*3] */
+"movq %%mm7,%%mm6\n" /* mm6 = _pix[0..8] */
+"punpcklbw %%mm0,%%mm6\n" /* expand unsigned _pix[0..3] to 16 bits */
+"movq %%mm4,%%mm5\n"
+"punpckhbw %%mm0,%%mm7\n" /* expand unsigned _pix[4..8] to 16 bits */
+"punpcklbw %%mm0,%%mm4\n" /* expand other arrays too */
+"punpckhbw %%mm0,%%mm5\n"
+"psubw %%mm4...
2011 Oct 25
0
[LLVMdev] Lowering to MMX
On Oct 20, 2011, at 8:42 AM, Nicolas Capens wrote:
> Hi all,
>
> I'm working on a graphics project which uses LLVM for dynamic code
> generation, and I noticed a major performance regression when upgrading
> from LLVM 2.8 to 3.0-rc1 (LLVM 2.9 didn't support Win64 so I skipped it
> entirely).
>
> I found out that the performance regression is due to removing
2004 Aug 24
5
MMX/mmxext optimisations
quite some speed improvement indeed.
attached the updated patch to apply to svn/trunk.
j
-------------- next part --------------
A non-text attachment was scrubbed...
Name: theora-mmx.patch.gz
Type: application/x-gzip
Size: 8648 bytes
Desc: not available
Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin
2011 Oct 20
4
[LLVMdev] Lowering to MMX
Hi all,
I'm working on a graphics project which uses LLVM for dynamic code
generation, and I noticed a major performance regression when upgrading
from LLVM 2.8 to 3.0-rc1 (LLVM 2.9 didn't support Win64 so I skipped it
entirely).
I found out that the performance regression is due to removing support
for lowering 64-bit vector operations to MMX, and using SSE2 instead. My
code uses a
2005 Mar 23
3
[PATCH] promised MMX patches rc1
...\t" /* zero mm0 */
+" .balign 16 \n\t"
+"1: movq (%4), %%mm2 \n\t" /* load mm2 with _src */
+" movq %%mm2, %%mm3 \n\t" /* copy mm2 to mm3 */
+" punpckhbw %%mm0, %%mm2 \n\t" /* expand high part of _src to 16 bits */
+" punpcklbw %%mm0, %%mm3 \n\t" /* expand low part of _src to 16 bits */
+" paddsw (%1), %%mm3 \n\t" /* add low part with low part of residue */
+" paddsw 8(%1), %%mm2 \n\t" /* high with high */
+" packuswb %%mm2, %%mm3 \n\t" /* pack and saturate to mm3 */
+" le...
2005 Mar 23
0
[PATCH]
...\t" /* zero mm0 */
+" .balign 16 \n\t"
+"1: movq (%4), %%mm2 \n\t" /* load mm2 with _src */
+" movq %%mm2, %%mm3 \n\t" /* copy mm2 to mm3 */
+" punpckhbw %%mm0, %%mm2 \n\t" /* expand high part of _src to 16 bits */
+" punpcklbw %%mm0, %%mm3 \n\t" /* expand low part of _src to 16 bits */
+" paddsw (%1), %%mm3 \n\t" /* add low part with low part of residue */
+" paddsw 8(%1), %%mm2 \n\t" /* high with high */
+" packuswb %%mm2, %%mm3 \n\t" /* pack and saturate to mm3 */
+" le...