Displaying 5 results from an estimated 5 matches for "paddw".
Did you mean:
paddr
2004 Aug 24
5
MMX/mmxext optimisations
quite some speed improvement indeed.
attached the updated patch to apply to svn/trunk.
j
-------------- next part --------------
A non-text attachment was scrubbed...
Name: theora-mmx.patch.gz
Type: application/x-gzip
Size: 8648 bytes
Desc: not available
Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin
2005 Aug 17
2
MMX loop filter for theora-exp
..."
+"punpckhbw %%mm0,%%mm3\n"
+"punpcklbw %%mm0,%%mm2\n"
+"psubw %%mm5,%%mm3\n"
+"psubw %%mm4,%%mm2\n"
+ /* mm3:mm2 = (_pix[_ystride*2]-_pix[_ystride]); */
+"PMULLW (V3),%%mm3\n" /* *3 */
+"PMULLW (V3),%%mm2\n" /* *3 */
+"paddw %%mm7,%%mm3\n" /* highpart */
+"paddw %%mm6,%%mm2\n"/* lowpart of _pix[0]-_pix[_ystride*3]+3*(_pix[_ystride*2]-_pix[_ystride]); */
+"paddw (V4),%%mm3\n" /* add 4 */
+"paddw (V4),%%mm2\n" /* add 4 */
+"psraw $3,%%mm3\n" /* >>3 f coefs high */...
2005 Jul 20
1
MMX IDCT for theora-exp
...t; \
+ " movq " r1","r5"\n" \
+ " pmulhw " r2","r1"\n" \
+ " movq " I(1)","r3"\n" \
+ " pmulhw " r7","r5"\n" \
+ " movq " C(1)","r0"\n" \
+ " paddw " r2","r4"\n" \
+ " paddw " r7","r6"\n" \
+ " paddw " r1","r2"\n" \
+ " movq " J(7)","r1"\n" \
+ " paddw " r5","r7"\n" \
+ " movq " r0"...
2009 Oct 13
3
Proposal for replacing asm code with intrinsics
...l is to replace all functions in assembly with compiler intrinsic which compiles into 1-2 assembly instructions and are much easier to maintain.
For example:
_mm_sad_epu8(__m128, __m128) will be compiled in PSADBW instruction with compiler-allocated registers.
And code like:
psadbw mm4,mm5
paddw mm0,mm4
Can be re-written into
_m64 mm0, mm4, mm5, mm6, mm7; //of course using meaningful names
mm0= _mm_add_epi16(mm0, _mm_sad_pu8(mm4, mm5));
Compiler will replace variables with actual registers, ensuring better allocation and scheduling of them.
So, benefits are:
1) Easier to read & unde...
2012 Nov 28
0
[LLVMdev] [llvm-commits] [dragonegg] r168787 - in /dragonegg/trunk: src/x86/Target.cpp src/x86/x86_builtins test/validator/c/copysignp.c
...Builder.CreateAnd(IntRHS, SignMask);
> + Value *Abs = Builder.CreateAnd(IntLHS, ConstantExpr::getNot(SignMask));
> + Value *IntRes = Builder.CreateOr(Abs, Sign);
> + Result = Builder.CreateBitCast(IntRes, VecTy);
> + return true;
> + }
> case paddb:
> case paddw:
> case paddd:
>
> Modified: dragonegg/trunk/src/x86/x86_builtins
> URL: http://llvm.org/viewvc/llvm-project/dragonegg/trunk/src/x86/x86_builtins?rev=168787&r1=168786&r2=168787&view=diff
> ==============================================================================...