search for: mm0

Displaying 20 results from an estimated 49 matches for "mm0".

Did you mean: mm
2004 Aug 24
5
MMX/mmxext optimisations
quite some speed improvement indeed. attached the updated patch to apply to svn/trunk. j -------------- next part -------------- A non-text attachment was scrubbed... Name: theora-mmx.patch.gz Type: application/x-gzip Size: 8648 bytes Desc: not available Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin
2011 Oct 26
0
[LLVMdev] Lowering to MMX
...declare x86_mmx @llvm.x86.mmx.punpcklbw(x86_mmx, x86_mmx) nounwind readnone > > Which gives me the following assembly code: > > push ebp > mov ebp,esp > and esp,0FFFFFFF0h > sub esp,20h > mov eax,dword ptr [ebp+0Ch] > movd xmm0,dword ptr [eax] > movapd xmmword ptr [esp],xmm0 > movq mm0,mmword ptr [esp] > punpcklbw mm0,mm0 > mov eax,dword ptr [ebp+8] > movq mmword ptr [eax],mm0 > emms > mov esp,ebp > pop ebp > ret > > The inner portion could loo...
2005 Aug 17
2
MMX loop filter for theora-exp
...tation of the loop_filter_v + pix cells are not striped so we can load them directly to MMX regs + +TODO: some instruction stalls can be avoided + +*/ + +static void loop_filter_v_mmx(unsigned char *_pix,int _ystride,int *_bv){ + int y; + _pix-=_ystride*2; + +__asm__ __volatile__( +"pxor %%mm0,%%mm0\n" /* mm0 = 0 */ +"movq (%0),%%mm7\n" /* mm7 = _pix[0..8] */ +"lea (%1,%1,2),%%esi\n" /* esi = _ystride*3 */ +"movq (%0,%%esi),%%mm4\n" /* mm4 = _pix[0..8]+_ystride*3] */ +"movq %%mm7,%%mm6\n" /* mm6 = _pix[0..8] */ +"punpcklbw %%mm0,%%mm6\n...
2011 Oct 26
2
[LLVMdev] Lowering to MMX
...86_mmx* %8, align 1 ret void } declare x86_mmx @llvm.x86.mmx.punpcklbw(x86_mmx, x86_mmx) nounwind readnone Which gives me the following assembly code: push ebp mov ebp,esp and esp,0FFFFFFF0h sub esp,20h mov eax,dword ptr [ebp+0Ch] movd xmm0,dword ptr [eax] movapd xmmword ptr [esp],xmm0 movq mm0,mmword ptr [esp] punpcklbw mm0,mm0 mov eax,dword ptr [ebp+8] movq mmword ptr [eax],mm0 emms mov esp,ebp pop ebp ret The inner portion could look like this instead: movd m...
2005 Mar 23
3
[PATCH] promised MMX patches rc1
...k with next(high) four values */ +); +} + +void oc_frag_recon_intra__mmx(unsigned char *_dst,int _dst_ystride, + const ogg_int16_t *_residue) { + + __asm__ __volatile__ ( +" mov $0x7, %%ecx \n\t" /* 8x loop */ +" .balign 16 \n\t" +"1: movq (V128), %%mm0 \n\t" /* Set mm0 to 0x0080008000800080 */ +" movq (%1), %%mm2 \n\t" /* First four input values */ +" movq %%mm0, %%mm1 \n\t" /* Set mm1 == mm0 */ +" movq 8(%1), %%mm3 \n\t" /* Next four input values */ +" decl %%ecx \n\t" /*...
2010 Aug 31
0
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
...clear. It looks like the successful code is doing an aggregate copy field-by-field while the failing code has lowered this to a memcpy. I would certainly expect the memcpy expansion to be smart enough to avoid using MM registers, though; that's a serious bug if it isn't. movd %xmm0, %rax movd %rax, %mm0 movq2dq %mm0, %xmm1 movq2dq %mm0, %xmm2 punpcklqdq %xmm2, %xmm1 ## xmm1 = xmm1[0],xmm2[0] movq 16(%rsp), %rax movd %rax, %mm0 movq2dq %mm0, %xmm0 punpcklqdq %xmm2, %xmm0 ## xmm0 = xmm0[0],xmm2[0] On Aug 31, 2010, at 11:18 AMPDT, Argyrios Kyrtzidis wrote: >...
2010 Aug 31
2
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
...ear. It looks like the successful code is doing an aggregate copy field-by-field while the failing code has lowered this to a memcpy. I would certainly expect the memcpy expansion to be smart enough to avoid using MM registers, though; that's a serious bug if it isn't. > > movd %xmm0, %rax > movd %rax, %mm0 > movq2dq %mm0, %xmm1 > movq2dq %mm0, %xmm2 > punpcklqdq %xmm2, %xmm1 ## xmm1 = xmm1[0],xmm2[0] > movq 16(%rsp), %rax > movd %rax, %mm0 > movq2dq %mm0, %xmm0 > punpcklqdq %xmm2, %xmm0 ## xmm0 = xmm0[0],xmm2[0] > > > On Aug 31,...
2004 Sep 10
2
An assembly optimization and fix
...loading FLAC__uint64s to FPU regs - ; dword [esp] == last_error_0 - ; dword [esp + 4] == last_error_1 - ; dword [esp + 8] == last_error_2 - ; dword [esp + 12] == last_error_3 - ; eax == error ; ebx == &data[i] ; ecx == loop counter (i) - ; edx == temp - ; edi == save ; ebp == order ; mm0 == total_error_1:total_error_0 - ; mm1 == total_error_3:total_error_2 - ; mm2 == 0:total_error_4 - ; mm3/4 == 0:unpackarea - ; mm5 == abs(error_1):abs(error_0) - ; mm5 == abs(error_3):abs(error_2) + ; mm1 == total_error_2:total_error_3 + ; mm2 == :total_error_4 + ; mm3 == last_error_1:last_error_0...
2010 Aug 31
5
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
...x movl $0, 16(%rsp) movl $0, 20(%rsp) movl $0, 8(%rsp) movl $0, 12(%rsp) movq 8(%rdi), %rsi leaq 16(%rsp), %rcx leaq 8(%rsp), %r8 callq __ZN7WebCore5mouniEPNS_15GraphicsContextEPNS_30GraphicsContextPlatformPrivateERKNS_9FloatRectERNS_10FloatPointES8_ movss 8(%rsp), %xmm1 movss 12(%rsp), %xmm0 subss 20(%rsp), %xmm0 subss 16(%rsp), %xmm1 ## kill: XMM1<def> XMM1<kill> XMM1<def> insertps $16, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[0],xmm1[2,3] movq 16(%rsp), %xmm0 addq $24, %rsp ret $ opt -std-compile-opts unopt-fail.ll -o - | l...
2010 Aug 31
0
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
...is doing an aggregate copy field-by-field while the >> failing code has lowered this to a memcpy. I would certainly >> expect the memcpy expansion to be smart enough to avoid using MM >> registers, though; that's a serious bug if it isn't. >> >> movd %xmm0, %rax >> movd %rax, %mm0 >> movq2dq %mm0, %xmm1 >> movq2dq %mm0, %xmm2 >> punpcklqdq %xmm2, %xmm1 ## xmm1 = xmm1[0],xmm2[0] >> movq 16(%rsp), %rax >> movd %rax, %mm0 >> movq2dq %mm0, %xmm0 >> punpcklqdq %xmm2, %xmm0 ## xmm0 = xmm0[0],xmm...
2005 Mar 23
0
[PATCH]
...k with next(high) four values */ +); +} + +void oc_frag_recon_intra__mmx(unsigned char *_dst,int _dst_ystride, + const ogg_int16_t *_residue) { + + __asm__ __volatile__ ( +" mov $0x7, %%ecx \n\t" /* 8x loop */ +" .balign 16 \n\t" +"1: movq (V128), %%mm0 \n\t" /* Set mm0 to 0x0080008000800080 */ +" movq (%1), %%mm2 \n\t" /* First four input values */ +" movq %%mm0, %%mm1 \n\t" /* Set mm1 == mm0 */ +" movq 8(%1), %%mm3 \n\t" /* Next four input values */ +" decl %%ecx \n\t" /*...
2014 Dec 24
2
[LLVMdev] X86 disassembler is quite broken on handling REX
hi, i think the current X86 disassembler is quite broken and fails badly on handling REX for x86_64 code. below are some examples: $ echo "0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64 .text por %mm3, %mm0 $ echo "0x40,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64 .text por %mm3, %mm0 $ echo "0x41,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64 .text <stdin>:1:1: warning: invalid instruction encoding 0x41...
2009 Aug 30
3
experimental patch for libtheora1.1beta3
...wo removed; correct the final sum here.*/ - "lea -32(%[ret],%[ret]),%[ret]\n\t" + /* Not working "lea -32(%[ret],%[ret]),%[ret]\n\t" */ + /* Like ret = ret+ret-32 */ + "add %[ret],%[ret]\n\t" + "sub 32,%[ret]\n\t" "movq 0x40(%[buf]),%%mm0\n\t" "cmp %[ret2],%[ret]\n\t" "movq 0x48(%[buf]),%%mm4\n\t" @@ -511,7 +514,11 @@ static unsigned oc_int_frag_satd_thresh_mmxext(const u "punpckhdq %%mm0,%%mm0\n\t" "paddd %%mm0,%%mm4\n\t" "movd %%mm4,%[ret2]\n\t" - &...
2014 Dec 24
2
[LLVMdev] X86 disassembler is quite broken on handling REX
...enough > to just drop the bit. Do you have other non-mmx examples? > > case TYPE_MM: \ > if (index > 7) \ > *valid = 0; \ > return prefix##_MM0 + index; > yes, exactly this place. but the question is: how do we know when to drop the REX.B? i dont know any non-MMX examples. it seems only MMX related instructions have this issue. thanks, Jun > > On Tue, Dec 23, 2014 at 10:17 PM, Jun Koi <junkoi2004 at gmail.com> wrote:...
2012 Nov 05
0
Diference in results from doBy::popMeans, multcomp::glht and contrast::contrast for a lme model
...38886 50.88872 16 0 31.40168 34.13165 1 summary(glht(m0, linfct=matrix(contr$X, nrow=1))) ## Linear Hypotheses: ## Estimate Std. Error t value Pr(>|t|) ## 1 == 0 32.7667 0.6439 50.89 <2e-16 *** # ok, results are the same! # random effects model require(nlme) mm0 <- lme(y~variedade, random=~1|bloco, da) summary(mm0) ## Fixed effects: y ~ variedade ## Value Std.Error DF t-value p-value ## (Intercept) 32.67807 1.5887794 16 20.568037 0 ## variedade2 -9.08144 0.9236918 16 -9.831679 0 contr <- contrast(mm0, list(variedade=&qu...
2011 Oct 25
0
[LLVMdev] Lowering to MMX
On Oct 20, 2011, at 8:42 AM, Nicolas Capens wrote: > Hi all, > > I'm working on a graphics project which uses LLVM for dynamic code > generation, and I noticed a major performance regression when upgrading > from LLVM 2.8 to 3.0-rc1 (LLVM 2.9 didn't support Win64 so I skipped it > entirely). > > I found out that the performance regression is due to removing
2009 Oct 13
3
Proposal for replacing asm code with intrinsics
...o replace all functions in assembly with compiler intrinsic which compiles into 1-2 assembly instructions and are much easier to maintain. For example: _mm_sad_epu8(__m128, __m128) will be compiled in PSADBW instruction with compiler-allocated registers. And code like: psadbw mm4,mm5 paddw mm0,mm4 Can be re-written into _m64 mm0, mm4, mm5, mm6, mm7; //of course using meaningful names mm0= _mm_add_epi16(mm0, _mm_sad_pu8(mm4, mm5)); Compiler will replace variables with actual registers, ensuring better allocation and scheduling of them. So, benefits are: 1) Easier to read & understa...
2015 Mar 27
2
[LLVMdev] LLVM fails for inline asm with Link Time Optimization
...mov esi, dword ptr 28(%esp) 1> ^ 1><inline asm>:4:21 : error 0: invalid token in expression 1> movq mm1, [edi+ebx-$8] 1> ^ 1><inline asm>:5:12 : error 0: invalid operand for instruction 1> pxor mm0, mm0 Thanks Ashish On Fri, Mar 27, 2015 at 8:21 PM, Rafael EspĂ­ndola < rafael.espindola at gmail.com> wrote: > If you are getting a parse error it is very likely a different bug. In > that bug the issue is that we don't parse the function bodies to find > if some inline in t...
2009 Jul 09
2
[LLVMdev] Wrong encoding of movd on x64
...%7 = bitcast i8* %1 to i32* ; <i32*> [#uses=1] store i32 %6, i32* %7, align 1 ret void } It generates the following code: mov rax,1E417A8h mov rax,qword ptr [rax] mov rcx,1E417B0h mov rcx,qword ptr [rcx] movq mm0,mmword ptr [rcx] movq rax,mm1 mov dword ptr [rax],ecx Note the last movq. What was probably intended to be generated was "movd ecx, mm0". LLVM mistakenly sets the 'wide' bit of the REX prefix to 1, turning movd into movq. Also, reg and r/m encoding has been...
2011 Oct 20
4
[LLVMdev] Lowering to MMX
Hi all, I'm working on a graphics project which uses LLVM for dynamic code generation, and I noticed a major performance regression when upgrading from LLVM 2.8 to 3.0-rc1 (LLVM 2.9 didn't support Win64 so I skipped it entirely). I found out that the performance regression is due to removing support for lowering 64-bit vector operations to MMX, and using SSE2 instead. My code uses a