search for: mm4

Displaying 20 results from an estimated 30 matches for "mm4".

Did you mean: mm
2004 Sep 10
2
An assembly optimization and fix
...total_error_1:total_error_0 - ; mm1 == total_error_3:total_error_2 - ; mm2 == 0:total_error_4 - ; mm3/4 == 0:unpackarea - ; mm5 == abs(error_1):abs(error_0) - ; mm5 == abs(error_3):abs(error_2) + ; mm1 == total_error_2:total_error_3 + ; mm2 == :total_error_4 + ; mm3 == last_error_1:last_error_0 + ; mm4 == last_error_2:last_error_3 - pxor mm0, mm0 ; total_error_1 = total_error_0 = 0 - pxor mm1, mm1 ; total_error_3 = total_error_2 = 0 - pxor mm2, mm2 ; total_error_4 = 0 - mov ebx, [esp + 36] ; ebx = data[] - mov ecx, [ebx - 4] ; ecx == data[-1] last_error_0 = data[-1] - mov eax, [ebx -...
2005 Aug 17
2
MMX loop filter for theora-exp
...p_filter_v_mmx(unsigned char *_pix,int _ystride,int *_bv){ + int y; + _pix-=_ystride*2; + +__asm__ __volatile__( +"pxor %%mm0,%%mm0\n" /* mm0 = 0 */ +"movq (%0),%%mm7\n" /* mm7 = _pix[0..8] */ +"lea (%1,%1,2),%%esi\n" /* esi = _ystride*3 */ +"movq (%0,%%esi),%%mm4\n" /* mm4 = _pix[0..8]+_ystride*3] */ +"movq %%mm7,%%mm6\n" /* mm6 = _pix[0..8] */ +"punpcklbw %%mm0,%%mm6\n" /* expand unsigned _pix[0..3] to 16 bits */ +"movq %%mm4,%%mm5\n" +"punpckhbw %%mm0,%%mm7\n" /* expand unsigned _pix[4..8] to 16 bits */ +"...
2004 Aug 24
5
MMX/mmxext optimisations
quite some speed improvement indeed. attached the updated patch to apply to svn/trunk. j -------------- next part -------------- A non-text attachment was scrubbed... Name: theora-mmx.patch.gz Type: application/x-gzip Size: 8648 bytes Desc: not available Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin
2010 Sep 29
1
Understanding linear contrasts in Anova using R
#I am trying to understand how R fits models for contrasts in a #simple one-way anova. This is an example, I am not stupid enough to want #to simultaneously apply all of these contrasts to real data. With a few #exceptions, the tests that I would compute by hand (or by other software) #will give the same t or F statistics. It is the contrast estimates that R produces #that I can't seem to
2009 Aug 30
3
experimental patch for libtheora1.1beta3
...t" + /* Not working "lea -32(%[ret],%[ret]),%[ret]\n\t" */ + /* Like ret = ret+ret-32 */ + "add %[ret],%[ret]\n\t" + "sub 32,%[ret]\n\t" "movq 0x40(%[buf]),%%mm0\n\t" "cmp %[ret2],%[ret]\n\t" "movq 0x48(%[buf]),%%mm4\n\t" @@ -511,7 +514,11 @@ static unsigned oc_int_frag_satd_thresh_mmxext(const u "punpckhdq %%mm0,%%mm0\n\t" "paddd %%mm0,%%mm4\n\t" "movd %%mm4,%[ret2]\n\t" - "lea (%[ret],%[ret2],2),%[ret]\n\t" + /* Not working "lea (%[ret],%[...
2005 Mar 23
3
[PATCH] promised MMX patches rc1
...st ogg_int16_t *_residue){ +int i; + __asm__ __volatile__ ( +" movl $0x7, %7 \n\t" /* 8x loop */ +" pxor %%mm0, %%mm0 \n\t" /* zero mm0 */ +" movq (%4), %%mm2 \n\t" /* load mm2 with _src1 */ +" .balign 16 \n\t" +"1: movq (%6), %%mm4 \n\t" /* packed SRC2 */ +" movq %%mm2, %%mm3 \n\t" /* copy to mm3 */ +" movq %%mm4, %%mm5 \n\t" /* copy packed src2 to mm5 */ +" mov %3, %%eax \n\t" +" punpcklbw %%mm0, %%mm2 \n\t" /* expand low part of src1 to mm2 */ +" punpcklbw %%mm0...
2005 Apr 03
2
RTNETLINK answers: Invalid argument
Hi, On this Fedora Core Devel (Raw Hide) system, if I boot on a distribution kernel (based on 2.6.12rc1-bk2) the network is fine. If I build a custom 2.6.12-rc1-V0.7.43-06 or 2.6.12-rc1-mm4 kernel the network interface fails to initialise on boot with RTNETLINK answers: Invalid argument. What can possibly cause this ? My kernel config should be mostly fine - I used it extensively at a time and diffing it with Red Hat does not show any obvious suspects (to me)). Thoug I haven'...
2009 Oct 13
3
Proposal for replacing asm code with intrinsics
...My proposal is to replace all functions in assembly with compiler intrinsic which compiles into 1-2 assembly instructions and are much easier to maintain. For example: _mm_sad_epu8(__m128, __m128) will be compiled in PSADBW instruction with compiler-allocated registers. And code like: psadbw mm4,mm5 paddw mm0,mm4 Can be re-written into _m64 mm0, mm4, mm5, mm6, mm7; //of course using meaningful names mm0= _mm_add_epi16(mm0, _mm_sad_pu8(mm4, mm5)); Compiler will replace variables with actual registers, ensuring better allocation and scheduling of them. So, benefits are: 1) Easier to r...
2005 Mar 23
0
[PATCH]
...st ogg_int16_t *_residue){ +int i; + __asm__ __volatile__ ( +" movl $0x7, %7 \n\t" /* 8x loop */ +" pxor %%mm0, %%mm0 \n\t" /* zero mm0 */ +" movq (%4), %%mm2 \n\t" /* load mm2 with _src1 */ +" .balign 16 \n\t" +"1: movq (%6), %%mm4 \n\t" /* packed SRC2 */ +" movq %%mm2, %%mm3 \n\t" /* copy to mm3 */ +" movq %%mm4, %%mm5 \n\t" /* copy packed src2 to mm5 */ +" mov %3, %%eax \n\t" +" punpcklbw %%mm0, %%mm2 \n\t" /* expand low part of src1 to mm2 */ +" punpcklbw %%mm0...
2010 Oct 20
2
[LLVMdev] llvm register reload/spilling around calls
...mm regs I >> added, however the calling code did not change at all... > > Look in X86InstrControl.td. The call instructions are all prefixed > by: > > let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2, > FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, > XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, > XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS], > > This is the fixed list of call-clobbered registers. It should really > be controlled by the calling convention of the called function > instead....
2010 Oct 20
0
[LLVMdev] llvm register reload/spilling around calls
...wrote: > On 20.10.2010 05:00, Jakob Stoklund Olesen wrote: >> Look in X86InstrControl.td. The call instructions are all prefixed >> by: >> >> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2, >> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, >> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, >> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS], >> >> This is the fixed list of call-clobbered registers. It should really >> be controlled by the calling convention of the called fun...
2010 Oct 20
1
[LLVMdev] llvm register reload/spilling around calls
...10.2010 05:00, Jakob Stoklund Olesen wrote: >>> Look in X86InstrControl.td. The call instructions are all prefixed >>> by: >>> >>> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2, >>> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, >>> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, >>> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS], >>> >>> This is the fixed list of call-clobbered registers. It should really >>> be controlled by the calling conventio...
2010 Oct 20
0
[LLVMdev] llvm register reload/spilling around calls
...he xmm regs I > added, however the calling code did not change at all... Look in X86InstrControl.td. The call instructions are all prefixed by: let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2, FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS], This is the fixed list of call-clobbered registers. It should really be controlled by the calling convention of the called function instead. T...
2002 Jan 22
1
glm.predict?
...6, 144, 164, 171, 200, 187, 169, 189, 168, 182, 208, 207, 193, 144, 178, 177, 176, 205, 153, 228, 227, 147, 173, 157, 214, 167, 140, 179, 204, 184, 151, 115, 173, 208, 135, 175, 136, 121, 189, 148, 174), .Names = c("Lead1.mm1", "Lead1.mm2", "Lead1.mm3", "Lead1.mm4", "Lead1.mm5", "Lead1.mm6", "Lead1.mm7", "Lead1.mm8", "Lead1.mm9", "Lead1.mm10", "Lead1.mm11", "Lead1.mm12", "Lead1.mm13", "Lead1.mm14", "Lead1.mm15", "Lead1.mm16", "L...
2007 Jun 19
3
[LLVMdev] TargetRegisterClass for Physical Register
...ally_ the case that it's in multiple classes). Does ValueType have something to do with that? In the same file, the VR64 register class has the following definition: def VR64 : RegisterClass<"X86", [v8i8, v4i16, v2i32, v1i64], 64, [MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7]>; So there are multiple ValueTypes here (the scalar registers each only have one corresponding to the bit size of the register). But still, if I have physical register MM2, that completely determines its register class. Is there some other architecture where the physical regi...
2010 Oct 20
3
[LLVMdev] llvm register reload/spilling around calls
Thanks for giving it a look! On 19.10.2010 23:21, Jakob Stoklund Olesen wrote: > On Oct 19, 2010, at 11:40 AM, Roland Scheidegger wrote: > >> So I saw that the code is doing lots of register >> spilling/reloading. Now I understand that due to calling >> conventions, there's not really a way to avoid this - I tried using >> coldcc but apparently the backend
2008 Sep 03
2
[LLVMdev] Codegen/Register allocation question.
...ef,dead>, %FP2<imp-def,dead>, %FP3<imp-def,dead>, %FP4<imp-def,dead>, %FP5<imp-def,dead>, %FP6<imp-def,dead>, %ST0<imp-def,dead>, %ST1<imp-def,dead>, %MM0<imp-def,dead>, %MM1<imp-def,dead>, %MM2<imp-def,dead>, %MM3<imp-def,dead>, %MM4<imp-def,dead>, %MM5<imp-def,dead>, %MM6<imp-def,dead>, %MM7<imp-def,dead>, %XMM0<imp-def,dead>, %XMM1<imp-def,dead>, %XMM2<imp-def,dead>, %XMM3<imp-def,dead>, %XMM4<imp-def,dead>, %XMM5<imp-def,dead>, %XMM6<imp-def,dead>, %XMM7<im...
2008 Sep 04
0
[LLVMdev] Codegen/Register allocation question.
...;imp-def,dead>, %FP3<imp-def,dead>, > %FP4<imp-def,dead>, %FP5<imp-def,dead>, %FP6<imp-def,dead>, > %ST0<imp-def,dead>, %ST1<imp-def,dead>, %MM0<imp-def,dead>, > %MM1<imp-def,dead>, %MM2<imp-def,dead>, %MM3<imp-def,dead>, > %MM4<imp-def,dead>, %MM5<imp-def,dead>, %MM6<imp-def,dead>, > %MM7<imp-def,dead>, %XMM0<imp-def,dead>, %XMM1<imp-def,dead>, > %XMM2<imp-def,dead>, %XMM3<imp-def,dead>, %XMM4<imp-def,dead>, > %XMM5<imp-def,dead>, %XMM6<imp-def,dead&g...
2007 Jun 18
2
[LLVMdev] TargetRegisterClass for Physical Register
How do I get the TargetRegisterClass for a physical register? SSARegMap::getRegClass only works for virtual registers. -Dave
2007 Jun 19
0
[LLVMdev] TargetRegisterClass for Physical Register
Take a look at getPhysicalRegisterRegClass( const MRegisterInfo *MRI, MVT::ValueType VT, unsigned reg) in ScheduleDAG.cpp. -- Christopher Lamb On Jun 18, 2007, at 4:52 PM, David A. Greene wrote: > How do I get the TargetRegisterClass for a physical register? > SSARegMap::getRegClass only works for virtual registers. > >