thr3ads.net - search: "mm4"

Displaying 20 results from an estimated 30 matches for "mm4".

Did you mean: mm

2004 Sep 10

An assembly optimization and fix

...total_error_1:total_error_0 - ; mm1 == total_error_3:total_error_2 - ; mm2 == 0:total_error_4 - ; mm3/4 == 0:unpackarea - ; mm5 == abs(error_1):abs(error_0) - ; mm5 == abs(error_3):abs(error_2) + ; mm1 == total_error_2:total_error_3 + ; mm2 == :total_error_4 + ; mm3 == last_error_1:last_error_0 + ; mm4 == last_error_2:last_error_3 - pxor mm0, mm0 ; total_error_1 = total_error_0 = 0 - pxor mm1, mm1 ; total_error_3 = total_error_2 = 0 - pxor mm2, mm2 ; total_error_4 = 0 - mov ebx, [esp + 36] ; ebx = data[] - mov ecx, [ebx - 4] ; ecx == data[-1] last_error_0 = data[-1] - mov eax, [ebx -...

MMX loop filter for theora-exp

2005 Aug 17

MMX loop filter for theora-exp

...p_filter_v_mmx(unsigned char *_pix,int _ystride,int *_bv){ + int y; + _pix-=_ystride*2; + +__asm__ __volatile__( +"pxor %%mm0,%%mm0\n" /* mm0 = 0 */ +"movq (%0),%%mm7\n" /* mm7 = _pix[0..8] */ +"lea (%1,%1,2),%%esi\n" /* esi = _ystride*3 */ +"movq (%0,%%esi),%%mm4\n" /* mm4 = _pix[0..8]+_ystride*3] */ +"movq %%mm7,%%mm6\n" /* mm6 = _pix[0..8] */ +"punpcklbw %%mm0,%%mm6\n" /* expand unsigned _pix[0..3] to 16 bits */ +"movq %%mm4,%%mm5\n" +"punpckhbw %%mm0,%%mm7\n" /* expand unsigned _pix[4..8] to 16 bits */ +"...

MMX/mmxext optimisations

2004 Aug 24

MMX/mmxext optimisations

quite some speed improvement indeed. attached the updated patch to apply to svn/trunk. j -------------- next part -------------- A non-text attachment was scrubbed... Name: theora-mmx.patch.gz Type: application/x-gzip Size: 8648 bytes Desc: not available Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin

Understanding linear contrasts in Anova using R

2010 Sep 29

Understanding linear contrasts in Anova using R

#I am trying to understand how R fits models for contrasts in a #simple one-way anova. This is an example, I am not stupid enough to want #to simultaneously apply all of these contrasts to real data. With a few #exceptions, the tests that I would compute by hand (or by other software) #will give the same t or F statistics. It is the contrast estimates that R produces #that I can't seem to

experimental patch for libtheora1.1beta3

2009 Aug 30

experimental patch for libtheora1.1beta3

...t" + /* Not working "lea -32(%[ret],%[ret]),%[ret]\n\t" */ + /* Like ret = ret+ret-32 */ + "add %[ret],%[ret]\n\t" + "sub 32,%[ret]\n\t" "movq 0x40(%[buf]),%%mm0\n\t" "cmp %[ret2],%[ret]\n\t" "movq 0x48(%[buf]),%%mm4\n\t" @@ -511,7 +514,11 @@ static unsigned oc_int_frag_satd_thresh_mmxext(const u "punpckhdq %%mm0,%%mm0\n\t" "paddd %%mm0,%%mm4\n\t" "movd %%mm4,%[ret2]\n\t" - "lea (%[ret],%[ret2],2),%[ret]\n\t" + /* Not working "lea (%[ret],%[...

[PATCH] promised MMX patches rc1

2005 Mar 23

[PATCH] promised MMX patches rc1

...st ogg_int16_t *_residue){ +int i; + __asm__ __volatile__ ( +" movl $0x7, %7 \n\t" /* 8x loop */ +" pxor %%mm0, %%mm0 \n\t" /* zero mm0 */ +" movq (%4), %%mm2 \n\t" /* load mm2 with _src1 */ +" .balign 16 \n\t" +"1: movq (%6), %%mm4 \n\t" /* packed SRC2 */ +" movq %%mm2, %%mm3 \n\t" /* copy to mm3 */ +" movq %%mm4, %%mm5 \n\t" /* copy packed src2 to mm5 */ +" mov %3, %%eax \n\t" +" punpcklbw %%mm0, %%mm2 \n\t" /* expand low part of src1 to mm2 */ +" punpcklbw %%mm0...

RTNETLINK answers: Invalid argument

2005 Apr 03

RTNETLINK answers: Invalid argument

Hi, On this Fedora Core Devel (Raw Hide) system, if I boot on a distribution kernel (based on 2.6.12rc1-bk2) the network is fine. If I build a custom 2.6.12-rc1-V0.7.43-06 or 2.6.12-rc1-mm4 kernel the network interface fails to initialise on boot with RTNETLINK answers: Invalid argument. What can possibly cause this ? My kernel config should be mostly fine - I used it extensively at a time and diffing it with Red Hat does not show any obvious suspects (to me)). Thoug I haven'...

Proposal for replacing asm code with intrinsics

2009 Oct 13

Proposal for replacing asm code with intrinsics

...My proposal is to replace all functions in assembly with compiler intrinsic which compiles into 1-2 assembly instructions and are much easier to maintain. For example: _mm_sad_epu8(__m128, __m128) will be compiled in PSADBW instruction with compiler-allocated registers. And code like: psadbw mm4,mm5 paddw mm0,mm4 Can be re-written into _m64 mm0, mm4, mm5, mm6, mm7; //of course using meaningful names mm0= _mm_add_epi16(mm0, _mm_sad_pu8(mm4, mm5)); Compiler will replace variables with actual registers, ensuring better allocation and scheduling of them. So, benefits are: 1) Easier to r...

[PATCH]

2005 Mar 23

[PATCH]

[LLVMdev] llvm register reload/spilling around calls

2010 Oct 20

[LLVMdev] llvm register reload/spilling around calls

...mm regs I >> added, however the calling code did not change at all... > > Look in X86InstrControl.td. The call instructions are all prefixed > by: > > let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2, > FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, > XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, > XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS], > > This is the fixed list of call-clobbered registers. It should really > be controlled by the calling convention of the called function > instead....

[LLVMdev] llvm register reload/spilling around calls

2010 Oct 20

[LLVMdev] llvm register reload/spilling around calls

...wrote: > On 20.10.2010 05:00, Jakob Stoklund Olesen wrote: >> Look in X86InstrControl.td. The call instructions are all prefixed >> by: >> >> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2, >> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, >> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, >> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS], >> >> This is the fixed list of call-clobbered registers. It should really >> be controlled by the calling convention of the called fun...

[LLVMdev] llvm register reload/spilling around calls

2010 Oct 20

[LLVMdev] llvm register reload/spilling around calls

...10.2010 05:00, Jakob Stoklund Olesen wrote: >>> Look in X86InstrControl.td. The call instructions are all prefixed >>> by: >>> >>> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2, >>> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, >>> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, >>> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS], >>> >>> This is the fixed list of call-clobbered registers. It should really >>> be controlled by the calling conventio...

[LLVMdev] llvm register reload/spilling around calls

2010 Oct 20

[LLVMdev] llvm register reload/spilling around calls

...he xmm regs I > added, however the calling code did not change at all... Look in X86InstrControl.td. The call instructions are all prefixed by: let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2, FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS], This is the fixed list of call-clobbered registers. It should really be controlled by the calling convention of the called function instead. T...

glm.predict?

2002 Jan 22

glm.predict?

...6, 144, 164, 171, 200, 187, 169, 189, 168, 182, 208, 207, 193, 144, 178, 177, 176, 205, 153, 228, 227, 147, 173, 157, 214, 167, 140, 179, 204, 184, 151, 115, 173, 208, 135, 175, 136, 121, 189, 148, 174), .Names = c("Lead1.mm1", "Lead1.mm2", "Lead1.mm3", "Lead1.mm4", "Lead1.mm5", "Lead1.mm6", "Lead1.mm7", "Lead1.mm8", "Lead1.mm9", "Lead1.mm10", "Lead1.mm11", "Lead1.mm12", "Lead1.mm13", "Lead1.mm14", "Lead1.mm15", "Lead1.mm16", "L...

[LLVMdev] TargetRegisterClass for Physical Register

2007 Jun 19

[LLVMdev] TargetRegisterClass for Physical Register

...ally_ the case that it's in multiple classes). Does ValueType have something to do with that? In the same file, the VR64 register class has the following definition: def VR64 : RegisterClass<"X86", [v8i8, v4i16, v2i32, v1i64], 64, [MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7]>; So there are multiple ValueTypes here (the scalar registers each only have one corresponding to the bit size of the register). But still, if I have physical register MM2, that completely determines its register class. Is there some other architecture where the physical regi...

[LLVMdev] llvm register reload/spilling around calls

2010 Oct 20

[LLVMdev] llvm register reload/spilling around calls

Thanks for giving it a look! On 19.10.2010 23:21, Jakob Stoklund Olesen wrote: > On Oct 19, 2010, at 11:40 AM, Roland Scheidegger wrote: > >> So I saw that the code is doing lots of register >> spilling/reloading. Now I understand that due to calling >> conventions, there's not really a way to avoid this - I tried using >> coldcc but apparently the backend

[LLVMdev] Codegen/Register allocation question.

2008 Sep 03

[LLVMdev] Codegen/Register allocation question.

...ef,dead>, %FP2<imp-def,dead>, %FP3<imp-def,dead>, %FP4<imp-def,dead>, %FP5<imp-def,dead>, %FP6<imp-def,dead>, %ST0<imp-def,dead>, %ST1<imp-def,dead>, %MM0<imp-def,dead>, %MM1<imp-def,dead>, %MM2<imp-def,dead>, %MM3<imp-def,dead>, %MM4<imp-def,dead>, %MM5<imp-def,dead>, %MM6<imp-def,dead>, %MM7<imp-def,dead>, %XMM0<imp-def,dead>, %XMM1<imp-def,dead>, %XMM2<imp-def,dead>, %XMM3<imp-def,dead>, %XMM4<imp-def,dead>, %XMM5<imp-def,dead>, %XMM6<imp-def,dead>, %XMM7<im...

[LLVMdev] Codegen/Register allocation question.

2008 Sep 04

[LLVMdev] Codegen/Register allocation question.

...;imp-def,dead>, %FP3<imp-def,dead>, > %FP4<imp-def,dead>, %FP5<imp-def,dead>, %FP6<imp-def,dead>, > %ST0<imp-def,dead>, %ST1<imp-def,dead>, %MM0<imp-def,dead>, > %MM1<imp-def,dead>, %MM2<imp-def,dead>, %MM3<imp-def,dead>, > %MM4<imp-def,dead>, %MM5<imp-def,dead>, %MM6<imp-def,dead>, > %MM7<imp-def,dead>, %XMM0<imp-def,dead>, %XMM1<imp-def,dead>, > %XMM2<imp-def,dead>, %XMM3<imp-def,dead>, %XMM4<imp-def,dead>, > %XMM5<imp-def,dead>, %XMM6<imp-def,dead&g...

[LLVMdev] TargetRegisterClass for Physical Register

2007 Jun 18

[LLVMdev] TargetRegisterClass for Physical Register

How do I get the TargetRegisterClass for a physical register? SSARegMap::getRegClass only works for virtual registers. -Dave

[LLVMdev] TargetRegisterClass for Physical Register

2007 Jun 19

[LLVMdev] TargetRegisterClass for Physical Register

Take a look at getPhysicalRegisterRegClass( const MRegisterInfo *MRI, MVT::ValueType VT, unsigned reg) in ScheduleDAG.cpp. -- Christopher Lamb On Jun 18, 2007, at 4:52 PM, David A. Greene wrote: > How do I get the TargetRegisterClass for a physical register? > SSARegMap::getRegClass only works for virtual registers. > >

search for: mm4