Displaying 20 results from an estimated 30 matches for "mm4".
Did you mean:
mm
2004 Sep 10
2
An assembly optimization and fix
...total_error_1:total_error_0
- ; mm1 == total_error_3:total_error_2
- ; mm2 == 0:total_error_4
- ; mm3/4 == 0:unpackarea
- ; mm5 == abs(error_1):abs(error_0)
- ; mm5 == abs(error_3):abs(error_2)
+ ; mm1 == total_error_2:total_error_3
+ ; mm2 == :total_error_4
+ ; mm3 == last_error_1:last_error_0
+ ; mm4 == last_error_2:last_error_3
- pxor mm0, mm0 ; total_error_1 = total_error_0 = 0
- pxor mm1, mm1 ; total_error_3 = total_error_2 = 0
- pxor mm2, mm2 ; total_error_4 = 0
- mov ebx, [esp + 36] ; ebx = data[]
- mov ecx, [ebx - 4] ; ecx == data[-1] last_error_0 = data[-1]
- mov eax, [ebx -...
2005 Aug 17
2
MMX loop filter for theora-exp
...p_filter_v_mmx(unsigned char *_pix,int _ystride,int *_bv){
+ int y;
+ _pix-=_ystride*2;
+
+__asm__ __volatile__(
+"pxor %%mm0,%%mm0\n" /* mm0 = 0 */
+"movq (%0),%%mm7\n" /* mm7 = _pix[0..8] */
+"lea (%1,%1,2),%%esi\n" /* esi = _ystride*3 */
+"movq (%0,%%esi),%%mm4\n" /* mm4 = _pix[0..8]+_ystride*3] */
+"movq %%mm7,%%mm6\n" /* mm6 = _pix[0..8] */
+"punpcklbw %%mm0,%%mm6\n" /* expand unsigned _pix[0..3] to 16 bits */
+"movq %%mm4,%%mm5\n"
+"punpckhbw %%mm0,%%mm7\n" /* expand unsigned _pix[4..8] to 16 bits */
+"...
2004 Aug 24
5
MMX/mmxext optimisations
quite some speed improvement indeed.
attached the updated patch to apply to svn/trunk.
j
-------------- next part --------------
A non-text attachment was scrubbed...
Name: theora-mmx.patch.gz
Type: application/x-gzip
Size: 8648 bytes
Desc: not available
Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin
2010 Sep 29
1
Understanding linear contrasts in Anova using R
#I am trying to understand how R fits models for contrasts in a
#simple one-way anova. This is an example, I am not stupid enough to want
#to simultaneously apply all of these contrasts to real data. With a few
#exceptions, the tests that I would compute by hand (or by other software)
#will give the same t or F statistics. It is the contrast estimates that
R produces
#that I can't seem to
2009 Aug 30
3
experimental patch for libtheora1.1beta3
...t"
+ /* Not working "lea -32(%[ret],%[ret]),%[ret]\n\t" */
+ /* Like ret = ret+ret-32 */
+ "add %[ret],%[ret]\n\t"
+ "sub 32,%[ret]\n\t"
"movq 0x40(%[buf]),%%mm0\n\t"
"cmp %[ret2],%[ret]\n\t"
"movq 0x48(%[buf]),%%mm4\n\t"
@@ -511,7 +514,11 @@ static unsigned oc_int_frag_satd_thresh_mmxext(const u
"punpckhdq %%mm0,%%mm0\n\t"
"paddd %%mm0,%%mm4\n\t"
"movd %%mm4,%[ret2]\n\t"
- "lea (%[ret],%[ret2],2),%[ret]\n\t"
+ /* Not working "lea (%[ret],%[...
2005 Mar 23
3
[PATCH] promised MMX patches rc1
...st ogg_int16_t *_residue){
+int i;
+ __asm__ __volatile__ (
+" movl $0x7, %7 \n\t" /* 8x loop */
+" pxor %%mm0, %%mm0 \n\t" /* zero mm0 */
+" movq (%4), %%mm2 \n\t" /* load mm2 with _src1 */
+" .balign 16 \n\t"
+"1: movq (%6), %%mm4 \n\t" /* packed SRC2 */
+" movq %%mm2, %%mm3 \n\t" /* copy to mm3 */
+" movq %%mm4, %%mm5 \n\t" /* copy packed src2 to mm5 */
+" mov %3, %%eax \n\t"
+" punpcklbw %%mm0, %%mm2 \n\t" /* expand low part of src1 to mm2 */
+" punpcklbw %%mm0...
2005 Apr 03
2
RTNETLINK answers: Invalid argument
Hi,
On this Fedora Core Devel (Raw Hide) system, if I boot on a distribution
kernel (based on 2.6.12rc1-bk2) the network is fine. If I build a custom
2.6.12-rc1-V0.7.43-06 or 2.6.12-rc1-mm4 kernel the network interface
fails to initialise on boot with RTNETLINK answers: Invalid argument.
What can possibly cause this ?
My kernel config should be mostly fine - I used it extensively at a time
and diffing it with Red Hat does not show any obvious suspects (to me)).
Thoug I haven'...
2009 Oct 13
3
Proposal for replacing asm code with intrinsics
...My proposal is to replace all functions in assembly with compiler intrinsic which compiles into 1-2 assembly instructions and are much easier to maintain.
For example:
_mm_sad_epu8(__m128, __m128) will be compiled in PSADBW instruction with compiler-allocated registers.
And code like:
psadbw mm4,mm5
paddw mm0,mm4
Can be re-written into
_m64 mm0, mm4, mm5, mm6, mm7; //of course using meaningful names
mm0= _mm_add_epi16(mm0, _mm_sad_pu8(mm4, mm5));
Compiler will replace variables with actual registers, ensuring better allocation and scheduling of them.
So, benefits are:
1) Easier to r...
2005 Mar 23
0
[PATCH]
...st ogg_int16_t *_residue){
+int i;
+ __asm__ __volatile__ (
+" movl $0x7, %7 \n\t" /* 8x loop */
+" pxor %%mm0, %%mm0 \n\t" /* zero mm0 */
+" movq (%4), %%mm2 \n\t" /* load mm2 with _src1 */
+" .balign 16 \n\t"
+"1: movq (%6), %%mm4 \n\t" /* packed SRC2 */
+" movq %%mm2, %%mm3 \n\t" /* copy to mm3 */
+" movq %%mm4, %%mm5 \n\t" /* copy packed src2 to mm5 */
+" mov %3, %%eax \n\t"
+" punpcklbw %%mm0, %%mm2 \n\t" /* expand low part of src1 to mm2 */
+" punpcklbw %%mm0...
2010 Oct 20
2
[LLVMdev] llvm register reload/spilling around calls
...mm regs I
>> added, however the calling code did not change at all...
>
> Look in X86InstrControl.td. The call instructions are all prefixed
> by:
>
> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2,
> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10,
> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
>
> This is the fixed list of call-clobbered registers. It should really
> be controlled by the calling convention of the called function
> instead....
2010 Oct 20
0
[LLVMdev] llvm register reload/spilling around calls
...wrote:
> On 20.10.2010 05:00, Jakob Stoklund Olesen wrote:
>> Look in X86InstrControl.td. The call instructions are all prefixed
>> by:
>>
>> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2,
>> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
>> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10,
>> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
>>
>> This is the fixed list of call-clobbered registers. It should really
>> be controlled by the calling convention of the called fun...
2010 Oct 20
1
[LLVMdev] llvm register reload/spilling around calls
...10.2010 05:00, Jakob Stoklund Olesen wrote:
>>> Look in X86InstrControl.td. The call instructions are all prefixed
>>> by:
>>>
>>> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2,
>>> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
>>> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10,
>>> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
>>>
>>> This is the fixed list of call-clobbered registers. It should really
>>> be controlled by the calling conventio...
2010 Oct 20
0
[LLVMdev] llvm register reload/spilling around calls
...he xmm regs I
> added, however the calling code did not change at all...
Look in X86InstrControl.td. The call instructions are all prefixed by:
let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11,
FP0, FP1, FP2, FP3, FP4, FP5, FP6, ST0, ST1,
MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,
XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
This is the fixed list of call-clobbered registers. It should really be controlled by the calling convention of the called function instead.
T...
2002 Jan 22
1
glm.predict?
...6, 144, 164, 171, 200, 187, 169, 189, 168, 182, 208, 207, 193,
144, 178, 177, 176, 205, 153, 228, 227, 147, 173, 157, 214, 167,
140, 179, 204, 184, 151, 115, 173, 208, 135, 175, 136, 121, 189,
148, 174), .Names = c("Lead1.mm1", "Lead1.mm2", "Lead1.mm3",
"Lead1.mm4", "Lead1.mm5", "Lead1.mm6", "Lead1.mm7", "Lead1.mm8",
"Lead1.mm9", "Lead1.mm10", "Lead1.mm11", "Lead1.mm12", "Lead1.mm13",
"Lead1.mm14", "Lead1.mm15", "Lead1.mm16", "L...
2007 Jun 19
3
[LLVMdev] TargetRegisterClass for Physical Register
...ally_ the case that it's in multiple classes). Does ValueType have
something to do with that?
In the same file, the VR64 register class has the following definition:
def VR64 : RegisterClass<"X86", [v8i8, v4i16, v2i32, v1i64], 64,
[MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7]>;
So there are multiple ValueTypes here (the scalar registers each only have
one corresponding to the bit size of the register). But still, if I have
physical register MM2, that completely determines its register class.
Is there some other architecture where the physical regi...
2010 Oct 20
3
[LLVMdev] llvm register reload/spilling around calls
Thanks for giving it a look!
On 19.10.2010 23:21, Jakob Stoklund Olesen wrote:
> On Oct 19, 2010, at 11:40 AM, Roland Scheidegger wrote:
>
>> So I saw that the code is doing lots of register
>> spilling/reloading. Now I understand that due to calling
>> conventions, there's not really a way to avoid this - I tried using
>> coldcc but apparently the backend
2008 Sep 03
2
[LLVMdev] Codegen/Register allocation question.
...ef,dead>, %FP2<imp-def,dead>, %FP3<imp-def,dead>,
%FP4<imp-def,dead>, %FP5<imp-def,dead>, %FP6<imp-def,dead>,
%ST0<imp-def,dead>, %ST1<imp-def,dead>, %MM0<imp-def,dead>,
%MM1<imp-def,dead>, %MM2<imp-def,dead>, %MM3<imp-def,dead>,
%MM4<imp-def,dead>, %MM5<imp-def,dead>, %MM6<imp-def,dead>,
%MM7<imp-def,dead>, %XMM0<imp-def,dead>, %XMM1<imp-def,dead>,
%XMM2<imp-def,dead>, %XMM3<imp-def,dead>, %XMM4<imp-def,dead>,
%XMM5<imp-def,dead>, %XMM6<imp-def,dead>, %XMM7<im...
2008 Sep 04
0
[LLVMdev] Codegen/Register allocation question.
...;imp-def,dead>, %FP3<imp-def,dead>,
> %FP4<imp-def,dead>, %FP5<imp-def,dead>, %FP6<imp-def,dead>,
> %ST0<imp-def,dead>, %ST1<imp-def,dead>, %MM0<imp-def,dead>,
> %MM1<imp-def,dead>, %MM2<imp-def,dead>, %MM3<imp-def,dead>,
> %MM4<imp-def,dead>, %MM5<imp-def,dead>, %MM6<imp-def,dead>,
> %MM7<imp-def,dead>, %XMM0<imp-def,dead>, %XMM1<imp-def,dead>,
> %XMM2<imp-def,dead>, %XMM3<imp-def,dead>, %XMM4<imp-def,dead>,
> %XMM5<imp-def,dead>, %XMM6<imp-def,dead&g...
2007 Jun 18
2
[LLVMdev] TargetRegisterClass for Physical Register
How do I get the TargetRegisterClass for a physical register?
SSARegMap::getRegClass only works for virtual registers.
-Dave
2007 Jun 19
0
[LLVMdev] TargetRegisterClass for Physical Register
Take a look at getPhysicalRegisterRegClass(
const MRegisterInfo *MRI,
MVT::ValueType VT,
unsigned reg)
in ScheduleDAG.cpp.
--
Christopher Lamb
On Jun 18, 2007, at 4:52 PM, David A. Greene wrote:
> How do I get the TargetRegisterClass for a physical register?
> SSARegMap::getRegClass only works for virtual registers.
>
>