thr3ads.net - search: "perfetta"

Displaying 3 results from an estimated 3 matches for "perfetta".

Did you mean: perfecta

[LLVMdev] compiler-rt's arm vfp o<= implementation

2010 Apr 09

[LLVMdev] compiler-rt's arm vfp o<= implementation

On 8 April 2010 02:28, Rodolph Perfetta <rodolph.perfetta at arm.com> wrote: > movhi means mov if unsigned Higher > > movls means mov if unsigned Lower or Same > > > > so depending on the comparison result r0 holds 1 or 0 > Thanks. Now that I understand the assembly, I think there's another problem. l...

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

2009 Nov 11

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

On Nov 11, 2009, at 3:27 AM, Rodolph Perfetta wrote: > > If you know about the alignment, maybe use structured load/store > (vst1.64/vld1.64 {dn-dm}). You may also want to work on whole cache > lines > (64 bytes on A8). You can find more in this discussion: > http://groups.google.com/group/beagleboard/browse_thread/thread/1...

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

2009 Nov 10

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

I tried to speed up Dhrystone on ARM Cortex-A8 by optimizing the memcpy intrinsic. I used the Neon load multiple instruction to move up to 48 bytes at a time . Over 15 scalar instructions collapsed down into these 2 Neon instructions. fldmiad r3, {d0, d1, d2, d3, d4, d5} @ SrcLine dhrystone.c 359 fstmiad r1, {d0, d1, d2, d3, d4, d5} It seems like this should be faster. But I did

search for: perfetta