thr3ads.net - search: "fstmiad"

Displaying 3 results from an estimated 3 matches for "fstmiad".

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

2009 Nov 10

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

...ex-A8 by optimizing the > memcpy intrinsic. I used the Neon load multiple instruction to move up > to 48 bytes at a time . Over 15 scalar instructions collapsed down > into these 2 Neon instructions. > > fldmiad r3, {d0, d1, d2, d3, d4, d5} @ SrcLine dhrystone.c 359 > fstmiad r1, {d0, d1, d2, d3, d4, d5} > > It seems like this should be faster. But I did not see any > appreciable speedup. > > I think the patch is correct. The code runs fine. > > I have attached my patch for "lib/Target/ARM/ARMISelLowering.cpp" to > this email. >...

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

2009 Nov 10

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

...ed up Dhrystone on ARM Cortex-A8 by optimizing the memcpy intrinsic. I used the Neon load multiple instruction to move up to 48 bytes at a time . Over 15 scalar instructions collapsed down into these 2 Neon instructions. fldmiad r3, {d0, d1, d2, d3, d4, d5} @ SrcLine dhrystone.c 359 fstmiad r1, {d0, d1, d2, d3, d4, d5} It seems like this should be faster. But I did not see any appreciable speedup. I think the patch is correct. The code runs fine. I have attached my patch for "lib/Target/ARM/ARMISelLowering.cpp" to this email. Does this look like the right modification?...

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

2009 Nov 10

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

...>> up >> to 48 bytes at a time . Over 15 scalar instructions collapsed down >> into these 2 Neon instructions. Nice. Thanks for working on this. It has long been on my todo list. >> >> fldmiad r3, {d0, d1, d2, d3, d4, d5} @ SrcLine dhrystone.c 359 >> fstmiad r1, {d0, d1, d2, d3, d4, d5} >> >> It seems like this should be faster. But I did not see any >> appreciable speedup. Even if it's not faster, it's still a code size win which is also important. Are we generating the right aligned NEON load / stores? >> >>...

search for: fstmiad