thr3ads.net - search: "1cd1162f"

Displaying 2 results from an estimated 2 matches for "1cd1162f".

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

2009 Nov 11

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

...shows that the tail will be in a separate 16- byte block. (And what's up with the 16-byte divisions? I thought the cache lines are 64 bytes....) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091111/1cd1162f/attachment.html>

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

2009 Nov 10

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

I tried to speed up Dhrystone on ARM Cortex-A8 by optimizing the memcpy intrinsic. I used the Neon load multiple instruction to move up to 48 bytes at a time . Over 15 scalar instructions collapsed down into these 2 Neon instructions. fldmiad r3, {d0, d1, d2, d3, d4, d5} @ SrcLine dhrystone.c 359 fstmiad r1, {d0, d1, d2, d3, d4, d5} It seems like this should be faster. But I did

search for: 1cd1162f