Displaying 2 results from an estimated 2 matches for "vgetq_lane".
2014 Dec 08
2
[LLVMdev] NEON intrinsics preventing redundant load optimization?
...the best thing to do, since this
> is a common pattern that needs looking into to produce optimal code.
Thanks for the responses. I’ve filed bug #21778 for this:
http://llvm.org/bugs/show_bug.cgi?id=21778
I’ve also tried replacing the vst1.32 with setting the data[i] elements individually with vgetq_lane, which gets at least the single multiply case back to optimal code. There’s still an unneeded temporary when doing res = a * b * c though. Anyway, let’s continue this on the bug tracker :)
Simon
2014 Dec 07
3
[LLVMdev] NEON intrinsics preventing redundant load optimization?
Hi all,
I’m not sure if this is the right list, so apologies if not.
Doing some profiling I noticed some of my hand-tuned matrix multiply code with NEON intrinsics was much slower through a C++ template wrapper vs calling the intrinsics function directly. It turned out clang/LLVM was unable to eliminate a temporary even though the case seemed quite straightforward. Unfortunately any loads