thr3ads.net - search: "vgetq

Displaying 2 results from an estimated 2 matches for "vgetq_lane".

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2014 Dec 08

[LLVMdev] NEON intrinsics preventing redundant load optimization?

...the best thing to do, since this > is a common pattern that needs looking into to produce optimal code. Thanks for the responses. I’ve filed bug #21778 for this: http://llvm.org/bugs/show_bug.cgi?id=21778 I’ve also tried replacing the vst1.32 with setting the data[i] elements individually with vgetq_lane, which gets at least the single multiply case back to optimal code. There’s still an unneeded temporary when doing res = a * b * c though. Anyway, let’s continue this on the bug tracker :) Simon

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2014 Dec 07

[LLVMdev] NEON intrinsics preventing redundant load optimization?

Hi all, I’m not sure if this is the right list, so apologies if not. Doing some profiling I noticed some of my hand-tuned matrix multiply code with NEON intrinsics was much slower through a C++ template wrapper vs calling the intrinsics function directly. It turned out clang/LLVM was unable to eliminate a temporary even though the case seemed quite straightforward. Unfortunately any loads

search for: vgetq_lane