Displaying 2 results from an estimated 2 matches for "vloadq_f32".
2015 Jan 05
2
[LLVMdev] NEON intrinsics preventing redundant load optimization?
On 4 Jan 2015, at 21:06, Tim Northover <t.p.northover at gmail.com> wrote:
>>> I’ve managed to replace the load/store intrinsics with pointer dereferences (along with a typedef to get the alignment correct). This generates 100% the same IR + asm as the auto-vectorized C version (both using -O3), and works with the toolchain in the latest XCode. Are there any concerns around doing
2015 Jan 05
4
[LLVMdev] NEON intrinsics preventing redundant load optimization?
...et and vload. NEON intrinsic types don't exist
in memory (memory is modelled as a sequence of scalars, as in the C model).
For this reason Renato I don't think we should advise people to work around
the API, as who knows what problems that will cause later.
The reason above is why we map a vloadq_f32() into a NEON intrinsic instead
of a generic IR load. Looking at your testcase, even with tip-of-trunk
clang we generate redundant loads and stores:
vld1.32 {d16, d17}, [r1]
vld1.32 {d18, d19}, [r0]
mov r0, sp
vmul.f32 q8, q9, q8
vst1.32 {d16, d17}, [r0]
vld1.64 {d16, d17}, [r0:128]
vst1.32 {d16,...