thr3ads.net - search: "vloadq

Displaying 2 results from an estimated 2 matches for "vloadq_f32".

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2015 Jan 05

[LLVMdev] NEON intrinsics preventing redundant load optimization?

On 4 Jan 2015, at 21:06, Tim Northover <t.p.northover at gmail.com> wrote: >>> I’ve managed to replace the load/store intrinsics with pointer dereferences (along with a typedef to get the alignment correct). This generates 100% the same IR + asm as the auto-vectorized C version (both using -O3), and works with the toolchain in the latest XCode. Are there any concerns around doing

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2015 Jan 05

[LLVMdev] NEON intrinsics preventing redundant load optimization?

...et and vload. NEON intrinsic types don't exist in memory (memory is modelled as a sequence of scalars, as in the C model). For this reason Renato I don't think we should advise people to work around the API, as who knows what problems that will cause later. The reason above is why we map a vloadq_f32() into a NEON intrinsic instead of a generic IR load. Looking at your testcase, even with tip-of-trunk clang we generate redundant loads and stores: vld1.32 {d16, d17}, [r1] vld1.32 {d18, d19}, [r0] mov r0, sp vmul.f32 q8, q9, q8 vst1.32 {d16, d17}, [r0] vld1.64 {d16, d17}, [r0:128] vst1.32 {d16,...

search for: vloadq_f32