Renato Golin
2015-Jan-02 15:37 UTC
[LLVMdev] NEON intrinsics preventing redundant load optimization?
On 10 December 2014 at 11:13, Simon Taylor <simontaylor1 at ntlworld.com> wrote:> I’ve managed to replace the load/store intrinsics with pointer dereferences (along with a typedef to get the alignment correct). This generates 100% the same IR + asm as the auto-vectorized C version (both using -O3), and works with the toolchain in the latest XCode. Are there any concerns around doing this?My view is that you should only use intrinsics where the language has no semantics for it. Since this is not the case, using pointers is probably the best way, anyway. There is still the "bug" where the load/store intrinsics don't map to simple pointer references, but since you found a better work-around, that has lower priority now. I changed the bug to reflect that. cheers, --renato
Tim Northover
2015-Jan-04 21:06 UTC
[LLVMdev] NEON intrinsics preventing redundant load optimization?
>> I’ve managed to replace the load/store intrinsics with pointer dereferences (along with a typedef to get the alignment correct). This generates 100% the same IR + asm as the auto-vectorized C version (both using -O3), and works with the toolchain in the latest XCode. Are there any concerns around doing this? > > My view is that you should only use intrinsics where the language has > no semantics for it. Since this is not the case, using pointers is > probably the best way, anyway.I think dereferencing pointers is explicitly discouraged in the documentation for portability reasons. It may well have issues on wrong-endian targets. Tim.
Simon Taylor
2015-Jan-05 10:14 UTC
[LLVMdev] NEON intrinsics preventing redundant load optimization?
On 4 Jan 2015, at 21:06, Tim Northover <t.p.northover at gmail.com> wrote:>>> I’ve managed to replace the load/store intrinsics with pointer dereferences (along with a typedef to get the alignment correct). This generates 100% the same IR + asm as the auto-vectorized C version (both using -O3), and works with the toolchain in the latest XCode. Are there any concerns around doing this? >> >> My view is that you should only use intrinsics where the language has >> no semantics for it. Since this is not the case, using pointers is >> probably the best way, anyway. > > I think dereferencing pointers is explicitly discouraged in the > documentation for portability reasons. It may well have issues on > wrong-endian targets.The ARM ACLE docs recommend against the GCC extension that allows an initializer list because of potential endianness issues: float32x4_t values = {1, 2, 3, 4}; I don’t recall seeing anything about pointer dereferencing, but it may have the same issues. I’m a bit hazy on endianness issues with NEON anyway (in terms of element numbering, casts between types, etc) but it seems like all the smartphone platform ABIs are defined to be little-endian so I haven’t spent too much time worrying about it. Simon
Maybe Matching Threads
- [LLVMdev] NEON intrinsics preventing redundant load optimization?
- [LLVMdev] Unaligned vector memory access for ARM/NEON.
- [LLVMdev] Unaligned vector memory access for ARM/NEON.
- [LLVMdev] NEON intrinsics preventing redundant load optimization?
- [LLVMdev] Unaligned vector memory access for ARM/NEON.