On 27 September 2010 18:19, Bob Wilson <bob.wilson at apple.com> wrote:> I'm not sure what you mean by this. The llvm intrinsics and built-in vector operations use plain vectors regardless of the front-end. The structures are only relevant for things like argument passing and copying -- you can't do anything else with them. Can you post an example of the 5X IR code size that you're seeing with clang? I'd like to understand the issue that you're seeing.I mean that I could remove all structure boilerplate and it still works, plus you don't have to define any type (as LLVM uses the vector types), as per the discussion we're having about needing to use structures. Make that 2x smaller, I had a special case that was not a fair comparison. But I recently found out that the polyNxN_t vector type can destroy everything, as it appears to LLVM as <8 x i8>, and is identical to a intNxN_t for base instructions, so an "icmp eq <8 x i8>" always become VCEQ.I8 and never a VCEQ.P8, even though that's what Clang generates. Putting them into structures doesn't help because of the type names being irrelevant, both names become %struct.__simd64_int8_t %struct.__simd64_int8_t = type { <8 x i8> } %struct.__simd64_poly8_t = type { <8 x i8> } %struct.__simd64_uint8_t = type { <8 x i8> } @u8d = common global %struct.__simd64_int8_t zeroinitializer, align 8 @i8d = common global %struct.__simd64_int8_t zeroinitializer, align 8 @p8d = common global %struct.__simd64_int8_t zeroinitializer, align 8 The difference between uint8x8 and int8x8 is done via 'nsw' (which, unless it's really generating a trap value, it's a misleading tag), but there's nothing that will flag this type as poly8x8. When I try to compile this with Clang: === comp.c ==#define __ARM_NEON__ #include <arm_neon.h> int8x8_t i8d; uint8x8_t u8d; poly8x8_t p8d; void vceq() { u8d = vceq_s8(i8d, i8d); u8d = vceq_p8(p8d, p8d); } === end == It generates exactly the same instruction for both calls: $ clang -ccc-host-triple armv7a-none-eabi -ccc-gcc-name arm-none-eabi-gcc -mfloat-abi=hard -w -S comp.c -o - | grep vceq .globl vceq .type vceq,%function vceq: vceq.i8 d0, d1, d0 vceq.i8 d0, d1, d0 .size vceq, .Ltmp0-vceq Isn't that a call to use intrinsics? -- cheers, --renato http://systemcall.org/ Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm
Support for NEON intrinsics in clang is not complete. Poly types in general are known to be an issue, and the vceq_p8 in your example definitely needs an intrinisic. It should work with llvm-gcc. Can you clarify ARM's position on those structure types? It sounds like you are advocating that we get rid of them. The only reason we've been using them in llvm-gcc and clang is for compatibility for ARM's specifications and with ARM's RVCT compiler. If ARM does not care about those things, I'd love to remove the struct wrappers from llvm. On Sep 27, 2010, at 2:51 PM, Renato Golin wrote:> On 27 September 2010 18:19, Bob Wilson <bob.wilson at apple.com> wrote: >> I'm not sure what you mean by this. The llvm intrinsics and built-in vector operations use plain vectors regardless of the front-end. The structures are only relevant for things like argument passing and copying -- you can't do anything else with them. Can you post an example of the 5X IR code size that you're seeing with clang? I'd like to understand the issue that you're seeing. > > > I mean that I could remove all structure boilerplate and it still > works, plus you don't have to define any type (as LLVM uses the vector > types), as per the discussion we're having about needing to use > structures. Make that 2x smaller, I had a special case that was not a > fair comparison. > > But I recently found out that the polyNxN_t vector type can destroy > everything, as it appears to LLVM as <8 x i8>, and is identical to a > intNxN_t for base instructions, so an "icmp eq <8 x i8>" always become > VCEQ.I8 and never a VCEQ.P8, even though that's what Clang generates. > > Putting them into structures doesn't help because of the type names > being irrelevant, both names become %struct.__simd64_int8_t > > %struct.__simd64_int8_t = type { <8 x i8> } > %struct.__simd64_poly8_t = type { <8 x i8> } > %struct.__simd64_uint8_t = type { <8 x i8> } > > @u8d = common global %struct.__simd64_int8_t zeroinitializer, align 8 > @i8d = common global %struct.__simd64_int8_t zeroinitializer, align 8 > @p8d = common global %struct.__simd64_int8_t zeroinitializer, align 8 > > The difference between uint8x8 and int8x8 is done via 'nsw' (which, > unless it's really generating a trap value, it's a misleading tag), > but there's nothing that will flag this type as poly8x8. > > When I try to compile this with Clang: > > === comp.c ==> #define __ARM_NEON__ > #include <arm_neon.h> > > int8x8_t i8d; > uint8x8_t u8d; > poly8x8_t p8d; > > void vceq() { > u8d = vceq_s8(i8d, i8d); > u8d = vceq_p8(p8d, p8d); > } > === end ==> > It generates exactly the same instruction for both calls: > > $ clang -ccc-host-triple armv7a-none-eabi -ccc-gcc-name > arm-none-eabi-gcc -mfloat-abi=hard -w -S comp.c -o - | grep vceq > .globl vceq > .type vceq,%function > vceq: > vceq.i8 d0, d1, d0 > vceq.i8 d0, d1, d0 > .size vceq, .Ltmp0-vceq > > > Isn't that a call to use intrinsics? > > -- > cheers, > --renato > > http://systemcall.org/ > > Reclaim your digital rights, eliminate DRM, learn more at > http://www.defectivebydesign.org/what_is_drm
On 27 September 2010 23:03, Bob Wilson <bob.wilson at apple.com> wrote:> Can you clarify ARM's position on those structure types? It sounds like you are advocating that we get rid of them. The only reason we've been using them in llvm-gcc and clang is for compatibility for ARM's specifications and with ARM's RVCT compiler. If ARM does not care about those things, I'd love to remove the struct wrappers from llvm.As Al said earlier, you definitely don't need the structures for compatibility with armcc. As far as the LLVM back-end is concerned, with or without structures, the instruction selection works a treat and generates correct NEON instructions. If the final object has the correct instructions and follows ARM ABIs, there is no point in keeping IR compatibility. I also noticed that Clang's arm_neon.h is completely different from armcc's, another non-compatible choice that has no impact in the final object code generated. As far as I can see, there is no gain in adding the wrapping structures to the vector types. I'll add the intrinsic to the VCEQ.P8 locally and test. If that works, I'll be sending patches to NEON.td for all ambiguities I find... -- cheers, --renato http://systemcall.org/ Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm