On 27 September 2010 23:45, Bob Wilson <bob.wilson at apple.com> wrote:> An implementation, such as in GCC, that does not use structures is compatible with ARM's specification in only one direction. GCC will accept any code written for RVCT, but not the other way around. And, as Al pointed out, there are also compatibility issues with how you can initialize vectors. (In fact, if you stick to the documented interfaces, the only way you can initialize a vector to an arbitrary value is by loading from memory.)Hi Bob, Can you clarify what compatibility problems you had with GCC? And that by using structures in Clang you made it work with armcc? Is it just a source code compatibility issue?> Can we get an official position from ARM on this?I really don't know what you want here. I can't tell you that it will be safe to remove the structures from Clang, since I don't know enough about the vector types (and all other back-ends that use it) and what the problems you had with gcc/armcc compatibility. Maybe, because of the way vectors are implemented in LLVM, there is no other way... maybe not.> Wait a minute.... VCEQ does not have a special polynomial version. There is only VCEQ.I8. What I said about support for polynomial types in Clang is still true, but for this particular case, there is no difference between vceq_s8, vceq_u8, and vceq_p8 (aside from the types of the intrinsic arguments).Sorry, bad example... (and wrong copy&past test generation) ;) -- cheers, --renato http://systemcall.org/ Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm
On Sep 28, 2010, at 2:07 AM, Renato Golin wrote:> On 27 September 2010 23:45, Bob Wilson <bob.wilson at apple.com> wrote: >> An implementation, such as in GCC, that does not use structures is compatible with ARM's specification in only one direction. GCC will accept any code written for RVCT, but not the other way around. And, as Al pointed out, there are also compatibility issues with how you can initialize vectors. (In fact, if you stick to the documented interfaces, the only way you can initialize a vector to an arbitrary value is by loading from memory.) > > Hi Bob, > > Can you clarify what compatibility problems you had with GCC? And that > by using structures in Clang you made it work with armcc? > > Is it just a source code compatibility issue?Yes, there are multiple issues but they all involve source compatibility. Here is an example: #include <arm_neon.h> uint32x2_t test(int32x2_t x) { return vadd_u32(x, x); } This works fine with GCC because int32x2_t and uint32x2_t are built-in vector types and can be implicitly converted. It is not valid if those types are defined as structs, because C/C++ do not allow distinct struct types to be implicitly converted just because they happen to have the same size. To get this to compile when the NEON types are structs, you need to add vreinterpret intrinsics: #include <arm_neon.h> uint32x2_t test(int32x2_t x) { uint32x2_t ux = vreinterpret_u32_s32(x); return vadd_u32(ux, ux); } I do not have access to ARM's compiler(s) but I'm assuming that the first example will not compile because vadd_u32 expects arguments of type uint32x2_t. Using structs in llvm does not "fix" a compatibility problem, but it helps our users write NEON code that will work with ARM's compiler.>> Can we get an official position from ARM on this? > > I really don't know what you want here. I can't tell you that it will > be safe to remove the structures from Clang, since I don't know enough > about the vector types (and all other back-ends that use it) and what > the problems you had with gcc/armcc compatibility. Maybe, because of > the way vectors are implemented in LLVM, there is no other way... > maybe not.We have gone to some lengths to make llvm match ARM's specifications and to help our users write code that will be portable to work with your compiler. If we don't have to worry about compatibility with ARM's specifications and ARM's compilers, we can drop the struct wrappers and make life easier for ourselves. I am getting requests that we do that regardless of ARM's opinion, but I've resisted based on the notion that portability and compatibility are worth fighting for. It is pretty ironic and frustrating to me to hear that even people at ARM think these wrapper structs are a bad idea. I would still prefer to have ARM publish specifications and guidelines that make it possible to write portable NEON code. Do you guys even care about that?
On 28 September 2010 18:20, Bob Wilson <bob.wilson at apple.com> wrote:> Yes, there are multiple issues but they all involve source compatibility.Hi Bob, than this is a completely different matter altogether.> I do not have access to ARM's compiler(s) but I'm assuming that the first example will not compile because vadd_u32 expects arguments of type uint32x2_t. Using structs in llvm does not "fix" a compatibility problem, but it helps our users write NEON code that will work with ARM's compiler.Indeed, the types are different, you will get an incompatible parameter error.> I am getting requests that we do that regardless of ARM's opinion, but I've resisted based on the notion that portability and compatibility are worth fighting for. It is pretty ironic and frustrating to me to hear that even people at ARM think these wrapper structs are a bad idea. I would still prefer to have ARM publish specifications and guidelines that make it possible to write portable NEON code. Do you guys even care about that?Nobody said that the structures are a bad idea, nor that it's not worth fighting for compatibility. What I said was: 1. The use of structures is an implementation choice. GCC chooses not to, we chose to. Simple as that. 2. The use of structures, *in IR*, is not necessary. Even using structure in the source code, you can easily detect NEON types and transform the IR accordingly. We do concern ourselves with compatibility, more than people normally believe. But there are certain constraints (partners, design issues, integration) that we simply cannot ignore. My first proposition of making every NEON call an intrinsic could help not only IR generation and codegen, but also make the arm_neon.h header more compatible with ARM's without the need of reinterpreting structures. I still have to think more about it (haven't thought about the header at all, so far), but this is something I can do and am willing to do to help Clang without breaking compatibility with ARM (the last thing I would want). We could (maybe should) discuss the intrinsic issue off-list, though. -- cheers, --renato http://systemcall.org/ Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm