Second question:
I was checking NEON instructions this week and the vector types seem
to be inside structures. If vector types are considered proper types
in LLVM, why pack them inside structures?
That results in a lot of boilerplate code for converting and copying
the values (about 20 lines of IR) just to call a NEON instruction
that, in the end, will be converted into three instructions:
VLDR + {whatever} + VSTR
If the load and store are normally performed by one operation (I
assume it's the same on Intel and others), why bother with the
structure passing instead of just using load/store for vector types?
Also, the extra struct { [i8 x 8] } for memcopy seems also redundant.
If you're explicitly telling you want NEON (or whatever vector
instructions), why bother with compatibility?
-- 
cheers,
--renato
http://systemcall.org/
Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm
On Sep 21, 2010, at 9:33 AM, Renato Golin wrote:> Second question: > > I was checking NEON instructions this week and the vector types seem > to be inside structures. If vector types are considered proper types > in LLVM, why pack them inside structures?Because that is what ARM has specified? They define the vector types that are used with their NEON intrinsics as "containerized vectors". Perhaps someone on the list from ARM can explain why they did it that way. The extra structures are irrelevant at the llvm IR level and below. The NEON intrinsics in llvm use plain old vector types. If you're using llvm-gcc, you can define the ARM_NEON_GCC_COMPATIBILITY preprocessor macro, and it will switch to a version of the NEON intrinsics that use plain vector types instead of the containerized vectors. For clang, we are planning to do something similar (without requiring the macro) by overloading the intrinsic functions to take either type of arguments, but that is not yet implemented.> > That results in a lot of boilerplate code for converting and copying > the values (about 20 lines of IR) just to call a NEON instruction > that, in the end, will be converted into three instructions: > > VLDR + {whatever} + VSTR > > If the load and store are normally performed by one operation (I > assume it's the same on Intel and others), why bother with the > structure passing instead of just using load/store for vector types?As you noted, the struct wrappers produce a lot of extra code but it should all be optimized away. If you see a case where that is not happening, please file a bug report.> > Also, the extra struct { [i8 x 8] } for memcopy seems also redundant. > If you're explicitly telling you want NEON (or whatever vector > instructions), why bother with compatibility?I don't know what you're referring to here. Can you give an example?
On 21 September 2010 18:03, Bob Wilson <bob.wilson at apple.com> wrote:> Because that is what ARM has specified? They define the vector types that are used with their NEON intrinsics as "containerized vectors". Perhaps someone on the list from ARM can explain why they did it that way.That's ok, but why do you need to do that in the IR? I mean, in the end, the boilerplate will be optimized away and all that's left will be the vector instruction, either compiled or JITed.> As you noted, the struct wrappers produce a lot of extra code but it should all be optimized away. If you see a case where that is not happening, please file a bug report.So far so good, all operations I've tried with Clang are being correctly generated to a load+op+store triple. -- cheers, --renato http://systemcall.org/ Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm
Bob Wilson writes:> On Sep 21, 2010, at 9:33 AM, Renato Golin wrote: > > I was checking NEON instructions this week and the vector types seem > > to be inside structures. If vector types are considered proper types > > in LLVM, why pack them inside structures? > > Because that is what ARM has specified? They define the vector types > that are used with their NEON intrinsics as "containerized vectors". > Perhaps someone on the list from ARM can explain why they did it that > way."Containerized Vector" in the ARM AAPCS refers to fundamental data types (machine types), it's the class of machine types comprising the 64-bit and 128-bit NEON machine types. The AAPCS defines how to pass containerized vectors and it defines how the NEON user types map on to them. Also it defines how to mangle the NEON user types. So it defines how to use NEON user types in a public binary interface. It also says that arm_neon.h "defines a set of internal structures that describe the short vector types" which I guess could be read as saying they are packed inside structures - but I don't think this is the intention and it doesn't match implementations. The arm_neon.h implementation in the ARM compiler defines the user types in terms of C structs called __simd64_int8_t etc. and the mangling originated as an artifact of this. But the C structs aren't wrapped vectors; they wrap double or a pair of doubles, to get the size and alignment. Their only purpose is to be recognized by name by the front end and turned into a native register type. In gcc's arm_neon.h the user types aren't structs at all, they're defined using vector_size and the mangling is done as a special case. So I think there's no need to wrap these types in LLVM. Al -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.