On 21 September 2010 18:03, Bob Wilson <bob.wilson at apple.com> wrote:> Because that is what ARM has specified? They define the vector types that are used with their NEON intrinsics as "containerized vectors". Perhaps someone on the list from ARM can explain why they did it that way.That's ok, but why do you need to do that in the IR? I mean, in the end, the boilerplate will be optimized away and all that's left will be the vector instruction, either compiled or JITed.> As you noted, the struct wrappers produce a lot of extra code but it should all be optimized away. If you see a case where that is not happening, please file a bug report.So far so good, all operations I've tried with Clang are being correctly generated to a load+op+store triple. -- cheers, --renato http://systemcall.org/ Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm
On Sep 21, 2010, at 10:14 AM, Renato Golin wrote:> On 21 September 2010 18:03, Bob Wilson <bob.wilson at apple.com> wrote: >> Because that is what ARM has specified? They define the vector types that are used with their NEON intrinsics as "containerized vectors". Perhaps someone on the list from ARM can explain why they did it that way. > > That's ok, but why do you need to do that in the IR? I mean, in the > end, the boilerplate will be optimized away and all that's left will > be the vector instruction, either compiled or JITed.The intrinsics are defined as ordinary C functions in <arm_neon.h>. They use the containerized vector types. So, you've got C code using structures, and at some point we want to remove those structures and expose the underlying vector types. We rely on llvm's SROA optimizations to do that. If you're suggesting that the front-end should optimize away the structures before even generating the llvm IR, that is definitely possible. It would require more code in the front-end. As long as SROA succeeds in optimizing away the cruft, why does it matter? I suppose there might be some effect on compile-time, but I'd be surprised if it is significant.
On 21 September 2010 18:24, Bob Wilson <bob.wilson at apple.com> wrote:> The intrinsics are defined as ordinary C functions in <arm_neon.h>. They use the containerized vector types. So, you've got C code using structures, and at some point we want to remove those structures and expose the underlying vector types. We rely on llvm's SROA optimizations to do that. If you're suggesting that the front-end should optimize away the structures before even generating the llvm IR, that is definitely possible. It would require more code in the front-end. As long as SROA succeeds in optimizing away the cruft, why does it matter? I suppose there might be some effect on compile-time, but I'd be surprised if it is significant.I see your point, and I'm not concerned with compilation time. All that code is reused by casting structures to vectors or something like that and gets optimized away automatically. However, Clang is already doing a lot of work in the front-end, since the operations are correct (adds, intinsics) where in arm_neon.h the function calls are transformed into a series of similar functions with slightly different parameters. It means that Clang is, at least, recognizing the correct functions and transforming into the appropriate instructions. Why not go a step further and minimize what needs optimizing in the back-end? But as you said, in GCC compatibility mode it's pure vector, so it's good enough. And I agree that it's not necessary, I was just curious... Thanks! ;) -- cheers, --renato http://systemcall.org/ Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm