Eric Hein via llvm-dev
2019-Jul-02 14:55 UTC
[llvm-dev] Optimizing pass-by-value structs for le64 target
Consider the following small example: struct wrapper { long value; }; long read_wrapper(wrapper w) { return w.value; } long read_primitive(long x) { return x; } When compiling for x86 at -O1, both functions reduce nicely to a single IR instruction. Looks like -sroa is performing this transformation, but even at -O0 it has deduced that the argument is really just an i64. Before -sroa: define dso_local i64 @_Z12read_wrapper7wrapper(i64) #0 { %2 = alloca %struct.wrapper, align 8 %3 = getelementptr inbounds %struct.wrapper, %struct.wrapper* %2, i32 0, i32 0 store i64 %0, i64* %3, align 8 %4 = getelementptr inbounds %struct.wrapper, %struct.wrapper* %2, i32 0, i32 0 %5 = load i64, i64* %4, align 8 ret i64 %5 } After -sroa: define dso_local i64 @_Z12read_wrapper7wrapper(i64 returned) local_unnamed_addr #0 { ret i64 %0 } But when I add -target le64, the read_wrapper function accepts a %struct.wrapper* byval, a pointer to the caller's stack. No level of optimization is able to make this function look as simple as read_primitive. define dso_local i64 @_Z12read_wrapper7wrapper(%struct.wrapper* byval nocapture readonly align 8) local_unnamed_addr #0 { %2 = getelementptr inbounds %struct.wrapper, %struct.wrapper* %0, i64 0, i32 0 %3 = load i64, i64* %2, align 8, !tbaa !2 ret i64 %3 } We're writing our own LLVM backend for a new architecture, we started with the generic little-endian 64-bit target (le64) and made customizations from there. What needs to be done to re-enable this optimization for our target? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190702/2aad1a63/attachment-0001.html>
Tim Northover via llvm-dev
2019-Jul-04 08:50 UTC
[llvm-dev] Optimizing pass-by-value structs for le64 target
Hi Eric, On Thu, 4 Jul 2019 at 00:10, Eric Hein via llvm-dev <llvm-dev at lists.llvm.org> wrote:> We're writing our own LLVM backend for a new architecture, we started with the generic little-endian 64-bit target (le64) and made customizations from there. What needs to be done to re-enable this optimization for our target?This particular detail is down to ABI handling in lib/CodeGen/TargetInfo.cpp. There each target has code to decide how a given C or C++ type gets mapped to LLVM IR at function call boundaries. The main purpose is of course to follow an externally specified ABI, but as you've discovered there's certain leeway for performance gains if multiple options are equally valid. Cheers. Tim.