David Nadlinger
2013-Oct-27 14:13 UTC
[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?
The following piece of IR is a fixed point for opt -std-compile-opts/-O3: --- target datalayout "e-p:64:64:64-S128-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: nounwind readonly define i32 @get32Bits(i8* inreg nocapture readonly %x_arg) #0 { %tmp1 = getelementptr inbounds i8* %x_arg, i64 3 %tmp2 = load i8* %tmp1, align 1 %tmp3 = zext i8 %tmp2 to i32 %tmp4 = shl nuw nsw i32 %tmp3, 24 %tmp6 = getelementptr inbounds i8* %x_arg, i64 2 %tmp7 = load i8* %tmp6, align 1 %tmp8 = zext i8 %tmp7 to i32 %tmp9 = shl nuw nsw i32 %tmp8, 16 %tmp10 = or i32 %tmp9, %tmp4 %tmp12 = getelementptr inbounds i8* %x_arg, i64 1 %tmp13 = load i8* %tmp12, align 1 %tmp14 = zext i8 %tmp13 to i32 %tmp15 = shl nuw nsw i32 %tmp14, 8 %tmp16 = or i32 %tmp10, %tmp15 %tmp19 = load i8* %x_arg, align 4 %tmp20 = zext i8 %tmp19 to i32 %tmp21 = or i32 %tmp16, %tmp20 ret i32 %tmp21 } attributes #0 = { nounwind readonly } --- Is there a reason why this can't be optimized down to a single i32 load based on the IR semantics, or is this just a missed optimization opportunity? Thanks, David
James Courtier-Dutton
2013-Oct-28 09:09 UTC
[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?
On Oct 27, 2013 2:16 PM, "David Nadlinger" <code at klickverbot.at> wrote:> > The following piece of IR is a fixed point for opt -std-compile-opts/-O3: > > --- > target datalayout >"e-p:64:64:64-S128-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"> target triple = "x86_64-unknown-linux-gnu" > > ; Function Attrs: nounwind readonly > define i32 @get32Bits(i8* inreg nocapture readonly %x_arg) #0 { > %tmp1 = getelementptr inbounds i8* %x_arg, i64 3 > %tmp2 = load i8* %tmp1, align 1 > %tmp3 = zext i8 %tmp2 to i32 > %tmp4 = shl nuw nsw i32 %tmp3, 24 > %tmp6 = getelementptr inbounds i8* %x_arg, i64 2 > %tmp7 = load i8* %tmp6, align 1 > %tmp8 = zext i8 %tmp7 to i32 > %tmp9 = shl nuw nsw i32 %tmp8, 16 > %tmp10 = or i32 %tmp9, %tmp4 > %tmp12 = getelementptr inbounds i8* %x_arg, i64 1 > %tmp13 = load i8* %tmp12, align 1 > %tmp14 = zext i8 %tmp13 to i32 > %tmp15 = shl nuw nsw i32 %tmp14, 8 > %tmp16 = or i32 %tmp10, %tmp15 > %tmp19 = load i8* %x_arg, align 4 > %tmp20 = zext i8 %tmp19 to i32 > %tmp21 = or i32 %tmp16, %tmp20 > ret i32 %tmp21 > } > > attributes #0 = { nounwind readonly } > --- > > Is there a reason why this can't be optimized down to a single i32 > load based on the IR semantics, or is this just a missed optimization > opportunity? >My guess is that this is a missed optimization, but in real life, all projects i have worked fix this in the C or C++ code using macros that change what instructions are used based on target platform and its endedness. James -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131028/0b9609a0/attachment.html>
David Nadlinger
2013-Oct-29 23:25 UTC
[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?
On Mon, Oct 28, 2013 at 10:09 AM, James Courtier-Dutton <james.dutton at gmail.com> wrote:> My guess is that this is a missed optimization, but in real life, all > projects i have worked fix this in the C or C++ code using macros that > change what instructions are used based on target platform and its > endedness.One reason for writing code like this, i.e. explicitly spelling out the accesses to the individual bytes, would be to allow compile-time evaluation of the fragment in the D programming language, where arbitrarily reinterpreting memory is not supported (although integer->integer pointer casts might be supported at some point). Would a patch adding the capability to lower this to InstCombine or similar have a chance of being accepted, or would that be considered to be too rare a spacial case to be worth the added complexity? David