thr3ads.net - llvm dev - [LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together? [Oct 2013]

If this information is useful, please help other people find it:
Share via:

David Nadlinger

2013-Oct-27 14:13 UTC

[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?

The following piece of IR is a fixed point for opt -std-compile-opts/-O3:

---
target datalayout
"e-p:64:64:64-S128-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: nounwind readonly
define i32 @get32Bits(i8* inreg nocapture readonly %x_arg) #0 {
  %tmp1 = getelementptr inbounds i8* %x_arg, i64 3
  %tmp2 = load i8* %tmp1, align 1
  %tmp3 = zext i8 %tmp2 to i32
  %tmp4 = shl nuw nsw i32 %tmp3, 24
  %tmp6 = getelementptr inbounds i8* %x_arg, i64 2
  %tmp7 = load i8* %tmp6, align 1
  %tmp8 = zext i8 %tmp7 to i32
  %tmp9 = shl nuw nsw i32 %tmp8, 16
  %tmp10 = or i32 %tmp9, %tmp4
  %tmp12 = getelementptr inbounds i8* %x_arg, i64 1
  %tmp13 = load i8* %tmp12, align 1
  %tmp14 = zext i8 %tmp13 to i32
  %tmp15 = shl nuw nsw i32 %tmp14, 8
  %tmp16 = or i32 %tmp10, %tmp15
  %tmp19 = load i8* %x_arg, align 4
  %tmp20 = zext i8 %tmp19 to i32
  %tmp21 = or i32 %tmp16, %tmp20
  ret i32 %tmp21
}

attributes #0 = { nounwind readonly }
---

Is there a reason why this can't be optimized down to a single i32
load based on the IR semantics, or is this just a missed optimization
opportunity?

Thanks,
David

James Courtier-Dutton

2013-Oct-28 09:09 UTC

head link

[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?

On Oct 27, 2013 2:16 PM, "David Nadlinger" <code at
klickverbot.at> wrote:>
> The following piece of IR is a fixed point for opt -std-compile-opts/-O3:
>
> ---
> target datalayout >
"e-p:64:64:64-S128-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"> target triple = "x86_64-unknown-linux-gnu"
>
> ; Function Attrs: nounwind readonly
> define i32 @get32Bits(i8* inreg nocapture readonly %x_arg) #0 {
>   %tmp1 = getelementptr inbounds i8* %x_arg, i64 3
>   %tmp2 = load i8* %tmp1, align 1
>   %tmp3 = zext i8 %tmp2 to i32
>   %tmp4 = shl nuw nsw i32 %tmp3, 24
>   %tmp6 = getelementptr inbounds i8* %x_arg, i64 2
>   %tmp7 = load i8* %tmp6, align 1
>   %tmp8 = zext i8 %tmp7 to i32
>   %tmp9 = shl nuw nsw i32 %tmp8, 16
>   %tmp10 = or i32 %tmp9, %tmp4
>   %tmp12 = getelementptr inbounds i8* %x_arg, i64 1
>   %tmp13 = load i8* %tmp12, align 1
>   %tmp14 = zext i8 %tmp13 to i32
>   %tmp15 = shl nuw nsw i32 %tmp14, 8
>   %tmp16 = or i32 %tmp10, %tmp15
>   %tmp19 = load i8* %x_arg, align 4
>   %tmp20 = zext i8 %tmp19 to i32
>   %tmp21 = or i32 %tmp16, %tmp20
>   ret i32 %tmp21
> }
>
> attributes #0 = { nounwind readonly }
> ---
>
> Is there a reason why this can't be optimized down to a single i32
> load based on the IR semantics, or is this just a missed optimization
> opportunity?
>My guess is that this is a missed optimization, but in real life, all
projects i have worked fix this in the C or C++ code using macros that
change what instructions are used based on target platform and its
endedness.
James
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131028/0b9609a0/attachment.html>

David Nadlinger

2013-Oct-29 23:25 UTC

head link

[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?

On Mon, Oct 28, 2013 at 10:09 AM, James Courtier-Dutton
<james.dutton at gmail.com> wrote:> My guess is that this is a missed optimization, but in real life, all
> projects i have worked fix this in the C or C++ code using macros that
> change what instructions are used based on target platform and its
> endedness.
One reason for writing code like this, i.e. explicitly spelling out
the accesses to the individual bytes, would be to allow compile-time
evaluation of the fragment in the D programming language, where
arbitrarily reinterpreting memory is not supported (although
integer->integer pointer casts might be supported at some point).

Would a patch adding the capability to lower this to InstCombine or
similar have a chance of being accepted, or would that be considered
to be too rare a spacial case to be worth the added complexity?

David

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Oct 2013 - [LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?

[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?

[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?

[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?

Apparently Analagous Threads