thr3ads.net - llvm dev - [llvm-dev] Optimization of successive constant stores [Dec 2015]

If this information is useful, please help other people find it:
Share via:

David Jones via llvm-dev

2015-Dec-11 17:07 UTC

[llvm-dev] Optimization of successive constant stores

Hmm... found an interesting issue:

Given:

    %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
    store i8 1, i8* %2, align 8
    %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
    store i8 2, i8* %3, align 1
    %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
    store i8 3, i8* %4, align 2
    %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
    store i8 4, i8* %5, align 1
    ret void

llc generates:

        movb    $1, (%rdi)
        movb    $2, 1(%rdi)
        movb    $3, 2(%rdi)
        movb    $4, 3(%rdi)
        retq

But given:

    %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
    store i8 1, i8* %2, align 1
    %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
    store i8 2, i8* %3, align 1
    %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
    store i8 3, i8* %4, align 1
    %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
    store i8 4, i8* %5, align 1
    ret void

We get:

        movl    $67305985, (%rdi)       # imm = 0x4030201
        retq

Also interesting:

define void @test(i8*) {
    %2 = getelementptr inbounds i8* %0, i32 0
    store i8 1, i8* %2, align 1
    %3 = getelementptr inbounds i8* %0, i32 1
    store i8 2, i8* %3, align 1
    %4 = getelementptr inbounds i8* %0, i32 2
    store i8 3, i8* %4, align 1
    %5 = getelementptr inbounds i8* %0, i32 3
    store i8 4, i8* %5, align 1
    ret void

This code also results in a single store. However, there is no guarantee
that the input pointer is 32-bit aligned. x86_64 tolerates this, but the
ABI mandates aligned loads and stores. My previous example guaranteed
alignment, because I started with a pointer to a structure containing a
member having 8-byte alignment, therefore we could infer the alignment of
some of the stores.

And checking the code in DAGCombiner.cpp indeed shows that all combined
instructions have to have the same alignment. Why?


On Fri, Dec 11, 2015 at 11:37 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> Hi David,
>
> We generally handle this (early) in the backend where we have more
> information about target capabilities and costs. See MergeConsecutiveStores
> in lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>
>  -Hal
>
> ----- Original Message -----
> > From: "David Jones via llvm-dev" <llvm-dev at
lists.llvm.org>
> > To: llvm-dev at lists.llvm.org
> > Sent: Friday, December 11, 2015 10:32:50 AM
> > Subject: [llvm-dev] Optimization of successive constant stores
> >
> > Consider the following:
> >
> > t arget datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> > target triple = "x86_64-unknown-linux-gnu"
> >
> > %UodStructType = type { i8, i8, i8, i8, i32, i8* }
> >
> > define void @test(%UodStructType*) {
> > %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
> > store i8 1, i8* %2, align 8
> > %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
> > store i8 2, i8* %3, align 1
> > %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
> > store i8 3, i8* %4, align 2
> > %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
> > store i8 4, i8* %5, align 1
> > ret void
> > }
> >
> > If I run this through opt -O3, it passes through unchanged.
> >
> > However, I would think that it would be profitable to combine the
> > stores into a single instruction, e.g.:
> >
> > define void @test(%UodStructType*) {
> > %2 = bitcast %UodStructType* %0 to i32*
> > store i32 0x04030201, i32* %2, align 8
> >
> >
> > ret void
> > }
> >
> >
> > I don't see any optimization that would do this.
> >
> > Interestingly, if I store the same 8-bit constant in all four bytes,
> > then MemCpyOpt will indeed convert this to a 32-bit store.
> >
> >
> > Am I doing something wrong, or is there really no optimization pass
> > that can clean this up?
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151211/cedd512a/attachment.html>

Hal Finkel via llvm-dev

2015-Dec-11 17:13 UTC

head link

[llvm-dev] Optimization of successive constant stores

----- Original Message -----> From: "David Jones" <djones at xtreme-eda.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvm-dev at lists.llvm.org
> Sent: Friday, December 11, 2015 11:07:28 AM
> Subject: Re: [llvm-dev] Optimization of successive constant stores
> 
> Hmm... found an interesting issue:
> 
> Given:
> 
> %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
> store i8 1, i8* %2, align 8
> %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
> store i8 2, i8* %3, align 1
> %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
> store i8 3, i8* %4, align 2
> %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
> store i8 4, i8* %5, align 1
> ret void
> 
> llc generates:
> 
> movb $1, (%rdi)
> movb $2, 1(%rdi)
> movb $3, 2(%rdi)
> movb $4, 3(%rdi)
> retq
> 
> But given:
> 
> %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
> store i8 1, i8* %2, align 1
> %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
> store i8 2, i8* %3, align 1
> %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
> store i8 3, i8* %4, align 1
> %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
> store i8 4, i8* %5, align 1
> ret void
> 
> We get:
> 
> movl $67305985, (%rdi) # imm = 0x4030201
> retq
> 
> 
> Also interesting:
> 
> define void @test(i8*) {
> %2 = getelementptr inbounds i8* %0, i32 0
> store i8 1, i8* %2, align 1
> %3 = getelementptr inbounds i8* %0, i32 1
> store i8 2, i8* %3, align 1
> %4 = getelementptr inbounds i8* %0, i32 2
> store i8 3, i8* %4, align 1
> %5 = getelementptr inbounds i8* %0, i32 3
> store i8 4, i8* %5, align 1
> ret void
> 
> 
> This code also results in a single store. However, there is no
> guarantee that the input pointer is 32-bit aligned. x86_64 tolerates
> this, but the ABI mandates aligned loads and stores. My previous
> example guaranteed alignment, because I started with a pointer to a
> structure containing a member having 8-byte alignment, therefore we
> could infer the alignment of some of the stores.
> 
> And checking the code in DAGCombiner.cpp indeed shows that all
> combined instructions have to have the same alignment. Why?
> 
> 
Seems like a bug. Overalignment should be pretty safe to ignore. Can you either
submit a patch or file a bug report?

Thanks,
Hal
> 
> 
> On Fri, Dec 11, 2015 at 11:37 AM, Hal Finkel < hfinkel at anl.gov >
> wrote:
> 
> 
> Hi David,
> 
> We generally handle this (early) in the backend where we have more
> information about target capabilities and costs. See
> MergeConsecutiveStores in lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> 
> -Hal
> 
> ----- Original Message -----
> > From: "David Jones via llvm-dev" < llvm-dev at
lists.llvm.org >
> > To: llvm-dev at lists.llvm.org
> > Sent: Friday, December 11, 2015 10:32:50 AM
> > Subject: [llvm-dev] Optimization of successive constant stores
> > 
> > Consider the following:
> > 
> > t arget datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> > target triple = "x86_64-unknown-linux-gnu"
> > 
> > %UodStructType = type { i8, i8, i8, i8, i32, i8* }
> > 
> > define void @test(%UodStructType*) {
> > %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
> > store i8 1, i8* %2, align 8
> > %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
> > store i8 2, i8* %3, align 1
> > %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
> > store i8 3, i8* %4, align 2
> > %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
> > store i8 4, i8* %5, align 1
> > ret void
> > }
> > 
> > If I run this through opt -O3, it passes through unchanged.
> > 
> > However, I would think that it would be profitable to combine the
> > stores into a single instruction, e.g.:
> > 
> > define void @test(%UodStructType*) {
> > %2 = bitcast %UodStructType* %0 to i32*
> > store i32 0x04030201, i32* %2, align 8
> > 
> > 
> > ret void
> > }
> > 
> > 
> > I don't see any optimization that would do this.
> > 
> > Interestingly, if I store the same 8-bit constant in all four
> > bytes,
> > then MemCpyOpt will indeed convert this to a 32-bit store.
> > 
> > 
> > Am I doing something wrong, or is there really no optimization pass
> > that can clean this up?
> > 
> > 
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

James Y Knight via llvm-dev

2015-Dec-11 19:07 UTC

head link

[llvm-dev] Optimization of successive constant stores

That was a bug, but I fixed it already a while ago. Looks like you're using
an old version of llvm.

On Fri, Dec 11, 2015 at 12:13 PM, Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> ----- Original Message -----
> > From: "David Jones" <djones at xtreme-eda.com>
> > To: "Hal Finkel" <hfinkel at anl.gov>
> > Cc: llvm-dev at lists.llvm.org
> > Sent: Friday, December 11, 2015 11:07:28 AM
> > Subject: Re: [llvm-dev] Optimization of successive constant stores
> >
> > Hmm... found an interesting issue:
> >
> > Given:
> >
> > %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
> > store i8 1, i8* %2, align 8
> > %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
> > store i8 2, i8* %3, align 1
> > %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
> > store i8 3, i8* %4, align 2
> > %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
> > store i8 4, i8* %5, align 1
> > ret void
> >
> > llc generates:
> >
> > movb $1, (%rdi)
> > movb $2, 1(%rdi)
> > movb $3, 2(%rdi)
> > movb $4, 3(%rdi)
> > retq
> >
> > But given:
> >
> > %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
> > store i8 1, i8* %2, align 1
> > %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
> > store i8 2, i8* %3, align 1
> > %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
> > store i8 3, i8* %4, align 1
> > %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
> > store i8 4, i8* %5, align 1
> > ret void
> >
> > We get:
> >
> > movl $67305985, (%rdi) # imm = 0x4030201
> > retq
> >
> >
> > Also interesting:
> >
> > define void @test(i8*) {
> > %2 = getelementptr inbounds i8* %0, i32 0
> > store i8 1, i8* %2, align 1
> > %3 = getelementptr inbounds i8* %0, i32 1
> > store i8 2, i8* %3, align 1
> > %4 = getelementptr inbounds i8* %0, i32 2
> > store i8 3, i8* %4, align 1
> > %5 = getelementptr inbounds i8* %0, i32 3
> > store i8 4, i8* %5, align 1
> > ret void
> >
> >
> > This code also results in a single store. However, there is no
> > guarantee that the input pointer is 32-bit aligned. x86_64 tolerates
> > this, but the ABI mandates aligned loads and stores. My previous
> > example guaranteed alignment, because I started with a pointer to a
> > structure containing a member having 8-byte alignment, therefore we
> > could infer the alignment of some of the stores.
> >
> > And checking the code in DAGCombiner.cpp indeed shows that all
> > combined instructions have to have the same alignment. Why?
> >
> >
>
> Seems like a bug. Overalignment should be pretty safe to ignore. Can you
> either submit a patch or file a bug report?
>
> Thanks,
> Hal
>
> >
> >
> > On Fri, Dec 11, 2015 at 11:37 AM, Hal Finkel < hfinkel at anl.gov
>
> > wrote:
> >
> >
> > Hi David,
> >
> > We generally handle this (early) in the backend where we have more
> > information about target capabilities and costs. See
> > MergeConsecutiveStores in lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> >
> > -Hal
> >
> > ----- Original Message -----
> > > From: "David Jones via llvm-dev" < llvm-dev at
lists.llvm.org >
> > > To: llvm-dev at lists.llvm.org
> > > Sent: Friday, December 11, 2015 10:32:50 AM
> > > Subject: [llvm-dev] Optimization of successive constant stores
> > >
> > > Consider the following:
> > >
> > > t arget datalayout =
"e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> > > target triple = "x86_64-unknown-linux-gnu"
> > >
> > > %UodStructType = type { i8, i8, i8, i8, i32, i8* }
> > >
> > > define void @test(%UodStructType*) {
> > > %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
> > > store i8 1, i8* %2, align 8
> > > %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
> > > store i8 2, i8* %3, align 1
> > > %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
> > > store i8 3, i8* %4, align 2
> > > %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
> > > store i8 4, i8* %5, align 1
> > > ret void
> > > }
> > >
> > > If I run this through opt -O3, it passes through unchanged.
> > >
> > > However, I would think that it would be profitable to combine the
> > > stores into a single instruction, e.g.:
> > >
> > > define void @test(%UodStructType*) {
> > > %2 = bitcast %UodStructType* %0 to i32*
> > > store i32 0x04030201, i32* %2, align 8
> > >
> > >
> > > ret void
> > > }
> > >
> > >
> > > I don't see any optimization that would do this.
> > >
> > > Interestingly, if I store the same 8-bit constant in all four
> > > bytes,
> > > then MemCpyOpt will indeed convert this to a 32-bit store.
> > >
> > >
> > > Am I doing something wrong, or is there really no optimization
pass
> > > that can clean this up?
> > >
> > >
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > llvm-dev at lists.llvm.org
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >
> >
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> >
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151211/da175445/attachment-0001.html>

llvm dev - Dec 2015 - Optimization of successive constant stores

[llvm-dev] Optimization of successive constant stores

[llvm-dev] Optimization of successive constant stores

[llvm-dev] Optimization of successive constant stores