thr3ads.net - llvm dev - [LLVMdev] Optimizing out redundant alloca involving byval params [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Mircea Trofin

2015-Mar-08 17:02 UTC

[LLVMdev] Optimizing out redundant alloca involving byval params

errata: I am on 3.6 full stop. I *thought* there was a 3.7 available, based
on the title of http://llvm.org/docs/ ("LLVM 3.7 documentation"). I
suppose
the docs are ahead of the release schedule?

On Sun, Mar 8, 2015 at 9:44 AM Mircea Trofin <mtrofin at google.com>
wrote:
> Sorry, that phase is part of the PNaCl toolchain. This would be LLVM 3.6,
> would your comments still apply?
>
> I tried -O3 to no avail. I suppose I'll get llvm 3.7, see if I can
> optimize the latest snippet there (the one avoiding load/store), and see
> from there.
>
> Thanks!
>
> On Fri, Mar 6, 2015 at 12:01 PM Philip Reames <listmail at
philipreames.com>
> wrote:
>
>>
>> On 03/05/2015 06:16 PM, Mircea Trofin wrote:
>>
>>  Thanks!
>>
>>  Philip, do you mean I should transform the original IR to something
like
>> this?
>>
>>
>> Yes.
>>
>> (...which is what -expand-struct-regs can do, when applied to my
original
>> input)
>>
>> Sorry, what?  This doesn't appear to be a pass in ToT.  Are you
using an
>> older version of LLVM?  If so, none of my comments will apply.
>>
>>
>> define void @main(%struct* byval %ptr) {
>>   %val.index = getelementptr %struct* %ptr, i32 0, i32 0
>>   %val.field = load i32* %val.index
>>   %val.index1 = getelementptr %struct* %ptr, i32 0, i32 1
>>   %val.field2 = load i32* %val.index1
>>   %val.ptr = alloca %struct
>>   %val.ptr.index = getelementptr %struct* %val.ptr, i32 0, i32 0
>>   store i32 %val.field, i32* %val.ptr.index
>>   %val.ptr.index4 = getelementptr %struct* %val.ptr, i32 0, i32 1
>>   store i32 %val.field2, i32* %val.ptr.index4
>>   call void @extern_func(%struct* byval %val.ptr)
>>   ret void
>> }
>>
>>  If so, would you mind pointing me to the phase that would reduce this?
>> (I'm assuming that's what you meant by "for free" -
there's an existing
>> phase I could use)
>>
>> I would expect GVN to get this.  If you can run this through a fully
-O3
>> pass order and get the right result, isolating the pass in question
should
>> be easy.
>>
>>
>>  Thank you.
>> Mircea.
>>
>>
>> On Thu, Mar 5, 2015 at 4:39 PM Philip Reames <listmail at
philipreames.com>
>> wrote:
>>
>>>  Reid is right that this would go in memcpyopt, but... we
there's an
>>> active discussion on the commit list which will solve this through
a
>>> different mechanism.  There's an active desire to avoid
teaching GVN and
>>> related pieces (of which memcpyopt is one) about first class
aggregates.
>>> We don't have enough active users of the feature to justify and
maintain
>>> the complexity.
>>>
>>> If you haven't already seen it, this background may help:
>>> http://llvm.org/docs/Frontend/PerformanceTips.html#avoid-
>>> loads-and-stores-of-large-aggregate-type
>>>
>>> The current proposal is to convert such aggregate loads and stores
into
>>> their component pieces.  If that happens, you're example should
come "for
>>> free" provided that the same example works when you break down
the FCA into
>>> it's component pieces.  If it doesn't, please say so.
>>>
>>> Philip
>>>
>>>
>>> On 03/05/2015 04:21 PM, Reid Kleckner wrote:
>>>
>>> I think lib/Transforms/Scalar/MemCpyOptimizer.cpp might be the
right
>>> place for this, considering that most frontends will use memcpy for
that
>>> copy anyway. It already has some logic for byval args.
>>>
>>> On Thu, Mar 5, 2015 at 3:51 PM, Mircea Trofin <mtrofin at
google.com>
>>> wrote:
>>>
>>>> Hello all,
>>>>
>>>>  I'm trying to find the pass that would convert from:
>>>>
>>>>  define void @main(%struct* byval %ptr) {
>>>>   %val = load %struct* %ptr
>>>>   %val.ptr = alloca %struct
>>>>   store %struct %val, %struct* %val.ptr
>>>>   call void @extern_func(%struct* byval %val.ptr)
>>>>   ret void
>>>> }
>>>>
>>>>  to this:
>>>>  define void @main(%struct* byval %ptr) {
>>>>   call void @extern_func(%struct* byval %ptr)
>>>>   ret void
>>>> }
>>>>
>>>>  First, am I missing something - would this be a correct
optimization?
>>>>
>>>>  Thank you,
>>>> Mircea.
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150308/6d2f7db7/attachment.html>

Mircea Trofin

2015-Apr-02 00:32 UTC

head link

[LLVMdev] Optimizing out redundant alloca involving byval params

I dug a bit more. It appears the succession -memcpyopt -instcombine can
convert this:

%struct.Str = type { i32, i32, i32, i32, i32, i32 }

define void @_Z4test3Str(%struct.Str* byval align 8 %s) {

entry:

  %agg.tmp = alloca %struct.Str, align 8

  %0 = bitcast %struct.Str* %agg.tmp to i8*

  %1 = bitcast %struct.Str* %s to i8*

  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 24, i32 4, i1
false)

  call void @_Z6e_test3Str(%struct.Str* byval align 8 %agg.tmp)

  ret void

}

Into this:

define void @_Z4test3Str(%struct.Str* byval align 8 %s) {

entry:

  call void @_Z6e_test3Str(%struct.Str* byval align 8 %s)

  ret void

}


Which is great. This isn't however happening with a GEP and load/store -
based IR (so a total of 6 sets of GEP on %s, load, then GEP on %agg.tmp +
store , like the one discussed earlier in this thread).

I see 2 options:

1) convert the pass I'm working on to produce memcpy instead of load/store
successions, which would allow the resulting IR to fit in the canonical
patterns optimized today, or

2) add support (probably to memcpyopt) for converting load/store
successions into memcpy, then let the current optimizations reduce the
resulting IR.

I'm looking for feedback as to which path to take. Are there known
instances of successive load/store that would benefit from being replaced
with memcpy (option 2)?

Thank you,
Mircea.


On Sun, Mar 8, 2015 at 10:02 AM Mircea Trofin <mtrofin at google.com>
wrote:
> errata: I am on 3.6 full stop. I *thought* there was a 3.7 available,
> based on the title of http://llvm.org/docs/ ("LLVM 3.7
documentation"). I
> suppose the docs are ahead of the release schedule?
>
> On Sun, Mar 8, 2015 at 9:44 AM Mircea Trofin <mtrofin at google.com>
wrote:
>
>> Sorry, that phase is part of the PNaCl toolchain. This would be LLVM
3.6,
>> would your comments still apply?
>>
>> I tried -O3 to no avail. I suppose I'll get llvm 3.7, see if I can
>> optimize the latest snippet there (the one avoiding load/store), and
see
>> from there.
>>
>> Thanks!
>>
>> On Fri, Mar 6, 2015 at 12:01 PM Philip Reames <listmail at
philipreames.com>
>> wrote:
>>
>>>
>>> On 03/05/2015 06:16 PM, Mircea Trofin wrote:
>>>
>>>  Thanks!
>>>
>>>  Philip, do you mean I should transform the original IR to
something
>>> like this?
>>>
>>>
>>> Yes.
>>>
>>> (...which is what -expand-struct-regs can do, when applied to my
>>> original input)
>>>
>>> Sorry, what?  This doesn't appear to be a pass in ToT.  Are you
using an
>>> older version of LLVM?  If so, none of my comments will apply.
>>>
>>>
>>> define void @main(%struct* byval %ptr) {
>>>   %val.index = getelementptr %struct* %ptr, i32 0, i32 0
>>>   %val.field = load i32* %val.index
>>>   %val.index1 = getelementptr %struct* %ptr, i32 0, i32 1
>>>   %val.field2 = load i32* %val.index1
>>>   %val.ptr = alloca %struct
>>>   %val.ptr.index = getelementptr %struct* %val.ptr, i32 0, i32 0
>>>   store i32 %val.field, i32* %val.ptr.index
>>>   %val.ptr.index4 = getelementptr %struct* %val.ptr, i32 0, i32 1
>>>   store i32 %val.field2, i32* %val.ptr.index4
>>>   call void @extern_func(%struct* byval %val.ptr)
>>>   ret void
>>> }
>>>
>>>  If so, would you mind pointing me to the phase that would reduce
this?
>>> (I'm assuming that's what you meant by "for free"
- there's an existing
>>> phase I could use)
>>>
>>> I would expect GVN to get this.  If you can run this through a
fully -O3
>>> pass order and get the right result, isolating the pass in question
should
>>> be easy.
>>>
>>>
>>>  Thank you.
>>> Mircea.
>>>
>>>
>>> On Thu, Mar 5, 2015 at 4:39 PM Philip Reames <listmail at
philipreames.com>
>>> wrote:
>>>
>>>>  Reid is right that this would go in memcpyopt, but... we
there's an
>>>> active discussion on the commit list which will solve this
through a
>>>> different mechanism.  There's an active desire to avoid
teaching GVN and
>>>> related pieces (of which memcpyopt is one) about first class
aggregates.
>>>> We don't have enough active users of the feature to justify
and maintain
>>>> the complexity.
>>>>
>>>> If you haven't already seen it, this background may help:
>>>> http://llvm.org/docs/Frontend/PerformanceTips.html#avoid-loa
>>>> ds-and-stores-of-large-aggregate-type
>>>>
>>>> The current proposal is to convert such aggregate loads and
stores into
>>>> their component pieces.  If that happens, you're example
should come "for
>>>> free" provided that the same example works when you break
down the FCA into
>>>> it's component pieces.  If it doesn't, please say so.
>>>>
>>>> Philip
>>>>
>>>>
>>>> On 03/05/2015 04:21 PM, Reid Kleckner wrote:
>>>>
>>>> I think lib/Transforms/Scalar/MemCpyOptimizer.cpp might be the
right
>>>> place for this, considering that most frontends will use memcpy
for that
>>>> copy anyway. It already has some logic for byval args.
>>>>
>>>> On Thu, Mar 5, 2015 at 3:51 PM, Mircea Trofin <mtrofin at
google.com>
>>>> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>>  I'm trying to find the pass that would convert from:
>>>>>
>>>>>  define void @main(%struct* byval %ptr) {
>>>>>   %val = load %struct* %ptr
>>>>>   %val.ptr = alloca %struct
>>>>>   store %struct %val, %struct* %val.ptr
>>>>>   call void @extern_func(%struct* byval %val.ptr)
>>>>>   ret void
>>>>> }
>>>>>
>>>>>  to this:
>>>>>  define void @main(%struct* byval %ptr) {
>>>>>   call void @extern_func(%struct* byval %ptr)
>>>>>   ret void
>>>>> }
>>>>>
>>>>>  First, am I missing something - would this be a correct
optimization?
>>>>>
>>>>>  Thank you,
>>>>> Mircea.
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>>
>>>>
>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150401/c366aedd/attachment.html>

Reid Kleckner

2015-Apr-02 16:21 UTC

head link

[LLVMdev] Optimizing out redundant alloca involving byval params

On Wed, Apr 1, 2015 at 5:32 PM, Mircea Trofin <mtrofin at google.com>
wrote:
> I dug a bit more. It appears the succession -memcpyopt -instcombine can
> convert this:
>
> %struct.Str = type { i32, i32, i32, i32, i32, i32 }
>
> define void @_Z4test3Str(%struct.Str* byval align 8 %s) {
>
> entry:
>
>   %agg.tmp = alloca %struct.Str, align 8
>
>   %0 = bitcast %struct.Str* %agg.tmp to i8*
>
>   %1 = bitcast %struct.Str* %s to i8*
>
>   call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 24, i32 4, i1
> false)
>
>   call void @_Z6e_test3Str(%struct.Str* byval align 8 %agg.tmp)
>
>   ret void
>
> }
>
> Into this:
>
> define void @_Z4test3Str(%struct.Str* byval align 8 %s) {
>
> entry:
>
>   call void @_Z6e_test3Str(%struct.Str* byval align 8 %s)
>
>   ret void
>
> }
>
>
> Which is great. This isn't however happening with a GEP and load/store
-
> based IR (so a total of 6 sets of GEP on %s, load, then GEP on %agg.tmp +
> store , like the one discussed earlier in this thread).
>
> I see 2 options:
>
> 1) convert the pass I'm working on to produce memcpy instead of
load/store
> successions, which would allow the resulting IR to fit in the canonical
> patterns optimized today, or
>I'd say that if you are copying an object and it requires more than 2 loads
and stores, use memcpy. This is what Clang does for aggregate copies when
there is no copy ctor.
> 2) add support (probably to memcpyopt) for converting load/store
> successions into memcpy, then let the current optimizations reduce the
> resulting IR.
>We should do this as a separate pass (I thought we did?), but it's hard to
do when there is interior padding in the struct. It's hard to know if the
interior padding of the destination needs to retain the data that was
originally there.

> I'm looking for feedback as to which path to take. Are there known
> instances of successive load/store that would benefit from being replaced
> with memcpy (option 2)?
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150402/b73b4a38/attachment.html>

llvm dev - Apr 2015 - [LLVMdev] Optimizing out redundant alloca involving byval params

[LLVMdev] Optimizing out redundant alloca involving byval params

[LLVMdev] Optimizing out redundant alloca involving byval params

[LLVMdev] Optimizing out redundant alloca involving byval params