Mircea Trofin
2015-Mar-08 17:02 UTC
[LLVMdev] Optimizing out redundant alloca involving byval params
errata: I am on 3.6 full stop. I *thought* there was a 3.7 available, based on the title of http://llvm.org/docs/ ("LLVM 3.7 documentation"). I suppose the docs are ahead of the release schedule? On Sun, Mar 8, 2015 at 9:44 AM Mircea Trofin <mtrofin at google.com> wrote:> Sorry, that phase is part of the PNaCl toolchain. This would be LLVM 3.6, > would your comments still apply? > > I tried -O3 to no avail. I suppose I'll get llvm 3.7, see if I can > optimize the latest snippet there (the one avoiding load/store), and see > from there. > > Thanks! > > On Fri, Mar 6, 2015 at 12:01 PM Philip Reames <listmail at philipreames.com> > wrote: > >> >> On 03/05/2015 06:16 PM, Mircea Trofin wrote: >> >> Thanks! >> >> Philip, do you mean I should transform the original IR to something like >> this? >> >> >> Yes. >> >> (...which is what -expand-struct-regs can do, when applied to my original >> input) >> >> Sorry, what? This doesn't appear to be a pass in ToT. Are you using an >> older version of LLVM? If so, none of my comments will apply. >> >> >> define void @main(%struct* byval %ptr) { >> %val.index = getelementptr %struct* %ptr, i32 0, i32 0 >> %val.field = load i32* %val.index >> %val.index1 = getelementptr %struct* %ptr, i32 0, i32 1 >> %val.field2 = load i32* %val.index1 >> %val.ptr = alloca %struct >> %val.ptr.index = getelementptr %struct* %val.ptr, i32 0, i32 0 >> store i32 %val.field, i32* %val.ptr.index >> %val.ptr.index4 = getelementptr %struct* %val.ptr, i32 0, i32 1 >> store i32 %val.field2, i32* %val.ptr.index4 >> call void @extern_func(%struct* byval %val.ptr) >> ret void >> } >> >> If so, would you mind pointing me to the phase that would reduce this? >> (I'm assuming that's what you meant by "for free" - there's an existing >> phase I could use) >> >> I would expect GVN to get this. If you can run this through a fully -O3 >> pass order and get the right result, isolating the pass in question should >> be easy. >> >> >> Thank you. >> Mircea. >> >> >> On Thu, Mar 5, 2015 at 4:39 PM Philip Reames <listmail at philipreames.com> >> wrote: >> >>> Reid is right that this would go in memcpyopt, but... we there's an >>> active discussion on the commit list which will solve this through a >>> different mechanism. There's an active desire to avoid teaching GVN and >>> related pieces (of which memcpyopt is one) about first class aggregates. >>> We don't have enough active users of the feature to justify and maintain >>> the complexity. >>> >>> If you haven't already seen it, this background may help: >>> http://llvm.org/docs/Frontend/PerformanceTips.html#avoid- >>> loads-and-stores-of-large-aggregate-type >>> >>> The current proposal is to convert such aggregate loads and stores into >>> their component pieces. If that happens, you're example should come "for >>> free" provided that the same example works when you break down the FCA into >>> it's component pieces. If it doesn't, please say so. >>> >>> Philip >>> >>> >>> On 03/05/2015 04:21 PM, Reid Kleckner wrote: >>> >>> I think lib/Transforms/Scalar/MemCpyOptimizer.cpp might be the right >>> place for this, considering that most frontends will use memcpy for that >>> copy anyway. It already has some logic for byval args. >>> >>> On Thu, Mar 5, 2015 at 3:51 PM, Mircea Trofin <mtrofin at google.com> >>> wrote: >>> >>>> Hello all, >>>> >>>> I'm trying to find the pass that would convert from: >>>> >>>> define void @main(%struct* byval %ptr) { >>>> %val = load %struct* %ptr >>>> %val.ptr = alloca %struct >>>> store %struct %val, %struct* %val.ptr >>>> call void @extern_func(%struct* byval %val.ptr) >>>> ret void >>>> } >>>> >>>> to this: >>>> define void @main(%struct* byval %ptr) { >>>> call void @extern_func(%struct* byval %ptr) >>>> ret void >>>> } >>>> >>>> First, am I missing something - would this be a correct optimization? >>>> >>>> Thank you, >>>> Mircea. >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150308/6d2f7db7/attachment.html>
Mircea Trofin
2015-Apr-02 00:32 UTC
[LLVMdev] Optimizing out redundant alloca involving byval params
I dug a bit more. It appears the succession -memcpyopt -instcombine can convert this: %struct.Str = type { i32, i32, i32, i32, i32, i32 } define void @_Z4test3Str(%struct.Str* byval align 8 %s) { entry: %agg.tmp = alloca %struct.Str, align 8 %0 = bitcast %struct.Str* %agg.tmp to i8* %1 = bitcast %struct.Str* %s to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 24, i32 4, i1 false) call void @_Z6e_test3Str(%struct.Str* byval align 8 %agg.tmp) ret void } Into this: define void @_Z4test3Str(%struct.Str* byval align 8 %s) { entry: call void @_Z6e_test3Str(%struct.Str* byval align 8 %s) ret void } Which is great. This isn't however happening with a GEP and load/store - based IR (so a total of 6 sets of GEP on %s, load, then GEP on %agg.tmp + store , like the one discussed earlier in this thread). I see 2 options: 1) convert the pass I'm working on to produce memcpy instead of load/store successions, which would allow the resulting IR to fit in the canonical patterns optimized today, or 2) add support (probably to memcpyopt) for converting load/store successions into memcpy, then let the current optimizations reduce the resulting IR. I'm looking for feedback as to which path to take. Are there known instances of successive load/store that would benefit from being replaced with memcpy (option 2)? Thank you, Mircea. On Sun, Mar 8, 2015 at 10:02 AM Mircea Trofin <mtrofin at google.com> wrote:> errata: I am on 3.6 full stop. I *thought* there was a 3.7 available, > based on the title of http://llvm.org/docs/ ("LLVM 3.7 documentation"). I > suppose the docs are ahead of the release schedule? > > On Sun, Mar 8, 2015 at 9:44 AM Mircea Trofin <mtrofin at google.com> wrote: > >> Sorry, that phase is part of the PNaCl toolchain. This would be LLVM 3.6, >> would your comments still apply? >> >> I tried -O3 to no avail. I suppose I'll get llvm 3.7, see if I can >> optimize the latest snippet there (the one avoiding load/store), and see >> from there. >> >> Thanks! >> >> On Fri, Mar 6, 2015 at 12:01 PM Philip Reames <listmail at philipreames.com> >> wrote: >> >>> >>> On 03/05/2015 06:16 PM, Mircea Trofin wrote: >>> >>> Thanks! >>> >>> Philip, do you mean I should transform the original IR to something >>> like this? >>> >>> >>> Yes. >>> >>> (...which is what -expand-struct-regs can do, when applied to my >>> original input) >>> >>> Sorry, what? This doesn't appear to be a pass in ToT. Are you using an >>> older version of LLVM? If so, none of my comments will apply. >>> >>> >>> define void @main(%struct* byval %ptr) { >>> %val.index = getelementptr %struct* %ptr, i32 0, i32 0 >>> %val.field = load i32* %val.index >>> %val.index1 = getelementptr %struct* %ptr, i32 0, i32 1 >>> %val.field2 = load i32* %val.index1 >>> %val.ptr = alloca %struct >>> %val.ptr.index = getelementptr %struct* %val.ptr, i32 0, i32 0 >>> store i32 %val.field, i32* %val.ptr.index >>> %val.ptr.index4 = getelementptr %struct* %val.ptr, i32 0, i32 1 >>> store i32 %val.field2, i32* %val.ptr.index4 >>> call void @extern_func(%struct* byval %val.ptr) >>> ret void >>> } >>> >>> If so, would you mind pointing me to the phase that would reduce this? >>> (I'm assuming that's what you meant by "for free" - there's an existing >>> phase I could use) >>> >>> I would expect GVN to get this. If you can run this through a fully -O3 >>> pass order and get the right result, isolating the pass in question should >>> be easy. >>> >>> >>> Thank you. >>> Mircea. >>> >>> >>> On Thu, Mar 5, 2015 at 4:39 PM Philip Reames <listmail at philipreames.com> >>> wrote: >>> >>>> Reid is right that this would go in memcpyopt, but... we there's an >>>> active discussion on the commit list which will solve this through a >>>> different mechanism. There's an active desire to avoid teaching GVN and >>>> related pieces (of which memcpyopt is one) about first class aggregates. >>>> We don't have enough active users of the feature to justify and maintain >>>> the complexity. >>>> >>>> If you haven't already seen it, this background may help: >>>> http://llvm.org/docs/Frontend/PerformanceTips.html#avoid-loa >>>> ds-and-stores-of-large-aggregate-type >>>> >>>> The current proposal is to convert such aggregate loads and stores into >>>> their component pieces. If that happens, you're example should come "for >>>> free" provided that the same example works when you break down the FCA into >>>> it's component pieces. If it doesn't, please say so. >>>> >>>> Philip >>>> >>>> >>>> On 03/05/2015 04:21 PM, Reid Kleckner wrote: >>>> >>>> I think lib/Transforms/Scalar/MemCpyOptimizer.cpp might be the right >>>> place for this, considering that most frontends will use memcpy for that >>>> copy anyway. It already has some logic for byval args. >>>> >>>> On Thu, Mar 5, 2015 at 3:51 PM, Mircea Trofin <mtrofin at google.com> >>>> wrote: >>>> >>>>> Hello all, >>>>> >>>>> I'm trying to find the pass that would convert from: >>>>> >>>>> define void @main(%struct* byval %ptr) { >>>>> %val = load %struct* %ptr >>>>> %val.ptr = alloca %struct >>>>> store %struct %val, %struct* %val.ptr >>>>> call void @extern_func(%struct* byval %val.ptr) >>>>> ret void >>>>> } >>>>> >>>>> to this: >>>>> define void @main(%struct* byval %ptr) { >>>>> call void @extern_func(%struct* byval %ptr) >>>>> ret void >>>>> } >>>>> >>>>> First, am I missing something - would this be a correct optimization? >>>>> >>>>> Thank you, >>>>> Mircea. >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>>> >>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150401/c366aedd/attachment.html>
Reid Kleckner
2015-Apr-02 16:21 UTC
[LLVMdev] Optimizing out redundant alloca involving byval params
On Wed, Apr 1, 2015 at 5:32 PM, Mircea Trofin <mtrofin at google.com> wrote:> I dug a bit more. It appears the succession -memcpyopt -instcombine can > convert this: > > %struct.Str = type { i32, i32, i32, i32, i32, i32 } > > define void @_Z4test3Str(%struct.Str* byval align 8 %s) { > > entry: > > %agg.tmp = alloca %struct.Str, align 8 > > %0 = bitcast %struct.Str* %agg.tmp to i8* > > %1 = bitcast %struct.Str* %s to i8* > > call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 24, i32 4, i1 > false) > > call void @_Z6e_test3Str(%struct.Str* byval align 8 %agg.tmp) > > ret void > > } > > Into this: > > define void @_Z4test3Str(%struct.Str* byval align 8 %s) { > > entry: > > call void @_Z6e_test3Str(%struct.Str* byval align 8 %s) > > ret void > > } > > > Which is great. This isn't however happening with a GEP and load/store - > based IR (so a total of 6 sets of GEP on %s, load, then GEP on %agg.tmp + > store , like the one discussed earlier in this thread). > > I see 2 options: > > 1) convert the pass I'm working on to produce memcpy instead of load/store > successions, which would allow the resulting IR to fit in the canonical > patterns optimized today, or >I'd say that if you are copying an object and it requires more than 2 loads and stores, use memcpy. This is what Clang does for aggregate copies when there is no copy ctor.> 2) add support (probably to memcpyopt) for converting load/store > successions into memcpy, then let the current optimizations reduce the > resulting IR. >We should do this as a separate pass (I thought we did?), but it's hard to do when there is interior padding in the struct. It's hard to know if the interior padding of the destination needs to retain the data that was originally there.> I'm looking for feedback as to which path to take. Are there known > instances of successive load/store that would benefit from being replaced > with memcpy (option 2)? >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150402/b73b4a38/attachment.html>