Amara Emerson via llvm-dev
2018-Aug-17 16:02 UTC
[llvm-dev] [RFC] Delaying phi-to-select transformation until later in the pass pipeline
> On Aug 15, 2018, at 10:57 PM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > On 08/15/2018 02:38 PM, Philip Reames via llvm-dev wrote: >> I'm concerned that we're focusing on one side of this. Let me point out a few concerns w/changing the canonical form here: >> >> LICM does not know how to hoist or sink regions. It does know how to hoist and sink selects.I think we should teach LICM to do this eventually.>> InstCombine has limited support for triangles/diamonds, but fairly extensive support for selects. >> EarlyCSE and GVN do not know how to eliminate fully redundant triangles/diamonds. PRE *may* get some of these cases, but that is by no means guaranteed or likely to be robust.Agreed, we’ll need a plan to deal with these issues.>> My personal opinion is that selects are the appropriate canonical form. >> >> For the one of the specific cases mentioned, teaching GVN to do FRE and PRE for loads from selects of pointers just doesn't seem that painful. I would be really tempted to do that instead. Similarly, walking facts back from select uses in CVP seems generally useful since we have use dependent facts in a bunch of contexts, not just selects. (Call arguments for instance, non-null from unconditional deref, etc..) >> >> To be clear, I am raising concerns, not actively objecting to this. If you want to move forward and commit to work through all of the issues identified I wont actively stand in the way. >> > > As I've expressed in the past, I think that not using select as part of our canonical form is potentially a superior design. However, it would be a major change. In addition to the issues that Philip mentions, there's also the fact that we'll just have more basic blocks and that, in itself, could lead to an increase in compile time. However, working through these issues will likely leave us with a more-robust optimizer. > > See also: https://bugs.llvm.org/show_bug.cgi?id=34603#c19 <https://bugs.llvm.org/show_bug.cgi?id=34603#c19>Canonicalizing to phis is my preference too.> > -Hal > >> Philip >> >> >> On 08/14/2018 12:39 PM, Sanjay Patel via llvm-dev wrote: >>> I didn't look closely at the new patch, but this appears to be a small extension to: >>> https://reviews.llvm.org/D38566 <https://reviews.llvm.org/D38566> >>> ...and the GVN-based reasons for delaying transformation to 'select' are discussed in detail in the motivating bug for that patch: >>> https://bugs.llvm.org/show_bug.cgi?id=34603 <https://bugs.llvm.org/show_bug.cgi?id=34603> >>> >>> So this sounds like the right direction to me. Note that there was objection to the implementation (a pile of pass options vs. uniquely named passes). >>> >>> Here's another motivating bug where early transform to select prevents optimization: >>> https://bugs.llvm.org/show_bug.cgi?id=36760 <https://bugs.llvm.org/show_bug.cgi?id=36760> >>> >>> Is that case affected by this patch? >>> >>> >>> On Tue, Aug 14, 2018 at 11:17 AM, John Brawn via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>> Summary >>> ======>>> >>> I'm planning on adjusting SimplifyCFG so that it doesn't turn two-entry phi >>> nodes into selects until later in the pass pipeline, to give passes which can >>> understand phis but not selects more opportunity to optimize. The thing I'm >>> trying to do which made me think of doing this is described below, but from the >>> benchmarking I've done it looks like this is overall a good idea regardless of >>> if I manage to get that done or not. >>> >>> Motivation >>> =========>>> >>> My goal is to get clang to optimize some code containing a call to >>> std::min_element which is dereferenced, so something like: >>> >>> float min_element_example(float *data, int size) >>> { >>> return *std::min_element(data, data+size); >>> } >>> >>> which, after inlining a specialization, looks like: >>> >>> float min_element_example_inlined(float *first, float *last) >>> { >>> for (float *p = first; p != last; ++p) >>> { >>> if (*p < *first) >>> first = p; >>> } >>> return *first; >>> } >>> >>> There are two loads in the loop, *p and *first, but actually the load *p can be >>> eliminated by using either the previous load *p or the previous *first, >>> depending on if the if-condition was taken or not. However the >>> "if (*p < *first) first = p" gets turned by simplifycfg into a select and this >>> makes optimizing this a lot harder because you no longer have distinct paths >>> through the CFG. >>> >>> I have some ideas on how to do the optimization (see my previous RFC "Making GVN >>> able to visit the same block more than once" posted in April, though I've >>> decided that the specific idea presented there isn't the right way to do it), >>> but I think the first step is to make sure we don't have a select when we try >>> to optimise this. >>> >>> Approach >>> =======>>> >>> I've posted a patch to https://reviews.llvm.org/D50723 <https://reviews.llvm.org/D50723> showing what I'm >>> intending to do. An extra parameter is added to SimplifyCFG to control whether >>> two-entry phi nodes are converted into select, and this is set to false in all >>> instances before the end of module simplification. At the end of module >>> simplification we do SimplifyCFG, then Instcombine to optimise the selects that >>> are introduced, then EarlyCSE to eliminate common subexpressions introduced by >>> instcombine.Is scheduling another simplifycfg, instcombine + earlycse pass really necessary? I’m concerned about the compile time impact. Cheers, Amara>>> >>> Benchmark Results >>> ================>>> >>> These are performance differences reported by LNT when running llvm-test-suite, >>> spec2000, and spec2006 at -O3 with and without the patch linked above (using >>> trunk llvm from a week or so ago). >>> >>> AArch64 results on ARM Cortex-A72: >>> >>> Performance Regressions - execution_time Change >>> SingleSource/Benchmarks/Shootout/Shootout-ary3 9.48% >>> MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt 3.79% >>> SingleSource/Benchmarks/CoyoteBench/huffbench 1.40% >>> >>> Performance Improvements - execution_time Change >>> MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl -23.74% >>> External/SPEC/CINT2000/256.bzip2/256.bzip2 -9.82% >>> MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt -9.57% >>> MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt -4.38% >>> MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt -3.94% >>> MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl -3.44% >>> External/SPEC/CFP2006/453.povray/453.povray -2.50% >>> SingleSource/Benchmarks/Adobe-C++/stepanov_vector -1.49% >>> >>> X86_64 results on Intel Xeon E5-2690: >>> >>> Performance Regressions - execution_time Change >>> MultiSource/Benchmarks/Ptrdist/yacr2/yacr2 5.62% >>> >>> Performance Improvements - execution_time Change >>> SingleSource/Benchmarks/Misc-C++/Large/sphereflake -4.43% >>> External/SPEC/CINT2006/456.hmmer/456.hmmer -2.50% >>> External/SPEC/CINT2006/464.h264ref/464.h264ref -1.60% >>> MultiSource/Benchmarks/nbench/nbench -1.19% >>> SingleSource/Benchmarks/Adobe-C++/functionobjects -1.07% >>> >>> I had a brief look at the regressions and they all look to be caused by >>> getting bad luck with branch mispredictions: I looked into the Shootout-ary3 and >>> yacr2 cases and in both the hot code path was the same with and without the >>> patch, but with more mispredictions probably caused by changes elsewhere. >>> >>> John >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180817/8f80d71e/attachment.html>
Philip Reames via llvm-dev
2018-Aug-17 20:17 UTC
[llvm-dev] [RFC] Delaying phi-to-select transformation until later in the pass pipeline
On 08/17/2018 09:02 AM, Amara Emerson wrote:> >> On Aug 15, 2018, at 10:57 PM, Hal Finkel via llvm-dev >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> >> On 08/15/2018 02:38 PM, Philip Reames via llvm-dev wrote: >>> >>> I'm concerned that we're focusing on one side of this. Let me point >>> out a few concerns w/changing the canonical form here: >>> >>> 1. LICM does not know how to hoist or sink regions. It does know >>> how to hoist and sink selects. >>> > I think we should teach LICM to do this eventually.Agreed. I just don't currently see patches to do so. Once LICM supports region hoisting, much of my concern disappears.>>> >>> 2. InstCombine has limited support for triangles/diamonds, but >>> fairly extensive support for selects. >>> 3. EarlyCSE and GVN do not know how to eliminate fully redundant >>> triangles/diamonds. PRE *may* get some of these cases, but that >>> is by no means guaranteed or likely to be robust. >>> > Agreed, we’ll need a plan to deal with these issues.Again, I'd love to see these issues fixed. Once that's done, the convert about phi vs select as canonical form is much less relevant.>>> >>> My personal opinion is that selects are the appropriate canonical form. >>> >>> For the one of the specific cases mentioned, teaching GVN to do FRE >>> and PRE for loads from selects of pointers just doesn't seem that >>> painful. I would be really tempted to do that instead. Similarly, >>> walking facts back from select uses in CVP seems generally useful >>> since we have use dependent facts in a bunch of contexts, not just >>> selects. (Call arguments for instance, non-null from unconditional >>> deref, etc..) >>> >>> To be clear, I am raising concerns, not actively objecting to this. >>> If you want to move forward and commit to work through all of the >>> issues identified I wont actively stand in the way. >>> >> >> As I've expressed in the past, I think that not using select as part >> of our canonical form is potentially a superior design. However, it >> would be a major change. In addition to the issues that Philip >> mentions, there's also the fact that we'll just have more basic >> blocks and that, in itself, could lead to an increase in compile >> time. However, working through these issues will likely leave us with >> a more-robust optimizer. >> >> See also: https://bugs.llvm.org/show_bug.cgi?id=34603#c19 > Canonicalizing to phis is my preference too. >> >> -Hal >> >>> Philip >>> >>> >>> On 08/14/2018 12:39 PM, Sanjay Patel via llvm-dev wrote: >>>> I didn't look closely at the new patch, but this appears to be a >>>> small extension to: >>>> https://reviews.llvm.org/D38566 >>>> ...and the GVN-based reasons for delaying transformation to >>>> 'select' are discussed in detail in the motivating bug for that patch: >>>> https://bugs.llvm.org/show_bug.cgi?id=34603 >>>> >>>> So this sounds like the right direction to me. Note that there was >>>> objection to the implementation (a pile of pass options vs. >>>> uniquely named passes). >>>> >>>> Here's another motivating bug where early transform to select >>>> prevents optimization: >>>> https://bugs.llvm.org/show_bug.cgi?id=36760 >>>> >>>> Is that case affected by this patch? >>>> >>>> >>>> On Tue, Aug 14, 2018 at 11:17 AM, John Brawn via llvm-dev >>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>> >>>> Summary >>>> ======>>>> >>>> I'm planning on adjusting SimplifyCFG so that it doesn't turn >>>> two-entry phi >>>> nodes into selects until later in the pass pipeline, to give >>>> passes which can >>>> understand phis but not selects more opportunity to optimize. >>>> The thing I'm >>>> trying to do which made me think of doing this is described >>>> below, but from the >>>> benchmarking I've done it looks like this is overall a good >>>> idea regardless of >>>> if I manage to get that done or not. >>>> >>>> Motivation >>>> =========>>>> >>>> My goal is to get clang to optimize some code containing a call to >>>> std::min_element which is dereferenced, so something like: >>>> >>>> float min_element_example(float *data, int size) >>>> { >>>> return *std::min_element(data, data+size); >>>> } >>>> >>>> which, after inlining a specialization, looks like: >>>> >>>> float min_element_example_inlined(float *first, float *last) >>>> { >>>> for (float *p = first; p != last; ++p) >>>> { >>>> if (*p < *first) >>>> first = p; >>>> } >>>> return *first; >>>> } >>>> >>>> There are two loads in the loop, *p and *first, but actually >>>> the load *p can be >>>> eliminated by using either the previous load *p or the previous >>>> *first, >>>> depending on if the if-condition was taken or not. However the >>>> "if (*p < *first) first = p" gets turned by simplifycfg into a >>>> select and this >>>> makes optimizing this a lot harder because you no longer have >>>> distinct paths >>>> through the CFG. >>>> >>>> I have some ideas on how to do the optimization (see my >>>> previous RFC "Making GVN >>>> able to visit the same block more than once" posted in April, >>>> though I've >>>> decided that the specific idea presented there isn't the right >>>> way to do it), >>>> but I think the first step is to make sure we don't have a >>>> select when we try >>>> to optimise this. >>>> >>>> Approach >>>> =======>>>> >>>> I've posted a patch to https://reviews.llvm.org/D50723 >>>> <https://reviews.llvm.org/D50723> showing what I'm >>>> intending to do. An extra parameter is added to SimplifyCFG to >>>> control whether >>>> two-entry phi nodes are converted into select, and this is set >>>> to false in all >>>> instances before the end of module simplification. At the end >>>> of module >>>> simplification we do SimplifyCFG, then Instcombine to optimise >>>> the selects that >>>> are introduced, then EarlyCSE to eliminate common >>>> subexpressions introduced by >>>> instcombine. >>>> > Is scheduling another simplifycfg, instcombine + earlycse pass really > necessary? I’m concerned about the compile time impact. > > Cheers, > Amara >>>> >>>> >>>> Benchmark Results >>>> ================>>>> >>>> These are performance differences reported by LNT when running >>>> llvm-test-suite, >>>> spec2000, and spec2006 at -O3 with and without the patch linked >>>> above (using >>>> trunk llvm from a week or so ago). >>>> >>>> AArch64 results on ARM Cortex-A72: >>>> >>>> Performance Regressions - execution_time >>>> Change >>>> SingleSource/Benchmarks/Shootout/Shootout-ary3 >>>> 9.48% >>>> MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt >>>> 3.79% >>>> SingleSource/Benchmarks/CoyoteBench/huffbench >>>> 1.40% >>>> >>>> Performance Improvements - execution_time >>>> Change >>>> MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl >>>> -23.74% >>>> External/SPEC/CINT2000/256.bzip2/256.bzip2 >>>> -9.82% >>>> MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt >>>> -9.57% >>>> MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt >>>> -4.38% >>>> MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt >>>> -3.94% >>>> MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl >>>> -3.44% >>>> External/SPEC/CFP2006/453.povray/453.povray >>>> -2.50% >>>> SingleSource/Benchmarks/Adobe-C++/stepanov_vector >>>> -1.49% >>>> >>>> X86_64 results on Intel Xeon E5-2690: >>>> >>>> Performance Regressions - execution_time Change >>>> MultiSource/Benchmarks/Ptrdist/yacr2/yacr2 5.62% >>>> >>>> Performance Improvements - execution_time Change >>>> SingleSource/Benchmarks/Misc-C++/Large/sphereflake -4.43% >>>> External/SPEC/CINT2006/456.hmmer/456.hmmer -2.50% >>>> External/SPEC/CINT2006/464.h264ref/464.h264ref -1.60% >>>> MultiSource/Benchmarks/nbench/nbench -1.19% >>>> SingleSource/Benchmarks/Adobe-C++/functionobjects -1.07% >>>> >>>> I had a brief look at the regressions and they all look to be >>>> caused by >>>> getting bad luck with branch mispredictions: I looked into the >>>> Shootout-ary3 and >>>> yacr2 cases and in both the hot code path was the same with and >>>> without the >>>> patch, but with more mispredictions probably caused by changes >>>> elsewhere. >>>> >>>> John >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> -- >> Hal Finkel >> Lead, Compiler Technology and Programming Languages >> Leadership Computing Facility >> Argonne National Laboratory >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180817/6456152a/attachment.html>
John Brawn via llvm-dev
2018-Aug-23 17:01 UTC
[llvm-dev] [RFC] Delaying phi-to-select transformation until later in the pass pipeline
> Here's another motivating bug where early transform to select prevents optimization: > https://bugs.llvm.org/show_bug.cgi?id=36760> Is that case affected by this patch?From a quick reading of that bug it looks like CVP needs to be enhanced to do a certain transformation, but the presence of a select means that even with that enhancement it wouldn’t do anything. With this patch we have a phi and so I think that if the CVP enhancement were done it should work.>>> 1. LICM does not know how to hoist or sink regions. It does know how to hoist and sink selects. >>I think we should teach LICM to do this eventually. >Agreed. I just don't currently see patches to do so. Once LICM supports region hoisting, much of my concern disappears.I’ve looked into what happens with LICM with this change, and it looks like everything except the phi gets hoisted, then later the phi is turned into a select, then a later LICM pass hoists the select. For the very simple tests I tried we end up with the same result in the end, but I wouldn’t be surprised if indeed LICM not doing this makes things worse in more complex cases. I’ve hacked together a modification to LICM to make it able to hoist phis (by essentially hoisting entire blocks instead of moving everything into a single preheader block) and it seemed to work OK, so I’ll get to work on getting that working properly and committed before I touch phi-to-select transformation.> InstCombine has limited support for triangles/diamonds, but fairly extensive support for selects.Currently I have InstCombine running directly after phis have been turned into selects, so this should only be a problem if something before that relies on the select instcombine having already happened. I’ll look into that to see if it is indeed a problem.>>>EarlyCSE and GVN do not know how to eliminate fully redundant triangles/diamonds. PRE *may* get some of these cases, but that is by no means guaranteed or likely to be robust. >>Agreed, we’ll need a plan to deal with these issues. >Again, I'd love to see these issues fixed. Once that's done, the convert about phi vs select as canonical form is much less relevant.I’ll look into this.> For the one of the specific cases mentioned, teaching GVN to do FRE and PRE for loads from selects of pointers just doesn't seem that painful. I would be really tempted to do that > instead. Similarly, walking facts back from select uses in CVP seems generally useful since we have use dependent facts in a bunch of contexts, not just selects. (Call arguments for > instance, non-null from unconditional deref, etc..)Actually making GVN able to handle selects being painful is exactly the reason I’m proposing doing this. GVN has a hard constraint that a single block can only have a single available value, and attempting to change that (which you have to do when selecting between values defined in the same block) causes a ton of problems that have to be resolved. John From: Philip Reames [mailto:listmail at philipreames.com] Sent: 17 August 2018 21:17 To: Amara Emerson; Hal Finkel; John Brawn Cc: Sanjay Patel; llvm-dev at lists.llvm.org; nd Subject: Re: [llvm-dev] [RFC] Delaying phi-to-select transformation until later in the pass pipeline On 08/17/2018 09:02 AM, Amara Emerson wrote: On Aug 15, 2018, at 10:57 PM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: On 08/15/2018 02:38 PM, Philip Reames via llvm-dev wrote: I'm concerned that we're focusing on one side of this. Let me point out a few concerns w/changing the canonical form here: 1. LICM does not know how to hoist or sink regions. It does know how to hoist and sink selects. I think we should teach LICM to do this eventually. Agreed. I just don't currently see patches to do so. Once LICM supports region hoisting, much of my concern disappears. 1. InstCombine has limited support for triangles/diamonds, but fairly extensive support for selects. 2. EarlyCSE and GVN do not know how to eliminate fully redundant triangles/diamonds. PRE *may* get some of these cases, but that is by no means guaranteed or likely to be robust. Agreed, we’ll need a plan to deal with these issues. Again, I'd love to see these issues fixed. Once that's done, the convert about phi vs select as canonical form is much less relevant. My personal opinion is that selects are the appropriate canonical form. For the one of the specific cases mentioned, teaching GVN to do FRE and PRE for loads from selects of pointers just doesn't seem that painful. I would be really tempted to do that instead. Similarly, walking facts back from select uses in CVP seems generally useful since we have use dependent facts in a bunch of contexts, not just selects. (Call arguments for instance, non-null from unconditional deref, etc..) To be clear, I am raising concerns, not actively objecting to this. If you want to move forward and commit to work through all of the issues identified I wont actively stand in the way. As I've expressed in the past, I think that not using select as part of our canonical form is potentially a superior design. However, it would be a major change. In addition to the issues that Philip mentions, there's also the fact that we'll just have more basic blocks and that, in itself, could lead to an increase in compile time. However, working through these issues will likely leave us with a more-robust optimizer. See also: https://bugs.llvm.org/show_bug.cgi?id=34603#c19 Canonicalizing to phis is my preference too. -Hal Philip On 08/14/2018 12:39 PM, Sanjay Patel via llvm-dev wrote: I didn't look closely at the new patch, but this appears to be a small extension to: https://reviews.llvm.org/D38566 ...and the GVN-based reasons for delaying transformation to 'select' are discussed in detail in the motivating bug for that patch: https://bugs.llvm.org/show_bug.cgi?id=34603 So this sounds like the right direction to me. Note that there was objection to the implementation (a pile of pass options vs. uniquely named passes). Here's another motivating bug where early transform to select prevents optimization: https://bugs.llvm.org/show_bug.cgi?id=36760 Is that case affected by this patch? On Tue, Aug 14, 2018 at 11:17 AM, John Brawn via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Summary ====== I'm planning on adjusting SimplifyCFG so that it doesn't turn two-entry phi nodes into selects until later in the pass pipeline, to give passes which can understand phis but not selects more opportunity to optimize. The thing I'm trying to do which made me think of doing this is described below, but from the benchmarking I've done it looks like this is overall a good idea regardless of if I manage to get that done or not. Motivation ========= My goal is to get clang to optimize some code containing a call to std::min_element which is dereferenced, so something like: float min_element_example(float *data, int size) { return *std::min_element(data, data+size); } which, after inlining a specialization, looks like: float min_element_example_inlined(float *first, float *last) { for (float *p = first; p != last; ++p) { if (*p < *first) first = p; } return *first; } There are two loads in the loop, *p and *first, but actually the load *p can be eliminated by using either the previous load *p or the previous *first, depending on if the if-condition was taken or not. However the "if (*p < *first) first = p" gets turned by simplifycfg into a select and this makes optimizing this a lot harder because you no longer have distinct paths through the CFG. I have some ideas on how to do the optimization (see my previous RFC "Making GVN able to visit the same block more than once" posted in April, though I've decided that the specific idea presented there isn't the right way to do it), but I think the first step is to make sure we don't have a select when we try to optimise this. Approach ======= I've posted a patch to https://reviews.llvm.org/D50723 showing what I'm intending to do. An extra parameter is added to SimplifyCFG to control whether two-entry phi nodes are converted into select, and this is set to false in all instances before the end of module simplification. At the end of module simplification we do SimplifyCFG, then Instcombine to optimise the selects that are introduced, then EarlyCSE to eliminate common subexpressions introduced by instcombine. Is scheduling another simplifycfg, instcombine + earlycse pass really necessary? I’m concerned about the compile time impact. Cheers, Amara Benchmark Results ================ These are performance differences reported by LNT when running llvm-test-suite, spec2000, and spec2006 at -O3 with and without the patch linked above (using trunk llvm from a week or so ago). AArch64 results on ARM Cortex-A72: Performance Regressions - execution_time Change SingleSource/Benchmarks/Shootout/Shootout-ary3 9.48% MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt 3.79% SingleSource/Benchmarks/CoyoteBench/huffbench 1.40% Performance Improvements - execution_time Change MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl -23.74% External/SPEC/CINT2000/256.bzip2/256.bzip2 -9.82% MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt -9.57% MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt -4.38% MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt -3.94% MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl -3.44% External/SPEC/CFP2006/453.povray/453.povray -2.50% SingleSource/Benchmarks/Adobe-C++/stepanov_vector -1.49% X86_64 results on Intel Xeon E5-2690: Performance Regressions - execution_time Change MultiSource/Benchmarks/Ptrdist/yacr2/yacr2 5.62% Performance Improvements - execution_time Change SingleSource/Benchmarks/Misc-C++/Large/sphereflake -4.43% External/SPEC/CINT2006/456.hmmer/456.hmmer -2.50% External/SPEC/CINT2006/464.h264ref/464.h264ref -1.60% MultiSource/Benchmarks/nbench/nbench -1.19% SingleSource/Benchmarks/Adobe-C++/functionobjects -1.07% I had a brief look at the regressions and they all look to be caused by getting bad luck with branch mispredictions: I looked into the Shootout-ary3 and yacr2 cases and in both the hot code path was the same with and without the patch, but with more mispredictions probably caused by changes elsewhere. John _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180823/973e5e0d/attachment-0001.html>