Hal Finkel via llvm-dev
2016-Mar-08 17:55 UTC
[llvm-dev] [cfe-dev] llvm and clang are getting slower
----- Original Message -----> From: "Mehdi Amini via cfe-dev" <cfe-dev at lists.llvm.org> > To: "Rafael Espíndola" <rafael.espindola at gmail.com> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "cfe-dev" <cfe-dev at lists.llvm.org> > Sent: Tuesday, March 8, 2016 11:40:47 AM > Subject: Re: [cfe-dev] [llvm-dev] llvm and clang are getting slower > > Hi Rafael, > > CC: cfe-dev > > Thanks for sharing. We also noticed this internally, and I know that > Bruno and Chris are working on some infrastructure and tooling to > help tracking closely compile time regressions. > > We had this conversation internally about the tradeoff between > compile-time and runtime performance, and I planned to bring-up the > topic on the list in the coming months, this looks like a good > occasion to plant the seed. Apparently in the past (years/decade > ago?) the project was very conservative on adding any optimizations > that would impact compile time, however there is no explicit policy > (that I know of) to address this tradeoff. > The closest I could find would be what Chandler wrote in: > http://reviews.llvm.org/D12826 ; for instance for O2 he stated that > "if an optimization increases compile time by 5% or increases code > size by 5% for a particular benchmark, that benchmark should also be > one which sees a 5% runtime improvement". > > My hope is that with better tooling for tracking compile time in the > future, we'll reach a state where we'll be able to consider > "breaking" the compile-time regression test as important as breaking > any test: i.e. the offending commit should be reverted unless it has > been shown to significantly (hand wavy...) improve the runtime > performance. > > <troll> > With the current trend, the Polly developers don't have to worry > about improving their compile time, we'll catch up with them ;) > </troll>My two largest pet peeves in this area are: 1. We often use functions from ValueTracking (to get known bits, the number of sign bits, etc.) as through they're low cost. They're not really low cost. The problem is that they *should* be. These functions do bottom-up walks, and could cache their results. Instead, they do a limited walk and recompute everything each time. This is expensive, and a significant amount of our InstCombine time goes to ValueTracking, and that shouldn't be the case. The more we add to InstCombine (and related passes), and the more we run InstCombine, the worse this gets. On the other hand, fixing this will help both compile time and code quality. Furthermore, BasicAA has the same problem. 2. We have "cleanup" passes in the pipeline, such as those that run after loop unrolling and/or vectorization, that run regardless of whether the preceding pass actually did anything. We've been adding more of these, and they catch important use cases, but we need a better infrastructure for this (either with the new pass manager or otherwise). Also, I'm very hopeful that as our new MemorySSA and GVN improvements materialize, we'll see large compile-time improvements from that work. We spend a huge amount of time in GVN computing memory-dependency information (the dwarfs the time spent by GVN doing actual value numbering work by an order of magnitude or more). -Hal> > -- > Mehdi > > > > > > > > On Mar 8, 2016, at 8:13 AM, Rafael Espíndola via llvm-dev > > <llvm-dev at lists.llvm.org> wrote: > > > > I have just benchmarked building trunk llvm and clang in Debug, > > Release and LTO modes (see the attached scrip for the cmake lines). > > > > The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all > > cases I used the system libgcc and libstdc++. > > > > For release builds there is a monotonic increase in each version. > > From > > 163 minutes with 3.5 to 212 minutes with trunk. For comparison, gcc > > 5.3.2 takes 205 minutes. > > > > Debug and LTO show an improvement in 3.7, but have regressed again > > in 3.8. > > > > Cheers, > > Rafael > > <run.sh><LTO.time><Debug.time><Release.time>_______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Adam Nemet via llvm-dev
2016-Mar-08 18:22 UTC
[llvm-dev] [cfe-dev] llvm and clang are getting slower
> On Mar 8, 2016, at 9:55 AM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > ----- Original Message ----- >> From: "Mehdi Amini via cfe-dev" <cfe-dev at lists.llvm.org> >> To: "Rafael Espíndola" <rafael.espindola at gmail.com> >> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "cfe-dev" <cfe-dev at lists.llvm.org> >> Sent: Tuesday, March 8, 2016 11:40:47 AM >> Subject: Re: [cfe-dev] [llvm-dev] llvm and clang are getting slower >> >> Hi Rafael, >> >> CC: cfe-dev >> >> Thanks for sharing. We also noticed this internally, and I know that >> Bruno and Chris are working on some infrastructure and tooling to >> help tracking closely compile time regressions. >> >> We had this conversation internally about the tradeoff between >> compile-time and runtime performance, and I planned to bring-up the >> topic on the list in the coming months, this looks like a good >> occasion to plant the seed. Apparently in the past (years/decade >> ago?) the project was very conservative on adding any optimizations >> that would impact compile time, however there is no explicit policy >> (that I know of) to address this tradeoff. >> The closest I could find would be what Chandler wrote in: >> http://reviews.llvm.org/D12826 ; for instance for O2 he stated that >> "if an optimization increases compile time by 5% or increases code >> size by 5% for a particular benchmark, that benchmark should also be >> one which sees a 5% runtime improvement". >> >> My hope is that with better tooling for tracking compile time in the >> future, we'll reach a state where we'll be able to consider >> "breaking" the compile-time regression test as important as breaking >> any test: i.e. the offending commit should be reverted unless it has >> been shown to significantly (hand wavy...) improve the runtime >> performance. >> >> <troll> >> With the current trend, the Polly developers don't have to worry >> about improving their compile time, we'll catch up with them ;) >> </troll> > > My two largest pet peeves in this area are: > > 1. We often use functions from ValueTracking (to get known bits, the number of sign bits, etc.) as through they're low cost. They're not really low cost. The problem is that they *should* be. These functions do bottom-up walks, and could cache their results. Instead, they do a limited walk and recompute everything each time. This is expensive, and a significant amount of our InstCombine time goes to ValueTracking, and that shouldn't be the case. The more we add to InstCombine (and related passes), and the more we run InstCombine, the worse this gets. On the other hand, fixing this will help both compile time and code quality. > > Furthermore, BasicAA has the same problem. > > 2. We have "cleanup" passes in the pipeline, such as those that run after loop unrolling and/or vectorization, that run regardless of whether the preceding pass actually did anything. We've been adding more of these, and they catch important use cases, but we need a better infrastructure for this (either with the new pass manager or otherwise).A related issue is that if an analysis is not preserved by a pass, it gets invalidated *even if* the pass doesn’t end up modifying the code. Because of this for example we invalidate SCEV’s cache unnecessarily. The new pass manager should fix this. Adam> > Also, I'm very hopeful that as our new MemorySSA and GVN improvements materialize, we'll see large compile-time improvements from that work. We spend a huge amount of time in GVN computing memory-dependency information (the dwarfs the time spent by GVN doing actual value numbering work by an order of magnitude or more). > > -Hal > >> >> -- >> Mehdi >> >> >> >> >> >> >>> On Mar 8, 2016, at 8:13 AM, Rafael Espíndola via llvm-dev >>> <llvm-dev at lists.llvm.org> wrote: >>> >>> I have just benchmarked building trunk llvm and clang in Debug, >>> Release and LTO modes (see the attached scrip for the cmake lines). >>> >>> The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all >>> cases I used the system libgcc and libstdc++. >>> >>> For release builds there is a monotonic increase in each version. >>> From >>> 163 minutes with 3.5 to 212 minutes with trunk. For comparison, gcc >>> 5.3.2 takes 205 minutes. >>> >>> Debug and LTO show an improvement in 3.7, but have regressed again >>> in 3.8. >>> >>> Cheers, >>> Rafael >>> <run.sh><LTO.time><Debug.time><Release.time>_______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> _______________________________________________ >> cfe-dev mailing list >> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev> >> > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/8cde263c/attachment.html>
Daniel Berlin via llvm-dev
2016-Mar-08 18:22 UTC
[llvm-dev] [cfe-dev] llvm and clang are getting slower
On Tue, Mar 8, 2016 at 9:55 AM, Hal Finkel via llvm-dev < llvm-dev at lists.llvm.org> wrote:> ----- Original Message ----- > > From: "Mehdi Amini via cfe-dev" <cfe-dev at lists.llvm.org> > > To: "Rafael Espíndola" <rafael.espindola at gmail.com> > > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "cfe-dev" < > cfe-dev at lists.llvm.org> > > Sent: Tuesday, March 8, 2016 11:40:47 AM > > Subject: Re: [cfe-dev] [llvm-dev] llvm and clang are getting slower > > > > Hi Rafael, > > > > CC: cfe-dev > > > > Thanks for sharing. We also noticed this internally, and I know that > > Bruno and Chris are working on some infrastructure and tooling to > > help tracking closely compile time regressions. > > > > We had this conversation internally about the tradeoff between > > compile-time and runtime performance, and I planned to bring-up the > > topic on the list in the coming months, this looks like a good > > occasion to plant the seed. Apparently in the past (years/decade > > ago?) the project was very conservative on adding any optimizations > > that would impact compile time, however there is no explicit policy > > (that I know of) to address this tradeoff. > > The closest I could find would be what Chandler wrote in: > > http://reviews.llvm.org/D12826 ; for instance for O2 he stated that > > "if an optimization increases compile time by 5% or increases code > > size by 5% for a particular benchmark, that benchmark should also be > > one which sees a 5% runtime improvement". > > > > My hope is that with better tooling for tracking compile time in the > > future, we'll reach a state where we'll be able to consider > > "breaking" the compile-time regression test as important as breaking > > any test: i.e. the offending commit should be reverted unless it has > > been shown to significantly (hand wavy...) improve the runtime > > performance. > > > > <troll> > > With the current trend, the Polly developers don't have to worry > > about improving their compile time, we'll catch up with them ;) > > </troll> > > My two largest pet peeves in this area are: >I think you hit on something that i would expand on: We don't hold the line very well on adding little things to passes and analysis over time. We add 1000 little walkers and pattern matchers to try to get better code, and then often add knobs to try to control their overall compile time. At some point, these all add up. You end up with the same flat profile if you do this everywhere, but your compiler gets slower. At some point, someone has to stop and say "well, wait a minute, are there better algorithms or architecture we should be using to do this", and either do it, or not let it get worse :) I'd suggest, in most cases, we know better ways to do almost all of these things. Don't get me wrong, i don't believe there is any theoretically pure way to do everything that we can just implement and never have to tweak. But it's a continuum, and at some point you have to stop and re-evaluate whether the current approach is really the right one if you have to have a billion little things to it get what you want. We often don't do that. We go *very* far down the path of a billion tweaks and adding knobs, and what we have now, compile time wise, is what you get when you do that :) I suspect this is because we don't really want to try to force work on people who are just trying to get crap done. We're all good contributors trying to do the right thing, and saying no often seems obstructionist, etc. The problem is at some point you end up with the tragedy of the commons. (also, not everything in the compiler has to catch every case to get good code)> 1. We often use functions from ValueTracking (to get known bits, the > number of sign bits, etc.) as through they're low cost. They're not really > low cost. The problem is that they *should* be. These functions do > bottom-up walks, and could cache their results. Instead, they do a limited > walk and recompute everything each time. This is expensive, and a > significant amount of our InstCombine time goes to ValueTracking, and that > shouldn't be the case. The more we add to InstCombine (and related passes), > and the more we run InstCombine, the worse this gets. On the other hand, > fixing this will help both compile time and code quality. >(LVI is another great example. Fun fact: If you ask for value info for everything, it's no longer lazy ....)> > Furthermore, BasicAA has the same problem. > > 2. We have "cleanup" passes in the pipeline, such as those that run after > loop unrolling and/or vectorization, that run regardless of whether the > preceding pass actually did anything. We've been adding more of these, and > they catch important use cases, but we need a better infrastructure for > this (either with the new pass manager or otherwise). > > Also, I'm very hopeful that as our new MemorySSA and GVN improvements > materialize, we'll see large compile-time improvements from that work. We > spend a huge amount of time in GVN computing memory-dependency information > (the dwarfs the time spent by GVN doing actual value numbering work by an > order of magnitude or more). >I'm a working on it ;) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/7d4c2729/attachment-0001.html>
Xinliang David Li via llvm-dev
2016-Mar-08 18:49 UTC
[llvm-dev] [cfe-dev] llvm and clang are getting slower
On Tue, Mar 8, 2016 at 10:22 AM, Daniel Berlin via cfe-dev < cfe-dev at lists.llvm.org> wrote:> > > On Tue, Mar 8, 2016 at 9:55 AM, Hal Finkel via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> ----- Original Message ----- >> > From: "Mehdi Amini via cfe-dev" <cfe-dev at lists.llvm.org> >> > To: "Rafael Espíndola" <rafael.espindola at gmail.com> >> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "cfe-dev" < >> cfe-dev at lists.llvm.org> >> > Sent: Tuesday, March 8, 2016 11:40:47 AM >> > Subject: Re: [cfe-dev] [llvm-dev] llvm and clang are getting slower >> > >> > Hi Rafael, >> > >> > CC: cfe-dev >> > >> > Thanks for sharing. We also noticed this internally, and I know that >> > Bruno and Chris are working on some infrastructure and tooling to >> > help tracking closely compile time regressions. >> > >> > We had this conversation internally about the tradeoff between >> > compile-time and runtime performance, and I planned to bring-up the >> > topic on the list in the coming months, this looks like a good >> > occasion to plant the seed. Apparently in the past (years/decade >> > ago?) the project was very conservative on adding any optimizations >> > that would impact compile time, however there is no explicit policy >> > (that I know of) to address this tradeoff. >> > The closest I could find would be what Chandler wrote in: >> > http://reviews.llvm.org/D12826 ; for instance for O2 he stated that >> > "if an optimization increases compile time by 5% or increases code >> > size by 5% for a particular benchmark, that benchmark should also be >> > one which sees a 5% runtime improvement". >> > >> > My hope is that with better tooling for tracking compile time in the >> > future, we'll reach a state where we'll be able to consider >> > "breaking" the compile-time regression test as important as breaking >> > any test: i.e. the offending commit should be reverted unless it has >> > been shown to significantly (hand wavy...) improve the runtime >> > performance. >> > >> > <troll> >> > With the current trend, the Polly developers don't have to worry >> > about improving their compile time, we'll catch up with them ;) >> > </troll> >> >> My two largest pet peeves in this area are: >> > > I think you hit on something that i would expand on: > > We don't hold the line very well on adding little things to passes and > analysis over time. > We add 1000 little walkers and pattern matchers to try to get better code, > and then often add knobs to try to control their overall compile time. > At some point, these all add up. You end up with the same flat profile if > you do this everywhere, but your compiler gets slower. > At some point, someone has to stop and say "well, wait a minute, are there > better algorithms or architecture we should be using to do this", and > either do it, or not let it get worse :) I'd suggest, in most cases, we > know better ways to do almost all of these things. > > Don't get me wrong, i don't believe there is any theoretically pure way to > do everything that we can just implement and never have to tweak. But it's > a continuum, and at some point you have to stop and re-evaluate whether the > current approach is really the right one if you have to have a billion > little things to it get what you want. > We often don't do that. > We go *very* far down the path of a billion tweaks and adding knobs, and > what we have now, compile time wise, is what you get when you do that :) > I suspect this is because we don't really want to try to force work on > people who are just trying to get crap done. We're all good contributors > trying to do the right thing, and saying no often seems obstructionist, etc. > The problem is at some point you end up with the tragedy of the commons. > > (also, not everything in the compiler has to catch every case to get good > code) > > >> 1. We often use functions from ValueTracking (to get known bits, the >> number of sign bits, etc.) as through they're low cost. They're not really >> low cost. The problem is that they *should* be. These functions do >> bottom-up walks, and could cache their results. Instead, they do a limited >> walk and recompute everything each time. This is expensive, and a >> significant amount of our InstCombine time goes to ValueTracking, and that >> shouldn't be the case. The more we add to InstCombine (and related passes), >> and the more we run InstCombine, the worse this gets. On the other hand, >> fixing this will help both compile time and code quality. >> > > (LVI is another great example. Fun fact: If you ask for value info for > everything, it's no longer lazy ....) >Yep -- see the bug Wei is working on: https://llvm.org/bugs/show_bug.cgi?id=10584 David> >> Furthermore, BasicAA has the same problem. >> >> 2. We have "cleanup" passes in the pipeline, such as those that run >> after loop unrolling and/or vectorization, that run regardless of whether >> the preceding pass actually did anything. We've been adding more of these, >> and they catch important use cases, but we need a better infrastructure for >> this (either with the new pass manager or otherwise). >> >> Also, I'm very hopeful that as our new MemorySSA and GVN improvements >> materialize, we'll see large compile-time improvements from that work. We >> spend a huge amount of time in GVN computing memory-dependency information >> (the dwarfs the time spent by GVN doing actual value numbering work by an >> order of magnitude or more). >> > > I'm a working on it ;) > > > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/165399c2/attachment.html>
Renato Golin via llvm-dev
2016-Mar-09 01:15 UTC
[llvm-dev] [cfe-dev] llvm and clang are getting slower
On 9 Mar 2016 1:22 a.m., "Adam Nemet via cfe-dev" <cfe-dev at lists.llvm.org> wrote:> A related issue is that if an analysis is not preserved by a pass, itgets invalidated *even if* the pass doesn’t end up modifying the code. Because of this for example we invalidate SCEV’s cache unnecessarily. The new pass manager should fix this. +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160309/51f55248/attachment.html>