Mehdi Amini via llvm-dev
2016-Mar-31 22:34 UTC
[llvm-dev] RFC: Large, especially super-linear, compile time regressions are bugs.
Hi Renato,> On Mar 31, 2016, at 2:46 PM, Renato Golin <renato.golin at linaro.org> wrote: > > On 31 March 2016 at 21:41, Mehdi Amini via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> TLDR: I totally support considering compile time regression as bug. > > Me too. > > I also agree that reverting fresh and reapplying is *much* easier than > trying to revert late. > > But I'd like to avoid dubious metrics.I'm not sure about how "this commit regress the compile time by 2%" is a dubious metric. The metric is not dubious IMO, it is what it is: a measurement. You just have to cast a good process around it to exploit this measurement in a useful way for the project.>> The closest I could find would be what Chandler wrote in: >> http://reviews.llvm.org/D12826 ; for instance for O2 he stated that "if an >> optimization increases compile time by 5% or increases code size by 5% for a >> particular benchmark, that benchmark should also be one which sees a 5% >> runtime improvement". > > I think this is a bit limited and can lead to which hunts, especially > wrt performance measurements. > > Chandler's title is perfect though... Large can be vague, but > "super-linear" is not. We used to have the concept that any large > super-linear (quadratic+) compile time introductions had to be in O3 > or, for really bad cases, behind additional flags. I think we should > keep that mindset. > > >> My hope is that with better tooling for tracking compile time in the future, >> we'll reach a state where we'll be able to consider "breaking" the >> compile-time regression test as important as breaking any test: i.e. the >> offending commit should be reverted unless it has been shown to >> significantly (hand wavy...) improve the runtime performance. > > In order to have any kind of threshold, we'd have to monitor with some > accuracy the performance of both compiler and compiled code for the > main platforms. We do that to certain extent with the test-suite bots, > but that's very far from ideal.I agree. Did you read the part where I was mentioning that we're working in the tooling part and that I was waiting for it to be done to start this thread?> > So, I'd recommend we steer away from any kind of percentage or ratio > and keep at least the quadratic changes and beyond on special flags > (n.logn is ok for most cases).How to do you suggest we address the long trail of 1-3% slow down that lead to the current situation (cf the two links I posted in my previous email)? Because there *is* a problem here, and I'd really like someone to come up with a solution for that.>> Since you raise the discussion now, I take the opportunity to push on the >> "more aggressive" side: I think the policy should be a balance between the >> improvement the commit brings compared to the compile time slow down. > > This is a fallacy.Not sure why or what you mean? The fact that an optimization improves only some target does not invalidate the point.> > Compile time often regress across all targets, while execution > improvements are focused on specific targets and can have negative > effects on those that were not benchmarked on.Yeah, as usual in LLVM: if you care about something on your platform, setup a bot and track trunk closely, otherwise you're less of a priority.> Overall, though, > compile time regressions dilute over the improvements, but not on a > commit per commit basis. That's what I meant by which hunt.There is no "witch hunt", at least that's not my objective. I think everyone is pretty enthusiastic with every new perf improvement (I do), but just like without bot in general (and policy) we would break them all the time unintentionally. I talking about chasing and tracking every single commit were a developer would regress compile time *without even being aware*. I'd personally love to have a bot or someone emailing me with compile time regression I would introduce.> > I think we should keep an eye on those changes, ask for numbers in > code review and even maybe do some benchmarking on our own before > accepting it. Also, we should not commit code that we know hurts > performance that badly, even if we believe people will replace them in > the future. It always takes too long. I myself have done that last > year, and I learnt my lesson.Agree.> Metrics are often more dangerous than helpful, as they tend to be used > as a substitute for thinking.I don't relate this sentence to anything concrete at stance here. I think this list is full of people that are very good at thinking and won't substitute it :) Best, -- Mehdi
Renato Golin via llvm-dev
2016-Mar-31 23:40 UTC
[llvm-dev] RFC: Large, especially super-linear, compile time regressions are bugs.
On 31 March 2016 at 23:34, Mehdi Amini <mehdi.amini at apple.com> wrote:> I'm not sure about how "this commit regress the compile time by 2%" is a dubious metric. > The metric is not dubious IMO, it is what it is: a measurement.Ignoring for a moment the slippery slope we recently had on compile time performance, 2% is an acceptable regression for a change that improves most targets around 2% execution time, more than if only one target was affected. Different people see performance with different eyes, and companies have different expectations about it, too, so those percentages can have different impact on different people for the same change. I guess my point is that no threshold will please everybody, and people are more likely to "abuse" of the metric if the results are far from what they see as acceptable, even if everyone else is ok with it. My point about replacing metrics for thinking is not to the lazy programmers (of which there are very few here), but to how far does the encoded threshold fall from your own. Bias is a *very* hard thing to remove, even for extremely smart and experienced people. So, while "which hunt" is a very strong term for the mild bias we'll all have personally, we have seen recently how some discussions end up in rage when a group of people strongly disagree with the rest, self-reinforcing their bias to levels that they would never reach alone. In those cases, the term stops being strong, and may be fitting... Makes sense?> I agree. Did you read the part where I was mentioning that we're working in the tooling part and that I was waiting for it to be done to start this thread?I did, and should have mentioned on my reply. I think you guys (and ARM) are doing an amazing job at quality measurement. I wasn't trying to reduce your efforts, but IMHO, the relationship between effort and bias removal is not linear, ie. you'll have to improve quality exponentially to remove bias linearly. So, the threshold we're prepared to stop might not remove all the problems and metrics could still play a negative role. I think I'm just asking for us to be aware of the fact, not to stop any attempt to introduce metrics. If they remain relevant to the final objective, and we're allowed to break them with enough arguments, it should work fine.> How to do you suggest we address the long trail of 1-3% slow down that lead to the current situation (cf the two links I posted in my previous email)? > Because there *is* a problem here, and I'd really like someone to come up with a solution for that.Indeed, we're now slower than GCC, and that's a place that looked impossible two years ago. But I doubt reverting a few patches will help. For this problem, we'll need a task force to hunt for all the dragons, and surgically alter them, since at this time, all relevant patches are too far in the past. For the future, emailing on compile time regressions (as well as run time) is a good thing to have and I vouch for it. But I don't want that to become a tool that will increase stress in the community.> Not sure why or what you mean? The fact that an optimization improves only some target does not invalidate the point.Sorry, I seem to have misinterpreted your point. The fallacy is about the measurement of "benefit" versus the regression "effect". The former is very hard to measure, while the latter is very precise. Comparisons with radically different standard deviations can easily fall into "undefined behaviour" land, and be seed for rage threads.> I talking about chasing and tracking every single commit were a developer would regress compile time *without even being aware*.That's a goal worth pursuing, regardless of the patch's benefit, I agree wholeheartedly. And for that, I'm very grateful of the work you guys are doing. cheers, --renato
Mehdi Amini via llvm-dev
2016-Apr-01 00:09 UTC
[llvm-dev] RFC: Large, especially super-linear, compile time regressions are bugs.
> On Mar 31, 2016, at 4:40 PM, Renato Golin <renato.golin at linaro.org> wrote: > > On 31 March 2016 at 23:34, Mehdi Amini <mehdi.amini at apple.com> wrote: >> I'm not sure about how "this commit regress the compile time by 2%" is a dubious metric. >> The metric is not dubious IMO, it is what it is: a measurement. > > Ignoring for a moment the slippery slope we recently had on compile > time performance, 2% is an acceptable regression for a change that > improves most targets around 2% execution time, more than if only one > target was affected.Sure, I don't think I have suggested anything else, if I did it is because I don't express myself correctly then :) I'm excited about runtime performance, and I'm willing to spend compile-time budget to achieve these. I'd even say that my view is that by tracking compile-time on other things, it'll help to preserve more compile-time budget for the kind of commit you mention above.> > Different people see performance with different eyes, and companies > have different expectations about it, too, so those percentages can > have different impact on different people for the same change. > > I guess my point is that no thresholdI don't suggest a threshold that says "a commit can't regress x%", and that would be set in stone. What I have in mind is more: if a commit regress the build above a threshold (1% on average for instance), then we should be able to have a discussion about this commit to evaluate if it belongs to O2 or if it should go to O3 for instance. Also if the commit is about refactoring, or introducing a new feature, the regression might not be intended at all by the author!> will please everybody, and > people are more likely to "abuse" of the metric if the results are far > from what they see as acceptable, even if everyone else is ok with it.The metric is "the commit regressed 1%". The natural thing that follows is what happens usually in the community: we look at the data (what is the performance improvement), and decide on a case by case if it is fine as is or not. I feel like you're talking about the "metric" like an automatic threshold that triggers an automatic revert and block things, this is not the goal and that is not what I mean when I use of the word metric (but hey, I'm not a native speaker!). As I said before, I'm mostly chasing *untracked* and *unintentional* compile time regression.> My point about replacing metrics for thinking is not to the lazy > programmers (of which there are very few here), but to how far does > the encoded threshold fall from your own. Bias is a *very* hard thing > to remove, even for extremely smart and experienced people. > > So, while "which hunt" is a very strong term for the mild bias we'll > all have personally, we have seen recently how some discussions end up > in rage when a group of people strongly disagree with the rest, > self-reinforcing their bias to levels that they would never reach > alone. In those cases, the term stops being strong, and may be > fitting... Makes sense? > > >> I agree. Did you read the part where I was mentioning that we're working in the tooling part and that I was waiting for it to be done to start this thread? > > I did, and should have mentioned on my reply. I think you guys (and > ARM) are doing an amazing job at quality measurement. I wasn't trying > to reduce your efforts, but IMHO, the relationship between effort and > bias removal is not linear, ie. you'll have to improve quality > exponentially to remove bias linearly. So, the threshold we're > prepared to stop might not remove all the problems and metrics could > still play a negative role.I'm not sure I really totally understand everything you mean.> > I think I'm just asking for us to be aware of the fact, not to stop > any attempt to introduce metrics. If they remain relevant to the final > objective, and we're allowed to break them with enough arguments, it > should work fine. > > >> How to do you suggest we address the long trail of 1-3% slow down that lead to the current situation (cf the two links I posted in my previous email)? >> Because there *is* a problem here, and I'd really like someone to come up with a solution for that. > > Indeed, we're now slower than GCC, and that's a place that looked > impossible two years ago. But I doubt reverting a few patches will > help. For this problem, we'll need a task force to hunt for all the > dragons, and surgically alter them, since at this time, all relevant > patches are too far in the past.Obviously, my immediate concern is "what tools and process to make sure it does not get worse", and starting with "community awareness" is not bad. Improving and recovering from the current state is valuable, but orthogonal to what I'm trying to achieve. Another things is the complain from multiple people that are trying to JIT using LLVM, we know LLVM is not designed in a way that helps with latency and memory consumption, but getting worse is not nice.> For the future, emailing on compile time regressions (as well as run > time) is a good thing to have and I vouch for it. But I don't want > that to become a tool that will increase stress in the community.Sure, I'm glad you step up to make sure it does not happen. So please continue to voice up in the future as we try to roll thing. I hope we're on the same track past the initial misunderstanding we had each other? What I'd really like is to have a consensus on the goal to pursue (knowing to not be alone to care about compile time is a great start!), so that the tooling can be set up to serve this goal the best way possible (and decreasing stress instead of increasing it). Best, -- Mehdi> > >> Not sure why or what you mean? The fact that an optimization improves only some target does not invalidate the point. > > Sorry, I seem to have misinterpreted your point. > > The fallacy is about the measurement of "benefit" versus the > regression "effect". The former is very hard to measure, while the > latter is very precise. Comparisons with radically different standard > deviations can easily fall into "undefined behaviour" land, and be > seed for rage threads. > > >> I talking about chasing and tracking every single commit were a developer would regress compile time *without even being aware*. > > That's a goal worth pursuing, regardless of the patch's benefit, I > agree wholeheartedly. And for that, I'm very grateful of the work you > guys are doing. > > cheers, > --renato
Maybe Matching Threads
- RFC: Large, especially super-linear, compile time regressions are bugs.
- RFC: Large, especially super-linear, compile time regressions are bugs.
- RFC: Large, especially super-linear, compile time regressions are bugs.
- RFC: Large, especially super-linear, compile time regressions are bugs.
- RFC: Large, especially super-linear, compile time regressions are bugs.