thr3ads.net - llvm dev - [llvm-dev] RFC: Large, especially super-linear, compile time regressions are bugs. [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Mehdi Amini via llvm-dev

2016-Mar-31 20:41 UTC

[llvm-dev] RFC: Large, especially super-linear, compile time regressions are bugs.

Hi,

TLDR: I totally support considering compile time regression as bug.

I'm glad you bring this topic. Also it is worth pointing at this recent
thread: http://lists.llvm.org/pipermail/llvm-dev/2016-March/096488.html
And also this blog post comparing the evolution of clang and gcc on this aspect:
http://hubicka.blogspot.nl/2016/03/building-libreoffice-with-gcc-6-and-lto.html

I will repeat myself here, since we also noticed internally that compile time
was slowly degrading with time. Bruno and Chris are working on some
infrastructure and tooling to help tracking closely compile time regressions.

We had this conversation internally about the tradeoff between compile-time and
runtime performance, and I planned to bring-up the topic on the list in the
coming months, but was waiting for more tooling to be ready.
Apparently in the past (years/decade ago?) the project was very conservative on
adding any optimizations that would impact compile time, however there is no
explicit policy (that I know of) to address this tradeoff.
The closest I could find would be what Chandler wrote in:
http://reviews.llvm.org/D12826 <http://reviews.llvm.org/D12826> ; for
instance for O2 he stated that "if an optimization increases compile time
by 5% or increases code size by 5% for a particular benchmark, that benchmark
should also be one which sees a 5% runtime improvement".

My hope is that with better tooling for tracking compile time in the future,
we'll reach a state where we'll be able to consider "breaking"
the compile-time regression test as important as breaking any test: i.e. the
offending commit should be reverted unless it has been shown to significantly
(hand wavy...) improve the runtime performance.

Since you raise the discussion now, I take the opportunity to push on the
"more aggressive" side: I think the policy should be a balance between
the improvement the commit brings compared to the compile time slow down.
Something along the line as what you wrote in my quote above.
You are referring to "large" compile time regressions (aside: what is
"large"?), while Bruno has graphs that shows that the compile time
regressions are mostly a lot of 1-3% regressions in general, spread over tens of
commits.
Also (and this where we need better tooling) *unexpected* compile-time slow down
are what makes me worried: i.e. the author of the commit adds something but
didn't expect the compile time to be "significantly" impacted.
This is motivated by Bruno/Chris data.
Tracking this more closely may also help to triage thing between O2 and O3 when
a commit introduces a compile time slow-down but also brings significant enough
runtime improvements.

-- 
Mehdi

> On Mar 31, 2016, at 1:28 PM, Chandler Carruth via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> LLVM has a wonderful policy regarding broken commits: we revert to green.
We ask that a test case be available within a reasonable time frame (preferably
before, but some exceptions can be made), but otherwise we revert the offending
patch, even if it contains nice features that people want, and keep the tree
green. This is an awesome policy.
> 
> I would like to suggest we adopt and follow the same policy for compile
time regressions that are large, and especially for ones that are super-linear.
As an example from the previous thread:
> 
> On Mon, Mar 14, 2016 at 12:14 PM Bruno Cardoso Lopes via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
> > There is a possibility that r259673 could play a role here.
> >
> > For the buildSchedGraph() method, there is the -dag-maps-huge-region
that
> > has the default value of 1000. When I commited the patch, I was
expecting
> > people to lower this value as needed and also suggested this, but this
has
> > not happened. 1000 is very high, basically "unlimited".
> >
> > It would be interesting to see what results you get with e.g. -mllvm
> > -dag-maps-huge-region=50. Of course, since this is a trade-off between
> > compile time and scheduler freedom, some care should be taken before
> > lowering this in trunk.
> 
> Indeed we hit this internally, filed a PR:
> https://llvm.org/bugs/show_bug.cgi?id=26940
<https://llvm.org/bugs/show_bug.cgi?id=26940>
> 
> I think we should have rolled back r259673 as soon as the test case was
available.
> 
> Thoughts?
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160331/94af3e0d/attachment.html>

Renato Golin via llvm-dev

2016-Mar-31 21:46 UTC

head link

[llvm-dev] RFC: Large, especially super-linear, compile time regressions are bugs.

On 31 March 2016 at 21:41, Mehdi Amini via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> TLDR: I totally support considering compile time regression as bug.
Me too.

I also agree that reverting fresh and reapplying is *much* easier than
trying to revert late.

But I'd like to avoid dubious metrics.

> The closest I could find would be what Chandler wrote in:
> http://reviews.llvm.org/D12826 ; for instance for O2 he stated that
"if an
> optimization increases compile time by 5% or increases code size by 5% for
a
> particular benchmark, that benchmark should also be one which sees a 5%
> runtime improvement".
I think this is a bit limited and can lead to which hunts, especially
wrt performance measurements.

Chandler's title is perfect though... Large can be vague, but
"super-linear" is not. We used to have the concept that any large
super-linear (quadratic+) compile time introductions had to be in O3
or, for really bad cases, behind additional flags. I think we should
keep that mindset.

> My hope is that with better tooling for tracking compile time in the
future,
> we'll reach a state where we'll be able to consider
"breaking" the
> compile-time regression test as important as breaking any test: i.e. the
> offending commit should be reverted unless it has been shown to
> significantly (hand wavy...) improve the runtime performance.
In order to have any kind of threshold, we'd have to monitor with some
accuracy the performance of both compiler and compiled code for the
main platforms. We do that to certain extent with the test-suite bots,
but that's very far from ideal.

So, I'd recommend we steer away from any kind of percentage or ratio
and keep at least the quadratic changes and beyond on special flags
(n.logn is ok for most cases).

> Since you raise the discussion now, I take the opportunity to push on the
> "more aggressive" side: I think the policy should be a balance
between the
> improvement the commit brings compared to the compile time slow down.
This is a fallacy.

Compile time often regress across all targets, while execution
improvements are focused on specific targets and can have negative
effects on those that were not benchmarked on. Overall, though,
compile time regressions dilute over the improvements, but not on a
commit per commit basis. That's what I meant by which hunt.

I think we should keep an eye on those changes, ask for numbers in
code review and even maybe do some benchmarking on our own before
accepting it. Also, we should not commit code that we know hurts
performance that badly, even if we believe people will replace them in
the future. It always takes too long. I myself have done that last
year, and I learnt my lesson.

Metrics are often more dangerous than helpful, as they tend to be used
as a substitute for thinking.

My tuppence.

--renato

Mehdi Amini via llvm-dev

2016-Mar-31 22:34 UTC

head link

[llvm-dev] RFC: Large, especially super-linear, compile time regressions are bugs.

Hi Renato,
> On Mar 31, 2016, at 2:46 PM, Renato Golin <renato.golin at
linaro.org> wrote:
> 
> On 31 March 2016 at 21:41, Mehdi Amini via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> TLDR: I totally support considering compile time regression as bug.
> 
> Me too.
> 
> I also agree that reverting fresh and reapplying is *much* easier than
> trying to revert late.
> 
> But I'd like to avoid dubious metrics.
I'm not sure about how "this commit regress the compile time by
2%" is a dubious metric.
The metric is not dubious IMO, it is what it is: a measurement. 
You just have to cast a good process around it to exploit this measurement in a
useful way for the project.
>> The closest I could find would be what Chandler wrote in:
>> http://reviews.llvm.org/D12826 ; for instance for O2 he stated that
"if an
>> optimization increases compile time by 5% or increases code size by 5%
for a
>> particular benchmark, that benchmark should also be one which sees a 5%
>> runtime improvement".
> 
> I think this is a bit limited and can lead to which hunts, especially
> wrt performance measurements.
> 
> Chandler's title is perfect though... Large can be vague, but
> "super-linear" is not. We used to have the concept that any large
> super-linear (quadratic+) compile time introductions had to be in O3
> or, for really bad cases, behind additional flags. I think we should
> keep that mindset.
> 
> 
>> My hope is that with better tooling for tracking compile time in the
future,
>> we'll reach a state where we'll be able to consider
"breaking" the
>> compile-time regression test as important as breaking any test: i.e.
the
>> offending commit should be reverted unless it has been shown to
>> significantly (hand wavy...) improve the runtime performance.
> 
> In order to have any kind of threshold, we'd have to monitor with some
> accuracy the performance of both compiler and compiled code for the
> main platforms. We do that to certain extent with the test-suite bots,
> but that's very far from ideal.
I agree. Did you read the part where I was mentioning that we're working in
the tooling part and that I was waiting for it to be done to start this thread?
> 
> So, I'd recommend we steer away from any kind of percentage or ratio
> and keep at least the quadratic changes and beyond on special flags
> (n.logn is ok for most cases).

How to do you suggest we address the long trail of 1-3% slow down that lead to
the current situation (cf the two links I posted in my previous email)?
Because there *is* a problem here, and I'd really like someone to come up
with a solution for that.

>> Since you raise the discussion now, I take the opportunity to push on
the
>> "more aggressive" side: I think the policy should be a
balance between the
>> improvement the commit brings compared to the compile time slow down.
> 
> This is a fallacy.
Not sure why or what you mean? The fact that an optimization improves only some
target does not invalidate the point.
> 
> Compile time often regress across all targets, while execution
> improvements are focused on specific targets and can have negative
> effects on those that were not benchmarked on.
Yeah, as usual in LLVM: if you care about something on your platform, setup a
bot and track trunk closely, otherwise you're less of a priority.
> Overall, though,
> compile time regressions dilute over the improvements, but not on a
> commit per commit basis. That's what I meant by which hunt.
There is no "witch hunt", at least that's not my objective.
I think everyone is pretty enthusiastic with every new perf improvement (I do),
but just like without bot in general (and policy) we would break them all the
time unintentionally.
I talking about chasing and tracking every single commit were a developer would
regress compile time *without even being aware*.
I'd personally love to have a bot or someone emailing me with compile time
regression I would introduce.
> 
> I think we should keep an eye on those changes, ask for numbers in
> code review and even maybe do some benchmarking on our own before
> accepting it. Also, we should not commit code that we know hurts
> performance that badly, even if we believe people will replace them in
> the future. It always takes too long. I myself have done that last
> year, and I learnt my lesson.
Agree. 
> Metrics are often more dangerous than helpful, as they tend to be used
> as a substitute for thinking.
I don't relate this sentence to anything concrete at stance here. 
I think this list is full of people that are very good at thinking and won't
substitute it :)

Best,

-- 
Mehdi

Sean Silva via llvm-dev

2016-Apr-01 02:14 UTC

head link

[llvm-dev] RFC: Large, especially super-linear, compile time regressions are bugs.

On Thu, Mar 31, 2016 at 2:46 PM, Renato Golin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 31 March 2016 at 21:41, Mehdi Amini via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > TLDR: I totally support considering compile time regression as bug.
>
> Me too.
>
> I also agree that reverting fresh and reapplying is *much* easier than
> trying to revert late.
>
> But I'd like to avoid dubious metrics.
>
>
> > The closest I could find would be what Chandler wrote in:
> > http://reviews.llvm.org/D12826 ; for instance for O2 he stated that
"if
> an
> > optimization increases compile time by 5% or increases code size by 5%
> for a
> > particular benchmark, that benchmark should also be one which sees a
5%
> > runtime improvement".
>
> I think this is a bit limited and can lead to which hunts, especially
> wrt performance measurements.
>
> Chandler's title is perfect though... Large can be vague, but
> "super-linear" is not. We used to have the concept that any large
> super-linear (quadratic+) compile time introductions had to be in O3
> or, for really bad cases, behind additional flags. I think we should
> keep that mindset.
>
>
> > My hope is that with better tooling for tracking compile time in the
> future,
> > we'll reach a state where we'll be able to consider
"breaking" the
> > compile-time regression test as important as breaking any test: i.e.
the
> > offending commit should be reverted unless it has been shown to
> > significantly (hand wavy...) improve the runtime performance.
>
> In order to have any kind of threshold, we'd have to monitor with some
> accuracy the performance of both compiler and compiled code for the
> main platforms. We do that to certain extent with the test-suite bots,
> but that's very far from ideal.
>
> So, I'd recommend we steer away from any kind of percentage or ratio
> and keep at least the quadratic changes and beyond on special flags
> (n.logn is ok for most cases).
>
>
> > Since you raise the discussion now, I take the opportunity to push on
the
> > "more aggressive" side: I think the policy should be a
balance between
> the
> > improvement the commit brings compared to the compile time slow down.
>
> This is a fallacy.
>
> Compile time often regress across all targets, while execution
> improvements are focused on specific targets and can have negative
> effects on those that were not benchmarked on. Overall, though,
> compile time regressions dilute over the improvements, but not on a
> commit per commit basis. That's what I meant by which hunt.
>
> I think we should keep an eye on those changes, ask for numbers in
> code review and even maybe do some benchmarking on our own before
> accepting it. Also, we should not commit code that we know hurts
> performance that badly, even if we believe people will replace them in
> the future. It always takes too long. I myself have done that last
> year, and I learnt my lesson.
>
> Metrics are often more dangerous than helpful, as they tend to be used
> as a substitute for thinking.
>
One of my favorite quotes:
"The results are definitely numbers, but do they have very much at all to
do with the problem?" - Forman Acton, "Numerical Methods that
(usually)
Work"

-- Sean Silva

>
> My tuppence.
>
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160331/777d1e57/attachment.html>

Seemingly Similar Threads

Search for more apparently analagous threads

llvm dev - Mar 2016 - RFC: Large, especially super-linear, compile time regressions are bugs.

[llvm-dev] RFC: Large, especially super-linear, compile time regressions are bugs.

[llvm-dev] RFC: Large, especially super-linear, compile time regressions are bugs.

[llvm-dev] RFC: Large, especially super-linear, compile time regressions are bugs.

[llvm-dev] RFC: Large, especially super-linear, compile time regressions are bugs.

Seemingly Similar Threads