thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] llvm and clang are getting slower [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2016-Mar-08 17:55 UTC

[llvm-dev] [cfe-dev] llvm and clang are getting slower

----- Original Message -----> From: "Mehdi Amini via cfe-dev" <cfe-dev at lists.llvm.org>
> To: "Rafael Espíndola" <rafael.espindola at gmail.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>,
"cfe-dev" <cfe-dev at lists.llvm.org>
> Sent: Tuesday, March 8, 2016 11:40:47 AM
> Subject: Re: [cfe-dev] [llvm-dev] llvm and clang are getting slower
> 
> Hi Rafael,
> 
> CC: cfe-dev
> 
> Thanks for sharing. We also noticed this internally, and I know that
> Bruno and Chris are working on some infrastructure and tooling to
> help tracking closely compile time regressions.
> 
> We had this conversation internally about the tradeoff between
> compile-time and runtime performance, and I planned to bring-up the
> topic on the list in the coming months, this looks like a good
> occasion to plant the seed. Apparently in the past (years/decade
> ago?) the project was very conservative on adding any optimizations
> that would impact compile time, however there is no explicit policy
> (that I know of) to address this tradeoff.
> The closest I could find would be what Chandler wrote in:
> http://reviews.llvm.org/D12826 ; for instance for O2 he stated that
> "if an optimization increases compile time by 5% or increases code
> size by 5% for a particular benchmark, that benchmark should also be
> one which sees a 5% runtime improvement".
> 
> My hope is that with better tooling for tracking compile time in the
> future, we'll reach a state where we'll be able to consider
> "breaking" the compile-time regression test as important as
breaking
> any test: i.e. the offending commit should be reverted unless it has
> been shown to significantly (hand wavy...) improve the runtime
> performance.
> 
> <troll>
> With the current trend, the Polly developers don't have to worry
> about improving their compile time, we'll catch up with them ;)
> </troll>
My two largest pet peeves in this area are:

 1. We often use functions from ValueTracking (to get known bits, the number of
sign bits, etc.) as through they're low cost. They're not really low
cost. The problem is that they *should* be. These functions do bottom-up walks,
and could cache their results. Instead, they do a limited walk and recompute
everything each time. This is expensive, and a significant amount of our
InstCombine time goes to ValueTracking, and that shouldn't be the case. The
more we add to InstCombine (and related passes), and the more we run
InstCombine, the worse this gets. On the other hand, fixing this will help both
compile time and code quality.

  Furthermore, BasicAA has the same problem.

 2. We have "cleanup" passes in the pipeline, such as those that run
after loop unrolling and/or vectorization, that run regardless of whether the
preceding pass actually did anything. We've been adding more of these, and
they catch important use cases, but we need a better infrastructure for this
(either with the new pass manager or otherwise).

Also, I'm very hopeful that as our new MemorySSA and GVN improvements
materialize, we'll see large compile-time improvements from that work. We
spend a huge amount of time in GVN computing memory-dependency information (the
dwarfs the time spent by GVN doing actual value numbering work by an order of
magnitude or more).

 -Hal
> 
> --
> Mehdi
> 
> 
> 
> 
> 
> 
> > On Mar 8, 2016, at 8:13 AM, Rafael Espíndola via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> > 
> > I have just benchmarked building trunk llvm and clang in Debug,
> > Release and LTO modes (see the attached scrip for the cmake lines).
> > 
> > The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all
> > cases I used the system libgcc and libstdc++.
> > 
> > For release builds there is a monotonic increase in each version.
> > From
> > 163 minutes with 3.5 to 212 minutes with trunk. For comparison, gcc
> > 5.3.2 takes 205 minutes.
> > 
> > Debug and LTO show an improvement in 3.7, but have regressed again
> > in 3.8.
> > 
> > Cheers,
> > Rafael
> >
<run.sh><LTO.time><Debug.time><Release.time>_______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Adam Nemet via llvm-dev

2016-Mar-08 18:22 UTC

head link

[llvm-dev] [cfe-dev] llvm and clang are getting slower

> On Mar 8, 2016, at 9:55 AM, Hal Finkel via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> ----- Original Message -----
>> From: "Mehdi Amini via cfe-dev" <cfe-dev at
lists.llvm.org>
>> To: "Rafael Espíndola" <rafael.espindola at gmail.com>
>> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>,
"cfe-dev" <cfe-dev at lists.llvm.org>
>> Sent: Tuesday, March 8, 2016 11:40:47 AM
>> Subject: Re: [cfe-dev] [llvm-dev] llvm and clang are getting slower
>> 
>> Hi Rafael,
>> 
>> CC: cfe-dev
>> 
>> Thanks for sharing. We also noticed this internally, and I know that
>> Bruno and Chris are working on some infrastructure and tooling to
>> help tracking closely compile time regressions.
>> 
>> We had this conversation internally about the tradeoff between
>> compile-time and runtime performance, and I planned to bring-up the
>> topic on the list in the coming months, this looks like a good
>> occasion to plant the seed. Apparently in the past (years/decade
>> ago?) the project was very conservative on adding any optimizations
>> that would impact compile time, however there is no explicit policy
>> (that I know of) to address this tradeoff.
>> The closest I could find would be what Chandler wrote in:
>> http://reviews.llvm.org/D12826 ; for instance for O2 he stated that
>> "if an optimization increases compile time by 5% or increases code
>> size by 5% for a particular benchmark, that benchmark should also be
>> one which sees a 5% runtime improvement".
>> 
>> My hope is that with better tooling for tracking compile time in the
>> future, we'll reach a state where we'll be able to consider
>> "breaking" the compile-time regression test as important as
breaking
>> any test: i.e. the offending commit should be reverted unless it has
>> been shown to significantly (hand wavy...) improve the runtime
>> performance.
>> 
>> <troll>
>> With the current trend, the Polly developers don't have to worry
>> about improving their compile time, we'll catch up with them ;)
>> </troll>
> 
> My two largest pet peeves in this area are:
> 
> 1. We often use functions from ValueTracking (to get known bits, the number
of sign bits, etc.) as through they're low cost. They're not really low
cost. The problem is that they *should* be. These functions do bottom-up walks,
and could cache their results. Instead, they do a limited walk and recompute
everything each time. This is expensive, and a significant amount of our
InstCombine time goes to ValueTracking, and that shouldn't be the case. The
more we add to InstCombine (and related passes), and the more we run
InstCombine, the worse this gets. On the other hand, fixing this will help both
compile time and code quality.
> 
>  Furthermore, BasicAA has the same problem.
> 
> 2. We have "cleanup" passes in the pipeline, such as those that
run after loop unrolling and/or vectorization, that run regardless of whether
the preceding pass actually did anything. We've been adding more of these,
and they catch important use cases, but we need a better infrastructure for this
(either with the new pass manager or otherwise).
A related issue is that if an analysis is not preserved by a pass, it gets
invalidated *even if* the pass doesn’t end up modifying the code.  Because of
this for example we invalidate SCEV’s cache unnecessarily.   The new pass
manager should fix this.

Adam
> 
> Also, I'm very hopeful that as our new MemorySSA and GVN improvements
materialize, we'll see large compile-time improvements from that work. We
spend a huge amount of time in GVN computing memory-dependency information (the
dwarfs the time spent by GVN doing actual value numbering work by an order of
magnitude or more).
> 
> -Hal
> 
>> 
>> --
>> Mehdi
>> 
>> 
>> 
>> 
>> 
>> 
>>> On Mar 8, 2016, at 8:13 AM, Rafael Espíndola via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> I have just benchmarked building trunk llvm and clang in Debug,
>>> Release and LTO modes (see the attached scrip for the cmake lines).
>>> 
>>> The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all
>>> cases I used the system libgcc and libstdc++.
>>> 
>>> For release builds there is a monotonic increase in each version.
>>> From
>>> 163 minutes with 3.5 to 212 minutes with trunk. For comparison, gcc
>>> 5.3.2 takes 205 minutes.
>>> 
>>> Debug and LTO show an improvement in 3.7, but have regressed again
>>> in 3.8.
>>> 
>>> Cheers,
>>> Rafael
>>>
<run.sh><LTO.time><Debug.time><Release.time>_______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>> 
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/8cde263c/attachment.html>

Daniel Berlin via llvm-dev

2016-Mar-08 18:22 UTC

head link

[llvm-dev] [cfe-dev] llvm and clang are getting slower

On Tue, Mar 8, 2016 at 9:55 AM, Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> ----- Original Message -----
> > From: "Mehdi Amini via cfe-dev" <cfe-dev at
lists.llvm.org>
> > To: "Rafael Espíndola" <rafael.espindola at gmail.com>
> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>,
"cfe-dev" <
> cfe-dev at lists.llvm.org>
> > Sent: Tuesday, March 8, 2016 11:40:47 AM
> > Subject: Re: [cfe-dev] [llvm-dev] llvm and clang are getting slower
> >
> > Hi Rafael,
> >
> > CC: cfe-dev
> >
> > Thanks for sharing. We also noticed this internally, and I know that
> > Bruno and Chris are working on some infrastructure and tooling to
> > help tracking closely compile time regressions.
> >
> > We had this conversation internally about the tradeoff between
> > compile-time and runtime performance, and I planned to bring-up the
> > topic on the list in the coming months, this looks like a good
> > occasion to plant the seed. Apparently in the past (years/decade
> > ago?) the project was very conservative on adding any optimizations
> > that would impact compile time, however there is no explicit policy
> > (that I know of) to address this tradeoff.
> > The closest I could find would be what Chandler wrote in:
> > http://reviews.llvm.org/D12826 ; for instance for O2 he stated that
> > "if an optimization increases compile time by 5% or increases
code
> > size by 5% for a particular benchmark, that benchmark should also be
> > one which sees a 5% runtime improvement".
> >
> > My hope is that with better tooling for tracking compile time in the
> > future, we'll reach a state where we'll be able to consider
> > "breaking" the compile-time regression test as important as
breaking
> > any test: i.e. the offending commit should be reverted unless it has
> > been shown to significantly (hand wavy...) improve the runtime
> > performance.
> >
> > <troll>
> > With the current trend, the Polly developers don't have to worry
> > about improving their compile time, we'll catch up with them ;)
> > </troll>
>
> My two largest pet peeves in this area are:
>
I think you hit on something that i would expand on:

We don't hold the line very well on adding little things to passes and
analysis over time.
We add 1000 little walkers and pattern matchers to try to get better code,
and then often add knobs to try to control their overall compile time.
At some point, these all add up. You end up with the same flat profile if
you do this everywhere, but your compiler gets slower.
At some point, someone has to stop and say "well, wait a minute, are there
better algorithms or architecture we should be using to do this", and
either do it, or not let it get worse :) I'd suggest, in most cases, we
know better ways to do almost all of these things.

Don't get me wrong, i don't believe there is any theoretically pure way
to
do everything that we can just implement and never have to tweak.  But it's
a continuum, and at some point you have to stop and re-evaluate whether the
current approach is really the right one if you have to have a billion
little things to it get what you want.
We often don't do that.
We go *very* far down the path of a billion tweaks and adding knobs, and
what we have now, compile time wise, is what you get when you do that :)
I suspect this is because we don't really want to try to force work on
people who are just trying to get crap done.  We're all good contributors
trying to do the right thing, and saying no often seems obstructionist, etc.
The problem is at some point you end up with the tragedy of the commons.

(also, not everything in the compiler has to catch every case to get good
code)

>  1. We often use functions from ValueTracking (to get known bits, the
> number of sign bits, etc.) as through they're low cost. They're not
really
> low cost. The problem is that they *should* be. These functions do
> bottom-up walks, and could cache their results. Instead, they do a limited
> walk and recompute everything each time. This is expensive, and a
> significant amount of our InstCombine time goes to ValueTracking, and that
> shouldn't be the case. The more we add to InstCombine (and related
passes),
> and the more we run InstCombine, the worse this gets. On the other hand,
> fixing this will help both compile time and code quality.
>
(LVI is another great example. Fun fact: If you ask for value info for
everything, it's no longer lazy ....)
>
>   Furthermore, BasicAA has the same problem.
>
>  2. We have "cleanup" passes in the pipeline, such as those that
run after
> loop unrolling and/or vectorization, that run regardless of whether the
> preceding pass actually did anything. We've been adding more of these,
and
> they catch important use cases, but we need a better infrastructure for
> this (either with the new pass manager or otherwise).
>
> Also, I'm very hopeful that as our new MemorySSA and GVN improvements
> materialize, we'll see large compile-time improvements from that work.
We
> spend a huge amount of time in GVN computing memory-dependency information
> (the dwarfs the time spent by GVN doing actual value numbering work by an
> order of magnitude or more).
>
I'm a working on it ;)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/7d4c2729/attachment-0001.html>

Xinliang David Li via llvm-dev

2016-Mar-08 18:49 UTC

head link

[llvm-dev] [cfe-dev] llvm and clang are getting slower

On Tue, Mar 8, 2016 at 10:22 AM, Daniel Berlin via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
>
>
> On Tue, Mar 8, 2016 at 9:55 AM, Hal Finkel via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> ----- Original Message -----
>> > From: "Mehdi Amini via cfe-dev" <cfe-dev at
lists.llvm.org>
>> > To: "Rafael Espíndola" <rafael.espindola at
gmail.com>
>> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>,
"cfe-dev" <
>> cfe-dev at lists.llvm.org>
>> > Sent: Tuesday, March 8, 2016 11:40:47 AM
>> > Subject: Re: [cfe-dev] [llvm-dev] llvm and clang are getting
slower
>> >
>> > Hi Rafael,
>> >
>> > CC: cfe-dev
>> >
>> > Thanks for sharing. We also noticed this internally, and I know
that
>> > Bruno and Chris are working on some infrastructure and tooling to
>> > help tracking closely compile time regressions.
>> >
>> > We had this conversation internally about the tradeoff between
>> > compile-time and runtime performance, and I planned to bring-up
the
>> > topic on the list in the coming months, this looks like a good
>> > occasion to plant the seed. Apparently in the past (years/decade
>> > ago?) the project was very conservative on adding any
optimizations
>> > that would impact compile time, however there is no explicit
policy
>> > (that I know of) to address this tradeoff.
>> > The closest I could find would be what Chandler wrote in:
>> > http://reviews.llvm.org/D12826 ; for instance for O2 he stated
that
>> > "if an optimization increases compile time by 5% or increases
code
>> > size by 5% for a particular benchmark, that benchmark should also
be
>> > one which sees a 5% runtime improvement".
>> >
>> > My hope is that with better tooling for tracking compile time in
the
>> > future, we'll reach a state where we'll be able to
consider
>> > "breaking" the compile-time regression test as important
as breaking
>> > any test: i.e. the offending commit should be reverted unless it
has
>> > been shown to significantly (hand wavy...) improve the runtime
>> > performance.
>> >
>> > <troll>
>> > With the current trend, the Polly developers don't have to
worry
>> > about improving their compile time, we'll catch up with them
;)
>> > </troll>
>>
>> My two largest pet peeves in this area are:
>>
>
> I think you hit on something that i would expand on:
>
> We don't hold the line very well on adding little things to passes and
> analysis over time.
> We add 1000 little walkers and pattern matchers to try to get better code,
> and then often add knobs to try to control their overall compile time.
> At some point, these all add up. You end up with the same flat profile if
> you do this everywhere, but your compiler gets slower.
> At some point, someone has to stop and say "well, wait a minute, are
there
> better algorithms or architecture we should be using to do this", and
> either do it, or not let it get worse :) I'd suggest, in most cases, we
> know better ways to do almost all of these things.
>
> Don't get me wrong, i don't believe there is any theoretically pure
way to
> do everything that we can just implement and never have to tweak.  But
it's
> a continuum, and at some point you have to stop and re-evaluate whether the
> current approach is really the right one if you have to have a billion
> little things to it get what you want.
> We often don't do that.
> We go *very* far down the path of a billion tweaks and adding knobs, and
> what we have now, compile time wise, is what you get when you do that :)
> I suspect this is because we don't really want to try to force work on
> people who are just trying to get crap done.  We're all good
contributors
> trying to do the right thing, and saying no often seems obstructionist,
etc.
> The problem is at some point you end up with the tragedy of the commons.
>
> (also, not everything in the compiler has to catch every case to get good
> code)
>
>
>>  1. We often use functions from ValueTracking (to get known bits, the
>> number of sign bits, etc.) as through they're low cost. They're
not really
>> low cost. The problem is that they *should* be. These functions do
>> bottom-up walks, and could cache their results. Instead, they do a
limited
>> walk and recompute everything each time. This is expensive, and a
>> significant amount of our InstCombine time goes to ValueTracking, and
that
>> shouldn't be the case. The more we add to InstCombine (and related
passes),
>> and the more we run InstCombine, the worse this gets. On the other
hand,
>> fixing this will help both compile time and code quality.
>>
>
> (LVI is another great example. Fun fact: If you ask for value info for
> everything, it's no longer lazy ....)
>

Yep -- see the bug Wei is working on:
https://llvm.org/bugs/show_bug.cgi?id=10584

David

>
>>   Furthermore, BasicAA has the same problem.
>>
>>  2. We have "cleanup" passes in the pipeline, such as those
that run
>> after loop unrolling and/or vectorization, that run regardless of
whether
>> the preceding pass actually did anything. We've been adding more of
these,
>> and they catch important use cases, but we need a better infrastructure
for
>> this (either with the new pass manager or otherwise).
>>
>> Also, I'm very hopeful that as our new MemorySSA and GVN
improvements
>> materialize, we'll see large compile-time improvements from that
work. We
>> spend a huge amount of time in GVN computing memory-dependency
information
>> (the dwarfs the time spent by GVN doing actual value numbering work by
an
>> order of magnitude or more).
>>
>
> I'm a working on it ;)
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/165399c2/attachment.html>

Renato Golin via llvm-dev

2016-Mar-09 01:15 UTC

head link

[llvm-dev] [cfe-dev] llvm and clang are getting slower

On 9 Mar 2016 1:22 a.m., "Adam Nemet via cfe-dev" <cfe-dev at
lists.llvm.org>
wrote:> A related issue is that if an analysis is not preserved by a pass, itgets invalidated *even if* the pass doesn’t end up modifying the code.
Because of this for example we invalidate SCEV’s cache unnecessarily.   The
new pass manager should fix this.

+1
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160309/51f55248/attachment.html>

llvm dev - Mar 2016 - [cfe-dev] llvm and clang are getting slower

[llvm-dev] [cfe-dev] llvm and clang are getting slower

[llvm-dev] [cfe-dev] llvm and clang are getting slower

[llvm-dev] [cfe-dev] llvm and clang are getting slower

[llvm-dev] [cfe-dev] llvm and clang are getting slower

[llvm-dev] [cfe-dev] llvm and clang are getting slower