thr3ads.net - llvm dev - [llvm-dev] llvm and clang are getting slower [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Sean Silva via llvm-dev

2016-Mar-08 21:09 UTC

[llvm-dev] llvm and clang are getting slower

On Tue, Mar 8, 2016 at 10:42 AM, Richard Smith via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Tue, Mar 8, 2016 at 8:13 AM, Rafael Espíndola
> <llvm-dev at lists.llvm.org> wrote:
> > I have just benchmarked building trunk llvm and clang in Debug,
> > Release and LTO modes (see the attached scrip for the cmake lines).
> >
> > The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all
> > cases I used the system libgcc and libstdc++.
> >
> > For release builds there is a monotonic increase in each version. From
> > 163 minutes with 3.5 to 212 minutes with trunk. For comparison, gcc
> > 5.3.2 takes 205 minutes.
> >
> > Debug and LTO show an improvement in 3.7, but have regressed again in
> 3.8.
>
> I'm curious how these times divide across Clang and various parts of
> LLVM; rerunning with -ftime-report and summing the numbers across all
> compiles could be interesting.
>
Based on the results I posted upthread about the relative time spend in the
backend for debug vs release, we can estimate this.
To summarize:
10% of time spent in LLVM for Debug
33% of time spent in LLVM for Release
(I'll abbreviate "in LLVM" as just "backend"; this is
"backend" from
clang's perspective)

Let's look at the difference between 3.5 and trunk.

For debug, the user time jumps from 174m50.251s to 197m9.932s.
That's {10490.3, 11829.9} seconds, respectively.
For release, the corresponding numbers are:
{9826.71, 12714.3} seconds.

debug35 = 10490.251
debugTrunk = 11829.932

debugTrunk/debug35 == 1.12771
debugRatio = 1.12771

release35 = 9826.705
releaseTrunk = 12714.288

releaseTrunk/release35 == 1.29385
releaseRatio = 1.29385

For simplicity, let's use a simple linear model for the distribution of
slowdown between the frontend and backend: a constant factor slowdown for
the backend, and an independent constant factor slowdown for the frontend.
This gives the following linear system:
debugRatio = .1 * backendRatio + (1 - .1) * frontendRatio
releaseRatio = .33 * backendRatio + (1 - .33) * frontendRatio

Solving this linear system we find that under this simple model, the
expected slowdown factors are:
backendRatio = 1.77783
frontendRatio = 1.05547

Intuitively, backendRatio comes out larger in this comparison because we
see the biggest slowdown during release (1.29 vs 1.12), and during release
we are spending a larger fraction of time in the backend (33% vs 10%).

Applying this same model to across Rafael's data, we find the following
(numbers have been rounded for clarity):

transition       backendRatio   frontendRatio
3.5->3.6         1.08           1.03
3.6->3.7         1.30           0.95
3.7->3.8         1.34           1.07
3.8->trunk       0.98           1.02

Note that in Rafael's measurements LTO is pretty similar to Release from a
CPU time (user time) standpoint. While the final LTO link takes a large
amount of real time, it is single threaded. Based on the real time numbers
the LTO link was only spending about 20 minutes single-threaded (i.e. about
20 minutes CPU time), which is pretty small compared to the 300-400 minutes
of total CPU time. It would be interesting to see the numbers for -O0 or
-O1 per-TU together with LTO.

-- Sean Silva

> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/7a77c758/attachment.html>

Sean Silva via llvm-dev

2016-Mar-08 21:10 UTC

head link

[llvm-dev] llvm and clang are getting slower

On Tue, Mar 8, 2016 at 1:09 PM, Sean Silva <chisophugis at gmail.com>
wrote:
>
>
> On Tue, Mar 8, 2016 at 10:42 AM, Richard Smith via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Tue, Mar 8, 2016 at 8:13 AM, Rafael Espíndola
>> <llvm-dev at lists.llvm.org> wrote:
>> > I have just benchmarked building trunk llvm and clang in Debug,
>> > Release and LTO modes (see the attached scrip for the cmake
lines).
>> >
>> > The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all
>> > cases I used the system libgcc and libstdc++.
>> >
>> > For release builds there is a monotonic increase in each version.
From
>> > 163 minutes with 3.5 to 212 minutes with trunk. For comparison,
gcc
>> > 5.3.2 takes 205 minutes.
>> >
>> > Debug and LTO show an improvement in 3.7, but have regressed again
in
>> 3.8.
>>
>> I'm curious how these times divide across Clang and various parts
of
>> LLVM; rerunning with -ftime-report and summing the numbers across all
>> compiles could be interesting.
>>
>
> Based on the results I posted upthread about the relative time spend in
> the backend for debug vs release, we can estimate this.
> To summarize:
>
That is, to summarize the post upthread that I'm referring to. The summary
of this post is that most of the slowdown seems to be in the backend.

-- Sean Silva

> 10% of time spent in LLVM for Debug
> 33% of time spent in LLVM for Release
> (I'll abbreviate "in LLVM" as just "backend"; this
is "backend" from
> clang's perspective)
>
> Let's look at the difference between 3.5 and trunk.
>
> For debug, the user time jumps from 174m50.251s to 197m9.932s.
> That's {10490.3, 11829.9} seconds, respectively.
> For release, the corresponding numbers are:
> {9826.71, 12714.3} seconds.
>
> debug35 = 10490.251
> debugTrunk = 11829.932
>
> debugTrunk/debug35 == 1.12771
> debugRatio = 1.12771
>
> release35 = 9826.705
> releaseTrunk = 12714.288
>
> releaseTrunk/release35 == 1.29385
> releaseRatio = 1.29385
>
> For simplicity, let's use a simple linear model for the distribution of
> slowdown between the frontend and backend: a constant factor slowdown for
> the backend, and an independent constant factor slowdown for the frontend.
> This gives the following linear system:
> debugRatio = .1 * backendRatio + (1 - .1) * frontendRatio
> releaseRatio = .33 * backendRatio + (1 - .33) * frontendRatio
>
> Solving this linear system we find that under this simple model, the
> expected slowdown factors are:
> backendRatio = 1.77783
> frontendRatio = 1.05547
>
> Intuitively, backendRatio comes out larger in this comparison because we
> see the biggest slowdown during release (1.29 vs 1.12), and during release
> we are spending a larger fraction of time in the backend (33% vs 10%).
>
> Applying this same model to across Rafael's data, we find the following
> (numbers have been rounded for clarity):
>
> transition       backendRatio   frontendRatio
> 3.5->3.6         1.08           1.03
> 3.6->3.7         1.30           0.95
> 3.7->3.8         1.34           1.07
> 3.8->trunk       0.98           1.02
>
> Note that in Rafael's measurements LTO is pretty similar to Release
from a
> CPU time (user time) standpoint. While the final LTO link takes a large
> amount of real time, it is single threaded. Based on the real time numbers
> the LTO link was only spending about 20 minutes single-threaded (i.e. about
> 20 minutes CPU time), which is pretty small compared to the 300-400 minutes
> of total CPU time. It would be interesting to see the numbers for -O0 or
> -O1 per-TU together with LTO.
>
> -- Sean Silva
>
>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/5d150eeb/attachment.html>

Mehdi Amini via llvm-dev

2016-Mar-08 22:25 UTC

head link

[llvm-dev] llvm and clang are getting slower

> On Mar 8, 2016, at 1:09 PM, Sean Silva via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> 
> On Tue, Mar 8, 2016 at 10:42 AM, Richard Smith via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> On Tue, Mar 8, 2016 at 8:13 AM, Rafael Espíndola
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> > I have just benchmarked building trunk llvm and clang in Debug,
> > Release and LTO modes (see the attached scrip for the cmake lines).
> >
> > The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all
> > cases I used the system libgcc and libstdc++.
> >
> > For release builds there is a monotonic increase in each version. From
> > 163 minutes with 3.5 to 212 minutes with trunk. For comparison, gcc
> > 5.3.2 takes 205 minutes.
> >
> > Debug and LTO show an improvement in 3.7, but have regressed again in
3.8.
> 
> I'm curious how these times divide across Clang and various parts of
> LLVM; rerunning with -ftime-report and summing the numbers across all
> compiles could be interesting.
> 
> Based on the results I posted upthread about the relative time spend in the
backend for debug vs release, we can estimate this.
> To summarize:
> 10% of time spent in LLVM for Debug
> 33% of time spent in LLVM for Release
> (I'll abbreviate "in LLVM" as just "backend"; this
is "backend" from clang's perspective)
> 
> Let's look at the difference between 3.5 and trunk.
> 
> For debug, the user time jumps from 174m50.251s to 197m9.932s.
> That's {10490.3, 11829.9} seconds, respectively.
> For release, the corresponding numbers are:
> {9826.71, 12714.3} seconds.
> 
> debug35 = 10490.251
> debugTrunk = 11829.932
> 
> debugTrunk/debug35 == 1.12771
> debugRatio = 1.12771
> 
> release35 = 9826.705
> releaseTrunk = 12714.288
> 
> releaseTrunk/release35 == 1.29385
> releaseRatio = 1.29385
> 
> For simplicity, let's use a simple linear model for the distribution of
slowdown between the frontend and backend: a constant factor slowdown for the
backend, and an independent constant factor slowdown for the frontend. This
gives the following linear system:
> debugRatio = .1 * backendRatio + (1 - .1) * frontendRatio
> releaseRatio = .33 * backendRatio + (1 - .33) * frontendRatio
> 
> Solving this linear system we find that under this simple model, the
expected slowdown factors are:
> backendRatio = 1.77783
> frontendRatio = 1.05547
> 
> Intuitively, backendRatio comes out larger in this comparison because we
see the biggest slowdown during release (1.29 vs 1.12), and during release we
are spending a larger fraction of time in the backend (33% vs 10%).
> 
> Applying this same model to across Rafael's data, we find the following
(numbers have been rounded for clarity):
> 
> transition       backendRatio   frontendRatio
> 3.5->3.6         1.08           1.03
> 3.6->3.7         1.30           0.95
> 3.7->3.8         1.34           1.07
> 3.8->trunk       0.98           1.02                
> 
> Note that in Rafael's measurements LTO is pretty similar to Release
from a CPU time (user time) standpoint. While the final LTO link takes a large
amount of real time, it is single threaded. Based on the real time numbers the
LTO link was only spending about 20 minutes single-threaded (i.e. about 20
minutes CPU time), which is pretty small compared to the 300-400 minutes of
total CPU time. It would be interesting to see the numbers for -O0 or -O1 per-TU
together with LTO.

Just a note about LTO being sequential: Rafael mentioned he was "building
trunk llvm and clang". By default I believe it is ~56 link targets that can
be run in parallel (provided you have enough RAM to avoid swapping).

-- 
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/04b96646/attachment.html>

Rafael Espíndola via llvm-dev

2016-Mar-08 22:52 UTC

head link

[llvm-dev] llvm and clang are getting slower

> Just a note about LTO being sequential: Rafael mentioned he was
"building
> trunk llvm and clang". By default I believe it is ~56 link targets
that can
> be run in parallel (provided you have enough RAM to avoid swapping).
Correct. The machine has no swap :-)

But some targets (clang) are much larger and I have the impression
that the last minute or so of the build is just finishing that one
link.

Cheers,
Rafael

Sean Silva via llvm-dev

2016-Mar-09 01:47 UTC

head link

[llvm-dev] llvm and clang are getting slower

On Tue, Mar 8, 2016 at 2:25 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
> On Mar 8, 2016, at 1:09 PM, Sean Silva via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
> On Tue, Mar 8, 2016 at 10:42 AM, Richard Smith via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Tue, Mar 8, 2016 at 8:13 AM, Rafael Espíndola
>> <llvm-dev at lists.llvm.org> wrote:
>> > I have just benchmarked building trunk llvm and clang in Debug,
>> > Release and LTO modes (see the attached scrip for the cmake
lines).
>> >
>> > The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all
>> > cases I used the system libgcc and libstdc++.
>> >
>> > For release builds there is a monotonic increase in each version.
From
>> > 163 minutes with 3.5 to 212 minutes with trunk. For comparison,
gcc
>> > 5.3.2 takes 205 minutes.
>> >
>> > Debug and LTO show an improvement in 3.7, but have regressed again
in
>> 3.8.
>>
>> I'm curious how these times divide across Clang and various parts
of
>> LLVM; rerunning with -ftime-report and summing the numbers across all
>> compiles could be interesting.
>>
>
> Based on the results I posted upthread about the relative time spend in
> the backend for debug vs release, we can estimate this.
> To summarize:
> 10% of time spent in LLVM for Debug
> 33% of time spent in LLVM for Release
> (I'll abbreviate "in LLVM" as just "backend"; this
is "backend" from
> clang's perspective)
>
> Let's look at the difference between 3.5 and trunk.
>
> For debug, the user time jumps from 174m50.251s to 197m9.932s.
> That's {10490.3, 11829.9} seconds, respectively.
> For release, the corresponding numbers are:
> {9826.71, 12714.3} seconds.
>
> debug35 = 10490.251
> debugTrunk = 11829.932
>
> debugTrunk/debug35 == 1.12771
> debugRatio = 1.12771
>
> release35 = 9826.705
> releaseTrunk = 12714.288
>
> releaseTrunk/release35 == 1.29385
> releaseRatio = 1.29385
>
> For simplicity, let's use a simple linear model for the distribution of
> slowdown between the frontend and backend: a constant factor slowdown for
> the backend, and an independent constant factor slowdown for the frontend.
> This gives the following linear system:
> debugRatio = .1 * backendRatio + (1 - .1) * frontendRatio
> releaseRatio = .33 * backendRatio + (1 - .33) * frontendRatio
>
> Solving this linear system we find that under this simple model, the
> expected slowdown factors are:
> backendRatio = 1.77783
> frontendRatio = 1.05547
>
> Intuitively, backendRatio comes out larger in this comparison because we
> see the biggest slowdown during release (1.29 vs 1.12), and during release
> we are spending a larger fraction of time in the backend (33% vs 10%).
>
> Applying this same model to across Rafael's data, we find the following
> (numbers have been rounded for clarity):
>
> transition       backendRatio   frontendRatio
> 3.5->3.6         1.08           1.03
> 3.6->3.7         1.30           0.95
> 3.7->3.8         1.34           1.07
> 3.8->trunk       0.98           1.02
>
> Note that in Rafael's measurements LTO is pretty similar to Release
from a
> CPU time (user time) standpoint. While the final LTO link takes a large
> amount of real time, it is single threaded. Based on the real time numbers
> the LTO link was only spending about 20 minutes single-threaded (i.e. about
> 20 minutes CPU time), which is pretty small compared to the 300-400 minutes
> of total CPU time. It would be interesting to see the numbers for -O0 or
> -O1 per-TU together with LTO.
>
>
>
> Just a note about LTO being sequential: Rafael mentioned he was
"building
> trunk llvm and clang". By default I believe it is ~56 link targets
that can
> be run in parallel (provided you have enough RAM to avoid swapping).
>
D'oh! I was looking at the data wrong since I broke my Fundamental Rule of
Looking At Data, namely: don't look at raw numbers in a table since you are
likely to look at things wrong or form biases based on the order in which
you look at the data points; *always* visualize. There is a significant
difference between release and LTO. About 2x consistently.

[image: Inline image 3]

This is actually curious because during the release build, we were spending
33% of CPU time in the backend (as clang sees it; i.e. mid-level optimizer
and codegen). This data is inconsistent with LTO simply being another run
through the backend (which would be just +33% CPU time at worst). There
seems to be something nonlinear happening.
To make it worse, the LTO build has approximately a full Release
optimization running per-TU, so the actual LTO step should be seeing
inlined/"cleaned up" IR which should be much smaller than what the
per-TU
optimizer is seeing, so naively it should take *even less* than "another
33% CPU time" chunk.
Yet we see 1.5x-2x difference:

[image: Inline image 4]

-- Sean Silva

>
> --
> Mehdi
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/4e8d18de/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2016-03-08 at 5.45.54 PM.png
Type: image/png
Size: 39766 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/4e8d18de/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2016-03-08 at 5.29.21 PM.png
Type: image/png
Size: 36008 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/4e8d18de/attachment-0003.png>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Mar 2016 - llvm and clang are getting slower

[llvm-dev] llvm and clang are getting slower

[llvm-dev] llvm and clang are getting slower

[llvm-dev] llvm and clang are getting slower

[llvm-dev] llvm and clang are getting slower

[llvm-dev] llvm and clang are getting slower

Possibly Parallel Threads