Sean Silva via llvm-dev
2016-Mar-08 21:09 UTC
[llvm-dev] llvm and clang are getting slower
On Tue, Mar 8, 2016 at 10:42 AM, Richard Smith via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Tue, Mar 8, 2016 at 8:13 AM, Rafael Espíndola > <llvm-dev at lists.llvm.org> wrote: > > I have just benchmarked building trunk llvm and clang in Debug, > > Release and LTO modes (see the attached scrip for the cmake lines). > > > > The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all > > cases I used the system libgcc and libstdc++. > > > > For release builds there is a monotonic increase in each version. From > > 163 minutes with 3.5 to 212 minutes with trunk. For comparison, gcc > > 5.3.2 takes 205 minutes. > > > > Debug and LTO show an improvement in 3.7, but have regressed again in > 3.8. > > I'm curious how these times divide across Clang and various parts of > LLVM; rerunning with -ftime-report and summing the numbers across all > compiles could be interesting. >Based on the results I posted upthread about the relative time spend in the backend for debug vs release, we can estimate this. To summarize: 10% of time spent in LLVM for Debug 33% of time spent in LLVM for Release (I'll abbreviate "in LLVM" as just "backend"; this is "backend" from clang's perspective) Let's look at the difference between 3.5 and trunk. For debug, the user time jumps from 174m50.251s to 197m9.932s. That's {10490.3, 11829.9} seconds, respectively. For release, the corresponding numbers are: {9826.71, 12714.3} seconds. debug35 = 10490.251 debugTrunk = 11829.932 debugTrunk/debug35 == 1.12771 debugRatio = 1.12771 release35 = 9826.705 releaseTrunk = 12714.288 releaseTrunk/release35 == 1.29385 releaseRatio = 1.29385 For simplicity, let's use a simple linear model for the distribution of slowdown between the frontend and backend: a constant factor slowdown for the backend, and an independent constant factor slowdown for the frontend. This gives the following linear system: debugRatio = .1 * backendRatio + (1 - .1) * frontendRatio releaseRatio = .33 * backendRatio + (1 - .33) * frontendRatio Solving this linear system we find that under this simple model, the expected slowdown factors are: backendRatio = 1.77783 frontendRatio = 1.05547 Intuitively, backendRatio comes out larger in this comparison because we see the biggest slowdown during release (1.29 vs 1.12), and during release we are spending a larger fraction of time in the backend (33% vs 10%). Applying this same model to across Rafael's data, we find the following (numbers have been rounded for clarity): transition backendRatio frontendRatio 3.5->3.6 1.08 1.03 3.6->3.7 1.30 0.95 3.7->3.8 1.34 1.07 3.8->trunk 0.98 1.02 Note that in Rafael's measurements LTO is pretty similar to Release from a CPU time (user time) standpoint. While the final LTO link takes a large amount of real time, it is single threaded. Based on the real time numbers the LTO link was only spending about 20 minutes single-threaded (i.e. about 20 minutes CPU time), which is pretty small compared to the 300-400 minutes of total CPU time. It would be interesting to see the numbers for -O0 or -O1 per-TU together with LTO. -- Sean Silva> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/7a77c758/attachment.html>
Sean Silva via llvm-dev
2016-Mar-08 21:10 UTC
[llvm-dev] llvm and clang are getting slower
On Tue, Mar 8, 2016 at 1:09 PM, Sean Silva <chisophugis at gmail.com> wrote:> > > On Tue, Mar 8, 2016 at 10:42 AM, Richard Smith via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> On Tue, Mar 8, 2016 at 8:13 AM, Rafael Espíndola >> <llvm-dev at lists.llvm.org> wrote: >> > I have just benchmarked building trunk llvm and clang in Debug, >> > Release and LTO modes (see the attached scrip for the cmake lines). >> > >> > The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all >> > cases I used the system libgcc and libstdc++. >> > >> > For release builds there is a monotonic increase in each version. From >> > 163 minutes with 3.5 to 212 minutes with trunk. For comparison, gcc >> > 5.3.2 takes 205 minutes. >> > >> > Debug and LTO show an improvement in 3.7, but have regressed again in >> 3.8. >> >> I'm curious how these times divide across Clang and various parts of >> LLVM; rerunning with -ftime-report and summing the numbers across all >> compiles could be interesting. >> > > Based on the results I posted upthread about the relative time spend in > the backend for debug vs release, we can estimate this. > To summarize: >That is, to summarize the post upthread that I'm referring to. The summary of this post is that most of the slowdown seems to be in the backend. -- Sean Silva> 10% of time spent in LLVM for Debug > 33% of time spent in LLVM for Release > (I'll abbreviate "in LLVM" as just "backend"; this is "backend" from > clang's perspective) > > Let's look at the difference between 3.5 and trunk. > > For debug, the user time jumps from 174m50.251s to 197m9.932s. > That's {10490.3, 11829.9} seconds, respectively. > For release, the corresponding numbers are: > {9826.71, 12714.3} seconds. > > debug35 = 10490.251 > debugTrunk = 11829.932 > > debugTrunk/debug35 == 1.12771 > debugRatio = 1.12771 > > release35 = 9826.705 > releaseTrunk = 12714.288 > > releaseTrunk/release35 == 1.29385 > releaseRatio = 1.29385 > > For simplicity, let's use a simple linear model for the distribution of > slowdown between the frontend and backend: a constant factor slowdown for > the backend, and an independent constant factor slowdown for the frontend. > This gives the following linear system: > debugRatio = .1 * backendRatio + (1 - .1) * frontendRatio > releaseRatio = .33 * backendRatio + (1 - .33) * frontendRatio > > Solving this linear system we find that under this simple model, the > expected slowdown factors are: > backendRatio = 1.77783 > frontendRatio = 1.05547 > > Intuitively, backendRatio comes out larger in this comparison because we > see the biggest slowdown during release (1.29 vs 1.12), and during release > we are spending a larger fraction of time in the backend (33% vs 10%). > > Applying this same model to across Rafael's data, we find the following > (numbers have been rounded for clarity): > > transition backendRatio frontendRatio > 3.5->3.6 1.08 1.03 > 3.6->3.7 1.30 0.95 > 3.7->3.8 1.34 1.07 > 3.8->trunk 0.98 1.02 > > Note that in Rafael's measurements LTO is pretty similar to Release from a > CPU time (user time) standpoint. While the final LTO link takes a large > amount of real time, it is single threaded. Based on the real time numbers > the LTO link was only spending about 20 minutes single-threaded (i.e. about > 20 minutes CPU time), which is pretty small compared to the 300-400 minutes > of total CPU time. It would be interesting to see the numbers for -O0 or > -O1 per-TU together with LTO. > > -- Sean Silva > > >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/5d150eeb/attachment.html>
Mehdi Amini via llvm-dev
2016-Mar-08 22:25 UTC
[llvm-dev] llvm and clang are getting slower
> On Mar 8, 2016, at 1:09 PM, Sean Silva via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > On Tue, Mar 8, 2016 at 10:42 AM, Richard Smith via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > On Tue, Mar 8, 2016 at 8:13 AM, Rafael Espíndola > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > I have just benchmarked building trunk llvm and clang in Debug, > > Release and LTO modes (see the attached scrip for the cmake lines). > > > > The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all > > cases I used the system libgcc and libstdc++. > > > > For release builds there is a monotonic increase in each version. From > > 163 minutes with 3.5 to 212 minutes with trunk. For comparison, gcc > > 5.3.2 takes 205 minutes. > > > > Debug and LTO show an improvement in 3.7, but have regressed again in 3.8. > > I'm curious how these times divide across Clang and various parts of > LLVM; rerunning with -ftime-report and summing the numbers across all > compiles could be interesting. > > Based on the results I posted upthread about the relative time spend in the backend for debug vs release, we can estimate this. > To summarize: > 10% of time spent in LLVM for Debug > 33% of time spent in LLVM for Release > (I'll abbreviate "in LLVM" as just "backend"; this is "backend" from clang's perspective) > > Let's look at the difference between 3.5 and trunk. > > For debug, the user time jumps from 174m50.251s to 197m9.932s. > That's {10490.3, 11829.9} seconds, respectively. > For release, the corresponding numbers are: > {9826.71, 12714.3} seconds. > > debug35 = 10490.251 > debugTrunk = 11829.932 > > debugTrunk/debug35 == 1.12771 > debugRatio = 1.12771 > > release35 = 9826.705 > releaseTrunk = 12714.288 > > releaseTrunk/release35 == 1.29385 > releaseRatio = 1.29385 > > For simplicity, let's use a simple linear model for the distribution of slowdown between the frontend and backend: a constant factor slowdown for the backend, and an independent constant factor slowdown for the frontend. This gives the following linear system: > debugRatio = .1 * backendRatio + (1 - .1) * frontendRatio > releaseRatio = .33 * backendRatio + (1 - .33) * frontendRatio > > Solving this linear system we find that under this simple model, the expected slowdown factors are: > backendRatio = 1.77783 > frontendRatio = 1.05547 > > Intuitively, backendRatio comes out larger in this comparison because we see the biggest slowdown during release (1.29 vs 1.12), and during release we are spending a larger fraction of time in the backend (33% vs 10%). > > Applying this same model to across Rafael's data, we find the following (numbers have been rounded for clarity): > > transition backendRatio frontendRatio > 3.5->3.6 1.08 1.03 > 3.6->3.7 1.30 0.95 > 3.7->3.8 1.34 1.07 > 3.8->trunk 0.98 1.02 > > Note that in Rafael's measurements LTO is pretty similar to Release from a CPU time (user time) standpoint. While the final LTO link takes a large amount of real time, it is single threaded. Based on the real time numbers the LTO link was only spending about 20 minutes single-threaded (i.e. about 20 minutes CPU time), which is pretty small compared to the 300-400 minutes of total CPU time. It would be interesting to see the numbers for -O0 or -O1 per-TU together with LTO.Just a note about LTO being sequential: Rafael mentioned he was "building trunk llvm and clang". By default I believe it is ~56 link targets that can be run in parallel (provided you have enough RAM to avoid swapping). -- Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/04b96646/attachment.html>
Rafael Espíndola via llvm-dev
2016-Mar-08 22:52 UTC
[llvm-dev] llvm and clang are getting slower
> Just a note about LTO being sequential: Rafael mentioned he was "building > trunk llvm and clang". By default I believe it is ~56 link targets that can > be run in parallel (provided you have enough RAM to avoid swapping).Correct. The machine has no swap :-) But some targets (clang) are much larger and I have the impression that the last minute or so of the build is just finishing that one link. Cheers, Rafael
Sean Silva via llvm-dev
2016-Mar-09 01:47 UTC
[llvm-dev] llvm and clang are getting slower
On Tue, Mar 8, 2016 at 2:25 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:> > On Mar 8, 2016, at 1:09 PM, Sean Silva via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > On Tue, Mar 8, 2016 at 10:42 AM, Richard Smith via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> On Tue, Mar 8, 2016 at 8:13 AM, Rafael Espíndola >> <llvm-dev at lists.llvm.org> wrote: >> > I have just benchmarked building trunk llvm and clang in Debug, >> > Release and LTO modes (see the attached scrip for the cmake lines). >> > >> > The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all >> > cases I used the system libgcc and libstdc++. >> > >> > For release builds there is a monotonic increase in each version. From >> > 163 minutes with 3.5 to 212 minutes with trunk. For comparison, gcc >> > 5.3.2 takes 205 minutes. >> > >> > Debug and LTO show an improvement in 3.7, but have regressed again in >> 3.8. >> >> I'm curious how these times divide across Clang and various parts of >> LLVM; rerunning with -ftime-report and summing the numbers across all >> compiles could be interesting. >> > > Based on the results I posted upthread about the relative time spend in > the backend for debug vs release, we can estimate this. > To summarize: > 10% of time spent in LLVM for Debug > 33% of time spent in LLVM for Release > (I'll abbreviate "in LLVM" as just "backend"; this is "backend" from > clang's perspective) > > Let's look at the difference between 3.5 and trunk. > > For debug, the user time jumps from 174m50.251s to 197m9.932s. > That's {10490.3, 11829.9} seconds, respectively. > For release, the corresponding numbers are: > {9826.71, 12714.3} seconds. > > debug35 = 10490.251 > debugTrunk = 11829.932 > > debugTrunk/debug35 == 1.12771 > debugRatio = 1.12771 > > release35 = 9826.705 > releaseTrunk = 12714.288 > > releaseTrunk/release35 == 1.29385 > releaseRatio = 1.29385 > > For simplicity, let's use a simple linear model for the distribution of > slowdown between the frontend and backend: a constant factor slowdown for > the backend, and an independent constant factor slowdown for the frontend. > This gives the following linear system: > debugRatio = .1 * backendRatio + (1 - .1) * frontendRatio > releaseRatio = .33 * backendRatio + (1 - .33) * frontendRatio > > Solving this linear system we find that under this simple model, the > expected slowdown factors are: > backendRatio = 1.77783 > frontendRatio = 1.05547 > > Intuitively, backendRatio comes out larger in this comparison because we > see the biggest slowdown during release (1.29 vs 1.12), and during release > we are spending a larger fraction of time in the backend (33% vs 10%). > > Applying this same model to across Rafael's data, we find the following > (numbers have been rounded for clarity): > > transition backendRatio frontendRatio > 3.5->3.6 1.08 1.03 > 3.6->3.7 1.30 0.95 > 3.7->3.8 1.34 1.07 > 3.8->trunk 0.98 1.02 > > Note that in Rafael's measurements LTO is pretty similar to Release from a > CPU time (user time) standpoint. While the final LTO link takes a large > amount of real time, it is single threaded. Based on the real time numbers > the LTO link was only spending about 20 minutes single-threaded (i.e. about > 20 minutes CPU time), which is pretty small compared to the 300-400 minutes > of total CPU time. It would be interesting to see the numbers for -O0 or > -O1 per-TU together with LTO. > > > > Just a note about LTO being sequential: Rafael mentioned he was "building > trunk llvm and clang". By default I believe it is ~56 link targets that can > be run in parallel (provided you have enough RAM to avoid swapping). >D'oh! I was looking at the data wrong since I broke my Fundamental Rule of Looking At Data, namely: don't look at raw numbers in a table since you are likely to look at things wrong or form biases based on the order in which you look at the data points; *always* visualize. There is a significant difference between release and LTO. About 2x consistently. [image: Inline image 3] This is actually curious because during the release build, we were spending 33% of CPU time in the backend (as clang sees it; i.e. mid-level optimizer and codegen). This data is inconsistent with LTO simply being another run through the backend (which would be just +33% CPU time at worst). There seems to be something nonlinear happening. To make it worse, the LTO build has approximately a full Release optimization running per-TU, so the actual LTO step should be seeing inlined/"cleaned up" IR which should be much smaller than what the per-TU optimizer is seeing, so naively it should take *even less* than "another 33% CPU time" chunk. Yet we see 1.5x-2x difference: [image: Inline image 4] -- Sean Silva> > -- > Mehdi > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/4e8d18de/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2016-03-08 at 5.45.54 PM.png Type: image/png Size: 39766 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/4e8d18de/attachment-0002.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2016-03-08 at 5.29.21 PM.png Type: image/png Size: 36008 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160308/4e8d18de/attachment-0003.png>