Craig Topper via llvm-dev
2018-Mar-17 23:04 UTC
[llvm-dev] [cfe-dev] Clang executable sizes and build stats
I'm sure the x86 scheduler models are causing bloat. Every time a single instruction appears on a line by itself like this in a scheduler model: def: InstRW<[SBWriteResGroup2], (instregex "ANDNPDrr")>; It causes that instruction to be its own group in the generated output. And its replicated for each CPU. We should look into better using regular expressions or taking advantage of the fact that InstRW can take a list of instructions. That makes those instructions part of a single group and the tablegen backend will only split the group if two CPUs have different ports, latency, etc. for instructions within the group. ~Craig On Sat, Mar 17, 2018 at 6:26 AM, Greg Bedwell via cfe-dev < cfe-dev at lists.llvm.org> wrote:> Thanks for raising this. This is something we've recently been looking at > too at Sony, as over the course of PS4's lifetime so far we've seen our > clang executable on Windows approximately double in size, which isn't ideal > for things like distributed build systems. A graph of clang.exe size on > our internal staging branch matches yours closely with it being more of a > death by a thousand cuts rather than being down to a small number of sudden > big-bang changes. > > I did spot one range of about 25 upstream commits in our data where the > exe size increased by over 1MB. My prime suspect in that range was a new > scheduling model being added to the X86 backend but I've not bisected > further to be sure yet. This would be an interesting case for us as we > don't really need to support any models other than Jaguar for our users but > don't want to break the LLVM tests, nor introduce loads of private changes > to our branch. > > I know our test/QA team have been doing some analysis using Bloaty > McBloatFace to see exactly where the size is coming from and produced some > really nice visualizations of that data. They've also been looking at how > the MinSizeRelease config does on Windows. I think the size savings were > decent but I'm not sure of performance numbers, if they have any yet. > > I'll ask around at what we have to share once back in the office. > > Thanks for sharing your data! > > -Greg > > > > On Sat, 17 Mar 2018 at 12:36, Dimitry Andric via cfe-dev < > cfe-dev at lists.llvm.org> wrote: > >> Hi all, >> >> I recently did a run where I built clang executables on FreeBSD >> 12-CURRENT [1], from trunk r250000 (2015-10-11) all through r327700 >> (2018-03-16), with increments of 100 revisions. This is mainly meant as an >> archive, for easily doing bisections, but there are also some interesting >> statistics. >> >> From r250000 through r327700: >> * the total (stripped) executable size grew by approximately 43% >> * the size of the text segment grew by approximately 41% >> * the size of the data segment grew by approximately 61% >> * the size of the bss segment grew by approximately 185% >> * real build time (on a 32 core system) grew by approximately 60% >> * user build time (on a 32 core system) grew by approximately 62% >> * maximum resident set size (RSS) grew by approximately 32% >> >> Google spreadsheet with more numbers and some graphs: >> >> https://docs.google.com/spreadsheets/d/e/2PACX-1vSGq1U7j45JNC_ >> bcG4HV3jKOV4WBUPbTSgMMFXd5SD0IEPTAFwWnlU2ysprmnHsNe5WONRCjg8F5mHK/pubhtml >> >> -Dimitry >> >> [1] These were built using the "ninja clang clang-headers" target, >> followed by "ninja install-clang install-clang-headers". >> >> _______________________________________________ >> cfe-dev mailing list >> cfe-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >> > > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180317/a98b4a4d/attachment.html>
Andrew Trick via llvm-dev
2018-Mar-21 01:34 UTC
[llvm-dev] [cfe-dev] Clang executable sizes and build stats
> On Mar 17, 2018, at 4:04 PM, Craig Topper via cfe-dev <cfe-dev at lists.llvm.org> wrote: > > I'm sure the x86 scheduler models are causing bloat. Every time a single instruction appears on a line by itself like this in a scheduler model: > > def: InstRW<[SBWriteResGroup2], (instregex "ANDNPDrr")>; > > It causes that instruction to be its own group in the generated output. And its replicated for each CPU. We should look into better using regular expressions or taking advantage of the fact that InstRW can take a list of instructions. That makes those instructions part of a single group and the tablegen backend will only split the group if two CPUs have different ports, latency, etc. for instructions within the group. > > ~CraigThe tables themselves are compact. There’s actually a lot of complexity spent on compacting the resource and latency tables. But, yes, there are 5k+ entries per cpu, roughly 28 byte each. However, if you're looking at a debug build, the tables will be huge. The scheduling class names are much bigger than the data. -Andy> On Sat, Mar 17, 2018 at 6:26 AM, Greg Bedwell via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote: > Thanks for raising this. This is something we've recently been looking at too at Sony, as over the course of PS4's lifetime so far we've seen our clang executable on Windows approximately double in size, which isn't ideal for things like distributed build systems. A graph of clang.exe size on our internal staging branch matches yours closely with it being more of a death by a thousand cuts rather than being down to a small number of sudden big-bang changes. > > I did spot one range of about 25 upstream commits in our data where the exe size increased by over 1MB. My prime suspect in that range was a new scheduling model being added to the X86 backend but I've not bisected further to be sure yet. This would be an interesting case for us as we don't really need to support any models other than Jaguar for our users but don't want to break the LLVM tests, nor introduce loads of private changes to our branch. > > I know our test/QA team have been doing some analysis using Bloaty McBloatFace to see exactly where the size is coming from and produced some really nice visualizations of that data. They've also been looking at how the MinSizeRelease config does on Windows. I think the size savings were decent but I'm not sure of performance numbers, if they have any yet. > > I'll ask around at what we have to share once back in the office. > > Thanks for sharing your data! > > -Greg > > > > On Sat, 17 Mar 2018 at 12:36, Dimitry Andric via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote: > Hi all, > > I recently did a run where I built clang executables on FreeBSD 12-CURRENT [1], from trunk r250000 (2015-10-11) all through r327700 (2018-03-16), with increments of 100 revisions. This is mainly meant as an archive, for easily doing bisections, but there are also some interesting statistics. > > From r250000 through r327700: > * the total (stripped) executable size grew by approximately 43% > * the size of the text segment grew by approximately 41% > * the size of the data segment grew by approximately 61% > * the size of the bss segment grew by approximately 185% > * real build time (on a 32 core system) grew by approximately 60% > * user build time (on a 32 core system) grew by approximately 62% > * maximum resident set size (RSS) grew by approximately 32% > > Google spreadsheet with more numbers and some graphs: > > https://docs.google.com/spreadsheets/d/e/2PACX-1vSGq1U7j45JNC_bcG4HV3jKOV4WBUPbTSgMMFXd5SD0IEPTAFwWnlU2ysprmnHsNe5WONRCjg8F5mHK/pubhtml <https://docs.google.com/spreadsheets/d/e/2PACX-1vSGq1U7j45JNC_bcG4HV3jKOV4WBUPbTSgMMFXd5SD0IEPTAFwWnlU2ysprmnHsNe5WONRCjg8F5mHK/pubhtml> > > -Dimitry > > [1] These were built using the "ninja clang clang-headers" target, followed by "ninja install-clang install-clang-headers". > > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev> > > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev> > > > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180320/51308943/attachment.html>
Craig Topper via llvm-dev
2018-Mar-22 04:28 UTC
[llvm-dev] [cfe-dev] Clang executable sizes and build stats
I just knocked ~400k off the size of the x86 scheduler tables by reducing from 5k+ entries to 2k+ entries per cpu. ~Craig On Tue, Mar 20, 2018 at 6:34 PM, Andrew Trick <atrick at apple.com> wrote:> > > On Mar 17, 2018, at 4:04 PM, Craig Topper via cfe-dev < > cfe-dev at lists.llvm.org> wrote: > > I'm sure the x86 scheduler models are causing bloat. Every time a single > instruction appears on a line by itself like this in a scheduler model: > > def: InstRW<[SBWriteResGroup2], (instregex "ANDNPDrr")>; > > It causes that instruction to be its own group in the generated output. > And its replicated for each CPU. We should look into better using regular > expressions or taking advantage of the fact that InstRW can take a list of > instructions. That makes those instructions part of a single group and the > tablegen backend will only split the group if two CPUs have different > ports, latency, etc. for instructions within the group. > > ~Craig > > > The tables themselves are compact. There’s actually a lot of complexity > spent on compacting the resource and latency tables. But, yes, there are > 5k+ entries per cpu, roughly 28 byte each. However, if you're looking at a > debug build, the tables will be huge. The scheduling class names are much > bigger than the data. > > -Andy > > On Sat, Mar 17, 2018 at 6:26 AM, Greg Bedwell via cfe-dev < > cfe-dev at lists.llvm.org> wrote: > >> Thanks for raising this. This is something we've recently been looking at >> too at Sony, as over the course of PS4's lifetime so far we've seen our >> clang executable on Windows approximately double in size, which isn't ideal >> for things like distributed build systems. A graph of clang.exe size on >> our internal staging branch matches yours closely with it being more of a >> death by a thousand cuts rather than being down to a small number of sudden >> big-bang changes. >> >> I did spot one range of about 25 upstream commits in our data where the >> exe size increased by over 1MB. My prime suspect in that range was a new >> scheduling model being added to the X86 backend but I've not bisected >> further to be sure yet. This would be an interesting case for us as we >> don't really need to support any models other than Jaguar for our users but >> don't want to break the LLVM tests, nor introduce loads of private changes >> to our branch. >> >> I know our test/QA team have been doing some analysis using Bloaty >> McBloatFace to see exactly where the size is coming from and produced some >> really nice visualizations of that data. They've also been looking at how >> the MinSizeRelease config does on Windows. I think the size savings were >> decent but I'm not sure of performance numbers, if they have any yet. >> >> I'll ask around at what we have to share once back in the office. >> >> Thanks for sharing your data! >> >> -Greg >> >> >> >> On Sat, 17 Mar 2018 at 12:36, Dimitry Andric via cfe-dev < >> cfe-dev at lists.llvm.org> wrote: >> >>> Hi all, >>> >>> I recently did a run where I built clang executables on FreeBSD >>> 12-CURRENT [1], from trunk r250000 (2015-10-11) all through r327700 >>> (2018-03-16), with increments of 100 revisions. This is mainly meant as an >>> archive, for easily doing bisections, but there are also some interesting >>> statistics. >>> >>> From r250000 through r327700: >>> * the total (stripped) executable size grew by approximately 43% >>> * the size of the text segment grew by approximately 41% >>> * the size of the data segment grew by approximately 61% >>> * the size of the bss segment grew by approximately 185% >>> * real build time (on a 32 core system) grew by approximately 60% >>> * user build time (on a 32 core system) grew by approximately 62% >>> * maximum resident set size (RSS) grew by approximately 32% >>> >>> Google spreadsheet with more numbers and some graphs: >>> >>> https://docs.google.com/spreadsheets/d/e/2PACX-1vSGq1U7j45JN >>> C_bcG4HV3jKOV4WBUPbTSgMMFXd5SD0IEPTAFwWnlU2ysprmnHsNe5WONRCj >>> g8F5mHK/pubhtml >>> >>> -Dimitry >>> >>> [1] These were built using the "ninja clang clang-headers" target, >>> followed by "ninja install-clang install-clang-headers". >>> >>> _______________________________________________ >>> cfe-dev mailing list >>> cfe-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >>> >> >> _______________________________________________ >> cfe-dev mailing list >> cfe-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >> >> > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180321/1e42e100/attachment.html>