NAKAMURA Takumi via llvm-dev
2017-Jul-10 09:44 UTC
[llvm-dev] FYI: ENABLE_MODULES would make building faster
I was testing efficiency with LLVM_ENABLE_MODULES to build clang/llvm tree. * Summary ** Efficiency of Modules increases as the degree of parallelism decreases. For example with -j8, Modules is 67% of elapsed time than no-modules. ** With higher parallelism, Modules is inefficient. For example with -j72, Modules is just 23 seconds faster than no-modules. Then, processor usage of Modules is about 55%. (Assuming (user+sys)/72 is ideal) ** If each module(s) is not rebuilt, rebuilding is sufficiently efficient. For example with -j72 to remove just *.o, processor usage is 84%. * Random notes for improvements - Get rid of -DCLANG_ENABLE_(ARCMT|REWRITER|SATATIC_ANALYZER), => clang-config.h - Propagate definitions in unittests to whole the tree. Modules is sensitive of -D in command line. - Recognize CMake and Ninja to rebuild module cache. IIRC, there was the discussion about Fortran modules. - Parse and issue "module rebuilder" from modules.cache in advance of building the tree. Anyways, Ninja doesn't do anything while each compilation unit is waiting for module lock. I expect developers and users would be happier with Modules. Thanks, Takumi Below, building clang with "/usr/bin/time ninja -jN clang" Host compiler is clang with libc++ and lld, -Asserts The host is Xeon 36 cores, 72 logical processors. Columns are; N,user,system,elapsed,Ideal:(u+s)/N,(Ideal/elapsed) N, Number of jobs -jN user, user time (sec) system, system time (sec) elapsed, elapsed time (sec) Ideal:(u+s)/N, Ideal elapsed time w/o idle (Ideal/elapsed): Efficiency -- elapsed processor usage *ENABLE_MODULES=OFF 96,11959.10,413.57,184.52,128.882,69.8% 80,12000.47,411.62,184.67,155.151,84.0% 72,11952.46,407.66,184.98,171.668,92.8% 64,10970.09,375.14,189.08,177.269,93.8% 48,8716.43,310.69,198.75,188.065,94.6% 41,7651.71,274.48,202.32,193.322,95.6% 40,7496.75,270.23,205.38,194.175,94.5% 39,7377.94,266.18,206.45,196.003,94.9% 38,7227.33,259.33,206.22,197.017,95.5% 37,7068.51,254.84,207.64,197.928,95.3% 36,6914.62,250.31,208.13,199.026,95.6% 35,6815.70,247.86,210.31,201.816,96.0% 34,6728.49,244.93,214.57,205.101,95.6% 33,6608.13,239.37,216.54,207.500,95.8% 32,6585.52,235.59,221.93,213.160,96.0% 28,6502.79,231.50,248.85,240.510,96.6% 24,6451.13,230.06,289.14,278.383,96.3% 20,6386.95,225.27,342.18,330.611,96.6% 16,6183.61,222.80,411.88,400.401,97.2% 8,5558.17,205.07,728.88,720.405,98.8% *ENABLE_MODULES=ON 96,6396.47,330.73,169.28,70.075,41.4% 88,6249.93,329.12,160.22,74.762,46.7% 80,6259.91,322.27,163.59,82.277,50.3% 72,6092.58,315.70,161.55,89.004,55.1% 64,5727.81,297.64,168.78,94.148,55.8% 56,5421.81,283.95,168.71,101.889,60.4% 48,4896.81,260.07,171.05,107.435,62.8% 40,4375.71,235.90,177.60,115.290,64.9% 32,3959.32,214.67,188.10,130.437,69.3% 24,3892.54,206.40,230.70,170.789,74.0% 16,3690.52,201.41,294.12,243.246,82.7% 8,3298.95,185.68,488.59,435.579,89.2% *ENABLE_MODULES_ON building to remove just *.o 96,6898.51,347.36,120.62,75.478,62.6% 88,6908.61,345.52,121.14,82.433,68.0% 80,6823.66,338.48,118.72,89.527,75.4% 72,6819.25,339.82,118.30,99.432,84.1% 64,6311.53,310.03,120.06,103.462,86.2% 56,5729.12,287.76,123.73,107.444,86.8% 48,5108.16,264.21,127.25,111.924,88.0% 40,4449.20,231.17,131.42,117.009,89.0% 32,3933.69,205.94,142.74,129.363,90.6% 24,3844.17,201.83,181.55,168.583,92.9% 16,3669.73,193.59,251.15,241.458,96.1% 8,3225.63,178.68,434.85,425.539,97.9% -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170710/4c4854db/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: modules.png Type: image/png Size: 24728 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170710/4c4854db/attachment.png>
Vassil Vassilev via llvm-dev
2017-Jul-11 20:52 UTC
[llvm-dev] [cfe-dev] FYI: ENABLE_MODULES would make building faster
On 10/07/17 11:44, NAKAMURA Takumi via cfe-dev wrote:> I was testing efficiency with LLVM_ENABLE_MODULES to build clang/llvm > tree.Thanks for sharing this summary.> > * Summary > > ** Efficiency of Modules increases as the degree of parallelism decreases. > For example with -j8, Modules is 67% of elapsed time than no-modules.Do you have some numbers about the performance if build with libstdc++?> > ** With higher parallelism, Modules is inefficient. > For example with -j72, Modules is just 23 seconds faster than no-modules. > Then, processor usage of Modules is about 55%. > (Assuming (user+sys)/72 is ideal)I assume this is the penalty of using implicit modules. Building modules locks which might lead to quadratic compile times (we had an issue describing the problem somewhere in bugzilla). I've seen in the past using make that we build modules but we fail to pick them up. I tried to fix the issue but didn't test thoroughly. If you compile with -H you should probably see which files are still textually included.> > ** If each module(s) is not rebuilt, rebuilding is sufficiently efficient. > For example with -j72 to remove just *.o, processor usage is 84%.Do you mean we are 84% faster?> > * Random notes for improvements > - Get rid of -DCLANG_ENABLE_(ARCMT|REWRITER|SATATIC_ANALYZER), => > clang-config.h+1 Thanks a lot for working on this! --Vassil> - Propagate definitions in unittests to whole the tree. > Modules is sensitive of -D in command line. > - Recognize CMake and Ninja to rebuild module cache. > IIRC, there was the discussion about Fortran modules. > - Parse and issue "module rebuilder" from modules.cache in advance of > building the tree. > Anyways, Ninja doesn't do anything while each compilation unit is > waiting for module lock. > > I expect developers and users would be happier with Modules. > Thanks, > Takumi > > > Below, building clang with "/usr/bin/time ninja -jN clang" > Host compiler is clang with libc++ and lld, -Asserts > The host is Xeon 36 cores, 72 logical processors. > > Columns are; > N,user,system,elapsed,Ideal:(u+s)/N,(Ideal/elapsed) > > N, Number of jobs -jN > user, user time (sec) > system, system time (sec) > elapsed, elapsed time (sec) > Ideal:(u+s)/N, Ideal elapsed time w/o idle > (Ideal/elapsed): Efficiency -- elapsed processor usage > > *ENABLE_MODULES=OFF > 96,11959.10,413.57,184.52,128.882,69.8% > 80,12000.47,411.62,184.67,155.151,84.0% > 72,11952.46,407.66,184.98,171.668,92.8% > 64,10970.09,375.14,189.08,177.269,93.8% > 48,8716.43,310.69,198.75,188.065,94.6% > 41,7651.71,274.48,202.32,193.322,95.6% > 40,7496.75,270.23,205.38,194.175,94.5% > 39,7377.94,266.18,206.45,196.003,94.9% > 38,7227.33,259.33,206.22,197.017,95.5% > 37,7068.51,254.84,207.64,197.928,95.3% > 36,6914.62,250.31,208.13,199.026,95.6% > 35,6815.70,247.86,210.31,201.816,96.0% > 34,6728.49,244.93,214.57,205.101,95.6% > 33,6608.13,239.37,216.54,207.500,95.8% > 32,6585.52,235.59,221.93,213.160,96.0% > 28,6502.79,231.50,248.85,240.510,96.6% > 24,6451.13,230.06,289.14,278.383,96.3% > 20,6386.95,225.27,342.18,330.611,96.6% > 16,6183.61,222.80,411.88,400.401,97.2% > 8,5558.17,205.07,728.88,720.405,98.8% > > *ENABLE_MODULES=ON > 96,6396.47,330.73,169.28,70.075,41.4% > 88,6249.93,329.12,160.22,74.762,46.7% > 80,6259.91,322.27,163.59,82.277,50.3% > 72,6092.58,315.70,161.55,89.004,55.1% > 64,5727.81,297.64,168.78,94.148,55.8% > 56,5421.81,283.95,168.71,101.889,60.4% > 48,4896.81,260.07,171.05,107.435,62.8% > 40,4375.71,235.90,177.60,115.290,64.9% > 32,3959.32,214.67,188.10,130.437,69.3% > 24,3892.54,206.40,230.70,170.789,74.0% > 16,3690.52,201.41,294.12,243.246,82.7% > 8,3298.95,185.68,488.59,435.579,89.2% > > *ENABLE_MODULES_ON building to remove just *.o > 96,6898.51,347.36,120.62,75.478,62.6% > 88,6908.61,345.52,121.14,82.433,68.0% > 80,6823.66,338.48,118.72,89.527,75.4% > 72,6819.25,339.82,118.30,99.432,84.1% > 64,6311.53,310.03,120.06,103.462,86.2% > 56,5729.12,287.76,123.73,107.444,86.8% > 48,5108.16,264.21,127.25,111.924,88.0% > 40,4449.20,231.17,131.42,117.009,89.0% > 32,3933.69,205.94,142.74,129.363,90.6% > 24,3844.17,201.83,181.55,168.583,92.9% > 16,3669.73,193.59,251.15,241.458,96.1% > 8,3225.63,178.68,434.85,425.539,97.9% > > > > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170711/db353d78/attachment.html>
David Blaikie via llvm-dev
2017-Jul-11 21:21 UTC
[llvm-dev] [cfe-dev] FYI: ENABLE_MODULES would make building faster
On Mon, Jul 10, 2017 at 2:44 AM NAKAMURA Takumi via cfe-dev < cfe-dev at lists.llvm.org> wrote:> I was testing efficiency with LLVM_ENABLE_MODULES to build clang/llvm tree. >Awesome - thanks for trying it out & gathering all this data!> > * Summary > > ** Efficiency of Modules increases as the degree of parallelism decreases. > For example with -j8, Modules is 67% of elapsed time than no-modules. > > ** With higher parallelism, Modules is inefficient. > For example with -j72, Modules is just 23 seconds faster than no-modules. > Then, processor usage of Modules is about 55%. > (Assuming (user+sys)/72 is ideal) >As Vasil mentioned, probably implicit modules. I have some hope/aspirations of implementing explicit modules* support in cmake & so it'll be interesting to compare how much more parallelism can be achieved by that. If anyone else is interesting in doing/helping with this work, I'd love any help - I've never touched cmake... so it'll be an adventure. (I assume it'll need changes to cmake itself, but I could be wrong) * Explicit modules are used at Google & implemented in Clang (though only accessible via cc1 at the moment) - where an explicit clang invocation must be made by the build system to build a .pcm file, and then explicit arguments given to a clang invocation of a file using those modules, etc.> > ** If each module(s) is not rebuilt, rebuilding is sufficiently efficient. > For example with -j72 to remove just *.o, processor usage is 84%. > > * Random notes for improvements > - Get rid of -DCLANG_ENABLE_(ARCMT|REWRITER|SATATIC_ANALYZER), => > clang-config.h > - Propagate definitions in unittests to whole the tree. > Modules is sensitive of -D in command line. > - Recognize CMake and Ninja to rebuild module cache. > IIRC, there was the discussion about Fortran modules. > - Parse and issue "module rebuilder" from modules.cache in advance of > building the tree. > Anyways, Ninja doesn't do anything while each compilation unit is > waiting for module lock. > > I expect developers and users would be happier with Modules. > Thanks, > Takumi > > > Below, building clang with "/usr/bin/time ninja -jN clang" > Host compiler is clang with libc++ and lld, -Asserts > The host is Xeon 36 cores, 72 logical processors. > > Columns are; > N,user,system,elapsed,Ideal:(u+s)/N,(Ideal/elapsed) > > N, Number of jobs -jN > user, user time (sec) > system, system time (sec) > elapsed, elapsed time (sec) > Ideal:(u+s)/N, Ideal elapsed time w/o idle > (Ideal/elapsed): Efficiency -- elapsed processor usage > > *ENABLE_MODULES=OFF > 96,11959.10,413.57,184.52,128.882,69.8% > 80,12000.47,411.62,184.67,155.151,84.0% > 72,11952.46,407.66,184.98,171.668,92.8% > 64,10970.09,375.14,189.08,177.269,93.8% > 48,8716.43,310.69,198.75,188.065,94.6% > 41,7651.71,274.48,202.32,193.322,95.6% > 40,7496.75,270.23,205.38,194.175,94.5% > 39,7377.94,266.18,206.45,196.003,94.9% > 38,7227.33,259.33,206.22,197.017,95.5% > 37,7068.51,254.84,207.64,197.928,95.3% > 36,6914.62,250.31,208.13,199.026,95.6% > 35,6815.70,247.86,210.31,201.816,96.0% > 34,6728.49,244.93,214.57,205.101,95.6% > 33,6608.13,239.37,216.54,207.500,95.8% > 32,6585.52,235.59,221.93,213.160,96.0% > 28,6502.79,231.50,248.85,240.510,96.6% > 24,6451.13,230.06,289.14,278.383,96.3% > 20,6386.95,225.27,342.18,330.611,96.6% > 16,6183.61,222.80,411.88,400.401,97.2% > 8,5558.17,205.07,728.88,720.405,98.8% > > *ENABLE_MODULES=ON > 96,6396.47,330.73,169.28,70.075,41.4% > 88,6249.93,329.12,160.22,74.762,46.7% > 80,6259.91,322.27,163.59,82.277,50.3% > 72,6092.58,315.70,161.55,89.004,55.1% > 64,5727.81,297.64,168.78,94.148,55.8% > 56,5421.81,283.95,168.71,101.889,60.4% > 48,4896.81,260.07,171.05,107.435,62.8% > 40,4375.71,235.90,177.60,115.290,64.9% > 32,3959.32,214.67,188.10,130.437,69.3% > 24,3892.54,206.40,230.70,170.789,74.0% > 16,3690.52,201.41,294.12,243.246,82.7% > 8,3298.95,185.68,488.59,435.579,89.2% > > *ENABLE_MODULES_ON building to remove just *.o > 96,6898.51,347.36,120.62,75.478,62.6% > 88,6908.61,345.52,121.14,82.433,68.0% > 80,6823.66,338.48,118.72,89.527,75.4% > 72,6819.25,339.82,118.30,99.432,84.1% > 64,6311.53,310.03,120.06,103.462,86.2% > 56,5729.12,287.76,123.73,107.444,86.8% > 48,5108.16,264.21,127.25,111.924,88.0% > 40,4449.20,231.17,131.42,117.009,89.0% > 32,3933.69,205.94,142.74,129.363,90.6% > 24,3844.17,201.83,181.55,168.583,92.9% > 16,3669.73,193.59,251.15,241.458,96.1% > 8,3225.63,178.68,434.85,425.539,97.9% > > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170711/c8e90fee/attachment.html>
Sean Silva via llvm-dev
2017-Jul-12 02:10 UTC
[llvm-dev] [cfe-dev] FYI: ENABLE_MODULES would make building faster
On Tue, Jul 11, 2017 at 2:21 PM, David Blaikie via cfe-dev < cfe-dev at lists.llvm.org> wrote:> > > On Mon, Jul 10, 2017 at 2:44 AM NAKAMURA Takumi via cfe-dev < > cfe-dev at lists.llvm.org> wrote: > >> I was testing efficiency with LLVM_ENABLE_MODULES to build clang/llvm >> tree. >> > > Awesome - thanks for trying it out & gathering all this data! > > >> >> * Summary >> >> ** Efficiency of Modules increases as the degree of parallelism decreases. >> For example with -j8, Modules is 67% of elapsed time than no-modules. >> >> ** With higher parallelism, Modules is inefficient. >> For example with -j72, Modules is just 23 seconds faster than no-modules. >> Then, processor usage of Modules is about 55%. >> (Assuming (user+sys)/72 is ideal) >> > > As Vasil mentioned, probably implicit modules. > > I have some hope/aspirations of implementing explicit modules* support in > cmake & so it'll be interesting to compare how much more parallelism can be > achieved by that. If anyone else is interesting in doing/helping with this > work, I'd love any help - I've never touched cmake... so it'll be an > adventure. (I assume it'll need changes to cmake itself, but I could be > wrong) >I vaugely looked at this at one point in the past, though I got side tracked before I could try it out. Basically, what it seemed like could be done was something like: 1. use a PRE_BUILD add_custom_command for a given add_library command to build the pcm ( https://cmake.org/cmake/help/v3.0/command/add_custom_command.html) 2. use an INTERFACE target_compile_definitions to add the command line flag that dependent code needs to add to the clang invocation ( https://cmake.org/cmake/help/v3.0/command/target_compile_definitions.html#command:target_compile_definitions ) 3. an IMPORTED target library could be used to model external system dependencies, and you would have one such IMPORTED target for each different libc or C++ standard library Of course, to have it properly integrated into CMake, ideally add_library would take a list of headers and do 1. and 2. by itself. CMake would need to have built-in knowledge of 3. also. -- Sean Silva> > * Explicit modules are used at Google & implemented in Clang (though only > accessible via cc1 at the moment) - where an explicit clang invocation must > be made by the build system to build a .pcm file, and then explicit > arguments given to a clang invocation of a file using those modules, etc. > > >> >> ** If each module(s) is not rebuilt, rebuilding is sufficiently efficient. >> For example with -j72 to remove just *.o, processor usage is 84%. >> >> * Random notes for improvements >> - Get rid of -DCLANG_ENABLE_(ARCMT|REWRITER|SATATIC_ANALYZER), => >> clang-config.h >> - Propagate definitions in unittests to whole the tree. >> Modules is sensitive of -D in command line. >> - Recognize CMake and Ninja to rebuild module cache. >> IIRC, there was the discussion about Fortran modules. >> - Parse and issue "module rebuilder" from modules.cache in advance of >> building the tree. >> Anyways, Ninja doesn't do anything while each compilation unit is >> waiting for module lock. >> >> I expect developers and users would be happier with Modules. >> Thanks, >> Takumi >> >> >> Below, building clang with "/usr/bin/time ninja -jN clang" >> Host compiler is clang with libc++ and lld, -Asserts >> The host is Xeon 36 cores, 72 logical processors. >> >> Columns are; >> N,user,system,elapsed,Ideal:(u+s)/N,(Ideal/elapsed) >> >> N, Number of jobs -jN >> user, user time (sec) >> system, system time (sec) >> elapsed, elapsed time (sec) >> Ideal:(u+s)/N, Ideal elapsed time w/o idle >> (Ideal/elapsed): Efficiency -- elapsed processor usage >> >> *ENABLE_MODULES=OFF >> 96,11959.10,413.57,184.52,128.882,69.8% >> 80,12000.47,411.62,184.67,155.151,84.0% >> 72,11952.46,407.66,184.98,171.668,92.8% >> 64,10970.09,375.14,189.08,177.269,93.8% >> 48,8716.43,310.69,198.75,188.065,94.6% >> 41,7651.71,274.48,202.32,193.322,95.6% >> 40,7496.75,270.23,205.38,194.175,94.5% >> 39,7377.94,266.18,206.45,196.003,94.9% >> 38,7227.33,259.33,206.22,197.017,95.5% >> 37,7068.51,254.84,207.64,197.928,95.3% >> 36,6914.62,250.31,208.13,199.026,95.6% >> 35,6815.70,247.86,210.31,201.816,96.0% >> 34,6728.49,244.93,214.57,205.101,95.6% >> 33,6608.13,239.37,216.54,207.500,95.8% >> 32,6585.52,235.59,221.93,213.160,96.0% >> 28,6502.79,231.50,248.85,240.510,96.6% >> 24,6451.13,230.06,289.14,278.383,96.3% >> 20,6386.95,225.27,342.18,330.611,96.6% >> 16,6183.61,222.80,411.88,400.401,97.2% >> 8,5558.17,205.07,728.88,720.405,98.8% >> >> *ENABLE_MODULES=ON >> 96,6396.47,330.73,169.28,70.075,41.4% >> 88,6249.93,329.12,160.22,74.762,46.7% >> 80,6259.91,322.27,163.59,82.277,50.3% >> 72,6092.58,315.70,161.55,89.004,55.1% >> 64,5727.81,297.64,168.78,94.148,55.8% >> 56,5421.81,283.95,168.71,101.889,60.4% >> 48,4896.81,260.07,171.05,107.435,62.8% >> 40,4375.71,235.90,177.60,115.290,64.9% >> 32,3959.32,214.67,188.10,130.437,69.3% >> 24,3892.54,206.40,230.70,170.789,74.0% >> 16,3690.52,201.41,294.12,243.246,82.7% >> 8,3298.95,185.68,488.59,435.579,89.2% >> >> *ENABLE_MODULES_ON building to remove just *.o >> 96,6898.51,347.36,120.62,75.478,62.6% >> 88,6908.61,345.52,121.14,82.433,68.0% >> 80,6823.66,338.48,118.72,89.527,75.4% >> 72,6819.25,339.82,118.30,99.432,84.1% >> 64,6311.53,310.03,120.06,103.462,86.2% >> 56,5729.12,287.76,123.73,107.444,86.8% >> 48,5108.16,264.21,127.25,111.924,88.0% >> 40,4449.20,231.17,131.42,117.009,89.0% >> 32,3933.69,205.94,142.74,129.363,90.6% >> 24,3844.17,201.83,181.55,168.583,92.9% >> 16,3669.73,193.59,251.15,241.458,96.1% >> 8,3225.63,178.68,434.85,425.539,97.9% >> >> _______________________________________________ >> cfe-dev mailing list >> cfe-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >> > > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170711/f84fb91d/attachment.html>
Boris Kolpackov via llvm-dev
2017-Jul-12 07:18 UTC
[llvm-dev] [cfe-dev] FYI: ENABLE_MODULES would make building faster
David Blaikie via cfe-dev <cfe-dev at lists.llvm.org> writes:> I have some hope/aspirations of implementing explicit modules* support in > cmake & so it'll be interesting to compare how much more parallelism can be > achieved by that.I think it will be hard to implement this "properly" in CMake until the underlying build systems are module-aware, similar to how (most) of them being header-aware. Specifically, generating all the .pcm's during some sort of a pre-build step will hinder parallelism since in a sense you will have a "barrier" between compiling module interfaces and other sources. Ideally, you would want to start compiling sources as soon as all the module interfaces that they actually use are ready. Especially so if you have a -j72 kind of machine ;-). Then there is the issue of change detection: you probably don't want to make all your sources depend on all your module interfaces. FWIW, we have implemented this "proper" module support (though for -fmodule-ts only) in build2[1], if you (or anyone else) would like to try it. [1] https://build2.org/ Boris