On Sun, Sep 18, 2016 at 12:32 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:> >> On Sep 17, 2016, at 3:19 PM, Carsten Mattner <carstenmattner at gmail.com> wrote: >> >> So, when I embark on the next ThinLTO try build, probably this Sunday, >> should I append -Wl,-plugin-opt,jobs=NUM_PHYS_CORES to LDFLAGS >> and run ninja without -j or -jNUM_PHYS_CORES? > > > ThinLTO is fairly lean on memory: It should not consume more memory per thread than if you launch the same number of clang process in parallel to process C++ files. > > For example when linking the clang binary itself, without debug info it consumes 0.6GB with 8 threads, 0.9GB with 16 threads, and 1.4GB with 32 threads. > With full debug info, we still have room for improvement, right now it consumes 2.3GB with 8 threads, 3.5GB with 16 threads, and 6.5GB with 32 threads. > > So I believe that configuring with -DDLLVM_PARALLEL_LINK_JOBS=1 should be enough without other constrains, but your mileage may vary.Sure, I'll try that to not introduce too many variables into the configure changes, though I have to ask if using lld would make it possible to have a common -Wl that works across platforms, being able to ignore if it's binutils. If I really wanted to pass that to cmake, overriding LDFLAGS would work, right?
As Mehdi mentioned, thinLTO backend processes use very little memory, you may get away without any additional flags (neither -Wl,--plugin-opt=jobs=.., nor -Dxxx for cmake to limit link parallesm) if your build machine has enough memory. Here is some build time data of parallel linking (with ThinLTO) 52 binaries in clang build (linking parallelism equals ninja parallelism). The machine has 32 logical cores and 64GB memory. 1) Using the default ninja parallelism, the peak 1min load-average is 537. The total elapse time is 9m43s 2) Using ninja -j16, the peak load is 411. The elapse time is 8m26s 3) ninja -j8 : elapse time is 8m34s 4) ninja -j4 : elapse time is 8m50s 5) ninja -j2 : elapse time is 9m54s 6) ninja -j1 : elapse time is 12m3s As you can see, doing serial thinLTO linking across multiple binaries do not give you the best performance. The build performance peaked at j16 in this configuration. You may need to find your best LLVM_PARALLEL_LINK_JOBS value. Having said that, there is definitely room for ThinLTO usability improvement so that ThinLTO parallel backend can coordinate well with the build system's parallelism so that user does not need to figure out the sweet spot. thanks, David On Sat, Sep 17, 2016 at 4:03 PM, Carsten Mattner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Sun, Sep 18, 2016 at 12:32 AM, Mehdi Amini <mehdi.amini at apple.com> > wrote: > > > >> On Sep 17, 2016, at 3:19 PM, Carsten Mattner <carstenmattner at gmail.com> > wrote: > >> > >> So, when I embark on the next ThinLTO try build, probably this Sunday, > >> should I append -Wl,-plugin-opt,jobs=NUM_PHYS_CORES to LDFLAGS > >> and run ninja without -j or -jNUM_PHYS_CORES? > > > > > > ThinLTO is fairly lean on memory: It should not consume more memory per > thread than if you launch the same number of clang process in parallel to > process C++ files. > > > > For example when linking the clang binary itself, without debug info it > consumes 0.6GB with 8 threads, 0.9GB with 16 threads, and 1.4GB with 32 > threads. > > With full debug info, we still have room for improvement, right now it > consumes 2.3GB with 8 threads, 3.5GB with 16 threads, and 6.5GB with 32 > threads. > > > > So I believe that configuring with -DDLLVM_PARALLEL_LINK_JOBS=1 should > be enough without other constrains, but your mileage may vary. > > Sure, I'll try that to not introduce too many variables into the > configure changes, > though I have to ask if using lld would make it possible to have a common > -Wl > that works across platforms, being able to ignore if it's binutils. > > If I really wanted to pass that to cmake, overriding LDFLAGS would work, > right? > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160917/73723ffe/attachment.html>
> On Sep 17, 2016, at 4:03 PM, Carsten Mattner <carstenmattner at gmail.com> wrote: > > On Sun, Sep 18, 2016 at 12:32 AM, Mehdi Amini <mehdi.amini at apple.com> wrote: >> >>> On Sep 17, 2016, at 3:19 PM, Carsten Mattner <carstenmattner at gmail.com> wrote: >>> >>> So, when I embark on the next ThinLTO try build, probably this Sunday, >>> should I append -Wl,-plugin-opt,jobs=NUM_PHYS_CORES to LDFLAGS >>> and run ninja without -j or -jNUM_PHYS_CORES? >> >> >> ThinLTO is fairly lean on memory: It should not consume more memory per thread than if you launch the same number of clang process in parallel to process C++ files. >> >> For example when linking the clang binary itself, without debug info it consumes 0.6GB with 8 threads, 0.9GB with 16 threads, and 1.4GB with 32 threads. >> With full debug info, we still have room for improvement, right now it consumes 2.3GB with 8 threads, 3.5GB with 16 threads, and 6.5GB with 32 threads. >> >> So I believe that configuring with -DDLLVM_PARALLEL_LINK_JOBS=1 should be enough without other constrains, but your mileage may vary. > > Sure, I'll try that to not introduce too many variables into the > configure changes, > though I have to ask if using lld would make it possible to have a common -Wl > that works across platforms, being able to ignore if it's binutils.I’m not sure I understand the question about lld. Lld will be a different linker, with its own set of option. Actually we usually rely on the clang driver to hide platform specific option and provide a common interface to the user.> > If I really wanted to pass that to cmake, overriding LDFLAGS would work, right?I don’t believe LDFLAGS is a valid cmake flag. You need to define both CMAKE_EXE_LINKER_FLAGS and CMAKE_SHARED_LINKER_FLAGS. — Mehdi
Awanish via llvm-dev
2016-Sep-18 10:54 UTC
[llvm-dev] C compiler cannot create executables
I am trying to build httpd.bc and for this I am configuring as ./configure --disable-shared CC="_/*llvm-gcc -flto -use-gold-plugin -Wl,-plugin-opt=also-emit-llvm*/_" CFLAGS="-g" RANLIB="ar --plugin /home/awanish/llvm-2.9/llvm-gcc-4.2-2.9.source/libexec/gcc/x86_64-unknown-linux-gnu/4.2.1/LLVMgold.so -s" AR_FLAGS="--plugin /home/awanish/llvm-2.9/llvm-gcc-4.2-2.9.source/libexec/gcc/x86_64-unknown-linux-gnu/4.2.1/LLVMgold.so -cru" but I am getting an error which states that checking for gcc... llvm-gcc -flto -use-gold-plugin -Wl,-plugin-opt=also-emit-llvm checking whether the C compiler works... no configure: error: in `/home/awanish/PHD/benchmark/httpd-2.2.16/myBuild/srclib/apr': configure: error: C compiler cannot create executables I got reference for configuring like this from "https://dslabredmine.epfl.ch/embedded/cloud9/user/CompilingLLVM.html". Can anyone please tell me where I am doing wrong and what is correct procedure for generating .bc for for httpd which can be run on klee? -- Thanks and Regards Awanish Pandey PhD, CSE IIT Kanpur -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160918/15f2b37f/attachment.html>
On Sun, Sep 18, 2016 at 6:30 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:> I’m not sure I understand the question about lld. Lld will be a > different linker, with its own set of option. Actually we usually > rely on the clang driver to hide platform specific option and > provide a common interface to the user.I was thinking if I force lld, then the -Wl param will be the same across platforms, and I wouldn't have to accommodate for different linkers.> I don’t believe LDFLAGS is a valid cmake flag. You need to define > both CMAKE_EXE_LINKER_FLAGS and CMAKE_SHARED_LINKER_FLAGS.It respects it, as it should, or otherwise packagers would have to replicate CFLAGS, CXXFLAGS, etc. via CMAKE_*_FLAGS in package build descriptions.
On Sun, Sep 18, 2016 at 5:45 AM, Xinliang David Li <xinliangli at gmail.com> wrote:> As Mehdi mentioned, thinLTO backend processes use very little memory, you > may get away without any additional flags (neither -Wl,--plugin-opt=jobs=.., > nor -Dxxx for cmake to limit link parallesm) if your build machine has > enough memory. Here is some build time data of parallel linking (with > ThinLTO) 52 binaries in clang build (linking parallelism equals ninja > parallelism). The machine has 32 logical cores and 64GB memory. > > 1) Using the default ninja parallelism, the peak 1min load-average is 537. > The total elapse time is 9m43s > 2) Using ninja -j16, the peak load is 411. The elapse time is 8m26s > 3) ninja -j8 : elapse time is 8m34s > 4) ninja -j4 : elapse time is 8m50s > 5) ninja -j2 : elapse time is 9m54s > 6) ninja -j1 : elapse time is 12m3s > > As you can see, doing serial thinLTO linking across multiple binaries do not > give you the best performance. The build performance peaked at j16 in this > configuration. You may need to find your best LLVM_PARALLEL_LINK_JOBS > value.What did you set LLVM_PARALLEL_LINK_JOBS to? Maybe I should first try to leave it unset and see if it fits within my machine's hardware limits.> Having said that, there is definitely room for ThinLTO usability > improvement so that ThinLTO parallel backend can coordinate well with the > build system's parallelism so that user does not need to figure out the > sweet spot.Definitely. If parallelism can be controlled on multiple layers, an outer layer's setting ought to influence it in a reasonable way to make it more intuitive to use.