thr3ads.net - llvm dev - [llvm-dev] (Thin)LTO llvm build [Sep 2016]

If this information is useful, please help other people find it:
Share via:

Carsten Mattner via llvm-dev

2016-Sep-17 23:03 UTC

[llvm-dev] (Thin)LTO llvm build

On Sun, Sep 18, 2016 at 12:32 AM, Mehdi Amini <mehdi.amini at apple.com>
wrote:>
>> On Sep 17, 2016, at 3:19 PM, Carsten Mattner <carstenmattner at
gmail.com> wrote:
>>
>> So, when I embark on the next ThinLTO try build, probably this Sunday,
>> should I append -Wl,-plugin-opt,jobs=NUM_PHYS_CORES to LDFLAGS
>> and run ninja without -j or -jNUM_PHYS_CORES?
>
>
> ThinLTO is fairly lean on memory: It should not consume more memory per
thread than if you launch the same number of clang process in parallel to
process C++ files.
>
> For example when linking the clang binary itself, without debug info it
consumes 0.6GB with 8 threads, 0.9GB with 16 threads, and 1.4GB with 32 threads.
> With full debug info, we still have room for improvement, right now it
consumes 2.3GB with 8 threads, 3.5GB with 16 threads, and 6.5GB with 32 threads.
>
> So I believe that configuring with -DDLLVM_PARALLEL_LINK_JOBS=1 should be
enough without other constrains, but your mileage may vary.
Sure, I'll try that to not introduce too many variables into the
configure changes,
though I have to ask if using lld would make it possible to have a common -Wl
that works across platforms, being able to ignore if it's binutils.

If I really wanted to pass that to cmake, overriding LDFLAGS would work, right?

Xinliang David Li via llvm-dev

2016-Sep-18 03:45 UTC

head link

[llvm-dev] (Thin)LTO llvm build

As Mehdi mentioned, thinLTO backend processes use very little memory, you
may get away without any additional flags (neither
-Wl,--plugin-opt=jobs=.., nor -Dxxx for cmake to limit link parallesm) if
your build machine has enough memory. Here is some build time data of
parallel linking (with ThinLTO) 52 binaries in clang build (linking
parallelism equals ninja parallelism). The machine has 32 logical cores and
64GB memory.

1) Using the default ninja parallelism, the peak 1min load-average is 537.
The total elapse time is 9m43s
2) Using ninja -j16, the peak load is 411. The elapse time is 8m26s
3) ninja -j8 : elapse time is 8m34s
4) ninja -j4 : elapse time is 8m50s
5) ninja  -j2 : elapse time is 9m54s
6) ninja -j1 : elapse time is 12m3s

As you can see, doing serial thinLTO linking across multiple binaries do
not give you the best performance. The build performance peaked at j16 in
this configuration.   You may need to find your best
LLVM_PARALLEL_LINK_JOBS value.

Having said that,  there is definitely  room for ThinLTO usability
improvement so that ThinLTO parallel backend can coordinate well with the
build system's parallelism so that user does not need to figure out the
sweet spot.

thanks,

David



On Sat, Sep 17, 2016 at 4:03 PM, Carsten Mattner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Sun, Sep 18, 2016 at 12:32 AM, Mehdi Amini <mehdi.amini at
apple.com>
> wrote:
> >
> >> On Sep 17, 2016, at 3:19 PM, Carsten Mattner <carstenmattner at
gmail.com>
> wrote:
> >>
> >> So, when I embark on the next ThinLTO try build, probably this
Sunday,
> >> should I append -Wl,-plugin-opt,jobs=NUM_PHYS_CORES to LDFLAGS
> >> and run ninja without -j or -jNUM_PHYS_CORES?
> >
> >
> > ThinLTO is fairly lean on memory: It should not consume more memory
per
> thread than if you launch the same number of clang process in parallel to
> process C++ files.
> >
> > For example when linking the clang binary itself, without debug info
it
> consumes 0.6GB with 8 threads, 0.9GB with 16 threads, and 1.4GB with 32
> threads.
> > With full debug info, we still have room for improvement, right now it
> consumes 2.3GB with 8 threads, 3.5GB with 16 threads, and 6.5GB with 32
> threads.
> >
> > So I believe that configuring with -DDLLVM_PARALLEL_LINK_JOBS=1 should
> be enough without other constrains, but your mileage may vary.
>
> Sure, I'll try that to not introduce too many variables into the
> configure changes,
> though I have to ask if using lld would make it possible to have a common
> -Wl
> that works across platforms, being able to ignore if it's binutils.
>
> If I really wanted to pass that to cmake, overriding LDFLAGS would work,
> right?
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160917/73723ffe/attachment.html>

Mehdi Amini via llvm-dev

2016-Sep-18 04:30 UTC

head link

[llvm-dev] (Thin)LTO llvm build

> On Sep 17, 2016, at 4:03 PM, Carsten Mattner <carstenmattner at
gmail.com> wrote:
> 
> On Sun, Sep 18, 2016 at 12:32 AM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>> 
>>> On Sep 17, 2016, at 3:19 PM, Carsten Mattner <carstenmattner at
gmail.com> wrote:
>>> 
>>> So, when I embark on the next ThinLTO try build, probably this
Sunday,
>>> should I append -Wl,-plugin-opt,jobs=NUM_PHYS_CORES to LDFLAGS
>>> and run ninja without -j or -jNUM_PHYS_CORES?
>> 
>> 
>> ThinLTO is fairly lean on memory: It should not consume more memory per
thread than if you launch the same number of clang process in parallel to
process C++ files.
>> 
>> For example when linking the clang binary itself, without debug info it
consumes 0.6GB with 8 threads, 0.9GB with 16 threads, and 1.4GB with 32 threads.
>> With full debug info, we still have room for improvement, right now it
consumes 2.3GB with 8 threads, 3.5GB with 16 threads, and 6.5GB with 32 threads.
>> 
>> So I believe that configuring with -DDLLVM_PARALLEL_LINK_JOBS=1 should
be enough without other constrains, but your mileage may vary.
> 
> Sure, I'll try that to not introduce too many variables into the
> configure changes,
> though I have to ask if using lld would make it possible to have a common
-Wl
> that works across platforms, being able to ignore if it's binutils.
I’m not sure I understand the question about lld. 
Lld will be a different linker, with its own set of option.
Actually we usually rely on the clang driver to hide platform specific option
and provide a common interface to the user.
> 
> If I really wanted to pass that to cmake, overriding LDFLAGS would work,
right?
I don’t believe LDFLAGS is a valid cmake flag. You need to define both
CMAKE_EXE_LINKER_FLAGS and CMAKE_SHARED_LINKER_FLAGS.

— 
Mehdi

Awanish via llvm-dev

2016-Sep-18 10:54 UTC

head link

[llvm-dev] C compiler cannot create executables

I am trying to build httpd.bc and for this I am configuring as

./configure --disable-shared  CC="_/*llvm-gcc -flto -use-gold-plugin 
-Wl,-plugin-opt=also-emit-llvm*/_" CFLAGS="-g" RANLIB="ar
--plugin
/home/awanish/llvm-2.9/llvm-gcc-4.2-2.9.source/libexec/gcc/x86_64-unknown-linux-gnu/4.2.1/LLVMgold.so
-s" AR_FLAGS="--plugin 
/home/awanish/llvm-2.9/llvm-gcc-4.2-2.9.source/libexec/gcc/x86_64-unknown-linux-gnu/4.2.1/LLVMgold.so
-cru"

but I am getting an error which states that

checking for gcc... llvm-gcc -flto -use-gold-plugin 
-Wl,-plugin-opt=also-emit-llvm
checking whether the C compiler works... no
configure: error: in 
`/home/awanish/PHD/benchmark/httpd-2.2.16/myBuild/srclib/apr':
configure: error: C compiler cannot create executables

I got reference for configuring like this from 
"https://dslabredmine.epfl.ch/embedded/cloud9/user/CompilingLLVM.html".
Can anyone please tell me where I am doing wrong and what is correct 
procedure for generating .bc for for httpd which can be run on klee?

-- 
Thanks and Regards
Awanish Pandey
PhD, CSE
IIT Kanpur

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160918/15f2b37f/attachment.html>

Carsten Mattner via llvm-dev

2016-Sep-18 11:09 UTC

head link

[llvm-dev] (Thin)LTO llvm build

On Sun, Sep 18, 2016 at 6:30 AM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
> I’m not sure I understand the question about lld. Lld will be a
> different linker, with its own set of option. Actually we usually
> rely on the clang driver to hide platform specific option and
> provide a common interface to the user.
I was thinking if I force lld, then the -Wl param will be the same
across platforms, and I wouldn't have to accommodate for different
linkers.
> I don’t believe LDFLAGS is a valid cmake flag. You need to define
> both CMAKE_EXE_LINKER_FLAGS and CMAKE_SHARED_LINKER_FLAGS.
It respects it, as it should, or otherwise packagers would have to
replicate CFLAGS, CXXFLAGS, etc. via CMAKE_*_FLAGS in package build
descriptions.

Carsten Mattner via llvm-dev

2016-Sep-18 11:12 UTC

head link

[llvm-dev] (Thin)LTO llvm build

On Sun, Sep 18, 2016 at 5:45 AM, Xinliang David Li <xinliangli at
gmail.com> wrote:> As Mehdi mentioned, thinLTO backend processes use very little memory, you
> may get away without any additional flags (neither
-Wl,--plugin-opt=jobs=..,
> nor -Dxxx for cmake to limit link parallesm) if your build machine has
> enough memory. Here is some build time data of parallel linking (with
> ThinLTO) 52 binaries in clang build (linking parallelism equals ninja
> parallelism). The machine has 32 logical cores and 64GB memory.
>
> 1) Using the default ninja parallelism, the peak 1min load-average is 537.
> The total elapse time is 9m43s
> 2) Using ninja -j16, the peak load is 411. The elapse time is 8m26s
> 3) ninja -j8 : elapse time is 8m34s
> 4) ninja -j4 : elapse time is 8m50s
> 5) ninja  -j2 : elapse time is 9m54s
> 6) ninja -j1 : elapse time is 12m3s
>
> As you can see, doing serial thinLTO linking across multiple binaries do
not
> give you the best performance. The build performance peaked at j16 in this
> configuration.   You may need to find your best LLVM_PARALLEL_LINK_JOBS
> value.
What did you set LLVM_PARALLEL_LINK_JOBS to?
Maybe I should first try to leave it unset and see if it fits within
my machine's
hardware limits.
> Having said that,  there is definitely  room for ThinLTO usability
> improvement so that ThinLTO parallel backend can coordinate well with the
> build system's parallelism so that user does not need to figure out the
> sweet spot.
Definitely. If parallelism can be controlled on multiple layers, an
outer layer's
setting ought to influence it in a reasonable way to make it more intuitive
to use.

llvm dev - Sep 2016 - (Thin)LTO llvm build

[llvm-dev] (Thin)LTO llvm build

[llvm-dev] (Thin)LTO llvm build

[llvm-dev] (Thin)LTO llvm build

[llvm-dev] C compiler cannot create executables

[llvm-dev] (Thin)LTO llvm build

[llvm-dev] (Thin)LTO llvm build