thr3ads.net - llvm dev - [LLVMdev] LTO question [Dec 2014]

If this information is useful, please help other people find it:
Share via:

Xinliang David Li

2014-Dec-26 23:46 UTC

[LLVMdev] LTO question

(repost the reply using my personal account -- previous reply to the list
got hold up)

On Thu, Dec 25, 2014 at 11:55 PM, Adve, Vikram Sadanand
<vadve at illinois.edu> wrote:> Diego, Teresa, David,
>
> Sorry for my delayed reply; I left for vacation right after sending my
message about this.>
> Diego, it wasn't explicit from your message whether LLVM LTO can handleFirefox-scale programs, which you said GCC can handle.  I assumed that's
what you meant, but could you confirm that?  I understand that neither can
handle the very large Google applications, but that's probably not a
near-term concern for a project like the one Charles is embarking on.

Vikram, LLVM can handle Firefox size programs. Honza wrote two good
articles about LTO.

http://hubicka.blogspot.com/2014/04/linktime-optimization-in-gcc-1-brief.html
http://hubicka.blogspot.com/2014/04/linktime-optimization-in-gcc-2-firefox.html

Comparison with LLVM is described in the second article. It took about
40min to finish building Firefox with llvm using lto and -g. The
following is a quote:

"This graph shows issues with debug info memory use. LLVM goes up to
35GB. LLVM developers are also working on debug info merging
improvements (equivalent to what GCC's type merging is) and the
situation has improved in last two releases until the current shape.
Older LLVM checkouts happily run out of 60GB memory & 60GB swap on my
machine.".
>
> I'd be interested to hear more about the LTO design you folks are
workingon, whenever you're ready to share the details.

We will share the details as soon as we can -- possibly some time in Jan
2015.
> I read the GCC design docs on LTO, and I'm curious how similar ordifferent your approach will be.  For example, the 3-phase approach of
WHOPR is fairly sophisticated (it actually follows closely some research
done at Rice U. and IBM on scalable interprocedural analysis, in the same
group where Preston did his Ph.D.).

In Google, we care mostly about peak optimization performance. Peak
Optimization is basically PGO + CMO. For cross-module optimization
(CMO) to be usable for large applications, small memory footprint is
just one aspect of it,  and fast build time is equally important. Peak
optimization is not only used in release build but  in developer
workflow too. This means build time with CMO should be close to O2
time as much as possible.  It is important to compiler engineers too
-- you don't want to wait for more than 20min to hit a breakpoint in
debugging a compiler problem :)

For this reason, GCC LTO is not used in Google. Instead, the much more
scalable solution called LIPO is widely used for CMO:
https://gcc.gnu.org/wiki/LightweightIpo. LIPO by design requires PGO.

While LIPO is scalable, it has its own limitation that prevents the
compiler from maximizing the benefit of CMO. The new design is
intended to solve the problem with more very aggressive objectives.
The new design is pretty simple and shares the basic principles of
LIPO without requiring PGO (though it still works best with PGO). It
still fits in LTO framework, so that toolchain support change is
minimized. For now, without giving details, I can share  some of the
objectives of the new design:

    * The build should be almost fully parallelizable (at both process
      level and build machine node level)
    * The build should scale to programs with *any/unlimited* size
      (measured in number of TUs). It should handle programs 10x, or 100x
       the size of Firefox.
    * The build time should be very close to non-LTO build, and can be
      considered to be turned on *by default* for O2 or at least O3
      compilations.
    * When turned on the by default, it can eliminate the need for
      users to put inline functions in header files (thus greatly help
      improving parsing time)
    * Most of the benefit of CMO comes from cross module inlining and
      cross module indirect call promotions.  By default, the design
      only enables these two, but it is still compatible with any whole
      program analysis which can be turned on with additional options.

thanks,

David

On Thu, Dec 25, 2014 at 11:55 PM, Adve, Vikram Sadanand <vadve at
illinois.edu>
wrote:>
> Diego, Teresa, David,
>
> Sorry for my delayed reply; I left for vacation right after sending my
> message about this.
>
> Diego, it wasn't explicit from your message whether LLVM LTO can handle
> Firefox-scale programs, which you said GCC can handle.  I assumed
that's
> what you meant, but could you confirm that?  I understand that neither can
> handle the very large Google applications, but that's probably not a
> near-term concern for a project like the one Charles is embarking on.
>
> I'd be interested to hear more about the LTO design you folks are
working
> on, whenever you're ready to share the details.  I read the GCC design
docs
> on LTO, and I'm curious how similar or different your approach will be.
> For example, the 3-phase approach of WHOPR is fairly sophisticated (it
> actually follows closely some research done at Rice U. and IBM on scalable
> interprocedural analysis, in the same group where Preston did his Ph.D.).
>
> For now, I would like to introduce you all to Charles, so that he has
> access to people working on this issue, which will probably continue to be
> a concern for his project.  I have copied you on my reply to him.
>
> Thanks for the information.
>
> --Vikram S. Adve
> Visiting Professor, Computer Science, EPFL
> Professor, Department of Computer Science
> University of Illinois at Urbana-Champaign
> vadve at illinois.edu
> http://llvm.org
>
>
>
>
> On Dec 16, 2014, at 3:48 AM, Teresa Johnson <tejohnson at google.com>
wrote:
>
> > On Fri, Dec 12, 2014 at 1:59 PM, Diego Novillo <dnovillo at
google.com>
> wrote:
> >> On 12/12/14 15:56, Adve, Vikram Sadanand wrote:
> >>>
> >>> I've been asked how LTO in LLVM compares to equivalent
capabilities
> >>> in GCC.  How do the two compare in terms of scalability?  And
> >>> robustness for large applications?
> >>
> >>
> >> Neither GCC nor LLVM can handle our (Google) large applications.
They're
> >> just too massive for the kind of linking done by LTO.
> >>
> >> When we built GCC's LTO, we were trying to address this by
creating a
> >> partitioned model, where the analysis phase and the codegen phase
are
> split
> >> to allow working on partial callgraphs
> >> (http://gcc.gnu.org/wiki/LinkTimeOptimization for details).
> >>
> >> This allows to split and parallelize the initial bytecode
generation
> and the
> >> final optimization/codegen. However, the analysis is still
implemented
> as a
> >> single process. We found that we cannot even load summaries, types
and
> >> symbols in an efficient way.
> >>
> >> It does allow for programs like Firefox to be handled. So, if by
"big"
> you
> >> need to handle something of that size, this model can doit.
> >>
> >> With LLVM, I can't even load the IR for one of our large
programs on a
> box
> >> with 64Gb of RAM.
> >>
> >>> Also, are there any ongoing efforts or plans to improve LTO in
LLVM
> >>> in the near future?
> >>
> >>
> >> Yes. We are going to be investing in this area very soon. David
and
> Teresa
> >> (CC'd) will have details.
> >
> > Still working out the details, but we are investigating a solution
> > that is scalable to very large programs. We'll share the design in
the
> > near future when we have more details worked out so that we can get
> > feedback.
> >
> > Thanks!
> > Teresa
> >
> >>
> >>
> >> Diego.
> >
> >
> >
> > --
> > Teresa Johnson | Software Engineer | tejohnson at google.com |
408-460-2413
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141226/b075fc56/attachment.html>

Eric Christopher

2014-Dec-27 00:45 UTC

head link

[LLVMdev] LTO question

>
>
> Comparison with LLVM is described in the second article. It took about
> 40min to finish building Firefox with llvm using lto and -g. The
> following is a quote:
>
> "This graph shows issues with debug info memory use. LLVM goes up to
> 35GB. LLVM developers are also working on debug info merging
> improvements (equivalent to what GCC's type merging is) and the
> situation has improved in last two releases until the current shape.
> Older LLVM checkouts happily run out of 60GB memory & 60GB swap on my
> machine.".
>
As an aside:

This has been largely alleviated on programs the size of firefox. The
current work that Duncan is doing is designed to reduce memory usage even
more - quite a bit.

That said, debug info quality for types and locations with LTO is quite
good under LLVM if you're willing to spend the memory versus the current
state of gcc. GCC has far better optimized location information than gcc,
but both are, of course, improving rapidly.

-eric

>
>
> >
> > I'd be interested to hear more about the LTO design you folks are
> working on, whenever you're ready to share the details.
>
> We will share the details as soon as we can -- possibly some time in Jan
> 2015.
>
>
> > I read the GCC design docs on LTO, and I'm curious how similar or
> different your approach will be.  For example, the 3-phase approach of
> WHOPR is fairly sophisticated (it actually follows closely some research
> done at Rice U. and IBM on scalable interprocedural analysis, in the same
> group where Preston did his Ph.D.).
>
> In Google, we care mostly about peak optimization performance. Peak
> Optimization is basically PGO + CMO. For cross-module optimization
> (CMO) to be usable for large applications, small memory footprint is
> just one aspect of it,  and fast build time is equally important. Peak
> optimization is not only used in release build but  in developer
> workflow too. This means build time with CMO should be close to O2
> time as much as possible.  It is important to compiler engineers too
> -- you don't want to wait for more than 20min to hit a breakpoint in
> debugging a compiler problem :)
>
> For this reason, GCC LTO is not used in Google. Instead, the much more
> scalable solution called LIPO is widely used for CMO:
> https://gcc.gnu.org/wiki/LightweightIpo. LIPO by design requires PGO.
>
> While LIPO is scalable, it has its own limitation that prevents the
> compiler from maximizing the benefit of CMO. The new design is
> intended to solve the problem with more very aggressive objectives.
> The new design is pretty simple and shares the basic principles of
> LIPO without requiring PGO (though it still works best with PGO). It
> still fits in LTO framework, so that toolchain support change is
> minimized. For now, without giving details, I can share  some of the
> objectives of the new design:
>
>     * The build should be almost fully parallelizable (at both process
>       level and build machine node level)
>     * The build should scale to programs with *any/unlimited* size
>       (measured in number of TUs). It should handle programs 10x, or 100x
>        the size of Firefox.
>     * The build time should be very close to non-LTO build, and can be
>       considered to be turned on *by default* for O2 or at least O3
>       compilations.
>     * When turned on the by default, it can eliminate the need for
>       users to put inline functions in header files (thus greatly help
>       improving parsing time)
>     * Most of the benefit of CMO comes from cross module inlining and
>       cross module indirect call promotions.  By default, the design
>       only enables these two, but it is still compatible with any whole
>       program analysis which can be turned on with additional options.
>
> thanks,
>
> David
>
> On Thu, Dec 25, 2014 at 11:55 PM, Adve, Vikram Sadanand <
> vadve at illinois.edu> wrote:
>>
>> Diego, Teresa, David,
>>
>> Sorry for my delayed reply; I left for vacation right after sending my
>> message about this.
>>
>> Diego, it wasn't explicit from your message whether LLVM LTO can
handle
>> Firefox-scale programs, which you said GCC can handle.  I assumed
that's
>> what you meant, but could you confirm that?  I understand that neither
can
>> handle the very large Google applications, but that's probably not
a
>> near-term concern for a project like the one Charles is embarking on.
>>
>> I'd be interested to hear more about the LTO design you folks are
working
>> on, whenever you're ready to share the details.  I read the GCC
design docs
>> on LTO, and I'm curious how similar or different your approach will
be.
>> For example, the 3-phase approach of WHOPR is fairly sophisticated (it
>> actually follows closely some research done at Rice U. and IBM on
scalable
>> interprocedural analysis, in the same group where Preston did his
Ph.D.).
>>
>> For now, I would like to introduce you all to Charles, so that he has
>> access to people working on this issue, which will probably continue to
be
>> a concern for his project.  I have copied you on my reply to him.
>>
>> Thanks for the information.
>>
>> --Vikram S. Adve
>> Visiting Professor, Computer Science, EPFL
>> Professor, Department of Computer Science
>> University of Illinois at Urbana-Champaign
>> vadve at illinois.edu
>> http://llvm.org
>>
>>
>>
>>
>> On Dec 16, 2014, at 3:48 AM, Teresa Johnson <tejohnson at
google.com> wrote:
>>
>> > On Fri, Dec 12, 2014 at 1:59 PM, Diego Novillo <dnovillo at
google.com>
>> wrote:
>> >> On 12/12/14 15:56, Adve, Vikram Sadanand wrote:
>> >>>
>> >>> I've been asked how LTO in LLVM compares to equivalent
capabilities
>> >>> in GCC.  How do the two compare in terms of scalability? 
And
>> >>> robustness for large applications?
>> >>
>> >>
>> >> Neither GCC nor LLVM can handle our (Google) large
applications.
>> They're
>> >> just too massive for the kind of linking done by LTO.
>> >>
>> >> When we built GCC's LTO, we were trying to address this by
creating a
>> >> partitioned model, where the analysis phase and the codegen
phase are
>> split
>> >> to allow working on partial callgraphs
>> >> (http://gcc.gnu.org/wiki/LinkTimeOptimization for details).
>> >>
>> >> This allows to split and parallelize the initial bytecode
generation
>> and the
>> >> final optimization/codegen. However, the analysis is still
implemented
>> as a
>> >> single process. We found that we cannot even load summaries,
types and
>> >> symbols in an efficient way.
>> >>
>> >> It does allow for programs like Firefox to be handled. So, if
by "big"
>> you
>> >> need to handle something of that size, this model can doit.
>> >>
>> >> With LLVM, I can't even load the IR for one of our large
programs on a
>> box
>> >> with 64Gb of RAM.
>> >>
>> >>> Also, are there any ongoing efforts or plans to improve
LTO in LLVM
>> >>> in the near future?
>> >>
>> >>
>> >> Yes. We are going to be investing in this area very soon.
David and
>> Teresa
>> >> (CC'd) will have details.
>> >
>> > Still working out the details, but we are investigating a solution
>> > that is scalable to very large programs. We'll share the
design in the
>> > near future when we have more details worked out so that we can
get
>> > feedback.
>> >
>> > Thanks!
>> > Teresa
>> >
>> >>
>> >>
>> >> Diego.
>> >
>> >
>> >
>> > --
>> > Teresa Johnson | Software Engineer | tejohnson at google.com |
>> 408-460-2413
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141227/b86d31ab/attachment.html>

Philip Reames

2014-Dec-28 20:13 UTC

head link

[LLVMdev] LTO question

On 12/26/2014 03:46 PM, Xinliang David Li wrote:> Vikram, LLVM can handle Firefox size programs. Honza wrote two good
> articles about LTO.
>
>
http://hubicka.blogspot.com/2014/04/linktime-optimization-in-gcc-1-brief.html
>
http://hubicka.blogspot.com/2014/04/linktime-optimization-in-gcc-2-firefox.html
>
> Comparison with LLVM is described in the second article. It took about
> 40min to finish building Firefox with llvm using lto and -g. The
> following is a quote:
>
> "This graph shows issues with debug info memory use. LLVM goes up to
> 35GB. LLVM developers are also working on debug info merging
> improvements (equivalent to what GCC's type merging is) and the
> situation has improved in last two releases until the current shape.
> Older LLVM checkouts happily run out of 60GB memory & 60GB swap on my
> machine.".One thing I want to highlight in the above is that this is a build with 
debug information.  My understanding from listening in on conversations 
at the developers meeting is that LTO w/o debug information scales 
substantially better.  Most of the discussion of memory usage is 
specific to retaining full debug info while doing LTO.  Either compiling 
without debug info at all, or compiling with only line table information 
results in dramatically reduced memory usage.

Note: I have no direct experience here; I'm just repeating what I've
heard.

Philip



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141228/55e3ffbb/attachment.html>

Eric Christopher

2014-Dec-28 21:08 UTC

head link

[LLVMdev] LTO question

On Sun Dec 28 2014 at 12:17:22 PM Philip Reames <listmail at
philipreames.com>
wrote:
> On 12/26/2014 03:46 PM, Xinliang David Li wrote:
>
> Vikram, LLVM can handle Firefox size programs. Honza wrote two good
> articles about LTO.
>
>
>
http://hubicka.blogspot.com/2014/04/linktime-optimization-in-gcc-1-brief.html
>
>
http://hubicka.blogspot.com/2014/04/linktime-optimization-in-gcc-2-firefox.html
>
> Comparison with LLVM is described in the second article. It took about
> 40min to finish building Firefox with llvm using lto and -g. The
> following is a quote:
>
> "This graph shows issues with debug info memory use. LLVM goes up to
> 35GB. LLVM developers are also working on debug info merging
> improvements (equivalent to what GCC's type merging is) and the
> situation has improved in last two releases until the current shape.
> Older LLVM checkouts happily run out of 60GB memory & 60GB swap on my
> machine.".
>
> One thing I want to highlight in the above is that this is a build with
> debug information.  My understanding from listening in on conversations at
> the developers meeting is that LTO w/o debug information scales
> substantially better.  Most of the discussion of memory usage is specific
> to retaining full debug info while doing LTO.  Either compiling without
> debug info at all, or compiling with only line table information results in
> dramatically reduced memory usage.
>
> Note: I have no direct experience here; I'm just repeating what
I've
> heard.
>
>This is correct, it's also an area of active development. :)

Thanks!

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141228/a487f22b/attachment.html>

llvm dev - Dec 2014 - [LLVMdev] LTO question

[LLVMdev] LTO question

[LLVMdev] LTO question

[LLVMdev] LTO question

[LLVMdev] LTO question