thr3ads.net - llvm dev - [llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units [Apr 2016]

If this information is useful, please help other people find it:
Share via:

Mehdi Amini via llvm-dev

2016-Apr-07 04:53 UTC

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

> On Apr 6, 2016, at 9:40 PM, Teresa Johnson <tejohnson at google.com>
wrote:
> 
> 
> 
> On Wed, Apr 6, 2016 at 5:13 PM, Peter Collingbourne <peter at pcc.me.uk
<mailto:peter at pcc.me.uk>> wrote:
> 
> 
> On Wed, Apr 6, 2016 at 4:53 PM, Mehdi Amini <mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>> wrote:
> 
>> On Apr 6, 2016, at 4:41 PM, Peter Collingbourne <peter at pcc.me.uk
<mailto:peter at pcc.me.uk>> wrote:
>> 
>> Hi all,
>> 
>> I'd like to propose changes to how we do promotion of global values
in ThinLTO. The goal here is to make it possible to pre-compile parts of the
translation unit to native code at compile time. For example, if we know that:
>> 
>> 1) A function is a leaf function, so it will never import any other
functions, and
> 
> It still may be imported somewhere else right?
> 
>> 2) The function's instruction count falls above a threshold
specified at compile time, so it will never be imported.
> 
> It won’t be imported, but unless it is a “leaf” it may import and inline
itself.
> 
>> or
>> 3) The compile-time threshold is zero, so there is no possibility of
functions being imported (What's the utility of this? Consider a program
transformation that requires whole-program information, such as CFI. During
development, the import threshold may be set to zero in order to minimize the
incremental link time while still providing the same CFI enforcement that would
be used in production builds of the application.)
>> 
>> then the function's body will not be affected by link-time
decisions, and we might as well produce its object code at compile time.
> 
> Reading this last sentence, it seems exactly the “non-LTO” case?
> 
> Yes, basically the point of this proposal is to be able to split the
linkage unit into LTO and non-LTO parts.
> 
> 
>> This will also allow the object code to be shared between linkage units
(this should hopefully help solve a major scalability problem for Chromium, as
that project contains a large number of test binaries based on common
libraries).
>> 
>> This can be done with a change to the intermediate object file format.
We can represent object files as native code containing statically compiled
functions and global data in the .text,. data, .rodata (etc.) sections, with an
.llvmbc section (or, I suppose, "__LLVM, __bitcode" when targeting
Mach-O) containing bitcode for functions to be compiled at link time.
>> 
>> In order to make this work, we need to make sure that references from
link-time compiled functions to statically compiled functions work correctly in
the case where the statically compiled function has internal linkage. We can do
this by promoting every global value with internal linkage, using a hash of the
external names (as I mentioned in [1]).
> 
> 
> Mehdi - I know you were keen to reduce the amount of promotion. Is that
still an issue for you assuming linker GC (dead stripping)?
Yes: we do better optimization on internal function in general. Our benchmarks
showed that it can really make some difference, and many cases were ThinLTO
didn’t perform as well as FullLTO were because of this promotion.
(binary size has never been my concern here)
> With this proposal we will need to stick with the current promote
everything scheme.
I don’t think so: you would need do it only for “internal functions that a leaf
and aren’t likely to be imported/inlined”.
That said any function that we emit the binary at compile time instead of link
time will contribute to inhibit optimizations for LTO/ThinLTO. The gain in
compile time has to be really important to make it worth it.
(Of course the CFI use-case is a totally different tradeoff).

Peter: have you thought about debug info by the way?

— 
Mehdi


>  
>> 
>> I imagine that for some linkers, it may not be possible to deal with
this scheme. For example, I did some investigation last year and discovered that
I could not use the gold plugin interface to load a native object file if we had
already claimed it as an IR file. I wouldn't be surprised to learn that ld64
has similar problems.
> 
> I suspect ld64 would need to be update to handle this scheme. Somehow it
need to be able to extract the object from the section.
> 
> Do you mean the bitcode object? There's already support in LLVM for
this (http://llvm-cs.pcc.me.uk/lib/Object/IRObjectFile.cpp#269
<http://llvm-cs.pcc.me.uk/lib/Object/IRObjectFile.cpp#269>). I suspect the
tricky part (which I was unsuccessful at doing with gold) will be convincing the
linker that the bitcode object and the regular object are two separate things.
> 
> Otherwise this should work, but I suspect the applicability (leaving CFI
aside) may not concern that many functions, so I’m not sure about the impact.
> 
> I'm also curious about the applicability in the regular ThinLTO case.
One thing that may help with applicability is that I suspect that not only leaf
functions but also functions that only call (directly or indirectly) functions
in the same TU, as well as functions that only make calls via function pointers
or vtables may fall under the criteria for non-importing.
> 
> With indirect call profiling the function pointer case should not be a
blocker for importing.
> 
> Teresa
>  
> 
> Peter
> 
> 
> 
> -- 
> Teresa Johnson |	 Software Engineer |	 tejohnson at google.com
<mailto:tejohnson at google.com> |	 408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160406/d7dad6a6/attachment.html>

Teresa Johnson via llvm-dev

2016-Apr-07 05:11 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

On Wed, Apr 6, 2016 at 9:53 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
> On Apr 6, 2016, at 9:40 PM, Teresa Johnson <tejohnson at google.com>
wrote:
>
>
>
> On Wed, Apr 6, 2016 at 5:13 PM, Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
>
>>
>>
>> On Wed, Apr 6, 2016 at 4:53 PM, Mehdi Amini <mehdi.amini at
apple.com>
>> wrote:
>>
>>>
>>> On Apr 6, 2016, at 4:41 PM, Peter Collingbourne <peter at
pcc.me.uk> wrote:
>>>
>>> Hi all,
>>>
>>> I'd like to propose changes to how we do promotion of global
values in
>>> ThinLTO. The goal here is to make it possible to pre-compile parts
of the
>>> translation unit to native code at compile time. For example, if we
know
>>> that:
>>>
>>> 1) A function is a leaf function, so it will never import any other
>>> functions, and
>>>
>>>
>>> It still may be imported somewhere else right?
>>>
>>> 2) The function's instruction count falls above a threshold
specified at
>>> compile time, so it will never be imported.
>>>
>>>
>>> It won’t be imported, but unless it is a “leaf” it may import and
inline
>>> itself.
>>>
>>
>>> or
>>> 3) The compile-time threshold is zero, so there is no possibility
of
>>> functions being imported (What's the utility of this? Consider
a program
>>> transformation that requires whole-program information, such as
CFI. During
>>> development, the import threshold may be set to zero in order to
minimize
>>> the incremental link time while still providing the same CFI
enforcement
>>> that would be used in production builds of the application.)
>>>
>>> then the function's body will not be affected by link-time
decisions,
>>> and we might as well produce its object code at compile time.
>>>
>>>
>>> Reading this last sentence, it seems exactly the “non-LTO” case?
>>>
>>
>> Yes, basically the point of this proposal is to be able to split the
>> linkage unit into LTO and non-LTO parts.
>>
>>
>>> This will also allow the object code to be shared between linkage
units
>>> (this should hopefully help solve a major scalability problem for
Chromium,
>>> as that project contains a large number of test binaries based on
common
>>> libraries).
>>>
>>> This can be done with a change to the intermediate object file
format.
>>> We can represent object files as native code containing statically
compiled
>>> functions and global data in the .text,. data, .rodata (etc.)
sections,
>>> with an .llvmbc section (or, I suppose, "__LLVM,
__bitcode" when targeting
>>> Mach-O) containing bitcode for functions to be compiled at link
time.
>>>
>>> In order to make this work, we need to make sure that references
from
>>> link-time compiled functions to statically compiled functions work
>>> correctly in the case where the statically compiled function has
internal
>>> linkage. We can do this by promoting every global value with
internal
>>> linkage, using a hash of the external names (as I mentioned in
[1]).
>>>
>>>
> Mehdi - I know you were keen to reduce the amount of promotion. Is that
> still an issue for you assuming linker GC (dead stripping)?
>
>
> Yes: we do better optimization on internal function in general. Our
> benchmarks showed that it can really make some difference, and many cases
> were ThinLTO didn’t perform as well as FullLTO were because of this
> promotion.
> (binary size has never been my concern here)
>
> With this proposal we will need to stick with the current promote
> everything scheme.
>
>
> I don’t think so: you would need do it only for “internal functions that a
> leaf and aren’t likely to be imported/inlined”.
>
I suppose you could do promotion in two different places - during the
compile step for these functions will will be emitted to text, and in the
back ends for the rest if they have references imported elsewhere.

That said any function that we emit the binary at compile time instead
of> link time will contribute to inhibit optimizations for LTO/ThinLTO. The
> gain in compile time has to be really important to make it worth it.
> (Of course the CFI use-case is a totally different tradeoff).
>
> Peter: have you thought about debug info by the way?
>
> —
> Mehdi
>
>
>
>
>
>>
>>> I imagine that for some linkers, it may not be possible to deal
with
>>> this scheme. For example, I did some investigation last year and
discovered
>>> that I could not use the gold plugin interface to load a native
object file
>>> if we had already claimed it as an IR file. I wouldn't be
surprised to
>>> learn that ld64 has similar problems.
>>>
>>>
>>> I suspect ld64 would need to be update to handle this scheme.
Somehow it
>>> need to be able to extract the object from the section.
>>>
>>
>> Do you mean the bitcode object? There's already support in LLVM for
this (
>> http://llvm-cs.pcc.me.uk/lib/Object/IRObjectFile.cpp#269). I suspect
the
>> tricky part (which I was unsuccessful at doing with gold) will be
>> convincing the linker that the bitcode object and the regular object
are
>> two separate things.
>>
>> Otherwise this should work, but I suspect the applicability (leaving
CFI
>>> aside) may not concern that many functions, so I’m not sure about
the
>>> impact.
>>>
>>
>> I'm also curious about the applicability in the regular ThinLTO
case. One
>> thing that may help with applicability is that I suspect that not only
leaf
>> functions but also functions that only call (directly or indirectly)
>> functions in the same TU, as well as functions that only make calls via
>> function pointers or vtables may fall under the criteria for
non-importing.
>>
>
> With indirect call profiling the function pointer case should not be a
> blocker for importing.
>
> Teresa
>
>
>>
>> Peter
>>
>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> 408-460-2413
>
>
>

-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160406/66bf22af/attachment-0001.html>

Xinliang David Li via llvm-dev

2016-Apr-07 17:58 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

On Wed, Apr 6, 2016 at 9:53 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
> On Apr 6, 2016, at 9:40 PM, Teresa Johnson <tejohnson at google.com>
wrote:
>
>
>
> On Wed, Apr 6, 2016 at 5:13 PM, Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
>
>>
>>
>> On Wed, Apr 6, 2016 at 4:53 PM, Mehdi Amini <mehdi.amini at
apple.com>
>> wrote:
>>
>>>
>>> On Apr 6, 2016, at 4:41 PM, Peter Collingbourne <peter at
pcc.me.uk> wrote:
>>>
>>> Hi all,
>>>
>>> I'd like to propose changes to how we do promotion of global
values in
>>> ThinLTO. The goal here is to make it possible to pre-compile parts
of the
>>> translation unit to native code at compile time. For example, if we
know
>>> that:
>>>
>>> 1) A function is a leaf function, so it will never import any other
>>> functions, and
>>>
>>>
>>> It still may be imported somewhere else right?
>>>
>>> 2) The function's instruction count falls above a threshold
specified at
>>> compile time, so it will never be imported.
>>>
>>>
>>> It won’t be imported, but unless it is a “leaf” it may import and
inline
>>> itself.
>>>
>>
>>> or
>>> 3) The compile-time threshold is zero, so there is no possibility
of
>>> functions being imported (What's the utility of this? Consider
a program
>>> transformation that requires whole-program information, such as
CFI. During
>>> development, the import threshold may be set to zero in order to
minimize
>>> the incremental link time while still providing the same CFI
enforcement
>>> that would be used in production builds of the application.)
>>>
>>> then the function's body will not be affected by link-time
decisions,
>>> and we might as well produce its object code at compile time.
>>>
>>>
>>> Reading this last sentence, it seems exactly the “non-LTO” case?
>>>
>>
>> Yes, basically the point of this proposal is to be able to split the
>> linkage unit into LTO and non-LTO parts.
>>
>>
>>> This will also allow the object code to be shared between linkage
units
>>> (this should hopefully help solve a major scalability problem for
Chromium,
>>> as that project contains a large number of test binaries based on
common
>>> libraries).
>>>
>>> This can be done with a change to the intermediate object file
format.
>>> We can represent object files as native code containing statically
compiled
>>> functions and global data in the .text,. data, .rodata (etc.)
sections,
>>> with an .llvmbc section (or, I suppose, "__LLVM,
__bitcode" when targeting
>>> Mach-O) containing bitcode for functions to be compiled at link
time.
>>>
>>> In order to make this work, we need to make sure that references
from
>>> link-time compiled functions to statically compiled functions work
>>> correctly in the case where the statically compiled function has
internal
>>> linkage. We can do this by promoting every global value with
internal
>>> linkage, using a hash of the external names (as I mentioned in
[1]).
>>>
>>>
> Mehdi - I know you were keen to reduce the amount of promotion. Is that
> still an issue for you assuming linker GC (dead stripping)?
>
>
> Yes: we do better optimization on internal function in general.
>
Inliner is one of the affected optimization -- however this sounds like a
matter of tuning to teach inliner about promoted static functions.

David


> Our benchmarks showed that it can really make some difference, and many
> cases were ThinLTO didn’t perform as well as FullLTO were because of this
> promotion.
> (binary size has never been my concern here)
>
>
> With this proposal we will need to stick with the current promote
> everything scheme.
>
>
> I don’t think so: you would need do it only for “internal functions that a
> leaf and aren’t likely to be imported/inlined”.
> That said any function that we emit the binary at compile time instead of
> link time will contribute to inhibit optimizations for LTO/ThinLTO. The
> gain in compile time has to be really important to make it worth it.
> (Of course the CFI use-case is a totally different tradeoff).
>
> Peter: have you thought about debug info by the way?
>
> —
> Mehdi
>
>
>
>
>
>>
>>> I imagine that for some linkers, it may not be possible to deal
with
>>> this scheme. For example, I did some investigation last year and
discovered
>>> that I could not use the gold plugin interface to load a native
object file
>>> if we had already claimed it as an IR file. I wouldn't be
surprised to
>>> learn that ld64 has similar problems.
>>>
>>>
>>> I suspect ld64 would need to be update to handle this scheme.
Somehow it
>>> need to be able to extract the object from the section.
>>>
>>
>> Do you mean the bitcode object? There's already support in LLVM for
this (
>> http://llvm-cs.pcc.me.uk/lib/Object/IRObjectFile.cpp#269). I suspect
the
>> tricky part (which I was unsuccessful at doing with gold) will be
>> convincing the linker that the bitcode object and the regular object
are
>> two separate things.
>>
>> Otherwise this should work, but I suspect the applicability (leaving
CFI
>>> aside) may not concern that many functions, so I’m not sure about
the
>>> impact.
>>>
>>
>> I'm also curious about the applicability in the regular ThinLTO
case. One
>> thing that may help with applicability is that I suspect that not only
leaf
>> functions but also functions that only call (directly or indirectly)
>> functions in the same TU, as well as functions that only make calls via
>> function pointers or vtables may fall under the criteria for
non-importing.
>>
>
> With indirect call profiling the function pointer case should not be a
> blocker for importing.
>
> Teresa
>
>
>>
>> Peter
>>
>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> 408-460-2413
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160407/24dfa3eb/attachment-0001.html>

Mehdi Amini via llvm-dev

2016-Apr-07 18:26 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

> On Apr 7, 2016, at 10:58 AM, Xinliang David Li <davidxl at
google.com> wrote:
> 
> 
> 
> On Wed, Apr 6, 2016 at 9:53 PM, Mehdi Amini <mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>> wrote:
> 
>> On Apr 6, 2016, at 9:40 PM, Teresa Johnson <tejohnson at google.com
<mailto:tejohnson at google.com>> wrote:
>> 
>> 
>> 
>> On Wed, Apr 6, 2016 at 5:13 PM, Peter Collingbourne <peter at
pcc.me.uk <mailto:peter at pcc.me.uk>> wrote:
>> 
>> 
>> On Wed, Apr 6, 2016 at 4:53 PM, Mehdi Amini <mehdi.amini at
apple.com <mailto:mehdi.amini at apple.com>> wrote:
>> 
>>> On Apr 6, 2016, at 4:41 PM, Peter Collingbourne <peter at
pcc.me.uk <mailto:peter at pcc.me.uk>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I'd like to propose changes to how we do promotion of global
values in ThinLTO. The goal here is to make it possible to pre-compile parts of
the translation unit to native code at compile time. For example, if we know
that:
>>> 
>>> 1) A function is a leaf function, so it will never import any other
functions, and
>> 
>> It still may be imported somewhere else right?
>> 
>>> 2) The function's instruction count falls above a threshold
specified at compile time, so it will never be imported.
>> 
>> It won’t be imported, but unless it is a “leaf” it may import and
inline itself.
>> 
>>> or
>>> 3) The compile-time threshold is zero, so there is no possibility
of functions being imported (What's the utility of this? Consider a program
transformation that requires whole-program information, such as CFI. During
development, the import threshold may be set to zero in order to minimize the
incremental link time while still providing the same CFI enforcement that would
be used in production builds of the application.)
>>> 
>>> then the function's body will not be affected by link-time
decisions, and we might as well produce its object code at compile time.
>> 
>> Reading this last sentence, it seems exactly the “non-LTO” case?
>> 
>> Yes, basically the point of this proposal is to be able to split the
linkage unit into LTO and non-LTO parts.
>> 
>> 
>>> This will also allow the object code to be shared between linkage
units (this should hopefully help solve a major scalability problem for
Chromium, as that project contains a large number of test binaries based on
common libraries).
>>> 
>>> This can be done with a change to the intermediate object file
format. We can represent object files as native code containing statically
compiled functions and global data in the .text,. data, .rodata (etc.) sections,
with an .llvmbc section (or, I suppose, "__LLVM, __bitcode" when
targeting Mach-O) containing bitcode for functions to be compiled at link time.
>>> 
>>> In order to make this work, we need to make sure that references
from link-time compiled functions to statically compiled functions work
correctly in the case where the statically compiled function has internal
linkage. We can do this by promoting every global value with internal linkage,
using a hash of the external names (as I mentioned in [1]).
>> 
>> 
>> Mehdi - I know you were keen to reduce the amount of promotion. Is that
still an issue for you assuming linker GC (dead stripping)?
> 
> Yes: we do better optimization on internal function in general. 
> 
> Inliner is one of the affected optimization -- however this sounds like a
matter of tuning to teach inliner about promoted static functions.
The inliner compute a tradeoff between pseudo runtime cost and binary size, the
existing bonus for static functions is when there is a single call site because
it makes the binary increase inexistant (dropping the static after inline). We
promote function because we think we are likely to introduce a reference to it
somewhere else, so “lying” to the inliner is not necessarily a good idea.
That said we (actually Bruno did) prototyped it already with somehow good
results :)
I’m not convinced yet that it should be independent of promoted or not promoted
though.

Assuming we solve the inliner issue, then remain the “optimizations other than
inliner”. We can probably solve most but I suspect it won’t be “trivial” either.

— 
Mehdi
>  
> 
> David
> 
>  
> Our benchmarks showed that it can really make some difference, and many
cases were ThinLTO didn’t perform as well as FullLTO were because of this
promotion.
> (binary size has never been my concern here)
>  
> 
>> With this proposal we will need to stick with the current promote
everything scheme.
> 
> I don’t think so: you would need do it only for “internal functions that a
leaf and aren’t likely to be imported/inlined”.
> That said any function that we emit the binary at compile time instead of
link time will contribute to inhibit optimizations for LTO/ThinLTO. The gain in
compile time has to be really important to make it worth it.
> (Of course the CFI use-case is a totally different tradeoff).
> 
> Peter: have you thought about debug info by the way?
> 
> — 
> Mehdi
> 
> 
> 
>>  
>>> 
>>> I imagine that for some linkers, it may not be possible to deal
with this scheme. For example, I did some investigation last year and discovered
that I could not use the gold plugin interface to load a native object file if
we had already claimed it as an IR file. I wouldn't be surprised to learn
that ld64 has similar problems.
>> 
>> I suspect ld64 would need to be update to handle this scheme. Somehow
it need to be able to extract the object from the section.
>> 
>> Do you mean the bitcode object? There's already support in LLVM for
this (http://llvm-cs.pcc.me.uk/lib/Object/IRObjectFile.cpp#269
<http://llvm-cs.pcc.me.uk/lib/Object/IRObjectFile.cpp#269>). I suspect the
tricky part (which I was unsuccessful at doing with gold) will be convincing the
linker that the bitcode object and the regular object are two separate things.
>> 
>> Otherwise this should work, but I suspect the applicability (leaving
CFI aside) may not concern that many functions, so I’m not sure about the
impact.
>> 
>> I'm also curious about the applicability in the regular ThinLTO
case. One thing that may help with applicability is that I suspect that not only
leaf functions but also functions that only call (directly or indirectly)
functions in the same TU, as well as functions that only make calls via function
pointers or vtables may fall under the criteria for non-importing.
>> 
>> With indirect call profiling the function pointer case should not be a
blocker for importing.
>> 
>> Teresa
>>  
>> 
>> Peter
>> 
>> 
>> 
>> -- 
>> Teresa Johnson |	 Software Engineer |	 tejohnson at google.com
<mailto:tejohnson at google.com> |	 408-460-2413 <tel:408-460-2413>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160407/a7ce7366/attachment.html>

Peter Collingbourne via llvm-dev

2016-Apr-07 19:32 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

On Wed, Apr 6, 2016 at 9:53 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
> On Apr 6, 2016, at 9:40 PM, Teresa Johnson <tejohnson at google.com>
wrote:
>
>
>
> On Wed, Apr 6, 2016 at 5:13 PM, Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
>
>>
>>
>> On Wed, Apr 6, 2016 at 4:53 PM, Mehdi Amini <mehdi.amini at
apple.com>
>> wrote:
>>
>>>
>>> On Apr 6, 2016, at 4:41 PM, Peter Collingbourne <peter at
pcc.me.uk> wrote:
>>>
>>> Hi all,
>>>
>>> I'd like to propose changes to how we do promotion of global
values in
>>> ThinLTO. The goal here is to make it possible to pre-compile parts
of the
>>> translation unit to native code at compile time. For example, if we
know
>>> that:
>>>
>>> 1) A function is a leaf function, so it will never import any other
>>> functions, and
>>>
>>>
>>> It still may be imported somewhere else right?
>>>
>>> 2) The function's instruction count falls above a threshold
specified at
>>> compile time, so it will never be imported.
>>>
>>>
>>> It won’t be imported, but unless it is a “leaf” it may import and
inline
>>> itself.
>>>
>>
>>> or
>>> 3) The compile-time threshold is zero, so there is no possibility
of
>>> functions being imported (What's the utility of this? Consider
a program
>>> transformation that requires whole-program information, such as
CFI. During
>>> development, the import threshold may be set to zero in order to
minimize
>>> the incremental link time while still providing the same CFI
enforcement
>>> that would be used in production builds of the application.)
>>>
>>> then the function's body will not be affected by link-time
decisions,
>>> and we might as well produce its object code at compile time.
>>>
>>>
>>> Reading this last sentence, it seems exactly the “non-LTO” case?
>>>
>>
>> Yes, basically the point of this proposal is to be able to split the
>> linkage unit into LTO and non-LTO parts.
>>
>>
>>> This will also allow the object code to be shared between linkage
units
>>> (this should hopefully help solve a major scalability problem for
Chromium,
>>> as that project contains a large number of test binaries based on
common
>>> libraries).
>>>
>>> This can be done with a change to the intermediate object file
format.
>>> We can represent object files as native code containing statically
compiled
>>> functions and global data in the .text,. data, .rodata (etc.)
sections,
>>> with an .llvmbc section (or, I suppose, "__LLVM,
__bitcode" when targeting
>>> Mach-O) containing bitcode for functions to be compiled at link
time.
>>>
>>> In order to make this work, we need to make sure that references
from
>>> link-time compiled functions to statically compiled functions work
>>> correctly in the case where the statically compiled function has
internal
>>> linkage. We can do this by promoting every global value with
internal
>>> linkage, using a hash of the external names (as I mentioned in
[1]).
>>>
>>>
> Mehdi - I know you were keen to reduce the amount of promotion. Is that
> still an issue for you assuming linker GC (dead stripping)?
>
>
> Yes: we do better optimization on internal function in general. Our
> benchmarks showed that it can really make some difference, and many cases
> were ThinLTO didn’t perform as well as FullLTO were because of this
> promotion.
> (binary size has never been my concern here)
>
> With this proposal we will need to stick with the current promote
> everything scheme.
>
>
> I don’t think so: you would need do it only for “internal functions that a
> leaf and aren’t likely to be imported/inlined”.
> That said any function that we emit the binary at compile time instead of
> link time will contribute to inhibit optimizations for LTO/ThinLTO. The
> gain in compile time has to be really important to make it worth it.
> (Of course the CFI use-case is a totally different tradeoff).
>
> Peter: have you thought about debug info by the way?
>
Yes, I suspect we'll have to duplicate the debug info between the non-LTO
and the LTO part like I was doing with parallel LTO codegen. In practice,
that means we'll end up with two DWARF compile units per TU. That's
probably better than it being a factor of the number of threads, and since
only one of the two compile units will be codegen'd at any one time, we
hopefully shouldn't see the sort of memory consumption we were seeing with
parallel LTO codegen.

Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160407/90800b43/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Apr 2016 - RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

Possibly Parallel Threads