thr3ads.net - llvm dev - [llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units [Apr 2016]

If this information is useful, please help other people find it:
Share via:

Peter Collingbourne via llvm-dev

2016-Apr-06 23:41 UTC

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

Hi all,

I'd like to propose changes to how we do promotion of global values in
ThinLTO. The goal here is to make it possible to pre-compile parts of the
translation unit to native code at compile time. For example, if we know
that:

1) A function is a leaf function, so it will never import any other
functions, and
2) The function's instruction count falls above a threshold specified at
compile time, so it will never be imported.
or
3) The compile-time threshold is zero, so there is no possibility of
functions being imported (What's the utility of this? Consider a program
transformation that requires whole-program information, such as CFI. During
development, the import threshold may be set to zero in order to minimize
the incremental link time while still providing the same CFI enforcement
that would be used in production builds of the application.)

then the function's body will not be affected by link-time decisions, and
we might as well produce its object code at compile time. This will also
allow the object code to be shared between linkage units (this should
hopefully help solve a major scalability problem for Chromium, as that
project contains a large number of test binaries based on common libraries).

This can be done with a change to the intermediate object file format. We
can represent object files as native code containing statically compiled
functions and global data in the .text,. data, .rodata (etc.) sections,
with an .llvmbc section (or, I suppose, "__LLVM, __bitcode" when
targeting
Mach-O) containing bitcode for functions to be compiled at link time.

In order to make this work, we need to make sure that references from
link-time compiled functions to statically compiled functions work
correctly in the case where the statically compiled function has internal
linkage. We can do this by promoting every global value with internal
linkage, using a hash of the external names (as I mentioned in [1]).

I imagine that for some linkers, it may not be possible to deal with this
scheme. For example, I did some investigation last year and discovered that
I could not use the gold plugin interface to load a native object file if
we had already claimed it as an IR file. I wouldn't be surprised to learn
that ld64 has similar problems.

In cases where we completely control the linker (e.g. lld), we can easily
support this scheme, as the linker can directly do whatever it wants. But
for linkers that cannot support this, I suggest that we promote
consistently under ThinLTO rather than having different promotion schemes
for different linkers, in order to reduce overall complexity.

Thanks for your feedback!

Thanks,
-- 
-- 
Peter

[1] http://lists.llvm.org/pipermail/llvm-dev/2016-April/098062.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160406/5c8768cf/attachment.html>

Mehdi Amini via llvm-dev

2016-Apr-06 23:53 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

> On Apr 6, 2016, at 4:41 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> 
> Hi all,
> 
> I'd like to propose changes to how we do promotion of global values in
ThinLTO. The goal here is to make it possible to pre-compile parts of the
translation unit to native code at compile time. For example, if we know that:
> 
> 1) A function is a leaf function, so it will never import any other
functions, and
It still may be imported somewhere else right?
> 2) The function's instruction count falls above a threshold specified
at compile time, so it will never be imported.
It won’t be imported, but unless it is a “leaf” it may import and inline itself.
> or
> 3) The compile-time threshold is zero, so there is no possibility of
functions being imported (What's the utility of this? Consider a program
transformation that requires whole-program information, such as CFI. During
development, the import threshold may be set to zero in order to minimize the
incremental link time while still providing the same CFI enforcement that would
be used in production builds of the application.)
> 
> then the function's body will not be affected by link-time decisions,
and we might as well produce its object code at compile time.
Reading this last sentence, it seems exactly the “non-LTO” case?
> This will also allow the object code to be shared between linkage units
(this should hopefully help solve a major scalability problem for Chromium, as
that project contains a large number of test binaries based on common
libraries).
> 
> This can be done with a change to the intermediate object file format. We
can represent object files as native code containing statically compiled
functions and global data in the .text,. data, .rodata (etc.) sections, with an
.llvmbc section (or, I suppose, "__LLVM, __bitcode" when targeting
Mach-O) containing bitcode for functions to be compiled at link time.
> 
> In order to make this work, we need to make sure that references from
link-time compiled functions to statically compiled functions work correctly in
the case where the statically compiled function has internal linkage. We can do
this by promoting every global value with internal linkage, using a hash of the
external names (as I mentioned in [1]).
> 
> I imagine that for some linkers, it may not be possible to deal with this
scheme. For example, I did some investigation last year and discovered that I
could not use the gold plugin interface to load a native object file if we had
already claimed it as an IR file. I wouldn't be surprised to learn that ld64
has similar problems.
I suspect ld64 would need to be update to handle this scheme. Somehow it need to
be able to extract the object from the section.

Otherwise this should work, but I suspect the applicability (leaving CFI aside)
may not concern that many functions, so I’m not sure about the impact.

— 
Mehdi

> 
> In cases where we completely control the linker (e.g. lld), we can easily
support this scheme, as the linker can directly do whatever it wants. But for
linkers that cannot support this, I suggest that we promote consistently under
ThinLTO rather than having different promotion schemes for different linkers, in
order to reduce overall complexity.
> 
> Thanks for your feedback!
> 
> Thanks,
> -- 
> -- 
> Peter
> 
> [1] http://lists.llvm.org/pipermail/llvm-dev/2016-April/098062.html
<http://lists.llvm.org/pipermail/llvm-dev/2016-April/098062.html>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160406/77444b66/attachment.html>

Mehdi Amini via llvm-dev

2016-Apr-06 23:57 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

> On Apr 6, 2016, at 4:53 PM, Mehdi Amini via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> 
>> On Apr 6, 2016, at 4:41 PM, Peter Collingbourne <peter at pcc.me.uk
<mailto:peter at pcc.me.uk>> wrote:
>> 
>> Hi all,
>> 
>> I'd like to propose changes to how we do promotion of global values
in ThinLTO. The goal here is to make it possible to pre-compile parts of the
translation unit to native code at compile time. For example, if we know that:
>> 
>> 1) A function is a leaf function, so it will never import any other
functions, and
> 
> It still may be imported somewhere else right?
> 
>> 2) The function's instruction count falls above a threshold
specified at compile time, so it will never be imported.
> 
> It won’t be imported, but unless it is a “leaf” it may import and inline
itself.
Oh there was an “and”, so forget about my comments above, since this is
acknowledged.

— 
Mehdi

> 
>> or
>> 3) The compile-time threshold is zero, so there is no possibility of
functions being imported (What's the utility of this? Consider a program
transformation that requires whole-program information, such as CFI. During
development, the import threshold may be set to zero in order to minimize the
incremental link time while still providing the same CFI enforcement that would
be used in production builds of the application.)
>> 
>> then the function's body will not be affected by link-time
decisions, and we might as well produce its object code at compile time.
> 
> Reading this last sentence, it seems exactly the “non-LTO” case?
> 
>> This will also allow the object code to be shared between linkage units
(this should hopefully help solve a major scalability problem for Chromium, as
that project contains a large number of test binaries based on common
libraries).
>> 
>> This can be done with a change to the intermediate object file format.
We can represent object files as native code containing statically compiled
functions and global data in the .text,. data, .rodata (etc.) sections, with an
.llvmbc section (or, I suppose, "__LLVM, __bitcode" when targeting
Mach-O) containing bitcode for functions to be compiled at link time.
>> 
>> In order to make this work, we need to make sure that references from
link-time compiled functions to statically compiled functions work correctly in
the case where the statically compiled function has internal linkage. We can do
this by promoting every global value with internal linkage, using a hash of the
external names (as I mentioned in [1]).
>> 
>> I imagine that for some linkers, it may not be possible to deal with
this scheme. For example, I did some investigation last year and discovered that
I could not use the gold plugin interface to load a native object file if we had
already claimed it as an IR file. I wouldn't be surprised to learn that ld64
has similar problems.
> 
> I suspect ld64 would need to be update to handle this scheme. Somehow it
need to be able to extract the object from the section.
> 
> Otherwise this should work, but I suspect the applicability (leaving CFI
aside) may not concern that many functions, so I’m not sure about the impact.
> 
> — 
> Mehdi
> 
> 
>> 
>> In cases where we completely control the linker (e.g. lld), we can
easily support this scheme, as the linker can directly do whatever it wants. But
for linkers that cannot support this, I suggest that we promote consistently
under ThinLTO rather than having different promotion schemes for different
linkers, in order to reduce overall complexity.
>> 
>> Thanks for your feedback!
>> 
>> Thanks,
>> -- 
>> -- 
>> Peter
>> 
>> [1] http://lists.llvm.org/pipermail/llvm-dev/2016-April/098062.html
<http://lists.llvm.org/pipermail/llvm-dev/2016-April/098062.html>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160406/95416725/attachment.html>

Peter Collingbourne via llvm-dev

2016-Apr-07 00:13 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

On Wed, Apr 6, 2016 at 4:53 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
> On Apr 6, 2016, at 4:41 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
>
> Hi all,
>
> I'd like to propose changes to how we do promotion of global values in
> ThinLTO. The goal here is to make it possible to pre-compile parts of the
> translation unit to native code at compile time. For example, if we know
> that:
>
> 1) A function is a leaf function, so it will never import any other
> functions, and
>
>
> It still may be imported somewhere else right?
>
> 2) The function's instruction count falls above a threshold specified
at
> compile time, so it will never be imported.
>
>
> It won’t be imported, but unless it is a “leaf” it may import and inline
> itself.
>
> or
> 3) The compile-time threshold is zero, so there is no possibility of
> functions being imported (What's the utility of this? Consider a
program
> transformation that requires whole-program information, such as CFI. During
> development, the import threshold may be set to zero in order to minimize
> the incremental link time while still providing the same CFI enforcement
> that would be used in production builds of the application.)
>
> then the function's body will not be affected by link-time decisions,
and
> we might as well produce its object code at compile time.
>
>
> Reading this last sentence, it seems exactly the “non-LTO” case?
>
Yes, basically the point of this proposal is to be able to split the
linkage unit into LTO and non-LTO parts.

> This will also allow the object code to be shared between linkage units
> (this should hopefully help solve a major scalability problem for Chromium,
> as that project contains a large number of test binaries based on common
> libraries).
>
> This can be done with a change to the intermediate object file format. We
> can represent object files as native code containing statically compiled
> functions and global data in the .text,. data, .rodata (etc.) sections,
> with an .llvmbc section (or, I suppose, "__LLVM, __bitcode" when
targeting
> Mach-O) containing bitcode for functions to be compiled at link time.
>
> In order to make this work, we need to make sure that references from
> link-time compiled functions to statically compiled functions work
> correctly in the case where the statically compiled function has internal
> linkage. We can do this by promoting every global value with internal
> linkage, using a hash of the external names (as I mentioned in [1]).
>
> I imagine that for some linkers, it may not be possible to deal with this
> scheme. For example, I did some investigation last year and discovered that
> I could not use the gold plugin interface to load a native object file if
> we had already claimed it as an IR file. I wouldn't be surprised to
learn
> that ld64 has similar problems.
>
>
> I suspect ld64 would need to be update to handle this scheme. Somehow it
> need to be able to extract the object from the section.
>
Do you mean the bitcode object? There's already support in LLVM for this (
http://llvm-cs.pcc.me.uk/lib/Object/IRObjectFile.cpp#269). I suspect the
tricky part (which I was unsuccessful at doing with gold) will be
convincing the linker that the bitcode object and the regular object are
two separate things.

Otherwise this should work, but I suspect the applicability (leaving
CFI> aside) may not concern that many functions, so I’m not sure about the
> impact.
>
I'm also curious about the applicability in the regular ThinLTO case. One
thing that may help with applicability is that I suspect that not only leaf
functions but also functions that only call (directly or indirectly)
functions in the same TU, as well as functions that only make calls via
function pointers or vtables may fall under the criteria for non-importing.

Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160406/e6d1bfb4/attachment.html>

Sean Silva via llvm-dev

2016-Apr-07 01:00 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

On Wed, Apr 6, 2016 at 4:41 PM, Peter Collingbourne via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>
> I'd like to propose changes to how we do promotion of global values in
> ThinLTO. The goal here is to make it possible to pre-compile parts of the
> translation unit to native code at compile time. For example, if we know
> that:
>
> 1) A function is a leaf function, so it will never import any other
> functions, and
> 2) The function's instruction count falls above a threshold specified
at
> compile time, so it will never be imported.
> or
> 3) The compile-time threshold is zero, so there is no possibility of
> functions being imported (What's the utility of this? Consider a
program
> transformation that requires whole-program information, such as CFI. During
> development, the import threshold may be set to zero in order to minimize
> the incremental link time while still providing the same CFI enforcement
> that would be used in production builds of the application.)
>
Do you know of any use case that is not as an aid for developers? I.e.
would this be a user-visible feature?

-- Sean Silva

>
> then the function's body will not be affected by link-time decisions,
and
> we might as well produce its object code at compile time. This will also
> allow the object code to be shared between linkage units (this should
> hopefully help solve a major scalability problem for Chromium, as that
> project contains a large number of test binaries based on common
libraries).
>
> This can be done with a change to the intermediate object file format. We
> can represent object files as native code containing statically compiled
> functions and global data in the .text,. data, .rodata (etc.) sections,
> with an .llvmbc section (or, I suppose, "__LLVM, __bitcode" when
targeting
> Mach-O) containing bitcode for functions to be compiled at link time.
>
> In order to make this work, we need to make sure that references from
> link-time compiled functions to statically compiled functions work
> correctly in the case where the statically compiled function has internal
> linkage. We can do this by promoting every global value with internal
> linkage, using a hash of the external names (as I mentioned in [1]).
>
> I imagine that for some linkers, it may not be possible to deal with this
> scheme. For example, I did some investigation last year and discovered that
> I could not use the gold plugin interface to load a native object file if
> we had already claimed it as an IR file. I wouldn't be surprised to
learn
> that ld64 has similar problems.
>
> In cases where we completely control the linker (e.g. lld), we can easily
> support this scheme, as the linker can directly do whatever it wants. But
> for linkers that cannot support this, I suggest that we promote
> consistently under ThinLTO rather than having different promotion schemes
> for different linkers, in order to reduce overall complexity.
>
> Thanks for your feedback!
>
> Thanks,
> --
> --
> Peter
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/2016-April/098062.html
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160406/ec1d9fef/attachment.html>

Peter Collingbourne via llvm-dev

2016-Apr-07 01:04 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

On Wed, Apr 6, 2016 at 6:00 PM, Sean Silva <chisophugis at gmail.com>
wrote:
>
>
> On Wed, Apr 6, 2016 at 4:41 PM, Peter Collingbourne via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>> I'd like to propose changes to how we do promotion of global values
in
>> ThinLTO. The goal here is to make it possible to pre-compile parts of
the
>> translation unit to native code at compile time. For example, if we
know
>> that:
>>
>> 1) A function is a leaf function, so it will never import any other
>> functions, and
>> 2) The function's instruction count falls above a threshold
specified at
>> compile time, so it will never be imported.
>> or
>> 3) The compile-time threshold is zero, so there is no possibility of
>> functions being imported (What's the utility of this? Consider a
program
>> transformation that requires whole-program information, such as CFI.
During
>> development, the import threshold may be set to zero in order to
minimize
>> the incremental link time while still providing the same CFI
enforcement
>> that would be used in production builds of the application.)
>>
>
> Do you know of any use case that is not as an aid for developers? I.e.
> would this be a user-visible feature?
>
It would indeed be a user-visible feature. The developers in this case are
the developers of the program that uses the CFI feature, not LLVM
developers.

Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160406/cff76136/attachment.html>

Xinliang David Li via llvm-dev

2016-Apr-07 17:52 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

On Wed, Apr 6, 2016 at 4:41 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> Hi all,
>
> I'd like to propose changes to how we do promotion of global values in
> ThinLTO. The goal here is to make it possible to pre-compile parts of the
> translation unit to native code at compile time. For example, if we know
> that:
>
> 1) A function is a leaf function, so it will never import any other
> functions, and
> 2) The function's instruction count falls above a threshold specified
at
> compile time, so it will never be imported.
> or
> 3) The compile-time threshold is zero, so there is no possibility of
> functions being imported (What's the utility of this? Consider a
program
> transformation that requires whole-program information, such as CFI. During
> development, the import threshold may be set to zero in order to minimize
> the incremental link time while still providing the same CFI enforcement
> that would be used in production builds of the application.)
>
> then the function's body will not be affected by link-time decisions,
and
> we might as well produce its object code at compile time. This will also
> allow the object code to be shared between linkage units (this should
> hopefully help solve a major scalability problem for Chromium, as that
> project contains a large number of test binaries based on common
libraries).
>

3) sounds like a good use case (though very unique to CFI. Generally, a
function body will be affected by link time decisions including whole
program analyses).  In this mode, do we really need to emit the bit code
section at all? There does not any need for static promotions if we assume
no cross module transformations will happen in this mode.


David

>
> This can be done with a change to the intermediate object file format. We
> can represent object files as native code containing statically compiled
> functions and global data in the .text,. data, .rodata (etc.) sections,
> with an .llvmbc section (or, I suppose, "__LLVM, __bitcode" when
targeting
> Mach-O) containing bitcode for functions to be compiled at link time.
>
> In order to make this work, we need to make sure that references from
> link-time compiled functions to statically compiled functions work
> correctly in the case where the statically compiled function has internal
> linkage. We can do this by promoting every global value with internal
> linkage, using a hash of the external names (as I mentioned in [1]).
>
> I imagine that for some linkers, it may not be possible to deal with this
> scheme. For example, I did some investigation last year and discovered that
> I could not use the gold plugin interface to load a native object file if
> we had already claimed it as an IR file. I wouldn't be surprised to
learn
> that ld64 has similar problems.
>
> In cases where we completely control the linker (e.g. lld), we can easily
> support this scheme, as the linker can directly do whatever it wants. But
> for linkers that cannot support this, I suggest that we promote
> consistently under ThinLTO rather than having different promotion schemes
> for different linkers, in order to reduce overall complexity.
>
> Thanks for your feedback!
>
> Thanks,
> --
> --
> Peter
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/2016-April/098062.html
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160407/21faadeb/attachment.html>

Rafael Espíndola via llvm-dev

2016-Apr-07 18:42 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

> I imagine that for some linkers, it may not be possible to deal with this
> scheme. For example, I did some investigation last year and discovered that
> I could not use the gold plugin interface to load a native object file if
we
> had already claimed it as an IR file. I wouldn't be surprised to learn
that
> ld64 has similar problems.
Can't you just call add_input_file with the original file in addition
to any .o files you created during linking?

Cheers,
Rafael

Peter Collingbourne via llvm-dev

2016-Apr-07 18:42 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

On Thu, Apr 7, 2016 at 10:52 AM, Xinliang David Li <davidxl at google.com>
wrote:
>
>
> On Wed, Apr 6, 2016 at 4:41 PM, Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
>
>> Hi all,
>>
>> I'd like to propose changes to how we do promotion of global values
in
>> ThinLTO. The goal here is to make it possible to pre-compile parts of
the
>> translation unit to native code at compile time. For example, if we
know
>> that:
>>
>> 1) A function is a leaf function, so it will never import any other
>> functions, and
>> 2) The function's instruction count falls above a threshold
specified at
>> compile time, so it will never be imported.
>> or
>> 3) The compile-time threshold is zero, so there is no possibility of
>> functions being imported (What's the utility of this? Consider a
program
>> transformation that requires whole-program information, such as CFI.
During
>> development, the import threshold may be set to zero in order to
minimize
>> the incremental link time while still providing the same CFI
enforcement
>> that would be used in production builds of the application.)
>>
>
>> then the function's body will not be affected by link-time
decisions, and
>> we might as well produce its object code at compile time. This will
also
>> allow the object code to be shared between linkage units (this should
>> hopefully help solve a major scalability problem for Chromium, as that
>> project contains a large number of test binaries based on common
libraries).
>>
>
>
> 3) sounds like a good use case (though very unique to CFI. Generally, a
> function body will be affected by link time decisions including whole
> program analyses).  In this mode, do we really need to emit the bit code
> section at all? There does not any need for static promotions if we assume
> no cross module transformations will happen in this mode.
>
We still need some way to generate the correct code for the CFI intrinsics,
which rely on cross module information. That information would come
directly from the combined summary, rather than being imported from other
modules.

Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160407/10e8e24f/attachment.html>

Peter Collingbourne via llvm-dev

2016-Apr-07 18:53 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

On Thu, Apr 7, 2016 at 11:42 AM, Rafael Espíndola <
rafael.espindola at gmail.com> wrote:
> > I imagine that for some linkers, it may not be possible to deal with
this
> > scheme. For example, I did some investigation last year and discovered
> that
> > I could not use the gold plugin interface to load a native object file
> if we
> > had already claimed it as an IR file. I wouldn't be surprised to
learn
> that
> > ld64 has similar problems.
>
> Can't you just call add_input_file with the original file in addition
> to any .o files you created during linking?
>
I think that almost worked, but I kept running into issues relating to the
fact that the plugin needed to inform the linker about symbols in the
native object file via add_symbols. What I ended up with was kind of
unsound and still didn't work correctly for (at least) some parts of
Chromium.

I suppose I can try to pull up that work again and go into more detail on
what exactly the issues were, if you're interested.

-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160407/413fcf9d/attachment.html>

Mehdi Amini via llvm-dev

2016-May-04 04:01 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

> On Apr 6, 2016, at 4:41 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> 
> Hi all,
> 
> I'd like to propose changes to how we do promotion of global values in
ThinLTO. The goal here is to make it possible to pre-compile parts of the
translation unit to native code at compile time. For example, if we know that:
> 
> 1) A function is a leaf function, so it will never import any other
functions, and
> 2) The function's instruction count falls above a threshold specified
at compile time, so it will never be imported.
> or
> 3) The compile-time threshold is zero, so there is no possibility of
functions being imported (What's the utility of this? Consider a program
transformation that requires whole-program information, such as CFI. During
development, the import threshold may be set to zero in order to minimize the
incremental link time while still providing the same CFI enforcement that would
be used in production builds of the application.)
> 
> then the function's body will not be affected by link-time decisions,
and we might as well produce its object code at compile time. This will also
allow the object code to be shared between linkage units (this should hopefully
help solve a major scalability problem for Chromium, as that project contains a
large number of test binaries based on common libraries).
> 
> This can be done with a change to the intermediate object file format. We
can represent object files as native code containing statically compiled
functions and global data in the .text,. data, .rodata (etc.) sections, with an
.llvmbc section (or, I suppose, "__LLVM, __bitcode" when targeting
Mach-O) containing bitcode for functions to be compiled at link time.
I was wondering why can't the "precompiled" function be embedded
in the IR instead of the bitcode embedded in the object file?
The codegen would still emit a single object file out of this IR file that
contains the code for the IR and the precompiled function.

It seems to me that this way the scheme would work with any existing existing
LTO implementation.

-- 
Mehdi


> 
> In order to make this work, we need to make sure that references from
link-time compiled functions to statically compiled functions work correctly in
the case where the statically compiled function has internal linkage. We can do
this by promoting every global value with internal linkage, using a hash of the
external names (as I mentioned in [1]).
> 
> I imagine that for some linkers, it may not be possible to deal with this
scheme. For example, I did some investigation last year and discovered that I
could not use the gold plugin interface to load a native object file if we had
already claimed it as an IR file. I wouldn't be surprised to learn that ld64
has similar problems.
> 
> In cases where we completely control the linker (e.g. lld), we can easily
support this scheme, as the linker can directly do whatever it wants. But for
linkers that cannot support this, I suggest that we promote consistently under
ThinLTO rather than having different promotion schemes for different linkers, in
order to reduce overall complexity.
> 
> Thanks for your feedback!
> 
> Thanks,
> -- 
> -- 
> Peter
> 
> [1] http://lists.llvm.org/pipermail/llvm-dev/2016-April/098062.html
<http://lists.llvm.org/pipermail/llvm-dev/2016-April/098062.html>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160503/31a9da75/attachment.html>

Peter Collingbourne via llvm-dev

2016-May-04 05:01 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

On Tue, May 3, 2016 at 9:01 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
> On Apr 6, 2016, at 4:41 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
>
> Hi all,
>
> I'd like to propose changes to how we do promotion of global values in
> ThinLTO. The goal here is to make it possible to pre-compile parts of the
> translation unit to native code at compile time. For example, if we know
> that:
>
> 1) A function is a leaf function, so it will never import any other
> functions, and
> 2) The function's instruction count falls above a threshold specified
at
> compile time, so it will never be imported.
> or
> 3) The compile-time threshold is zero, so there is no possibility of
> functions being imported (What's the utility of this? Consider a
program
> transformation that requires whole-program information, such as CFI. During
> development, the import threshold may be set to zero in order to minimize
> the incremental link time while still providing the same CFI enforcement
> that would be used in production builds of the application.)
>
> then the function's body will not be affected by link-time decisions,
and
> we might as well produce its object code at compile time. This will also
> allow the object code to be shared between linkage units (this should
> hopefully help solve a major scalability problem for Chromium, as that
> project contains a large number of test binaries based on common
libraries).
>
> This can be done with a change to the intermediate object file format. We
> can represent object files as native code containing statically compiled
> functions and global data in the .text,. data, .rodata (etc.) sections,
> with an .llvmbc section (or, I suppose, "__LLVM, __bitcode" when
targeting
> Mach-O) containing bitcode for functions to be compiled at link time.
>
>
> I was wondering why can't the "precompiled" function be
embedded in the IR
> instead of the bitcode embedded in the object file?
> The codegen would still emit a single object file out of this IR file that
> contains the code for the IR and the precompiled function.
>
> It seems to me that this way the scheme would work with any existing
> existing LTO implementation.
>
You'd still have the same problem. No matter whether you put the native
object inside the IR file or vice versa, you still have a file containing a
native object and some IR. That's the scenario that I found that the gold
plugin interface wouldn't support.

Supporting IR embedded in a native object section inside a linker should be
pretty trivial, if you control the linker. My prototype implementation in
lld is about 10 lines of code.

Peter

> --
> Mehdi
>
>
>
>
> In order to make this work, we need to make sure that references from
> link-time compiled functions to statically compiled functions work
> correctly in the case where the statically compiled function has internal
> linkage. We can do this by promoting every global value with internal
> linkage, using a hash of the external names (as I mentioned in [1]).
>
> I imagine that for some linkers, it may not be possible to deal with this
> scheme. For example, I did some investigation last year and discovered that
> I could not use the gold plugin interface to load a native object file if
> we had already claimed it as an IR file. I wouldn't be surprised to
learn
> that ld64 has similar problems.
>
> In cases where we completely control the linker (e.g. lld), we can easily
> support this scheme, as the linker can directly do whatever it wants. But
> for linkers that cannot support this, I suggest that we promote
> consistently under ThinLTO rather than having different promotion schemes
> for different linkers, in order to reduce overall complexity.
>
> Thanks for your feedback!
>
> Thanks,
> --
> --
> Peter
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/2016-April/098062.html
>
>
>

-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160503/e46108c9/attachment-0001.html>

David Blaikie via llvm-dev

2016-May-04 15:19 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

On Wed, Apr 6, 2016 at 4:41 PM, Peter Collingbourne via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>
> I'd like to propose changes to how we do promotion of global values in
> ThinLTO. The goal here is to make it possible to pre-compile parts of the
> translation unit to native code at compile time. For example, if we know
> that:
>
> 1) A function is a leaf function, so it will never import any other
> functions, and
> 2) The function's instruction count falls above a threshold specified
at
> compile time, so it will never be imported.
> or
> 3) The compile-time threshold is zero, so there is no possibility of
> functions being imported (What's the utility of this? Consider a
program
> transformation that requires whole-program information, such as CFI. During
> development, the import threshold may be set to zero in order to minimize
> the incremental link time while still providing the same CFI enforcement
> that would be used in production builds of the application.)
>
> then the function's body will not be affected by link-time decisions,
and
> we might as well produce its object code at compile time. This will also
> allow the object code to be shared between linkage units (this should
> hopefully help solve a major scalability problem for Chromium, as that
> project contains a large number of test binaries based on common
libraries).
>
> This can be done with a change to the intermediate object file format. We
> can represent object files as native code containing statically compiled
> functions and global data in the .text,. data, .rodata (etc.) sections,
> with an .llvmbc section (or, I suppose, "__LLVM, __bitcode" when
targeting
> Mach-O) containing bitcode for functions to be compiled at link time.
>
> In order to make this work, we need to make sure that references from
> link-time compiled functions to statically compiled functions work
> correctly in the case where the statically compiled function has internal
> linkage. We can do this by promoting every global value with internal
> linkage, using a hash of the external names (as I mentioned in [1]).
>
What about translation units that have no external names? I hit this
problem with DWARF Fission hashing recently, where two files had code
equivalent to this:

  struct foo { foo(); }
  static foo f;

Thus no external symbols, and indeed exactly the same set of symbols for
two instances of this file (& I have seen examples of this in Google's
codebase - though I haven't searched extensively, and it may be that the
linker never actually picks two of these together, but the DWP tool doesn't
have the same kind of "skip this library if no symbols are needed from
it"
behavior as the linker).

Also, (I haven't read the whole thread, but I assume) you're considering
doing this with debug info too? All type information could pretty easily be
emitted up-front and just reduced to declarations (again, on non-LLDB
platforms... :/) for the rest of the debug info. The extra declarations
might make object files a bit bigger, though. (eg: if there were types that
weren't used in any of the ahead-of-time compiled code, but were used in
the ThinLTO'd code - the naive approach would still produce the type info
up front and a declaration in ThinLTO which would make for bigger output
than just putting the type in the ThinLTO'd code - but it would potentially
improve parallelism by reducing the amount of type goo needing to be
imported/exported/emitted during ThinLTO)

>
> I imagine that for some linkers, it may not be possible to deal with this
> scheme. For example, I did some investigation last year and discovered that
> I could not use the gold plugin interface to load a native object file if
> we had already claimed it as an IR file. I wouldn't be surprised to
learn
> that ld64 has similar problems.
>
> In cases where we completely control the linker (e.g. lld), we can easily
> support this scheme, as the linker can directly do whatever it wants. But
> for linkers that cannot support this, I suggest that we promote
> consistently under ThinLTO rather than having different promotion schemes
> for different linkers, in order to reduce overall complexity.
>
> Thanks for your feedback!
>
> Thanks,
> --
> --
> Peter
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/2016-April/098062.html
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160504/902652af/attachment.html>

Peter Collingbourne via llvm-dev

2016-May-04 15:47 UTC

head link

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

On Wed, May 4, 2016 at 8:19 AM, David Blaikie <dblaikie at gmail.com>
wrote:
>
>
> On Wed, Apr 6, 2016 at 4:41 PM, Peter Collingbourne via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>> I'd like to propose changes to how we do promotion of global values
in
>> ThinLTO. The goal here is to make it possible to pre-compile parts of
the
>> translation unit to native code at compile time. For example, if we
know
>> that:
>>
>> 1) A function is a leaf function, so it will never import any other
>> functions, and
>> 2) The function's instruction count falls above a threshold
specified at
>> compile time, so it will never be imported.
>> or
>> 3) The compile-time threshold is zero, so there is no possibility of
>> functions being imported (What's the utility of this? Consider a
program
>> transformation that requires whole-program information, such as CFI.
During
>> development, the import threshold may be set to zero in order to
minimize
>> the incremental link time while still providing the same CFI
enforcement
>> that would be used in production builds of the application.)
>>
>> then the function's body will not be affected by link-time
decisions, and
>> we might as well produce its object code at compile time. This will
also
>> allow the object code to be shared between linkage units (this should
>> hopefully help solve a major scalability problem for Chromium, as that
>> project contains a large number of test binaries based on common
libraries).
>>
>> This can be done with a change to the intermediate object file format.
We
>> can represent object files as native code containing statically
compiled
>> functions and global data in the .text,. data, .rodata (etc.) sections,
>> with an .llvmbc section (or, I suppose, "__LLVM, __bitcode"
when targeting
>> Mach-O) containing bitcode for functions to be compiled at link time.
>>
>> In order to make this work, we need to make sure that references from
>> link-time compiled functions to statically compiled functions work
>> correctly in the case where the statically compiled function has
internal
>> linkage. We can do this by promoting every global value with internal
>> linkage, using a hash of the external names (as I mentioned in [1]).
>>
>
> What about translation units that have no external names? I hit this
> problem with DWARF Fission hashing recently, where two files had code
> equivalent to this:
>
>   struct foo { foo(); }
>   static foo f;
>
> Thus no external symbols, and indeed exactly the same set of symbols for
> two instances of this file (& I have seen examples of this in
Google's
> codebase - though I haven't searched extensively, and it may be that
the
> linker never actually picks two of these together, but the DWP tool
doesn't
> have the same kind of "skip this library if no symbols are needed from
it"
> behavior as the linker).
>
Yes, I came across this case in my prototype. This can happen if two such
TUs appear directly as linker inputs (rather than as library members). This
is a rare case, and the code in such a TU is most likely initialization
code that does not require extensive optimization, so the solution I
decided on was to inhibit ThinLTO for such modules. In my prototype, I
caused such modules to be compiled with regular LTO, but there are other
possible solutions, such as compiling to a native object.

Also, (I haven't read the whole thread, but I assume) you're
considering> doing this with debug info too? All type information could pretty easily be
> emitted up-front and just reduced to declarations (again, on non-LLDB
> platforms... :/) for the rest of the debug info. The extra declarations
> might make object files a bit bigger, though. (eg: if there were types that
> weren't used in any of the ahead-of-time compiled code, but were used
in
> the ThinLTO'd code - the naive approach would still produce the type
info
> up front and a declaration in ThinLTO which would make for bigger output
> than just putting the type in the ThinLTO'd code - but it would
potentially
> improve parallelism by reducing the amount of type goo needing to be
> imported/exported/emitted during ThinLTO)
>
That's an interesting idea. I hadn't thought about just emitting type
declarations in the ThinLTO'd code but yes, that's something we could
consider doing. It would be interesting to see what the tradeoff would be
in terms of the edit/compile/debug cycle time, as we'd be exchanging linker
work for debugger work.

Peter


>
>
>>
>> I imagine that for some linkers, it may not be possible to deal with
this
>> scheme. For example, I did some investigation last year and discovered
that
>> I could not use the gold plugin interface to load a native object file
if
>> we had already claimed it as an IR file. I wouldn't be surprised to
learn
>> that ld64 has similar problems.
>>
>> In cases where we completely control the linker (e.g. lld), we can
easily
>> support this scheme, as the linker can directly do whatever it wants.
But
>> for linkers that cannot support this, I suggest that we promote
>> consistently under ThinLTO rather than having different promotion
schemes
>> for different linkers, in order to reduce overall complexity.
>>
>> Thanks for your feedback!
>>
>> Thanks,
>> --
>> --
>> Peter
>>
>> [1] http://lists.llvm.org/pipermail/llvm-dev/2016-April/098062.html
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>

-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160504/5cd457fc/attachment.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Apr 2016 - RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

Reasonably Related Threads