thr3ads.net - llvm dev - [llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline [Apr 2018]

If this information is useful, please help other people find it:
Share via:

via llvm-dev

2018-Apr-09 18:06 UTC

[llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline

Hello,
I am exploring the possibility of unifying the BC file generation phase for
ThinLTO and FullLTO. Our third party library providers prefer to give us only
one version of the BC archives, rather than test and ship both Thin and Full LTO
BC archives. We want to find a way to allow our users to pick either Thin or
Full LTO, while having only one "unified" version of the BC archive.
Note, I am not necessarily proposing to do this work in the upstream compiler.
If there is no interest from other companies, we might have to keep this as a
private patch for Sony.
One of the ideas (not my preference) is to mix and match files in the Thin and
Full BC formats.  I'm not sure how well the "mix and match"
scenario works in general. I was wondering if Apple or Google are doing this for
production?
I wrote a toy example, compiled one group of files with ThinLTO and the rest
with FullLTO, linked them with gold. I saw that irrespective of whether the Thin
or Full LTO option was used at the link step, files are optimized within the
Thin group and within the Full group separately, but they don't know about
the files in the other group (which makes sense). Basically, the border between
Thin and Full LTO bitcode files created an artificial "barrier" which
prevented cross-border optimization.
Obviously, I am not too fond of this idea. Even if mixing and matching ThinLTO
and FullLTO bitcode files will work "as is", I suspect we will see a
non-trivial runtime performance degradation because of the
"ThinLTO"/"FullLTO" border. Are you aware of any potential
problems with this solution, other than performance?

Another, hopefully, better idea is to introduce a "unified" BC format,
which could either be FullLTO, ThinLTO, or neither (e.g., something in between).
If the user chooses FullLTO at the link step, but some of the files are in the
Thin BC format - the linker will call a special LTO API to convert these files
to the Full LTO BC format (i.e., stripping the module summary section +
potentially do some additional optimizations from the FullLTO pass manager
pipeline).
If the user chooses ThinLTO at the link step, but some of the files are in the
Full BC format - the linker will call an LTO API to convert these files to the
Thin LTO bitcode format (by regenerating the module summary section dynamically
for the Full LTO bitcode files).
I think the most reasonable idea for the unification of the Thin and Full LTO
compilation pipelines is to use Full LTO as the "unified" BC format.
If the user requests FullLTO - no additional work is needed, the linker will
perform FullLTO as usual. If the user request ThinLTO, the linker will call an
API to regenerate the module summary section for all the files in the FullLTO
format and perform ThinLTO as usual.
In reality I suspect things will be much more complicated. The pipelines for the
Thin and Full LTO compilation phases are quite different. ThinLTO can afford to
do much more optimization in the linking phase (since it has parallel backends
& smaller IR compared to FullLTO), while for FullLTO we are forced to move
some optimizations from linking to the compilation phase.
So, if we pick FullLTO as our unified format, we would increase the build time
for ThinLTO (we will be doing the FullLTO initial optimization pipeline in the
compile phase, which is more than what ThinLTO is currently doing, but the
pipeline of the optimizations in the backend will stay the same). It's not
clear what will happen with the runtime performance: we might improve it
(because we repeat some of the optimizations several times), or we might make it
worse (because we might do an optimization in the early compilation phase,
potentially preventing more aggressive optimization later). What are your
expectations? Will this approach work in general? If so, what do you think will
happen with the runtime performance?
I also noticed that the pass manager pipeline is different for ThinLTO+Sample
PGO (use profile case). This might create some additional complications for
unification of Thin and FullLTO BC generation phase too, but it's too small
detail to worry about right now. I'm more interested in choosing a right
general direction for solving this problem now.
Please share your thoughts!
Thank you!
Katya.


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180409/6df24df6/attachment.html>

David Blaikie via llvm-dev

2018-Apr-10 14:38 UTC

head link

[llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline

Hi Katya,

[+Teresa since this is about ThinLTO & she's the owner there]

I'm not sure how other folks feel, but terminologically I'm not sure I
think of these as different formats (for example you mention the idea of
stripping the summaries from ThinLTO BC files to then feed them in as
FullLTO files - I would imagine it'd be reasonable to modify/fix/improve
the linker integration to have it (perhaps optionally) /ignore/ the
summaries, or use the summaries but in a non-siloed way (so that there's
not that optimization boundary between ThinLTO and FullLTO))

You're dealing with a situation where you are shipped BC files offline and
then do one, or multiple builds with these BC files?

If the scenario was more like a naive build: Multiple BC files generated on
a single (multi-core/threaded) machine (but some Thin, some Full) & then
fed to the linker, I would wonder if it'd be relatively cheap for the LTO
step to support this by computing summaries for FullLTO files on the fly
(without a separate tool/writing the summary to disk, etc). Though I
suppose that'd produce a pretty wildly different behavior in the link when
just a single ThinLTO BC file was added to an otherwise FullLTO build.

Anyway - just some (admittedly fairly uninformed) thoughts. I'm sure Teresa
has more informed ideas about how this might all look.

On Mon, Apr 9, 2018 at 12:20 PM via llvm-dev <llvm-dev at lists.llvm.org>
wrote:
> Hello,
>
> I am exploring the possibility of unifying the BC file generation phase
> for ThinLTO and FullLTO. Our third party library providers prefer to give
> us only one version of the BC archives, rather than test and ship both Thin
> and Full LTO BC archives. We want to find a way to allow our users to pick
> either Thin or Full LTO, while having only one “unified” version of the BC
> archive.
>
> Note, I am not necessarily proposing to do this work in the upstream
> compiler. If there is no interest from other companies, we might have to
> keep this as a private patch for Sony.
>
> One of the ideas (not my preference) is to mix and match files in the Thin
> and Full BC formats.  I'm not sure how well the "mix and
match" scenario
> works in general. I was wondering if Apple or Google are doing this for
> production?
>
> I wrote a toy example, compiled one group of files with ThinLTO and the
> rest with FullLTO, linked them with gold. I saw that irrespective of
> whether the Thin or Full LTO option was used at the link step, files are
> optimized within the Thin group and within the Full group separately, but
> they don't know about the files in the other group (which makes sense).
> Basically, the border between Thin and Full LTO bitcode files created an
> artificial "barrier" which prevented cross-border optimization.
>
> Obviously, I am not too fond of this idea. Even if mixing and matching
> ThinLTO and FullLTO bitcode files will work “as is”, I suspect we will see
> a non-trivial runtime performance degradation because of the
> "ThinLTO"/"FullLTO" border. Are you aware of any
potential problems with
> this solution, other than performance?
>
>
>
> Another, hopefully, better idea is to introduce a "unified" BC
format,
> which could either be FullLTO, ThinLTO, or neither (e.g., something in
> between).
>
> If the user chooses FullLTO at the link step, but some of the files are in
> the Thin BC format – the linker will call a special LTO API to convert
> these files to the Full LTO BC format (i.e., stripping the module summary
> section + potentially do some additional optimizations from the FullLTO
> pass manager pipeline).
>
> If the user chooses ThinLTO at the link step, but some of the files are in
> the Full BC format – the linker will call an LTO API to convert these files
> to the Thin LTO bitcode format (by regenerating the module summary section
> dynamically for the Full LTO bitcode files).
>
> I think the most reasonable idea for the unification of the Thin and Full
> LTO compilation pipelines is to use Full LTO as the “unified” BC format. If
> the user requests FullLTO – no additional work is needed, the linker will
> perform FullLTO as usual. If the user request ThinLTO, the linker will call
> an API to regenerate the module summary section for all the files in the
> FullLTO format and perform ThinLTO as usual.
>
> In reality I suspect things will be much more complicated. The pipelines
> for the Thin and Full LTO compilation phases are quite different. ThinLTO
> can afford to do much more optimization in the linking phase (since it has
> parallel backends & smaller IR compared to FullLTO), while for FullLTO
we
> are forced to move some optimizations from linking to the compilation
phase.
>
> So, if we pick FullLTO as our unified format, we would increase the build
> time for ThinLTO (we will be doing the FullLTO initial optimization
> pipeline in the compile phase, which is more than what ThinLTO is currently
> doing, but the pipeline of the optimizations in the backend will stay the
> same). It’s not clear what will happen with the runtime performance: we
> might improve it (because we repeat some of the optimizations several
> times), or we might make it worse (because we might do an optimization in
> the early compilation phase, potentially preventing more aggressive
> optimization later). What are your expectations? Will this approach work in
> general? If so, what do you think will happen with the runtime performance?
>
> I also noticed that the pass manager pipeline is different for
> ThinLTO+Sample PGO (use profile case). This might create some additional
> complications for unification of Thin and FullLTO BC generation phase too,
> but it’s too small detail to worry about right now. I’m more interested in
> choosing a right general direction for solving this problem now.
>
> Please share your thoughts!
>
> Thank you!
>
> Katya.
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180410/1cf5961e/attachment.html>

via llvm-dev

2018-Apr-10 22:00 UTC

head link

[llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline

Hi David,
Thank you so much for your reply!
>> You're dealing with a situation where you are shipped BC files
offline and then do one, or multiple builds with these BC files?Yes, that’s exactly the case.
>> If the scenario was more like a naive build: Multiple BC files
generated on a single (multi-core/threaded) machine (but some Thin, some
>> Full) & then fed to the linker, I would wonder if it'd be
relatively cheap for the LTO step to support this by computing summaries for
>> FullLTO files on the fly (without a separate tool/writing the summary
to disk, etc).
I think so. My understanding that for FullLTO files, it’s possible to perform
name anonymous globals pass and compute summaries on the fly, which should allow
to perform ThinLTO at link phase.

Katya.

From: David Blaikie <dblaikie at gmail.com>
Sent: Tuesday, April 10, 2018 7:38 AM
To: Romanova, Katya <katya.romanova at sony.com>; Teresa Johnson
<tejohnson at google.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO
frontend + initial optimization pipeline

Hi Katya,

[+Teresa since this is about ThinLTO & she's the owner there]

I'm not sure how other folks feel, but terminologically I'm not sure I
think of these as different formats (for example you mention the idea of
stripping the summaries from ThinLTO BC files to then feed them in as FullLTO
files - I would imagine it'd be reasonable to modify/fix/improve the linker
integration to have it (perhaps optionally) /ignore/ the summaries, or use the
summaries but in a non-siloed way (so that there's not that optimization
boundary between ThinLTO and FullLTO))

You're dealing with a situation where you are shipped BC files offline and
then do one, or multiple builds with these BC files?

If the scenario was more like a naive build: Multiple BC files generated on a
single (multi-core/threaded) machine (but some Thin, some Full) & then fed
to the linker, I would wonder if it'd be relatively cheap for the LTO step
to support this by computing summaries for FullLTO files on the fly (without a
separate tool/writing the summary to disk, etc). Though I suppose that'd
produce a pretty wildly different behavior in the link when just a single
ThinLTO BC file was added to an otherwise FullLTO build.

Anyway - just some (admittedly fairly uninformed) thoughts. I'm sure Teresa
has more informed ideas about how this might all look.
On Mon, Apr 9, 2018 at 12:20 PM via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hello,
I am exploring the possibility of unifying the BC file generation phase for
ThinLTO and FullLTO. Our third party library providers prefer to give us only
one version of the BC archives, rather than test and ship both Thin and Full LTO
BC archives. We want to find a way to allow our users to pick either Thin or
Full LTO, while having only one “unified” version of the BC archive.
Note, I am not necessarily proposing to do this work in the upstream compiler.
If there is no interest from other companies, we might have to keep this as a
private patch for Sony.
One of the ideas (not my preference) is to mix and match files in the Thin and
Full BC formats.  I'm not sure how well the "mix and match"
scenario works in general. I was wondering if Apple or Google are doing this for
production?
I wrote a toy example, compiled one group of files with ThinLTO and the rest
with FullLTO, linked them with gold. I saw that irrespective of whether the Thin
or Full LTO option was used at the link step, files are optimized within the
Thin group and within the Full group separately, but they don't know about
the files in the other group (which makes sense). Basically, the border between
Thin and Full LTO bitcode files created an artificial "barrier" which
prevented cross-border optimization.
Obviously, I am not too fond of this idea. Even if mixing and matching ThinLTO
and FullLTO bitcode files will work “as is”, I suspect we will see a non-trivial
runtime performance degradation because of the
"ThinLTO"/"FullLTO" border. Are you aware of any potential
problems with this solution, other than performance?

Another, hopefully, better idea is to introduce a "unified" BC format,
which could either be FullLTO, ThinLTO, or neither (e.g., something in between).
If the user chooses FullLTO at the link step, but some of the files are in the
Thin BC format – the linker will call a special LTO API to convert these files
to the Full LTO BC format (i.e., stripping the module summary section +
potentially do some additional optimizations from the FullLTO pass manager
pipeline).
If the user chooses ThinLTO at the link step, but some of the files are in the
Full BC format – the linker will call an LTO API to convert these files to the
Thin LTO bitcode format (by regenerating the module summary section dynamically
for the Full LTO bitcode files).
I think the most reasonable idea for the unification of the Thin and Full LTO
compilation pipelines is to use Full LTO as the “unified” BC format. If the user
requests FullLTO – no additional work is needed, the linker will perform FullLTO
as usual. If the user request ThinLTO, the linker will call an API to regenerate
the module summary section for all the files in the FullLTO format and perform
ThinLTO as usual.
In reality I suspect things will be much more complicated. The pipelines for the
Thin and Full LTO compilation phases are quite different. ThinLTO can afford to
do much more optimization in the linking phase (since it has parallel backends
& smaller IR compared to FullLTO), while for FullLTO we are forced to move
some optimizations from linking to the compilation phase.
So, if we pick FullLTO as our unified format, we would increase the build time
for ThinLTO (we will be doing the FullLTO initial optimization pipeline in the
compile phase, which is more than what ThinLTO is currently doing, but the
pipeline of the optimizations in the backend will stay the same). It’s not clear
what will happen with the runtime performance: we might improve it (because we
repeat some of the optimizations several times), or we might make it worse
(because we might do an optimization in the early compilation phase, potentially
preventing more aggressive optimization later). What are your expectations? Will
this approach work in general? If so, what do you think will happen with the
runtime performance?
I also noticed that the pass manager pipeline is different for ThinLTO+Sample
PGO (use profile case). This might create some additional complications for
unification of Thin and FullLTO BC generation phase too, but it’s too small
detail to worry about right now. I’m more interested in choosing a right general
direction for solving this problem now.
Please share your thoughts!
Thank you!
Katya.

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180410/859f6ffd/attachment.html>

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Apr 2018 - exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline

[llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline

[llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline

[llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline

Maybe Matching Threads