thr3ads.net - llvm dev - [llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline [Apr 2018]

If this information is useful, please help other people find it:
Share via:

via llvm-dev

2018-Apr-11 18:20 UTC

[llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline

From: Mehdi AMINI <joker.eph at gmail.com>
Sent: Tuesday, April 10, 2018 11:53 PM
To: Romanova, Katya <katya.romanova at sony.com>
Cc: David Blaikie <dblaikie at gmail.com>; Teresa Johnson <tejohnson at
google.com>; llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO
frontend + initial optimization pipeline

Le mar. 10 avr. 2018 à 23:18, <katya.romanova at
sony.com<mailto:katya.romanova at sony.com>> a écrit :
Hi Mehdi,

Awesome! It’s a very clear design. The only question left is which pipeline to
choose for unified compile-phase optimization pipeline.

-        ThinLTO compile-phase pipeline? It might very negatively affect
compile-time and the memory footprint for FullLTO link-phase. That was the
reason why so many optimization were moved from the link-phase to the parallel
compile-phase for FullLTO in the first place.

Just to clarify: "optimizations" were not "moved from the
link-phase to the parallel compile-phase for FullLTO", they have never been
in the link phase for FullLTO. It has always been this way.

I see. What I meant was the following comment from the phabricator review about
defining the ThinLTO pipeline, but I didn’t remember its exact wording.
https://reviews.llvm.org/D17115
“On the contrary to Full LTO, ThinLTO can afford to shift compile time from the
frontend to the linker: both phases are parallel”.

I think that the ThinLTO compile-phase pipeline will only affect FullLTO in the
sense that we need to add more passes during the link phase, is this what you
meant?

Yes, that’s exactly what I meant.

-        FullLTO compile-phase pipeline?  More optimization passes at
compile-phase will obviously increase compile time for ThinLTO, though I suspect
it will be tolerable. It is not very clear how this choice will affect the
overall runtime performance for ThinLTO. Assuming we keep well-tuned
link-phase/backend optimization pipeline “as is” for ThinLTO and FullLTO, we
will repeat some optimization passes for ThinLTO at compile-phase and later at
link-phase which potentially could improve the performance… or it could make it
worse, because we might perform an optimization early at compile-time,
potentially preventing more aggressive optimization at link-phase when we see a
larger scope. Any prediction on what would happen to the ThinLTO runtime
performance at run-time?

Note: repeating optimization is not supposed to improve performance, at least
this isn't the goal of the pipeline.
The pipeline for ThinLTO has been modeled on O3, good or bad we felt there was
no reason to really deviate and any improvement to one could (should!) reflect
on the other.

The rational behind the ThinLTO pipeline is not only compile time: it split the
O3 pipeline at the point where we stop the "function simplification" /
inliner loop and before we get into unrolling/vectorization.
I remember even trying to stop the compile-phase without inlining but the
generated IR was too big: the inliner CGSCC visit actually reduces the size of
the IR considerably in some cases.

Thank you for sharing! It’s a very helpful.

Mehdi, It seems that you have spent a significant time experimenting with
ThinLTO pipeline and determining where exactly the compile-phase should end and
link-phase should start.  How do you envision unified ThinLTO/FullLTO
compile-phase pipeline? We might tune/improve this pipeline it in the future,
but having a good starting point is very important too.

-        New “unified” compile-phase pipeline?

I guess, there is not a definitive answer and we have to experiment, measure
compile-time/run-time performance and potentially make some adjustments to the
pipeline and to the thresholds. We have a few proprietary tests in Sony that we
could use for the performance measurements, but it will be nicer if there are
some open source benchmarks that we could use. What did you use in Google/Apple
for ThinLTO/FullLTO measurements? Have you used some proprietary benchmarks
also? It’s important to make sure we won’t have run-time/compile-time
performance degradation, but it will be nicer if anyone can run previously used
ThinLTO/FullLTO benchmarks oneself, while making changes to the optimization
pipeline and heuristics.

We benchmarked multiple variants of the pipeline two years ago. There were some
regressions when adoption the ThinLTO pipeline in FullLTO (and some
improvements), but when investigated we didn't find any real regressions
that couldn't be solved by fixing the optimizer.

When referring to ThinLTO and FullLTO pipelines here do you mean compile-phase
pipeline, link-phase pipeline or full pipeline (i.e., compile-phase +
link-phase)? The terminology is slightly confusing here.

I.e. these are cases where FullLTO gets it right "by luck" and not by
principle, and fixing such cases helps the non-LTO O3 (for example this test
case https://bugs.llvm.org/show_bug.cgi?id=27395 )

>> # No flag: use the compile-phase preference, perform ThinLTO on a.o and
FullLTO on b.o/c.o, but allow ThinLTO import between the LTO group and the
ThinLTO >> objects
>> $ clang a.o b.o c.o
If I understood you correctly, while doing ThinLTO on a.o, we could import from
b.o and c.o (this is possible since the summaries are available), while we won’t
see a.o when doing FullLTO for b.o/c.o. (i.e., the previous non-permeable
barrier between ThinLTO and FullLTO groups will become permeable in one
direction).

It could be permeable in both direction: b.o+c.o become "like a single
ThinLTO object" after they get merged.

I see…
However, do you think by doing this, we will achieve a better performance than
doing ThinLTO backend for all of the files (a.o, b.o, c.o)?

Performance is always very much use-case dependent.
One may know that a group of files performs better when they get merged together
with FullLTO while the rest of the app does not?

I don't know but this all needs to be carefully looked at from a
user-interface point of view I think (will it be intuitive for the users? Will
it fit in every (most) scenarios? etc.).

>> # No flag: use the compile-phase preference, perform ThinLTO on a.o and
FullLTO on b.o/c.o, but allow ThinLTO import between the LTO group and the
ThinLTO >> objects
>> $ clang a.o b.o c.oI wonder if we have a use-case for the “mix and match compile-phase preference”
situation that you described above? Maybe the linker should simply report an
error in this case? Or do we have to accept this because of backwards
compatibility?

Cheers,

--
Mehdi

Thank you!
Katya.

From: Mehdi AMINI <joker.eph at gmail.com<mailto:joker.eph at
gmail.com>>
Sent: Tuesday, April 10, 2018 5:25 PM
To: Romanova, Katya <katya.romanova at sony.com<mailto:katya.romanova at
sony.com>>
Cc: David Blaikie <dblaikie at gmail.com<mailto:dblaikie at
gmail.com>>; Teresa Johnson <tejohnson at
google.com<mailto:tejohnson at google.com>>; llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO
frontend + initial optimization pipeline

Hi,

It is non trivial to recompute summaries (which is why we have summaries in the
bitcode in the first place by the way), because bitcode is expensive to load.

I think shipping two different variant of the bitcode, one with and one without
summaries isn't providing much benefit while complicating the flow. We could
achieve what you're looking for by revisiting the flow a little.

I would try to consider if we can:

1) always generate summaries.
2) Use the same compile-phase optimization pipeline for ThinLTO and LTO.
3) Decide at link time if you want to do FullLTO or ThinLTO.

We haven't got this route 2 years ago because during the bringup we
didn't want to affect FullLTO in any way, but it may make sense now to have
`clang -flto=thin` and `clang -flto=full` be identical and change the linker
plugins to operate either in full-LTO mode or in ThinLTO mode but not
differentiate based on the availability of the summaries.

A possible behavior could be:

# The -flto flag in the compile phase does not change the produced bitcode but
for a flag that record the preference in the bitcode (FullLTO vs ThinLTO)
$ clang -c -flto=thin a.cpp
$ clang -c -flto=full b.cpp
$ clang -c -flto=full c.cpp

# At link time the behavior depends on the -flto flag passed in.

# No flag: use the compile-phase preference, perform ThinLTO on a.o and FullLTO
on b.o/c.o, but allow ThinLTO import between the LTO group and the ThinLTO
objects
$ clang a.o b.o c.o

# Forces full LTO, merges all the objects, no cross module importing will
happen.
clang a.o b.o c.o -flto=full

# Forces ThinLTO for all objects, FullLTO won't happen, no objects will be
merged.
clang a.o b.o c.o -flto=thin

Cheers,

--
Mehdi

Le mar. 10 avr. 2018 à 15:51, via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> a écrit :
Hi David,
Thank you so much for your reply!
>> You're dealing with a situation where you are shipped BC files
offline and then do one, or multiple builds with these BC files?Yes, that’s exactly the case.
>> If the scenario was more like a naive build: Multiple BC files
generated on a single (multi-core/threaded) machine (but some Thin, some
>> Full) & then fed to the linker, I would wonder if it'd be
relatively cheap for the LTO step to support this by computing summaries for
>> FullLTO files on the fly (without a separate tool/writing the summary
to disk, etc).
I think so. My understanding that for FullLTO files, it’s possible to perform
name anonymous globals pass and compute summaries on the fly, which should allow
to perform ThinLTO at link phase.

Katya.

From: David Blaikie <dblaikie at gmail.com<mailto:dblaikie at
gmail.com>>
Sent: Tuesday, April 10, 2018 7:38 AM
To: Romanova, Katya <katya.romanova at sony.com<mailto:katya.romanova at
sony.com>>; Teresa Johnson <tejohnson at google.com<mailto:tejohnson
at google.com>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO
frontend + initial optimization pipeline

Hi Katya,

[+Teresa since this is about ThinLTO & she's the owner there]

I'm not sure how other folks feel, but terminologically I'm not sure I
think of these as different formats (for example you mention the idea of
stripping the summaries from ThinLTO BC files to then feed them in as FullLTO
files - I would imagine it'd be reasonable to modify/fix/improve the linker
integration to have it (perhaps optionally) /ignore/ the summaries, or use the
summaries but in a non-siloed way (so that there's not that optimization
boundary between ThinLTO and FullLTO))

You're dealing with a situation where you are shipped BC files offline and
then do one, or multiple builds with these BC files?

If the scenario was more like a naive build: Multiple BC files generated on a
single (multi-core/threaded) machine (but some Thin, some Full) & then fed
to the linker, I would wonder if it'd be relatively cheap for the LTO step
to support this by computing summaries for FullLTO files on the fly (without a
separate tool/writing the summary to disk, etc). Though I suppose that'd
produce a pretty wildly different behavior in the link when just a single
ThinLTO BC file was added to an otherwise FullLTO build.

Anyway - just some (admittedly fairly uninformed) thoughts. I'm sure Teresa
has more informed ideas about how this might all look.
On Mon, Apr 9, 2018 at 12:20 PM via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hello,
I am exploring the possibility of unifying the BC file generation phase for
ThinLTO and FullLTO. Our third party library providers prefer to give us only
one version of the BC archives, rather than test and ship both Thin and Full LTO
BC archives. We want to find a way to allow our users to pick either Thin or
Full LTO, while having only one “unified” version of the BC archive.
Note, I am not necessarily proposing to do this work in the upstream compiler.
If there is no interest from other companies, we might have to keep this as a
private patch for Sony.
One of the ideas (not my preference) is to mix and match files in the Thin and
Full BC formats.  I'm not sure how well the "mix and match"
scenario works in general. I was wondering if Apple or Google are doing this for
production?
I wrote a toy example, compiled one group of files with ThinLTO and the rest
with FullLTO, linked them with gold. I saw that irrespective of whether the Thin
or Full LTO option was used at the link step, files are optimized within the
Thin group and within the Full group separately, but they don't know about
the files in the other group (which makes sense). Basically, the border between
Thin and Full LTO bitcode files created an artificial "barrier" which
prevented cross-border optimization.
Obviously, I am not too fond of this idea. Even if mixing and matching ThinLTO
and FullLTO bitcode files will work “as is”, I suspect we will see a non-trivial
runtime performance degradation because of the
"ThinLTO"/"FullLTO" border. Are you aware of any potential
problems with this solution, other than performance?

Another, hopefully, better idea is to introduce a "unified" BC format,
which could either be FullLTO, ThinLTO, or neither (e.g., something in between).
If the user chooses FullLTO at the link step, but some of the files are in the
Thin BC format – the linker will call a special LTO API to convert these files
to the Full LTO BC format (i.e., stripping the module summary section +
potentially do some additional optimizations from the FullLTO pass manager
pipeline).
If the user chooses ThinLTO at the link step, but some of the files are in the
Full BC format – the linker will call an LTO API to convert these files to the
Thin LTO bitcode format (by regenerating the module summary section dynamically
for the Full LTO bitcode files).
I think the most reasonable idea for the unification of the Thin and Full LTO
compilation pipelines is to use Full LTO as the “unified” BC format. If the user
requests FullLTO – no additional work is needed, the linker will perform FullLTO
as usual. If the user request ThinLTO, the linker will call an API to regenerate
the module summary section for all the files in the FullLTO format and perform
ThinLTO as usual.
In reality I suspect things will be much more complicated. The pipelines for the
Thin and Full LTO compilation phases are quite different. ThinLTO can afford to
do much more optimization in the linking phase (since it has parallel backends
& smaller IR compared to FullLTO), while for FullLTO we are forced to move
some optimizations from linking to the compilation phase.
So, if we pick FullLTO as our unified format, we would increase the build time
for ThinLTO (we will be doing the FullLTO initial optimization pipeline in the
compile phase, which is more than what ThinLTO is currently doing, but the
pipeline of the optimizations in the backend will stay the same). It’s not clear
what will happen with the runtime performance: we might improve it (because we
repeat some of the optimizations several times), or we might make it worse
(because we might do an optimization in the early compilation phase, potentially
preventing more aggressive optimization later). What are your expectations? Will
this approach work in general? If so, what do you think will happen with the
runtime performance?
I also noticed that the pass manager pipeline is different for ThinLTO+Sample
PGO (use profile case). This might create some additional complications for
unification of Thin and FullLTO BC generation phase too, but it’s too small
detail to worry about right now. I’m more interested in choosing a right general
direction for solving this problem now.
Please share your thoughts!
Thank you!
Katya.

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180411/16c1d4b0/attachment-0001.html>

Mehdi AMINI via llvm-dev

2018-Apr-11 19:18 UTC

head link

[llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline

Le mer. 11 avr. 2018 à 11:20, <katya.romanova at sony.com> a écrit :
>
>
>
>
> *From:* Mehdi AMINI <joker.eph at gmail.com>
> *Sent:* Tuesday, April 10, 2018 11:53 PM
> *To:* Romanova, Katya <katya.romanova at sony.com>
> *Cc:* David Blaikie <dblaikie at gmail.com>; Teresa Johnson <
> tejohnson at google.com>; llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] exploring possibilities for unifying ThinLTO
> and FullLTO frontend + initial optimization pipeline
>
>
>
>
>
> Le mar. 10 avr. 2018 à 23:18, <katya.romanova at sony.com> a écrit :
>
> Hi Mehdi,
>
>
>
> Awesome! It’s a very clear design. The only question left is which
> pipeline to choose for unified compile-phase optimization pipeline.
>
> -        ThinLTO compile-phase pipeline? It might very negatively affect
> compile-time and the memory footprint for FullLTO link-phase. That was the
> reason why so many optimization were moved from the link-phase to the
> parallel compile-phase for FullLTO in the first place.
>
>
>
> Just to clarify: "optimizations" were not "moved from the
link-phase to
> the parallel compile-phase for FullLTO", they have never been in the
link
> phase for FullLTO. It has always been this way.
>
>
>
> I see. What I meant was the following comment from the phabricator review
> about defining the ThinLTO pipeline, but I didn’t remember its exact
> wording.
>
> https://reviews.llvm.org/D17115
>
> “On the contrary to Full LTO, ThinLTO can afford to shift compile time
> from the frontend to the linker: both phases are parallel”.
>
>
>
> I think that the ThinLTO compile-phase pipeline will only affect FullLTO
> in the sense that we need to add more passes during the link phase, is this
> what you meant?
>
>
>
> Yes, that’s exactly what I meant.
>
>
>
>
> -        FullLTO compile-phase pipeline?  More optimization passes at
> compile-phase will obviously increase compile time for ThinLTO, though I
> suspect it will be tolerable. It is not very clear how this choice will
> affect the overall runtime performance for ThinLTO. Assuming we keep
> well-tuned link-phase/backend optimization pipeline “as is” for ThinLTO and
> FullLTO, we will repeat some optimization passes for ThinLTO at
> compile-phase and later at link-phase which potentially could improve the
> performance… or it could make it worse, because we might perform an
> optimization early at compile-time, potentially preventing more aggressive
> optimization at link-phase when we see a larger scope. Any prediction on
> what would happen to the ThinLTO runtime performance at run-time?
>
>
>
> Note: repeating optimization is not supposed to improve performance, at
> least this isn't the goal of the pipeline.
>
> The pipeline for ThinLTO has been modeled on O3, good or bad we felt there
> was no reason to really deviate and any improvement to one could (should!)
> reflect on the other.
>
>
>
> The rational behind the ThinLTO pipeline is not only compile time: it
> split the O3 pipeline at the point where we stop the "function
> simplification" / inliner loop and before we get into
> unrolling/vectorization.
>
> I remember even trying to stop the compile-phase without inlining but the
> generated IR was too big: the inliner CGSCC visit actually reduces the size
> of the IR considerably in some cases.
>
>
>
> Thank you for sharing! It’s a very helpful.
>
>
>
> Mehdi, It seems that you have spent a significant time experimenting with
> ThinLTO pipeline and determining where exactly the compile-phase should end
> and link-phase should start.  How do you envision unified ThinLTO/FullLTO
> compile-phase pipeline? We might tune/improve this pipeline it in the
> future, but having a good starting point is very important too.
>
I don't know: it is all about tradeoffs :)
I was in favor of using a single pipeline based on ~O3, the reason being
mainly that it is easier to maintain/validate/evolve: when folks improve
the O3 pipeline you get the benefit immediately in the ThinLTO optimization
phase, in contrary with FullLTO. The tradeoff is about compile-time: it can
become really long for FullLTO in some extreme cases. I suggested in the
past that such cases could be handled by running the FullLTO linker
optimization phase with O1 to reduce the amount of optimization.



>
>
> -        New “unified” compile-phase pipeline?
>
>
>
> I guess, there is not a definitive answer and we have to experiment,
> measure compile-time/run-time performance and potentially make some
> adjustments to the pipeline and to the thresholds. We have a few
> proprietary tests in Sony that we could use for the performance
> measurements, but it will be nicer if there are some open source benchmarks
> that we could use. What did you use in Google/Apple for ThinLTO/FullLTO
> measurements? Have you used some proprietary benchmarks also? It’s
> important to make sure we won’t have run-time/compile-time performance
> degradation, but it will be nicer if anyone can run previously used
> ThinLTO/FullLTO benchmarks oneself, while making changes to the
> optimization pipeline and heuristics.
>
>
>
> We benchmarked multiple variants of the pipeline two years ago. There were
> some regressions when adoption the ThinLTO pipeline in FullLTO (and some
> improvements), but when investigated we didn't find any real
regressions
> that couldn't be solved by fixing the optimizer.
>
>
>
> When referring to ThinLTO and FullLTO pipelines here do you mean
> compile-phase pipeline, link-phase pipeline or full pipeline (i.e.,
> compile-phase + link-phase)? The terminology is slightly confusing here.
>

Here I meant everything: trying to use the exact same pipeline in both
phases.


>
>
> I.e. these are cases where FullLTO gets it right "by luck" and
not by
> principle, and fixing such cases helps the non-LTO O3 (for example this
> test case https://bugs.llvm.org/show_bug.cgi?id=27395 )
>
>
>
>
>
> >> # No flag: use the compile-phase preference, perform ThinLTO on
a.o and
> FullLTO on b.o/c.o, but allow ThinLTO import between the LTO group and the
> ThinLTO >> objects
>
> >> $ clang a.o b.o c.o
>
>
>
> If I understood you correctly, while doing ThinLTO on a.o, we could import
> from b.o and c.o (this is possible since the summaries are available),
> while we won’t see a.o when doing FullLTO for b.o/c.o. (i.e., the previous
> non-permeable barrier between ThinLTO and FullLTO groups will become
> permeable in one direction).
>
>
>
> It could be permeable in both direction: b.o+c.o become "like a single
> ThinLTO object" after they get merged.
>
>
>
> I see…
>
> However, do you think by doing this, we will achieve a better performance
> than doing ThinLTO backend for all of the files (a.o, b.o, c.o)?
>
>
>
> Performance is always very much use-case dependent.
>
> One may know that a group of files performs better when they get merged
> together with FullLTO while the rest of the app does not?
>
>
>
> I don't know but this all needs to be carefully looked at from a
> user-interface point of view I think (will it be intuitive for the users?
> Will it fit in every (most) scenarios? etc.).
>
>
>
>
>
> >> # No flag: use the compile-phase preference, perform ThinLTO on
a.o and
> FullLTO on b.o/c.o, but allow ThinLTO import between the LTO group and the
> ThinLTO >> objects
>
> >> $ clang a.o b.o c.o
>
> I wonder if we have a use-case for the “mix and match compile-phase
> preference” situation that you described above? Maybe the linker should
> simply report an error in this case? Or do we have to accept this because
> of backwards compatibility?
>
I don't know :)
We need to consider the cases of "old" bitcode that wouldn't have
summaries
(maybe they could get merged in the LTO partition but not participate in
cross-module optimizations?)
We should hear from Apple folks as well.

-- 
Mehdi

>
>
>
>
>
>
>
> Thank you!
>
> Katya.
>
>
>
>
> *From:* Mehdi AMINI <joker.eph at gmail.com>
> *Sent:* Tuesday, April 10, 2018 5:25 PM
> *To:* Romanova, Katya <katya.romanova at sony.com>
> *Cc:* David Blaikie <dblaikie at gmail.com>; Teresa Johnson <
> tejohnson at google.com>; llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] exploring possibilities for unifying ThinLTO
> and FullLTO frontend + initial optimization pipeline
>
>
>
> Hi,
>
>
>
> It is non trivial to recompute summaries (which is why we have summaries
> in the bitcode in the first place by the way), because bitcode is expensive
> to load.
>
>
>
> I think shipping two different variant of the bitcode, one with and one
> without summaries isn't providing much benefit while complicating the
flow.
> We could achieve what you're looking for by revisiting the flow a
little.
>
>
>
> I would try to consider if we can:
>
>
>
> 1) always generate summaries.
>
> 2) Use the same compile-phase optimization pipeline for ThinLTO and LTO.
>
> 3) Decide at link time if you want to do FullLTO or ThinLTO.
>
>
>
> We haven't got this route 2 years ago because during the bringup we
didn't
> want to affect FullLTO in any way, but it may make sense now to have `clang
> -flto=thin` and `clang -flto=full` be identical and change the linker
> plugins to operate either in full-LTO mode or in ThinLTO mode but not
> differentiate based on the availability of the summaries.
>
>
>
> A possible behavior could be:
>
>
>
> # The -flto flag in the compile phase does not change the produced bitcode
> but for a flag that record the preference in the bitcode (FullLTO vs
> ThinLTO)
>
> $ clang -c -flto=thin a.cpp
>
> $ clang -c -flto=full b.cpp
>
> $ clang -c -flto=full c.cpp
>
>
>
> # At link time the behavior depends on the -flto flag passed in.
>
>
>
> # No flag: use the compile-phase preference, perform ThinLTO on a.o and
> FullLTO on b.o/c.o, but allow ThinLTO import between the LTO group and the
> ThinLTO objects
>
> $ clang a.o b.o c.o
>
>
>
> # Forces full LTO, merges all the objects, no cross module importing will
> happen.
>
> clang a.o b.o c.o -flto=full
>
>
>
> # Forces ThinLTO for all objects, FullLTO won't happen, no objects will
be
> merged.
>
> clang a.o b.o c.o -flto=thin
>
>
>
> Cheers,
>
>
>
> --
>
> Mehdi
>
>
>
>
>
>
>
>
>
> Le mar. 10 avr. 2018 à 15:51, via llvm-dev <llvm-dev at
lists.llvm.org> a
> écrit :
>
> Hi David,
>
> Thank you so much for your reply!
>
>
>
> >> You're dealing with a situation where you are shipped BC files
offline
> and then do one, or multiple builds with these BC files?
> Yes, that’s exactly the case.
>
>
>
> >> If the scenario was more like a naive build: Multiple BC files
> generated on a single (multi-core/threaded) machine (but some Thin, some
>
> >> Full) & then fed to the linker, I would wonder if it'd be
relatively
> cheap for the LTO step to support this by computing summaries for
>
> >> FullLTO files on the fly (without a separate tool/writing the
summary
> to disk, etc).
>
>
>
> I think so. My understanding that for FullLTO files, it’s possible to
> perform name anonymous globals pass and compute summaries on the fly, which
> should allow to perform ThinLTO at link phase.
>
>
>
> Katya.
>
>
>
> *From:* David Blaikie <dblaikie at gmail.com>
> *Sent:* Tuesday, April 10, 2018 7:38 AM
> *To:* Romanova, Katya <katya.romanova at sony.com>; Teresa Johnson
<
> tejohnson at google.com>
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] exploring possibilities for unifying ThinLTO
> and FullLTO frontend + initial optimization pipeline
>
>
>
> Hi Katya,
>
> [+Teresa since this is about ThinLTO & she's the owner there]
>
> I'm not sure how other folks feel, but terminologically I'm not
sure I
> think of these as different formats (for example you mention the idea of
> stripping the summaries from ThinLTO BC files to then feed them in as
> FullLTO files - I would imagine it'd be reasonable to
modify/fix/improve
> the linker integration to have it (perhaps optionally) /ignore/ the
> summaries, or use the summaries but in a non-siloed way (so that
there's
> not that optimization boundary between ThinLTO and FullLTO))
>
> You're dealing with a situation where you are shipped BC files offline
and
> then do one, or multiple builds with these BC files?
>
> If the scenario was more like a naive build: Multiple BC files generated
> on a single (multi-core/threaded) machine (but some Thin, some Full) &
then
> fed to the linker, I would wonder if it'd be relatively cheap for the
LTO
> step to support this by computing summaries for FullLTO files on the fly
> (without a separate tool/writing the summary to disk, etc). Though I
> suppose that'd produce a pretty wildly different behavior in the link
when
> just a single ThinLTO BC file was added to an otherwise FullLTO build.
>
> Anyway - just some (admittedly fairly uninformed) thoughts. I'm sure
> Teresa has more informed ideas about how this might all look.
>
> On Mon, Apr 9, 2018 at 12:20 PM via llvm-dev <llvm-dev at
lists.llvm.org>
> wrote:
>
> Hello,
>
> I am exploring the possibility of unifying the BC file generation phase
> for ThinLTO and FullLTO. Our third party library providers prefer to give
> us only one version of the BC archives, rather than test and ship both Thin
> and Full LTO BC archives. We want to find a way to allow our users to pick
> either Thin or Full LTO, while having only one “unified” version of the BC
> archive.
>
> Note, I am not necessarily proposing to do this work in the upstream
> compiler. If there is no interest from other companies, we might have to
> keep this as a private patch for Sony.
>
> One of the ideas (not my preference) is to mix and match files in the Thin
> and Full BC formats.  I'm not sure how well the "mix and
match" scenario
> works in general. I was wondering if Apple or Google are doing this for
> production?
>
> I wrote a toy example, compiled one group of files with ThinLTO and the
> rest with FullLTO, linked them with gold. I saw that irrespective of
> whether the Thin or Full LTO option was used at the link step, files are
> optimized within the Thin group and within the Full group separately, but
> they don't know about the files in the other group (which makes sense).
> Basically, the border between Thin and Full LTO bitcode files created an
> artificial "barrier" which prevented cross-border optimization.
>
> Obviously, I am not too fond of this idea. Even if mixing and matching
> ThinLTO and FullLTO bitcode files will work “as is”, I suspect we will see
> a non-trivial runtime performance degradation because of the
> "ThinLTO"/"FullLTO" border. Are you aware of any
potential problems with
> this solution, other than performance?
>
>
>
> Another, hopefully, better idea is to introduce a "unified" BC
format,
> which could either be FullLTO, ThinLTO, or neither (e.g., something in
> between).
>
> If the user chooses FullLTO at the link step, but some of the files are in
> the Thin BC format – the linker will call a special LTO API to convert
> these files to the Full LTO BC format (i.e., stripping the module summary
> section + potentially do some additional optimizations from the FullLTO
> pass manager pipeline).
>
> If the user chooses ThinLTO at the link step, but some of the files are in
> the Full BC format – the linker will call an LTO API to convert these files
> to the Thin LTO bitcode format (by regenerating the module summary section
> dynamically for the Full LTO bitcode files).
>
> I think the most reasonable idea for the unification of the Thin and Full
> LTO compilation pipelines is to use Full LTO as the “unified” BC format. If
> the user requests FullLTO – no additional work is needed, the linker will
> perform FullLTO as usual. If the user request ThinLTO, the linker will call
> an API to regenerate the module summary section for all the files in the
> FullLTO format and perform ThinLTO as usual.
>
> In reality I suspect things will be much more complicated. The pipelines
> for the Thin and Full LTO compilation phases are quite different. ThinLTO
> can afford to do much more optimization in the linking phase (since it has
> parallel backends & smaller IR compared to FullLTO), while for FullLTO
we
> are forced to move some optimizations from linking to the compilation
phase.
>
> So, if we pick FullLTO as our unified format, we would increase the build
> time for ThinLTO (we will be doing the FullLTO initial optimization
> pipeline in the compile phase, which is more than what ThinLTO is currently
> doing, but the pipeline of the optimizations in the backend will stay the
> same). It’s not clear what will happen with the runtime performance: we
> might improve it (because we repeat some of the optimizations several
> times), or we might make it worse (because we might do an optimization in
> the early compilation phase, potentially preventing more aggressive
> optimization later). What are your expectations? Will this approach work in
> general? If so, what do you think will happen with the runtime performance?
>
> I also noticed that the pass manager pipeline is different for
> ThinLTO+Sample PGO (use profile case). This might create some additional
> complications for unification of Thin and FullLTO BC generation phase too,
> but it’s too small detail to worry about right now. I’m more interested in
> choosing a right general direction for solving this problem now.
>
> Please share your thoughts!
>
> Thank you!
>
> Katya.
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180411/132975f8/attachment.html>

Mehdi AMINI via llvm-dev

2018-Apr-11 19:18 UTC

head link

[llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline

See attached some quick slides (backup from the dev meeting talk) about the
pass pipeline.



-- 
Mehdi

Le mer. 11 avr. 2018 à 12:18, Mehdi AMINI <joker.eph at gmail.com> a écrit
:
>
>
> Le mer. 11 avr. 2018 à 11:20, <katya.romanova at sony.com> a écrit :
>
>>
>>
>>
>>
>> *From:* Mehdi AMINI <joker.eph at gmail.com>
>> *Sent:* Tuesday, April 10, 2018 11:53 PM
>> *To:* Romanova, Katya <katya.romanova at sony.com>
>> *Cc:* David Blaikie <dblaikie at gmail.com>; Teresa Johnson <
>> tejohnson at google.com>; llvm-dev <llvm-dev at
lists.llvm.org>
>> *Subject:* Re: [llvm-dev] exploring possibilities for unifying ThinLTO
>> and FullLTO frontend + initial optimization pipeline
>>
>>
>>
>>
>>
>> Le mar. 10 avr. 2018 à 23:18, <katya.romanova at sony.com> a
écrit :
>>
>> Hi Mehdi,
>>
>>
>>
>> Awesome! It’s a very clear design. The only question left is which
>> pipeline to choose for unified compile-phase optimization pipeline.
>>
>> -        ThinLTO compile-phase pipeline? It might very negatively
affect
>> compile-time and the memory footprint for FullLTO link-phase. That was
the
>> reason why so many optimization were moved from the link-phase to the
>> parallel compile-phase for FullLTO in the first place.
>>
>>
>>
>> Just to clarify: "optimizations" were not "moved from
the link-phase to
>> the parallel compile-phase for FullLTO", they have never been in
the link
>> phase for FullLTO. It has always been this way.
>>
>>
>>
>> I see. What I meant was the following comment from the phabricator
review
>> about defining the ThinLTO pipeline, but I didn’t remember its exact
>> wording.
>>
>> https://reviews.llvm.org/D17115
>>
>> “On the contrary to Full LTO, ThinLTO can afford to shift compile time
>> from the frontend to the linker: both phases are parallel”.
>>
>>
>>
>> I think that the ThinLTO compile-phase pipeline will only affect
FullLTO
>> in the sense that we need to add more passes during the link phase, is
this
>> what you meant?
>>
>>
>>
>> Yes, that’s exactly what I meant.
>>
>>
>>
>>
>> -        FullLTO compile-phase pipeline?  More optimization passes at
>> compile-phase will obviously increase compile time for ThinLTO, though
I
>> suspect it will be tolerable. It is not very clear how this choice will
>> affect the overall runtime performance for ThinLTO. Assuming we keep
>> well-tuned link-phase/backend optimization pipeline “as is” for ThinLTO
and
>> FullLTO, we will repeat some optimization passes for ThinLTO at
>> compile-phase and later at link-phase which potentially could improve
the
>> performance… or it could make it worse, because we might perform an
>> optimization early at compile-time, potentially preventing more
aggressive
>> optimization at link-phase when we see a larger scope. Any prediction
on
>> what would happen to the ThinLTO runtime performance at run-time?
>>
>>
>>
>> Note: repeating optimization is not supposed to improve performance, at
>> least this isn't the goal of the pipeline.
>>
>> The pipeline for ThinLTO has been modeled on O3, good or bad we felt
>> there was no reason to really deviate and any improvement to one could
>> (should!) reflect on the other.
>>
>>
>>
>> The rational behind the ThinLTO pipeline is not only compile time: it
>> split the O3 pipeline at the point where we stop the "function
>> simplification" / inliner loop and before we get into
>> unrolling/vectorization.
>>
>> I remember even trying to stop the compile-phase without inlining but
the
>> generated IR was too big: the inliner CGSCC visit actually reduces the
size
>> of the IR considerably in some cases.
>>
>>
>>
>> Thank you for sharing! It’s a very helpful.
>>
>>
>>
>> Mehdi, It seems that you have spent a significant time experimenting
with
>> ThinLTO pipeline and determining where exactly the compile-phase should
end
>> and link-phase should start.  How do you envision unified
ThinLTO/FullLTO
>> compile-phase pipeline? We might tune/improve this pipeline it in the
>> future, but having a good starting point is very important too.
>>
>
> I don't know: it is all about tradeoffs :)
> I was in favor of using a single pipeline based on ~O3, the reason being
> mainly that it is easier to maintain/validate/evolve: when folks improve
> the O3 pipeline you get the benefit immediately in the ThinLTO optimization
> phase, in contrary with FullLTO. The tradeoff is about compile-time: it can
> become really long for FullLTO in some extreme cases. I suggested in the
> past that such cases could be handled by running the FullLTO linker
> optimization phase with O1 to reduce the amount of optimization.
>
>
>
>
>>
>>
>> -        New “unified” compile-phase pipeline?
>>
>>
>>
>> I guess, there is not a definitive answer and we have to experiment,
>> measure compile-time/run-time performance and potentially make some
>> adjustments to the pipeline and to the thresholds. We have a few
>> proprietary tests in Sony that we could use for the performance
>> measurements, but it will be nicer if there are some open source
benchmarks
>> that we could use. What did you use in Google/Apple for ThinLTO/FullLTO
>> measurements? Have you used some proprietary benchmarks also? It’s
>> important to make sure we won’t have run-time/compile-time performance
>> degradation, but it will be nicer if anyone can run previously used
>> ThinLTO/FullLTO benchmarks oneself, while making changes to the
>> optimization pipeline and heuristics.
>>
>>
>>
>> We benchmarked multiple variants of the pipeline two years ago. There
>> were some regressions when adoption the ThinLTO pipeline in FullLTO
(and
>> some improvements), but when investigated we didn't find any real
>> regressions that couldn't be solved by fixing the optimizer.
>>
>>
>>
>> When referring to ThinLTO and FullLTO pipelines here do you mean
>> compile-phase pipeline, link-phase pipeline or full pipeline (i.e.,
>> compile-phase + link-phase)? The terminology is slightly confusing
here.
>>
>
>
> Here I meant everything: trying to use the exact same pipeline in both
> phases.
>
>
>
>>
>>
>> I.e. these are cases where FullLTO gets it right "by luck"
and not by
>> principle, and fixing such cases helps the non-LTO O3 (for example this
>> test case https://bugs.llvm.org/show_bug.cgi?id=27395 )
>>
>>
>>
>>
>>
>> >> # No flag: use the compile-phase preference, perform ThinLTO
on a.o
>> and FullLTO on b.o/c.o, but allow ThinLTO import between the LTO group
and
>> the ThinLTO >> objects
>>
>> >> $ clang a.o b.o c.o
>>
>>
>>
>> If I understood you correctly, while doing ThinLTO on a.o, we could
>> import from b.o and c.o (this is possible since the summaries are
>> available), while we won’t see a.o when doing FullLTO for b.o/c.o.
(i.e.,
>> the previous non-permeable barrier between ThinLTO and FullLTO groups
will
>> become permeable in one direction).
>>
>>
>>
>> It could be permeable in both direction: b.o+c.o become "like a
single
>> ThinLTO object" after they get merged.
>>
>>
>>
>> I see…
>>
>> However, do you think by doing this, we will achieve a better
performance
>> than doing ThinLTO backend for all of the files (a.o, b.o, c.o)?
>>
>>
>>
>> Performance is always very much use-case dependent.
>>
>> One may know that a group of files performs better when they get merged
>> together with FullLTO while the rest of the app does not?
>>
>>
>>
>> I don't know but this all needs to be carefully looked at from a
>> user-interface point of view I think (will it be intuitive for the
users?
>> Will it fit in every (most) scenarios? etc.).
>>
>>
>>
>>
>>
>> >> # No flag: use the compile-phase preference, perform ThinLTO
on a.o
>> and FullLTO on b.o/c.o, but allow ThinLTO import between the LTO group
and
>> the ThinLTO >> objects
>>
>> >> $ clang a.o b.o c.o
>>
>> I wonder if we have a use-case for the “mix and match compile-phase
>> preference” situation that you described above? Maybe the linker should
>> simply report an error in this case? Or do we have to accept this
because
>> of backwards compatibility?
>>
>
> I don't know :)
> We need to consider the cases of "old" bitcode that wouldn't
have
> summaries (maybe they could get merged in the LTO partition but not
> participate in cross-module optimizations?)
> We should hear from Apple folks as well.
>
> --
> Mehdi
>
>
>>
>>
>>
>>
>>
>>
>>
>> Thank you!
>>
>> Katya.
>>
>>
>>
>>
>> *From:* Mehdi AMINI <joker.eph at gmail.com>
>> *Sent:* Tuesday, April 10, 2018 5:25 PM
>> *To:* Romanova, Katya <katya.romanova at sony.com>
>> *Cc:* David Blaikie <dblaikie at gmail.com>; Teresa Johnson <
>> tejohnson at google.com>; llvm-dev <llvm-dev at
lists.llvm.org>
>> *Subject:* Re: [llvm-dev] exploring possibilities for unifying ThinLTO
>> and FullLTO frontend + initial optimization pipeline
>>
>>
>>
>> Hi,
>>
>>
>>
>> It is non trivial to recompute summaries (which is why we have
summaries
>> in the bitcode in the first place by the way), because bitcode is
expensive
>> to load.
>>
>>
>>
>> I think shipping two different variant of the bitcode, one with and one
>> without summaries isn't providing much benefit while complicating
the flow.
>> We could achieve what you're looking for by revisiting the flow a
little.
>>
>>
>>
>> I would try to consider if we can:
>>
>>
>>
>> 1) always generate summaries.
>>
>> 2) Use the same compile-phase optimization pipeline for ThinLTO and
LTO.
>>
>> 3) Decide at link time if you want to do FullLTO or ThinLTO.
>>
>>
>>
>> We haven't got this route 2 years ago because during the bringup we
>> didn't want to affect FullLTO in any way, but it may make sense now
to have
>> `clang -flto=thin` and `clang -flto=full` be identical and change the
>> linker plugins to operate either in full-LTO mode or in ThinLTO mode
but
>> not differentiate based on the availability of the summaries.
>>
>>
>>
>> A possible behavior could be:
>>
>>
>>
>> # The -flto flag in the compile phase does not change the produced
>> bitcode but for a flag that record the preference in the bitcode
(FullLTO
>> vs ThinLTO)
>>
>> $ clang -c -flto=thin a.cpp
>>
>> $ clang -c -flto=full b.cpp
>>
>> $ clang -c -flto=full c.cpp
>>
>>
>>
>> # At link time the behavior depends on the -flto flag passed in.
>>
>>
>>
>> # No flag: use the compile-phase preference, perform ThinLTO on a.o and
>> FullLTO on b.o/c.o, but allow ThinLTO import between the LTO group and
the
>> ThinLTO objects
>>
>> $ clang a.o b.o c.o
>>
>>
>>
>> # Forces full LTO, merges all the objects, no cross module importing
will
>> happen.
>>
>> clang a.o b.o c.o -flto=full
>>
>>
>>
>> # Forces ThinLTO for all objects, FullLTO won't happen, no objects
will
>> be merged.
>>
>> clang a.o b.o c.o -flto=thin
>>
>>
>>
>> Cheers,
>>
>>
>>
>> --
>>
>> Mehdi
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Le mar. 10 avr. 2018 à 15:51, via llvm-dev <llvm-dev at
lists.llvm.org> a
>> écrit :
>>
>> Hi David,
>>
>> Thank you so much for your reply!
>>
>>
>>
>> >> You're dealing with a situation where you are shipped BC
files offline
>> and then do one, or multiple builds with these BC files?
>> Yes, that’s exactly the case.
>>
>>
>>
>> >> If the scenario was more like a naive build: Multiple BC files
>> generated on a single (multi-core/threaded) machine (but some Thin,
some
>>
>> >> Full) & then fed to the linker, I would wonder if it'd
be relatively
>> cheap for the LTO step to support this by computing summaries for
>>
>> >> FullLTO files on the fly (without a separate tool/writing the
summary
>> to disk, etc).
>>
>>
>>
>> I think so. My understanding that for FullLTO files, it’s possible to
>> perform name anonymous globals pass and compute summaries on the fly,
which
>> should allow to perform ThinLTO at link phase.
>>
>>
>>
>> Katya.
>>
>>
>>
>> *From:* David Blaikie <dblaikie at gmail.com>
>> *Sent:* Tuesday, April 10, 2018 7:38 AM
>> *To:* Romanova, Katya <katya.romanova at sony.com>; Teresa
Johnson <
>> tejohnson at google.com>
>> *Cc:* llvm-dev at lists.llvm.org
>> *Subject:* Re: [llvm-dev] exploring possibilities for unifying ThinLTO
>> and FullLTO frontend + initial optimization pipeline
>>
>>
>>
>> Hi Katya,
>>
>> [+Teresa since this is about ThinLTO & she's the owner there]
>>
>> I'm not sure how other folks feel, but terminologically I'm not
sure I
>> think of these as different formats (for example you mention the idea
of
>> stripping the summaries from ThinLTO BC files to then feed them in as
>> FullLTO files - I would imagine it'd be reasonable to
modify/fix/improve
>> the linker integration to have it (perhaps optionally) /ignore/ the
>> summaries, or use the summaries but in a non-siloed way (so that
there's
>> not that optimization boundary between ThinLTO and FullLTO))
>>
>> You're dealing with a situation where you are shipped BC files
offline
>> and then do one, or multiple builds with these BC files?
>>
>> If the scenario was more like a naive build: Multiple BC files
generated
>> on a single (multi-core/threaded) machine (but some Thin, some Full)
& then
>> fed to the linker, I would wonder if it'd be relatively cheap for
the LTO
>> step to support this by computing summaries for FullLTO files on the
fly
>> (without a separate tool/writing the summary to disk, etc). Though I
>> suppose that'd produce a pretty wildly different behavior in the
link when
>> just a single ThinLTO BC file was added to an otherwise FullLTO build.
>>
>> Anyway - just some (admittedly fairly uninformed) thoughts. I'm
sure
>> Teresa has more informed ideas about how this might all look.
>>
>> On Mon, Apr 9, 2018 at 12:20 PM via llvm-dev <llvm-dev at
lists.llvm.org>
>> wrote:
>>
>> Hello,
>>
>> I am exploring the possibility of unifying the BC file generation phase
>> for ThinLTO and FullLTO. Our third party library providers prefer to
give
>> us only one version of the BC archives, rather than test and ship both
Thin
>> and Full LTO BC archives. We want to find a way to allow our users to
pick
>> either Thin or Full LTO, while having only one “unified” version of the
BC
>> archive.
>>
>> Note, I am not necessarily proposing to do this work in the upstream
>> compiler. If there is no interest from other companies, we might have
to
>> keep this as a private patch for Sony.
>>
>> One of the ideas (not my preference) is to mix and match files in the
>> Thin and Full BC formats.  I'm not sure how well the "mix and
match"
>> scenario works in general. I was wondering if Apple or Google are doing
>> this for production?
>>
>> I wrote a toy example, compiled one group of files with ThinLTO and the
>> rest with FullLTO, linked them with gold. I saw that irrespective of
>> whether the Thin or Full LTO option was used at the link step, files
are
>> optimized within the Thin group and within the Full group separately,
but
>> they don't know about the files in the other group (which makes
sense).
>> Basically, the border between Thin and Full LTO bitcode files created
an
>> artificial "barrier" which prevented cross-border
optimization.
>>
>> Obviously, I am not too fond of this idea. Even if mixing and matching
>> ThinLTO and FullLTO bitcode files will work “as is”, I suspect we will
see
>> a non-trivial runtime performance degradation because of the
>> "ThinLTO"/"FullLTO" border. Are you aware of any
potential problems with
>> this solution, other than performance?
>>
>>
>>
>> Another, hopefully, better idea is to introduce a "unified"
BC format,
>> which could either be FullLTO, ThinLTO, or neither (e.g., something in
>> between).
>>
>> If the user chooses FullLTO at the link step, but some of the files are
>> in the Thin BC format – the linker will call a special LTO API to
convert
>> these files to the Full LTO BC format (i.e., stripping the module
summary
>> section + potentially do some additional optimizations from the FullLTO
>> pass manager pipeline).
>>
>> If the user chooses ThinLTO at the link step, but some of the files are
>> in the Full BC format – the linker will call an LTO API to convert
these
>> files to the Thin LTO bitcode format (by regenerating the module
summary
>> section dynamically for the Full LTO bitcode files).
>>
>> I think the most reasonable idea for the unification of the Thin and
Full
>> LTO compilation pipelines is to use Full LTO as the “unified” BC
format. If
>> the user requests FullLTO – no additional work is needed, the linker
will
>> perform FullLTO as usual. If the user request ThinLTO, the linker will
call
>> an API to regenerate the module summary section for all the files in
the
>> FullLTO format and perform ThinLTO as usual.
>>
>> In reality I suspect things will be much more complicated. The
pipelines
>> for the Thin and Full LTO compilation phases are quite different.
ThinLTO
>> can afford to do much more optimization in the linking phase (since it
has
>> parallel backends & smaller IR compared to FullLTO), while for
FullLTO we
>> are forced to move some optimizations from linking to the compilation
phase.
>>
>> So, if we pick FullLTO as our unified format, we would increase the
build
>> time for ThinLTO (we will be doing the FullLTO initial optimization
>> pipeline in the compile phase, which is more than what ThinLTO is
currently
>> doing, but the pipeline of the optimizations in the backend will stay
the
>> same). It’s not clear what will happen with the runtime performance: we
>> might improve it (because we repeat some of the optimizations several
>> times), or we might make it worse (because we might do an optimization
in
>> the early compilation phase, potentially preventing more aggressive
>> optimization later). What are your expectations? Will this approach
work in
>> general? If so, what do you think will happen with the runtime
performance?
>>
>> I also noticed that the pass manager pipeline is different for
>> ThinLTO+Sample PGO (use profile case). This might create some
additional
>> complications for unification of Thin and FullLTO BC generation phase
too,
>> but it’s too small detail to worry about right now. I’m more interested
in
>> choosing a right general direction for solving this problem now.
>>
>> Please share your thoughts!
>>
>> Thank you!
>>
>> Katya.
>>
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180411/ccd0c689/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ThinLTO Pipeline.pdf
Type: application/pdf
Size: 383195 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180411/ccd0c689/attachment-0001.pdf>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Apr 2018 - exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline

[llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline

[llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline

[llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline

Apparently Analagous Threads