thr3ads.net - llvm dev - [LLVMdev] [Proposal] Parallelize post-IPO stage. [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Shuxin Yang

2013-Jul-17 20:06 UTC

[LLVMdev] [Proposal] Parallelize post-IPO stage.

On 7/17/13 12:35 PM, Diego Novillo wrote:> On Fri, Jul 12, 2013 at 3:49 PM, Shuxin Yang <shuxin.llvm at
gmail.com> wrote:
>
>> 3. How to parallelize post-IPO stage
>> ===================================>>
>>    From 5k' high, the concept is very simple, just to
>>     step 1).divide the merged IR into small pieces,
>>     step 2).and compile each of this pieces independendly.
>>     step 3) the objects of each piece are fed back to linker to are
linked
>>             into an executable, or a dynamic lib.
> You seem to be describing GCC's strategy for whole program
> optimization (http://gcc.gnu.org/wiki/LinkTimeOptimization).
Hi,  Diego:

   Thank you for the comment. Quite honestly, I'm not very familiar with
gcc's LTO implementation, and now I'm not allowed to read any GPL V3
code.

What I'm describing here is somewhat similar to Open64's IPA 
implementation.
I did port gcc before and of course read its document . Based on my
little knowledge of its internal, I think gcc's "whole-program"
mode is almost identical to Open64's IPA in spirit, I guess gcc 
community likely
borrowed  some idea from Open64.

On the other hand, as far as I can understand of the oldish gcc's 
implementation
before I join Apple, I guess the "whole-program mode" in gcc is
misnomer.
I don't want call this work as "whole-program mode", I'd like
to call
"big program mode"
or something like that, as I don't care the binary being built see the 
whole-program or
not while the analyses/opt
do care the difference between them.
>
> What is a bit confusing in your description is that you seem to be
> wanting to do more work after IPO is *done*.
IPO is difficult to be parallelized.  What I try to do is to parallelize 
the post-lto compilation
stage, including, optionally LNO, and scalar opt and compile-time 
hogging CodeGen.
> If the optimizations are
> done, then there is no need to parallelize anything.
>
> In GCC, we have two parallel stages: the generation of bytecode (which
> is naturally parallelized by your build system) and the final
> optimization phase, which is parallelized by partitioning the
> callgraph into disjoint subsets.  These subsets are then parallelized
> by spawning the compiler on each of the partitions.I call the 1st "parallel stage" pre-ipo, and the 2nd "parallel
stage"
post-IPO:-),
and the IPO (which you call WPA) is sandwiched in the middle.
>
> The only sequential part is the actual analysis (what we call WPA or
> whole program analysis).While not include in the proposal, I was proposing another level 
(earlier) of partition in order
to build outrageously huge programs in order to parallelize the "WPA".
The benefit
is to parallize the IPO/"WPA", however, the cost is not seeing the 
"whole program".

One of my coworker told me we care about the benefit reaped from seeing 
the "whole program"
than the improved compile-time at the cost of compiling bunch of 
"incomplete program"s independently.

> That is a single threaded phase that makes
> all the optimization decisions, annotates the summaries on each
> callgraph node, partitions the callgraph and then spawns all the
> optimizers to work on each section of the callgraph.
>This is what I called "post-IPO" :-)

Eric Christopher

2013-Jul-17 20:16 UTC

head link

[LLVMdev] [Proposal] Parallelize post-IPO stage.

On Wed, Jul 17, 2013 at 1:06 PM, Shuxin Yang <shuxin.llvm at gmail.com>
wrote:>
> On 7/17/13 12:35 PM, Diego Novillo wrote:
>>
>> On Fri, Jul 12, 2013 at 3:49 PM, Shuxin Yang <shuxin.llvm at
gmail.com>
>> wrote:
>>
>>> 3. How to parallelize post-IPO stage
>>> ===================================>>>
>>>    From 5k' high, the concept is very simple, just to
>>>     step 1).divide the merged IR into small pieces,
>>>     step 2).and compile each of this pieces independendly.
>>>     step 3) the objects of each piece are fed back to linker to are
>>> linked
>>>             into an executable, or a dynamic lib.
>>
>> You seem to be describing GCC's strategy for whole program
>> optimization (http://gcc.gnu.org/wiki/LinkTimeOptimization).
>
>
> Hi,  Diego:
>
>   Thank you for the comment. Quite honestly, I'm not very familiar with
> gcc's LTO implementation, and now I'm not allowed to read any GPL
V3 code.
>
> What I'm describing here is somewhat similar to Open64's IPA
implementation.
> I did port gcc before and of course read its document . Based on my
> little knowledge of its internal, I think gcc's
"whole-program"
> mode is almost identical to Open64's IPA in spirit, I guess gcc
community
> likely
> borrowed  some idea from Open64.
>
> On the other hand, as far as I can understand of the oldish gcc's
> implementation
> before I join Apple, I guess the "whole-program mode" in gcc is
misnomer.
> I don't want call this work as "whole-program mode", I'd
like to call "big
> program mode"
> or something like that, as I don't care the binary being built see the
> whole-program or
> not while the analyses/opt
> do care the difference between them.
>
>
>>
>> What is a bit confusing in your description is that you seem to be
>> wanting to do more work after IPO is *done*.
>
>
> IPO is difficult to be parallelized.  What I try to do is to parallelize
the
> post-lto compilation
> stage, including, optionally LNO, and scalar opt and compile-time hogging
> CodeGen.
>
>
>> If the optimizations are
>> done, then there is no need to parallelize anything.
>>
>> In GCC, we have two parallel stages: the generation of bytecode (which
>> is naturally parallelized by your build system) and the final
>> optimization phase, which is parallelized by partitioning the
>> callgraph into disjoint subsets.  These subsets are then parallelized
>> by spawning the compiler on each of the partitions.
>
> I call the 1st "parallel stage" pre-ipo, and the 2nd
"parallel stage"
> post-IPO:-),
> and the IPO (which you call WPA) is sandwiched in the middle.
>
>
>>
>> The only sequential part is the actual analysis (what we call WPA or
>> whole program analysis).
>
> While not include in the proposal, I was proposing another level (earlier)
> of partition in order
> to build outrageously huge programs in order to parallelize the
"WPA".  The
> benefit
> is to parallize the IPO/"WPA", however, the cost is not seeing
the "whole
> program".
>
> One of my coworker told me we care about the benefit reaped from seeing the
> "whole program"
> than the improved compile-time at the cost of compiling bunch of
"incomplete
> program"s independently.
>
>
>
>> That is a single threaded phase that makes
>> all the optimization decisions, annotates the summaries on each
>> callgraph node, partitions the callgraph and then spawns all the
>> optimizers to work on each section of the callgraph.
>>
> This is what I called "post-IPO" :-)
>
OK, yeah, this was horribly confusing. IPA would probably have been a
better choice of term (or WPA) with the 'A' for analysis and the
'O'
for optimization where we're doing the work.

It makes your entire proposal make some sense now ;)

I'll need to go back and reread it with that in mind.

-eric

Xinliang David Li

2013-Jul-17 20:45 UTC

head link

[LLVMdev] [Proposal] Parallelize post-IPO stage.

There are so many related terminologies that are usually confused or
misused. It should be normalized before discussion. Here is one
version:

o IPA --> interprocedural analysis and optimization
o IPO/CMO --> interprocedural optimization (cross module IPA).   In
different context, IPO may mean slightly different things. For
instance, it may mean the whole compilation pipeline, or just the
serial part when all IR files are merged and analyzed.  IPO/CMO can
operate both whole program mode and non-whole program mode.

Note that IPA is more general than IPO/CMO, as it covers single module
case tool.

o LTO --> one type of implementation of IPO/CMO, which involves
linker/linker plugin. It means the whole IPO pipeline.

The following are a list of GCC specific terms:

o WHOPR --> GCC LTO in whole program mode
o WPA  --> the serial/merge part of the IPO pipelines
o LGEN --> pre-IPO compilation
o LTRANS -> post-IPO compilation.

Shuxin's proposal is about parallelizing 'LTRANS' in LLVM.

thanks,

David



On Wed, Jul 17, 2013 at 1:16 PM, Eric Christopher <echristo at gmail.com>
wrote:> On Wed, Jul 17, 2013 at 1:06 PM, Shuxin Yang <shuxin.llvm at
gmail.com> wrote:
>>
>> On 7/17/13 12:35 PM, Diego Novillo wrote:
>>>
>>> On Fri, Jul 12, 2013 at 3:49 PM, Shuxin Yang <shuxin.llvm at
gmail.com>
>>> wrote:
>>>
>>>> 3. How to parallelize post-IPO stage
>>>> ===================================>>>>
>>>>    From 5k' high, the concept is very simple, just to
>>>>     step 1).divide the merged IR into small pieces,
>>>>     step 2).and compile each of this pieces independendly.
>>>>     step 3) the objects of each piece are fed back to linker to
are
>>>> linked
>>>>             into an executable, or a dynamic lib.
>>>
>>> You seem to be describing GCC's strategy for whole program
>>> optimization (http://gcc.gnu.org/wiki/LinkTimeOptimization).
>>
>>
>> Hi,  Diego:
>>
>>   Thank you for the comment. Quite honestly, I'm not very familiar
with
>> gcc's LTO implementation, and now I'm not allowed to read any
GPL V3 code.
>>
>> What I'm describing here is somewhat similar to Open64's IPA
implementation.
>> I did port gcc before and of course read its document . Based on my
>> little knowledge of its internal, I think gcc's
"whole-program"
>> mode is almost identical to Open64's IPA in spirit, I guess gcc
community
>> likely
>> borrowed  some idea from Open64.
>>
>> On the other hand, as far as I can understand of the oldish gcc's
>> implementation
>> before I join Apple, I guess the "whole-program mode" in gcc
is misnomer.
>> I don't want call this work as "whole-program mode",
I'd like to call "big
>> program mode"
>> or something like that, as I don't care the binary being built see
the
>> whole-program or
>> not while the analyses/opt
>> do care the difference between them.
>>
>>
>>>
>>> What is a bit confusing in your description is that you seem to be
>>> wanting to do more work after IPO is *done*.
>>
>>
>> IPO is difficult to be parallelized.  What I try to do is to
parallelize the
>> post-lto compilation
>> stage, including, optionally LNO, and scalar opt and compile-time
hogging
>> CodeGen.
>>
>>
>>> If the optimizations are
>>> done, then there is no need to parallelize anything.
>>>
>>> In GCC, we have two parallel stages: the generation of bytecode
(which
>>> is naturally parallelized by your build system) and the final
>>> optimization phase, which is parallelized by partitioning the
>>> callgraph into disjoint subsets.  These subsets are then
parallelized
>>> by spawning the compiler on each of the partitions.
>>
>> I call the 1st "parallel stage" pre-ipo, and the 2nd
"parallel stage"
>> post-IPO:-),
>> and the IPO (which you call WPA) is sandwiched in the middle.
>>
>>
>>>
>>> The only sequential part is the actual analysis (what we call WPA
or
>>> whole program analysis).
>>
>> While not include in the proposal, I was proposing another level
(earlier)
>> of partition in order
>> to build outrageously huge programs in order to parallelize the
"WPA".  The
>> benefit
>> is to parallize the IPO/"WPA", however, the cost is not
seeing the "whole
>> program".
>>
>> One of my coworker told me we care about the benefit reaped from seeing
the
>> "whole program"
>> than the improved compile-time at the cost of compiling bunch of
"incomplete
>> program"s independently.
>>
>>
>>
>>> That is a single threaded phase that makes
>>> all the optimization decisions, annotates the summaries on each
>>> callgraph node, partitions the callgraph and then spawns all the
>>> optimizers to work on each section of the callgraph.
>>>
>> This is what I called "post-IPO" :-)
>>
>
> OK, yeah, this was horribly confusing. IPA would probably have been a
> better choice of term (or WPA) with the 'A' for analysis and the
'O'
> for optimization where we're doing the work.
>
> It makes your entire proposal make some sense now ;)
>
> I'll need to go back and reread it with that in mind.
>
> -eric
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Jul 2013 - [LLVMdev] [Proposal] Parallelize post-IPO stage.

[LLVMdev] [Proposal] Parallelize post-IPO stage.

[LLVMdev] [Proposal] Parallelize post-IPO stage.

[LLVMdev] [Proposal] Parallelize post-IPO stage.

Possibly Parallel Threads