thr3ads.net - llvm dev - [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Wan, Xiaofei

2013-Jul-16 10:33 UTC

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

Hi, community:

For the sake of our business need, I want to enable "Function-based
parallel code generation" to boost up the compilation of single module,
please see the details of the design and provide your feedbacks on below
aspects, thanks!
1. Is this idea the proper solution for my requirement
2. This new feature will be enabled by llc -thd=N and has no impact on original
llc when -thd=1
3. Can this new feature of llc be accepted by community and merged into LLVM
code tree

Patches
The patch is divided into four separated parts, the all-in-one patch could be
found here:
http://llvm-reviews.chandlerc.com/D1152

Design
https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing

Background
1. Our business need to compile C/C++ source files into LLVM IR and link them
into a big BC file; the big BC file is then compiled into binary code on
different arch/target devices.
2. Backend code generation is a time-consuming activity happened on target
device which makes it an important user experience.
3. Make -j or file based parallelism can't help here since there is only one
big BC file; function-based parallel LLVM backend code generation is a good
solution to improve compilation time which will fully utilize multi-cores.

Overall design strategy and goal
1. Generate totally same binary as what single thread output
2. No impacts on single thread performance & conformance
3. Little impacts on LLVM code infrastructure

Current status and test result
1. Parallel llc can generate same code as single thread by "objdump
-d", it could pass 10 hours stress test for all performance benchmark
2. Parallel llc can introduce ~2.9X performance gain on XEON sever for 4 threads

Thanks
Wan Xiaofei
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Parallel.CG.7z
Type: application/octet-stream
Size: 24682 bytes
Desc: Parallel.CG.7z
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130716/9fa272e0/attachment.obj>

Chandler Carruth

2013-Jul-16 10:46 UTC

head link

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

While I think the end goal you're describing is close to the correct one, I
see the high-level strategy for getting there somewhat differently:

1) The code generators are only one collection of function passes that
might be parallelized. Many others might also be parallelized profitably.
The design for parallelism within LLVM's pass management infrastructure
should be sufficiently generic to express all of these use cases.

2) The idea of having multiple pass managers necessitates (unless I
misunderstand) duplicating a fair amount of state. For example, the caches
in immutable analysis passes would no longer be shared, etc. I think that
is really unfortunate, and would prefer instead to use parallelizing pass
managers that are in fact responsible for the scheduling of passes.

3) It doesn't provide a strategy for parallelizing the leaves of a CGSCC
pass manager which is where a significant portion of the potential
parallelism is available within the middle end.

4) It doesn't deal with the (numerous) parts of LLVM that are not actually
thread safe today. They may happen to work with the code generators you're
happening to test, but there is no guarantee. Notable things to think about
here are computing new types, the use-def lists of globals, commandline
flags, and static state variables. While our intent has been to avoid
problems with the last two that could preclude parallelism, it seems
unlikely that we have succeeded without thorough testing to this point.
Instead, I fear we have leaned heavily on the crutch of
one-thread-per-LLVMContext.

5) It adds more complexity onto the poorly designed pass manager
infrastructure. Personally, I think that cleanups to the design and
architecture of the pass manager should be prioritized above adding new
functionality like parallelism. However, so far no one has really had time
to do this (including myself). While I would like to have time in the
future to do this, as with everything else in OSS, it won't be real until
the patches start flowing.

On Tue, Jul 16, 2013 at 3:33 AM, Wan, Xiaofei <xiaofei.wan at intel.com>
wrote:
> Hi, community:
>
> For the sake of our business need, I want to enable "Function-based
> parallel code generation" to boost up the compilation of single
module,
> please see the details of the design and provide your feedbacks on below
> aspects, thanks!
> 1. Is this idea the proper solution for my requirement
> 2. This new feature will be enabled by llc -thd=N and has no impact on
> original llc when -thd=1
> 3. Can this new feature of llc be accepted by community and merged into
> LLVM code tree
>
> Patches
> The patch is divided into four separated parts, the all-in-one patch could
> be found here:
> http://llvm-reviews.chandlerc.com/D1152
>
> Design
>
>
https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing
>
>
> Background
> 1. Our business need to compile C/C++ source files into LLVM IR and link
> them into a big BC file; the big BC file is then compiled into binary code
> on different arch/target devices.
> 2. Backend code generation is a time-consuming activity happened on target
> device which makes it an important user experience.
> 3. Make -j or file based parallelism can't help here since there is
only
> one big BC file; function-based parallel LLVM backend code generation is a
> good solution to improve compilation time which will fully utilize
> multi-cores.
>
> Overall design strategy and goal
> 1. Generate totally same binary as what single thread output
> 2. No impacts on single thread performance & conformance
> 3. Little impacts on LLVM code infrastructure
>
> Current status and test result
> 1. Parallel llc can generate same code as single thread by "objdump
-d",
> it could pass 10 hours stress test for all performance benchmark
> 2. Parallel llc can introduce ~2.9X performance gain on XEON sever for 4
> threads
>
>
> Thanks
> Wan Xiaofei
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130716/61096a69/attachment.html>

Wan, Xiaofei

2013-Jul-16 11:37 UTC

head link

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

Thanks for your comments, see my reply below, thanks.

Thanks
Wan Xiaofei
From: Chandler Carruth [mailto:chandlerc at google.com]
Sent: Tuesday, July 16, 2013 6:47 PM
To: Wan, Xiaofei
Cc: LLVM Developers Mailing List (llvmdev at cs.uiuc.edu)
Subject: Re: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM
backend code generation

While I think the end goal you're describing is close to the correct one, I
see the high-level strategy for getting there somewhat differently:

1) The code generators are only one collection of function passes that might be
parallelized. Many others might also be parallelized profitably. The design for
parallelism within LLVM's pass management infrastructure should be
sufficiently generic to express all of these use cases.
[xiaofei], yes, only passes in function pass manager are parallelized, it is
enough to meet our requirement since 95% of time in llc are in function passes.

2) The idea of having multiple pass managers necessitates (unless I
misunderstand) duplicating a fair amount of state. For example, the caches in
immutable analysis passes would no longer be shared, etc. I think that is really
unfortunate, and would prefer instead to use parallelizing pass managers that
are in fact responsible for the scheduling of passes.

[ Xiaofei ] For immutable passes, they are not parallelized, actually, only
passes in function pass manager are parallelized
The reason why I start multiple pass manager is, make original code
infrastructure stable, each thread has its own PM, then consume functions
independently.

3) It doesn't provide a strategy for parallelizing the leaves of a CGSCC
pass manager which is where a significant portion of the potential parallelism
is available within the middle end.

4) It doesn't deal with the (numerous) parts of LLVM that are not actually
thread safe today. They may happen to work with the code generators you're
happening to test, but there is no guarantee. Notable things to think about here
are computing new types, the use-def lists of globals, commandline flags, and
static state variables. While our intent has been to avoid problems with the
last two that could preclude parallelism, it seems unlikely that we have
succeeded without thorough testing to this point. Instead, I fear we have leaned
heavily on the crutch of one-thread-per-LLVMContext.

[Xiaofei] we consider all the aspects you are listing, otherwise, it can’t pass
any test cases, now we could pass all benchmarks and almost all unit test cases
especial cases.

5) It adds more complexity onto the poorly designed pass manager infrastructure.
Personally, I think that cleanups to the design and architecture of the pass
manager should be prioritized above adding new functionality like parallelism.
However, so far no one has really had time to do this (including myself). While
I would like to have time in the future to do this, as with everything else in
OSS, it won't be real until the patches start flowing.
[xiaofei] this feature doesn’t rely on PM too much; it doesn’t need to change PM
infrastructure
On Tue, Jul 16, 2013 at 3:33 AM, Wan, Xiaofei <xiaofei.wan at
intel.com<mailto:xiaofei.wan at intel.com>> wrote:
Hi, community:

For the sake of our business need, I want to enable "Function-based
parallel code generation" to boost up the compilation of single module,
please see the details of the design and provide your feedbacks on below
aspects, thanks!
1. Is this idea the proper solution for my requirement
2. This new feature will be enabled by llc -thd=N and has no impact on original
llc when -thd=1
3. Can this new feature of llc be accepted by community and merged into LLVM
code tree

Patches
The patch is divided into four separated parts, the all-in-one patch could be
found here:
http://llvm-reviews.chandlerc.com/D1152

Design
https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing

Background
1. Our business need to compile C/C++ source files into LLVM IR and link them
into a big BC file; the big BC file is then compiled into binary code on
different arch/target devices.
2. Backend code generation is a time-consuming activity happened on target
device which makes it an important user experience.
3. Make -j or file based parallelism can't help here since there is only one
big BC file; function-based parallel LLVM backend code generation is a good
solution to improve compilation time which will fully utilize multi-cores.

Overall design strategy and goal
1. Generate totally same binary as what single thread output
2. No impacts on single thread performance & conformance
3. Little impacts on LLVM code infrastructure

Current status and test result
1. Parallel llc can generate same code as single thread by "objdump
-d", it could pass 10 hours stress test for all performance benchmark
2. Parallel llc can introduce ~2.9X performance gain on XEON sever for 4 threads

Thanks
Wan Xiaofei

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130716/ee7312ed/attachment.html>

Evan Cheng

2013-Jul-16 12:28 UTC

head link

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

Please see Shuxin's proposal on "parallelizing post-IPO stage". It
seems the two projects are related.

Evan

On Jul 16, 2013, at 3:33 AM, "Wan, Xiaofei" <xiaofei.wan at
intel.com> wrote:
> Hi, community:
> 
> For the sake of our business need, I want to enable "Function-based
parallel code generation" to boost up the compilation of single module,
please see the details of the design and provide your feedbacks on below
aspects, thanks!
> 1. Is this idea the proper solution for my requirement
> 2. This new feature will be enabled by llc -thd=N and has no impact on
original llc when -thd=1
> 3. Can this new feature of llc be accepted by community and merged into
LLVM code tree
> 
> Patches
> The patch is divided into four separated parts, the all-in-one patch could
be found here:
> http://llvm-reviews.chandlerc.com/D1152
> 
> Design
>
https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing
> 
> 
> Background
> 1. Our business need to compile C/C++ source files into LLVM IR and link
them into a big BC file; the big BC file is then compiled into binary code on
different arch/target devices.
> 2. Backend code generation is a time-consuming activity happened on target
device which makes it an important user experience.
> 3. Make -j or file based parallelism can't help here since there is
only one big BC file; function-based parallel LLVM backend code generation is a
good solution to improve compilation time which will fully utilize multi-cores.
> 
> Overall design strategy and goal
> 1. Generate totally same binary as what single thread output
> 2. No impacts on single thread performance & conformance
> 3. Little impacts on LLVM code infrastructure
> 
> Current status and test result
> 1. Parallel llc can generate same code as single thread by "objdump
-d", it could pass 10 hours stress test for all performance benchmark
> 2. Parallel llc can introduce ~2.9X performance gain on XEON sever for 4
threads
> 
> 
> Thanks
> Wan Xiaofei
> <Parallel.CG.7z>_______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Wan, Xiaofei

2013-Jul-16 14:23 UTC

head link

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

Yes, the purpose is similar, we started this job from last year;
But it Shuxin's solution is module based (correct me if I am wrong), we
tried this solution and failed for many reasons, it is described in my design
document
https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing

we need discuss two solution and compare them, then adopt one solution

The biggest difference of module based parallelism and function based
parallelism are
1. how to partition module into different pieces which consume similar time, it
is a difficult question
2. How to make sure the generated binary is same each time
3. if 2 can't be achieved, it is difficult to validate the correctness of
parallelism

Thanks
Wan Xiaofei

-----Original Message-----
From: Evan Cheng [mailto:evan.cheng at apple.com] 
Sent: Tuesday, July 16, 2013 8:28 PM
To: Wan, Xiaofei
Cc: LLVM Developers Mailing List (llvmdev at cs.uiuc.edu); Shuxin Yang
Subject: Re: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM
backend code generation

Please see Shuxin's proposal on "parallelizing post-IPO stage". It
seems the two projects are related.

Evan

On Jul 16, 2013, at 3:33 AM, "Wan, Xiaofei" <xiaofei.wan at
intel.com> wrote:
> Hi, community:
> 
> For the sake of our business need, I want to enable "Function-based
parallel code generation" to boost up the compilation of single module,
please see the details of the design and provide your feedbacks on below
aspects, thanks!
> 1. Is this idea the proper solution for my requirement 2. This new 
> feature will be enabled by llc -thd=N and has no impact on original 
> llc when -thd=1 3. Can this new feature of llc be accepted by 
> community and merged into LLVM code tree
> 
> Patches
> The patch is divided into four separated parts, the all-in-one patch could
be found here:
> http://llvm-reviews.chandlerc.com/D1152
> 
> Design
> https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgj
> Y-vhyfySg/edit?usp=sharing
> 
> 
> Background
> 1. Our business need to compile C/C++ source files into LLVM IR and link
them into a big BC file; the big BC file is then compiled into binary code on
different arch/target devices.
> 2. Backend code generation is a time-consuming activity happened on target
device which makes it an important user experience.
> 3. Make -j or file based parallelism can't help here since there is
only one big BC file; function-based parallel LLVM backend code generation is a
good solution to improve compilation time which will fully utilize multi-cores.
> 
> Overall design strategy and goal
> 1. Generate totally same binary as what single thread output 2. No 
> impacts on single thread performance & conformance 3. Little impacts 
> on LLVM code infrastructure
> 
> Current status and test result
> 1. Parallel llc can generate same code as single thread by "objdump 
> -d", it could pass 10 hours stress test for all performance benchmark 
> 2. Parallel llc can introduce ~2.9X performance gain on XEON sever for 
> 4 threads
> 
> 
> Thanks
> Wan Xiaofei
> <Parallel.CG.7z>_______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Shuxin Yang

2013-Jul-16 18:02 UTC

head link

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

In addition to the concerns Chandler figure out,
I'm curious about :
     execution time of pristine-llc vs "modified-llc with -thd=1", and
     the exec-time of pristine-clang vs clang-linked-with-the-modified-llc.

Thanks


On 7/16/13 3:46 AM, Chandler Carruth wrote:> While I think the end goal you're describing is close to the correct 
> one, I see the high-level strategy for getting there somewhat 
> differently:
>
> 1) The code generators are only one collection of function passes that 
> might be parallelized. Many others might also be parallelized 
> profitably. The design for parallelism within LLVM's pass management 
> infrastructure should be sufficiently generic to express all of these 
> use cases.
>
> 2) The idea of having multiple pass managers necessitates (unless I 
> misunderstand) duplicating a fair amount of state. For example, the 
> caches in immutable analysis passes would no longer be shared, etc. I 
> think that is really unfortunate, and would prefer instead to use 
> parallelizing pass managers that are in fact responsible for the 
> scheduling of passes.
>
> 3) It doesn't provide a strategy for parallelizing the leaves of a 
> CGSCC pass manager which is where a significant portion of the 
> potential parallelism is available within the middle end.
>
> 4) It doesn't deal with the (numerous) parts of LLVM that are not 
> actually thread safe today. They may happen to work with the code 
> generators you're happening to test, but there is no guarantee. 
> Notable things to think about here are computing new types, the 
> use-def lists of globals, commandline flags, and static state 
> variables. While our intent has been to avoid problems with the last 
> two that could preclude parallelism, it seems unlikely that we have 
> succeeded without thorough testing to this point. Instead, I fear we 
> have leaned heavily on the crutch of one-thread-per-LLVMContext.
>
> 5) It adds more complexity onto the poorly designed pass manager 
> infrastructure. Personally, I think that cleanups to the design and 
> architecture of the pass manager should be prioritized above adding 
> new functionality like parallelism. However, so far no one has really 
> had time to do this (including myself). While I would like to have 
> time in the future to do this, as with everything else in OSS, it 
> won't be real until the patches start flowing.
>
>
> On Tue, Jul 16, 2013 at 3:33 AM, Wan, Xiaofei <xiaofei.wan at intel.com 
> <mailto:xiaofei.wan at intel.com>> wrote:
>
>     Hi, community:
>
>     For the sake of our business need, I want to enable
>     "Function-based parallel code generation" to boost up the
>     compilation of single module, please see the details of the design
>     and provide your feedbacks on below aspects, thanks!
>     1. Is this idea the proper solution for my requirement
>     2. This new feature will be enabled by llc -thd=N and has no
>     impact on original llc when -thd=1
>     3. Can this new feature of llc be accepted by community and merged
>     into LLVM code tree
>
>     Patches
>     The patch is divided into four separated parts, the all-in-one
>     patch could be found here:
>     http://llvm-reviews.chandlerc.com/D1152
>
>     Design
>    
https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing
>
>
>     Background
>     1. Our business need to compile C/C++ source files into LLVM IR
>     and link them into a big BC file; the big BC file is then compiled
>     into binary code on different arch/target devices.
>     2. Backend code generation is a time-consuming activity happened
>     on target device which makes it an important user experience.
>     3. Make -j or file based parallelism can't help here since there
>     is only one big BC file; function-based parallel LLVM backend code
>     generation is a good solution to improve compilation time which
>     will fully utilize multi-cores.
>
>     Overall design strategy and goal
>     1. Generate totally same binary as what single thread output
>     2. No impacts on single thread performance & conformance
>     3. Little impacts on LLVM code infrastructure
>
>     Current status and test result
>     1. Parallel llc can generate same code as single thread by
>     "objdump -d", it could pass 10 hours stress test for all
>     performance benchmark
>     2. Parallel llc can introduce ~2.9X performance gain on XEON sever
>     for 4 threads
>
>
>     Thanks
>     Wan Xiaofei
>
>     _______________________________________________
>     LLVM Developers mailing list
>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>     http://llvm.cs.uiuc.edu
>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130716/6ca93bc3/attachment.html>

Xinliang David Li

2013-Jul-16 20:18 UTC

head link

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

On Tue, Jul 16, 2013 at 3:33 AM, Wan, Xiaofei <xiaofei.wan at intel.com>
wrote:> Hi, community:
>
> For the sake of our business need, I want to enable "Function-based
parallel code generation" to boost up the compilation of single module,
please see the details of the design and provide your feedbacks on below
aspects, thanks!
> 1. Is this idea the proper solution for my requirement
> 2. This new feature will be enabled by llc -thd=N and has no impact on
original llc when -thd=1
> 3. Can this new feature of llc be accepted by community and merged into
LLVM code tree
>
> Patches
> The patch is divided into four separated parts, the all-in-one patch could
be found here:
> http://llvm-reviews.chandlerc.com/D1152
>
> Design
>
https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing
>
>
> Background
> 1. Our business need to compile C/C++ source files into LLVM IR and link
them into a big BC file; the big BC file is then compiled into binary code on
different arch/target devices.
> 2. Backend code generation is a time-consuming activity happened on target
device which makes it an important user experience.
> 3. Make -j or file based parallelism can't help here since there is
only one big BC file; function-based parallel LLVM backend code generation is a
good solution to improve compilation time which will fully utilize multi-cores.
>
> Overall design strategy and goal
> 1. Generate totally same binary as what single thread output
> 2. No impacts on single thread performance & conformance
> 3. Little impacts on LLVM code infrastructure
>
> Current status and test result
> 1. Parallel llc can generate same code as single thread by "objdump
-d", it could pass 10 hours stress test for all performance benchmark
> 2. Parallel llc can introduce ~2.9X performance gain on XEON sever for 4
threads
Ignoring FE time which can be fully parallelized and assuming 10%
compile time is spent in serial module passes, 25% time is spent in
CGSCC pass, the maximum speed up that can be gained by using function
level parallelism is less than 3x.  Even adding support for parallel
compilation for leaves of CG in CGSCC pass won't help too much -- the
percentage of leaf functions is < 30% in large apps I have seen.

Module based parallelism proposed by Shuxin has max speed up of 10x,
assuming body cloning does not add a lot overhead and build farm with
hundred/thousands of nodes is used.

David
>
>
> Thanks
> Wan Xiaofei
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Chandler Carruth

2013-Jul-16 20:33 UTC

head link

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

On Tue, Jul 16, 2013 at 1:18 PM, Xinliang David Li <xinliangli at
gmail.com>wrote:
> Ignoring FE time which can be fully parallelized and assuming 10%
> compile time is spent in serial module passes, 25% time is spent in
> CGSCC pass, the maximum speed up that can be gained by using function
> level parallelism is less than 3x.  Even adding support for parallel
> compilation for leaves of CG in CGSCC pass won't help too much -- the
> percentage of leaf functions is < 30% in large apps I have seen.
>
Can you clarify what you're basing these assumption on or how you derived
your data?

> Module based parallelism proposed by Shuxin has max speed up of 10x,
> assuming body cloning does not add a lot overhead and build farm with
> hundred/thousands of nodes is used.
>
Body cloning does add some overhead, so that actually needs to be measured.
Also, many don't have such a build farm.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130716/c3f73c10/attachment.html>

Wan, Xiaofei

2013-Jul-17 02:48 UTC

head link

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

-----Original Message-----
From: Xinliang David Li [mailto:xinliangli at gmail.com] 
Sent: Wednesday, July 17, 2013 4:18 AM
To: Wan, Xiaofei
Cc: LLVM Developers Mailing List (llvmdev at cs.uiuc.edu)
Subject: Re: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM
backend code generation

On Tue, Jul 16, 2013 at 3:33 AM, Wan, Xiaofei <xiaofei.wan at intel.com>
wrote:> Hi, community:
>
> For the sake of our business need, I want to enable "Function-based
parallel code generation" to boost up the compilation of single module,
please see the details of the design and provide your feedbacks on below
aspects, thanks!
> 1. Is this idea the proper solution for my requirement 2. This new 
> feature will be enabled by llc -thd=N and has no impact on original 
> llc when -thd=1 3. Can this new feature of llc be accepted by 
> community and merged into LLVM code tree
>
> Patches
> The patch is divided into four separated parts, the all-in-one patch could
be found here:
> http://llvm-reviews.chandlerc.com/D1152
>
> Design
> https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgj
> Y-vhyfySg/edit?usp=sharing
>
>
> Background
> 1. Our business need to compile C/C++ source files into LLVM IR and link
them into a big BC file; the big BC file is then compiled into binary code on
different arch/target devices.
> 2. Backend code generation is a time-consuming activity happened on target
device which makes it an important user experience.
> 3. Make -j or file based parallelism can't help here since there is
only one big BC file; function-based parallel LLVM backend code generation is a
good solution to improve compilation time which will fully utilize multi-cores.
>
> Overall design strategy and goal
> 1. Generate totally same binary as what single thread output 2. No 
> impacts on single thread performance & conformance 3. Little impacts 
> on LLVM code infrastructure
>
> Current status and test result
> 1. Parallel llc can generate same code as single thread by "objdump 
> -d", it could pass 10 hours stress test for all performance benchmark 
> 2. Parallel llc can introduce ~2.9X performance gain on XEON sever for 
> 4 threads
Ignoring FE time which can be fully parallelized and assuming 10% compile time
is spent in serial module passes, 25% time is spent in CGSCC pass, the maximum
speed up that can be gained by using function level parallelism is less than 3x.
Even adding support for parallel compilation for leaves of CG in CGSCC pass
won't help too much -- the percentage of leaf functions is < 30% in large
apps I have seen.

Module based parallelism proposed by Shuxin has max speed up of 10x, assuming
body cloning does not add a lot overhead and build farm with hundred/thousands
of nodes is used.

[Xiaofei] for SpecCPU2006, I got the data function passes consume >90% of
total time in llc by vtune (I don't enable LTO); here I only consider llc
without LTO, the max parallelism depends how many threads are started.

David
>
>
> Thanks
> Wan Xiaofei
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Jul 2013 - [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

Possibly Parallel Threads