thr3ads.net - llvm dev - [LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Star Tan

2013-Jun-30 00:04 UTC

[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass

Hi all,



I have investigated the compile-time overhead of "Polly Scop
Detection" pass based on LNT testing results.
This mail is to share some results I have found.


(1) Analysis of "SCOP Detection Pass" for PolyBench (Attached file
PolyBench_SCoPs.log)
Experimental results show that the "SCOP Detection pass" does not lead
to significant extra compile-time overhead for compiling PolyBench. The percent
of compile-time overhead caused by "SCOP Detection Pass" is usually
less than 4% of total compile-time. Details for each benchmark can be seen in
attached file SCoPs.tgz. I think this is because a lot of other Polly passes,
such as "Cloog code generation" and "Induction Variable
Simplification" are much more expensive than the "SCOP Detection
Pass".


(2) Analysis of "SCOP Detection Pass for two hot benchmarks (tramp3d and
oggenc)
“SCOP Detection Pass" would lead to significant compile-time overhead for
these two benchmarks: tramp3d and oggenc, both of which are included in LLVM
test-suite/MultiSource.


The top 5 passes in compiling tramp3d are: (Attached file tramp3d-SCoPs.log)
   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---
Name ---
   6.0720 ( 21.7%)   0.0400 (  2.1%)   6.1120 ( 20.5%)   6.1986 ( 20.6%)  Polly
- Detect static control parts (SCoPs)
   4.0600 ( 14.5%)   0.2000 ( 10.7%)   4.2600 ( 14.3%)   4.3655 ( 14.5%)  X86
DAG->DAG Instruction Selection
   1.9880 (  7.1%)   0.2080 ( 11.1%)   2.1960 (  7.4%)   2.2277 (  7.4%) 
Function Integration/Inlining
   1.7520 (  6.3%)   0.0840 (  4.5%)   1.8360 (  6.2%)   1.8765 (  6.2%)  Global
Value Numbering
   1.2440 (  4.4%)   0.1040 (  5.5%)   1.3480 (  4.5%)   1.2925 (  4.3%) 
Combine redundant instructions


and the top 5 passes in compiling oggenc are: (Attached file oggenc-SCoPs.log)
   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---
Name ---
   0.7760 ( 14.6%)   0.0280 ( 11.1%)   0.8040 ( 14.4%)   0.8207 ( 14.5%)  X86
DAG->DAG Instruction Selection
   0.7080 ( 13.3%)   0.0040 (  1.6%)   0.7120 ( 12.8%)   0.7317 ( 13.0%)  Polly
- Detect static control parts (SCoPs)
   0.4200 (  7.9%)   0.0000 (  0.0%)   0.4200 (  7.5%)   0.4135 (  7.3%)  Polly
- Calculate dependences
   0.3120 (  5.9%)   0.0200 (  7.9%)   0.3320 (  6.0%)   0.2947 (  5.2%)  Loop
Strength Reduction
   0.1720 (  3.2%)   0.0080 (  3.2%)   0.1800 (  3.2%)   0.1992 (  3.5%)  Global
Value Numbering


Results show that Polly spends a lot of time on detecting scops, but most of
region scops are proved to be invalid at last. As a result, this pass waste a
lot of compile-time.I think we should improve this pass by detect invalid scop
early.


(3) About detecting scop regions in bottom-up order.
Detecting scop regions in bottom-up order can significantly speed up the scop
detection pass. However, as I have discussed with Sebastian, detecting scops in
bottom-up order and up-bottom order will lead to different results. As a result,
we should not change the detection order.


Do you have any other suggestions that may speed up the scop detection pass?


Best,
Star Tan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130630/c9f428d2/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: PolyBench_SCoPs.log
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130630/c9f428d2/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SCoPs.tgz
Type: application/octet-stream
Size: 36590 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130630/c9f428d2/attachment.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: tramp3d-SCoPs.log
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130630/c9f428d2/attachment-0001.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: oggenc-SCoPs.log
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130630/c9f428d2/attachment-0002.ksh>

Tobias Grosser

2013-Jun-30 00:34 UTC

head link

[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass

On 06/29/2013 05:04 PM, Star Tan wrote:> Hi all,
>
>
>
> I have investigated the compile-time overhead of "Polly Scop
Detection" pass based on LNT testing results.
> This mail is to share some results I have found.
>
>
> (1) Analysis of "SCOP Detection Pass" for PolyBench (Attached
file PolyBench_SCoPs.log)
> Experimental results show that the "SCOP Detection pass" does not
lead to significant extra compile-time overhead for compiling PolyBench. The
percent of compile-time overhead caused by "SCOP Detection Pass" is
usually less than 4% of total compile-time. Details for each benchmark can be
seen in attached file SCoPs.tgz. I think this is because a lot of other Polly
passes, such as "Cloog code generation" and "Induction Variable
Simplification" are much more expensive than the "SCOP Detection
Pass".
Good.
> (2) Analysis of "SCOP Detection Pass for two hot benchmarks (tramp3d
and oggenc)
> “SCOP Detection Pass" would lead to significant compile-time overhead
for these two benchmarks: tramp3d and oggenc, both of which are included in LLVM
test-suite/MultiSource.
>
>
> The top 5 passes in compiling tramp3d are: (Attached file
tramp3d-SCoPs.log)
>     ---User Time---   --System Time--   --User+System--   ---Wall Time--- 
--- Name ---
>     6.0720 ( 21.7%)   0.0400 (  2.1%)   6.1120 ( 20.5%)   6.1986 ( 20.6%) 
Polly - Detect static control parts (SCoPs)
>     4.0600 ( 14.5%)   0.2000 ( 10.7%)   4.2600 ( 14.3%)   4.3655 ( 14.5%) 
X86 DAG->DAG Instruction Selection
>     1.9880 (  7.1%)   0.2080 ( 11.1%)   2.1960 (  7.4%)   2.2277 (  7.4%) 
Function Integration/Inlining
>     1.7520 (  6.3%)   0.0840 (  4.5%)   1.8360 (  6.2%)   1.8765 (  6.2%) 
Global Value Numbering
>     1.2440 (  4.4%)   0.1040 (  5.5%)   1.3480 (  4.5%)   1.2925 (  4.3%) 
Combine redundant instructions
>
>
> and the top 5 passes in compiling oggenc are: (Attached file
oggenc-SCoPs.log)
>     ---User Time---   --System Time--   --User+System--   ---Wall Time--- 
--- Name ---
>     0.7760 ( 14.6%)   0.0280 ( 11.1%)   0.8040 ( 14.4%)   0.8207 ( 14.5%) 
X86 DAG->DAG Instruction Selection
>     0.7080 ( 13.3%)   0.0040 (  1.6%)   0.7120 ( 12.8%)   0.7317 ( 13.0%) 
Polly - Detect static control parts (SCoPs)
>     0.4200 (  7.9%)   0.0000 (  0.0%)   0.4200 (  7.5%)   0.4135 (  7.3%) 
Polly - Calculate dependences
>     0.3120 (  5.9%)   0.0200 (  7.9%)   0.3320 (  6.0%)   0.2947 (  5.2%) 
Loop Strength Reduction
>     0.1720 (  3.2%)   0.0080 (  3.2%)   0.1800 (  3.2%)   0.1992 (  3.5%) 
Global Value Numbering
>
>
> Results show that Polly spends a lot of time on detecting scops, but most
of region scops are proved to be invalid at last. As a result, this pass waste a
lot of compile-time.I think we should improve this pass by detect invalid scop
early.
Great. Now we have two test cases we can work with. Can you
upload the LLVM-IR produced by clang -O0 (without Polly)?

The next step is to understand what is going on. Some ideas on how to
understand what is going on:

1) Reduce the amount of input code

At best, we can reduce this to the single function on which the Polly 
scop detection takes more than 20.6% of the overall time. To get there,
I propose to run the timings with 'opt -O3' (instead of clang). As a 
first step you disable inlining to make sure that your results are still 
reproducible. If this is the case, I would (semi-automatically?) try to 
reduce the test case by removing functions from it for which the removal 
does not reduce the Polly overhead.

2) Check why the Polly scop detection is failing

You can use 'opt -polly-detect -analyze' to see the most common reasons 
the scop detection failed. We should verify that we perform the most 
common and cheap tests early.
> (3) About detecting scop regions in bottom-up order.
> Detecting scop regions in bottom-up order can significantly speed up the
scop detection pass. However, as I have discussed with Sebastian, detecting
scops in bottom-up order and up-bottom order will lead to different results. As
a result, we should not change the detection order.
Sebastian had a patch for this. Does his patch improve the scop 
detection time.

I agree with you that we can not just switch the order in which we 
detect scops, but I still believe that a bottom up detection is the 
right way to go. However, to abort early we need to classify the scop
detection failures into failures that will equally hold for larger scops
and ones that may disappear in larger scops. Only if a failure of the 
first class was detected, we can abort early without reducing scop coverage.
> Do you have any other suggestions that may speed up the scop detection
pass?
I believe bottom-up detection may be a good thing, but before drawing 
conclusions we need to understand where time is actually spent. The 
suggestions above should help us there.

Cheers
Tobi

P.S.: Please do not copy llvm-commits. as it is only for patches and 
commit messages.

Star Tan

2013-Jul-01 02:59 UTC

head link

[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass

At 2013-06-30 08:34:34,"Tobias Grosser" <tobias at grosser.es>
wrote:
>On 06/29/2013 05:04 PM, Star Tan wrote:
>> Hi all,
>>
>>
>>
>> I have investigated the compile-time overhead of "Polly Scop
Detection" pass based on LNT testing results.
>> This mail is to share some results I have found.
>>
>>
>> (1) Analysis of "SCOP Detection Pass" for PolyBench (Attached
file PolyBench_SCoPs.log)
>> Experimental results show that the "SCOP Detection pass" does
not lead to significant extra compile-time overhead for compiling PolyBench. The
percent of compile-time overhead caused by "SCOP Detection Pass" is
usually less than 4% of total compile-time. Details for each benchmark can be
seen in attached file SCoPs.tgz. I think this is because a lot of other Polly
passes, such as "Cloog code generation" and "Induction Variable
Simplification" are much more expensive than the "SCOP Detection
Pass".
>
>Good.
>
>> (2) Analysis of "SCOP Detection Pass for two hot benchmarks
(tramp3d and oggenc)
>> “SCOP Detection Pass" would lead to significant compile-time
overhead for these two benchmarks: tramp3d and oggenc, both of which are
included in LLVM test-suite/MultiSource.
>>
>>
>> The top 5 passes in compiling tramp3d are: (Attached file
tramp3d-SCoPs.log)
>>     ---User Time---   --System Time--   --User+System--   ---Wall
Time---  --- Name ---
>>     6.0720 ( 21.7%)   0.0400 (  2.1%)   6.1120 ( 20.5%)   6.1986 (
20.6%)  Polly - Detect static control parts (SCoPs)
>>     4.0600 ( 14.5%)   0.2000 ( 10.7%)   4.2600 ( 14.3%)   4.3655 (
14.5%)  X86 DAG->DAG Instruction Selection
>>     1.9880 (  7.1%)   0.2080 ( 11.1%)   2.1960 (  7.4%)   2.2277 ( 
7.4%)  Function Integration/Inlining
>>     1.7520 (  6.3%)   0.0840 (  4.5%)   1.8360 (  6.2%)   1.8765 ( 
6.2%)  Global Value Numbering
>>     1.2440 (  4.4%)   0.1040 (  5.5%)   1.3480 (  4.5%)   1.2925 ( 
4.3%)  Combine redundant instructions
>>
>>
>> and the top 5 passes in compiling oggenc are: (Attached file
oggenc-SCoPs.log)
>>     ---User Time---   --System Time--   --User+System--   ---Wall
Time---  --- Name ---
>>     0.7760 ( 14.6%)   0.0280 ( 11.1%)   0.8040 ( 14.4%)   0.8207 (
14.5%)  X86 DAG->DAG Instruction Selection
>>     0.7080 ( 13.3%)   0.0040 (  1.6%)   0.7120 ( 12.8%)   0.7317 (
13.0%)  Polly - Detect static control parts (SCoPs)
>>     0.4200 (  7.9%)   0.0000 (  0.0%)   0.4200 (  7.5%)   0.4135 ( 
7.3%)  Polly - Calculate dependences
>>     0.3120 (  5.9%)   0.0200 (  7.9%)   0.3320 (  6.0%)   0.2947 ( 
5.2%)  Loop Strength Reduction
>>     0.1720 (  3.2%)   0.0080 (  3.2%)   0.1800 (  3.2%)   0.1992 ( 
3.5%)  Global Value Numbering
>>
>>
>> Results show that Polly spends a lot of time on detecting scops, but
most of region scops are proved to be invalid at last. As a result, this pass
waste a lot of compile-time.I think we should improve this pass by detect
invalid scop early.
>
>Great. Now we have two test cases we can work with. Can you
>upload the LLVM-IR produced by clang -O0 (without Polly)?
I have attached the LLVM-IR file for tramp3d and oggenc.>
>The next step is to understand what is going on. Some ideas on how to
>understand what is going on:
>
>1) Reduce the amount of input code
>
>At best, we can reduce this to the single function on which the Polly 
>scop detection takes more than 20.6% of the overall time. To get there,
>I propose to run the timings with 'opt -O3' (instead of clang). As a
>first step you disable inlining to make sure that your results are still 
>reproducible. If this is the case, I would (semi-automatically?) try to 
>reduce the test case by removing functions from it for which the removal 
>does not reduce the Polly overhead.Yes, the original LLVM IR code is too complex.
I would try to reduce the code size by investigating some hot functions in these
two testcases.>
>2) Check why the Polly scop detection is failing
>
>You can use 'opt -polly-detect -analyze' to see the most common
reasons
>the scop detection failed. We should verify that we perform the most 
>common and cheap tests early.
I would dig into the details of these two testcases.>
>> (3) About detecting scop regions in bottom-up order.
>> Detecting scop regions in bottom-up order can significantly speed up
the scop detection pass. However, as I have discussed with Sebastian, detecting
scops in bottom-up order and up-bottom order will lead to different results. As
a result, we should not change the detection order.
>
>Sebastian had a patch for this. Does his patch improve the scop 
>detection time.
>
>I agree with you that we can not just switch the order in which we 
>detect scops, but I still believe that a bottom up detection is the 
>right way to go. However, to abort early we need to classify the scop
>detection failures into failures that will equally hold for larger scops
>and ones that may disappear in larger scops. Only if a failure of the 
>first class was detected, we can abort early without reducing scop coverage.
>
>> Do you have any other suggestions that may speed up the scop detection
pass?
>
>I believe bottom-up detection may be a good thing, but before drawing 
>conclusions we need to understand where time is actually spent. The 
>suggestions above should help us there.
>
>Cheers
>Tobi
>
>P.S.: Please do not copy llvm-commits. as it is only for patches and 
>commit messages.
Thanks for your suggestions again.
Best wishes,
Star Tan
从网易yeah邮箱发来的云附件
oggenc_tramp3d_llvm_ir.tgz (2.11M, 2013年7月16日 10:58 到期)
下载
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/2e79a95a/attachment.html>

Star Tan

2013-Jul-01 13:51 UTC

head link

[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass

>Great. Now we have two test cases we can work with. Can you
>upload the LLVM-IR produced by clang -O0 (without Polly)?Since tramp3d-v4.ll is to large (19M with 267 thousand lines), I would focus on
the oggenc benchmark at firat.
I attached the oggenc.ll (LLVM-IR produced by clang -O0 without Polly), which
compressed into the file oggenc.tgz.
>2) Check why the Polly scop detection is failing
>
>You can use 'opt -polly-detect -analyze' to see the most common
reasons
>the scop detection failed. We should verify that we perform the most 
>common and cheap tests early.
>I also attached the output file oggenc_polly_detect_analyze.log produced by
"polly-opt -O3 -polly-detect -analyze oggenc.ll". Unfortunately, it
only dumps valid scop regions. At first, I thought to dump all debugging
information by "-debug" option, but it will dump too many unrelated
information produced by other passes. Do you know any option that allows me to
dump debugging information for the "-polly-detect" pass, but at the
same time disabling debugging information for other passes?
Star Tan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/40997612/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oggenc.tgz
Type: application/octet-stream
Size: 657372 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/40997612/attachment.obj>

Star Tan

2013-Jul-01 15:18 UTC

head link

[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass

>> (3) About detecting scop regions in bottom-up order.
>> Detecting scop regions in bottom-up order can significantly speed up
the scop detection pass. However, as I have discussed with Sebastian, detecting
scops in bottom-up order and up-bottom order will lead to different results. As
a result, we should not change the detection order.
>
>Sebastian had a patch for this. Does his patch improve the scop 
>detection time.
LNT testing results for Sebastian's patch file can be seen on
http://188.40.87.11:8000/db_default/v4/nts/recent_activity (Run Order:
ScopDetect130615). You can compare ScopDetect130615 (Polly with Sebastian's
patch) to pOpt130615 (Polly without Sebastian's patch). The result seems not
show significant performance improvements with the bottom-up patch for LLVM
test-suite benchmarks.
You are right. I think I should better firstly focus on the some simple
examples, such as oggenc, to understand where scop detection pass spend its
time. After that, we can then investigate the scop detection order.
Bests,
Star Tan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/2e5dbee9/attachment.html>

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Jul 2013 - [LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass

[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass

[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass

[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass

[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass

[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass

Maybe Matching Threads