Star Tan
2013-Jun-30 00:04 UTC
[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass
Hi all, I have investigated the compile-time overhead of "Polly Scop Detection" pass based on LNT testing results. This mail is to share some results I have found. (1) Analysis of "SCOP Detection Pass" for PolyBench (Attached file PolyBench_SCoPs.log) Experimental results show that the "SCOP Detection pass" does not lead to significant extra compile-time overhead for compiling PolyBench. The percent of compile-time overhead caused by "SCOP Detection Pass" is usually less than 4% of total compile-time. Details for each benchmark can be seen in attached file SCoPs.tgz. I think this is because a lot of other Polly passes, such as "Cloog code generation" and "Induction Variable Simplification" are much more expensive than the "SCOP Detection Pass". (2) Analysis of "SCOP Detection Pass for two hot benchmarks (tramp3d and oggenc) “SCOP Detection Pass" would lead to significant compile-time overhead for these two benchmarks: tramp3d and oggenc, both of which are included in LLVM test-suite/MultiSource. The top 5 passes in compiling tramp3d are: (Attached file tramp3d-SCoPs.log) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 6.0720 ( 21.7%) 0.0400 ( 2.1%) 6.1120 ( 20.5%) 6.1986 ( 20.6%) Polly - Detect static control parts (SCoPs) 4.0600 ( 14.5%) 0.2000 ( 10.7%) 4.2600 ( 14.3%) 4.3655 ( 14.5%) X86 DAG->DAG Instruction Selection 1.9880 ( 7.1%) 0.2080 ( 11.1%) 2.1960 ( 7.4%) 2.2277 ( 7.4%) Function Integration/Inlining 1.7520 ( 6.3%) 0.0840 ( 4.5%) 1.8360 ( 6.2%) 1.8765 ( 6.2%) Global Value Numbering 1.2440 ( 4.4%) 0.1040 ( 5.5%) 1.3480 ( 4.5%) 1.2925 ( 4.3%) Combine redundant instructions and the top 5 passes in compiling oggenc are: (Attached file oggenc-SCoPs.log) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.7760 ( 14.6%) 0.0280 ( 11.1%) 0.8040 ( 14.4%) 0.8207 ( 14.5%) X86 DAG->DAG Instruction Selection 0.7080 ( 13.3%) 0.0040 ( 1.6%) 0.7120 ( 12.8%) 0.7317 ( 13.0%) Polly - Detect static control parts (SCoPs) 0.4200 ( 7.9%) 0.0000 ( 0.0%) 0.4200 ( 7.5%) 0.4135 ( 7.3%) Polly - Calculate dependences 0.3120 ( 5.9%) 0.0200 ( 7.9%) 0.3320 ( 6.0%) 0.2947 ( 5.2%) Loop Strength Reduction 0.1720 ( 3.2%) 0.0080 ( 3.2%) 0.1800 ( 3.2%) 0.1992 ( 3.5%) Global Value Numbering Results show that Polly spends a lot of time on detecting scops, but most of region scops are proved to be invalid at last. As a result, this pass waste a lot of compile-time.I think we should improve this pass by detect invalid scop early. (3) About detecting scop regions in bottom-up order. Detecting scop regions in bottom-up order can significantly speed up the scop detection pass. However, as I have discussed with Sebastian, detecting scops in bottom-up order and up-bottom order will lead to different results. As a result, we should not change the detection order. Do you have any other suggestions that may speed up the scop detection pass? Best, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130630/c9f428d2/attachment.html> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: PolyBench_SCoPs.log URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130630/c9f428d2/attachment.ksh> -------------- next part -------------- A non-text attachment was scrubbed... Name: SCoPs.tgz Type: application/octet-stream Size: 36590 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130630/c9f428d2/attachment.obj> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: tramp3d-SCoPs.log URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130630/c9f428d2/attachment-0001.ksh> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: oggenc-SCoPs.log URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130630/c9f428d2/attachment-0002.ksh>
Tobias Grosser
2013-Jun-30 00:34 UTC
[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass
On 06/29/2013 05:04 PM, Star Tan wrote:> Hi all, > > > > I have investigated the compile-time overhead of "Polly Scop Detection" pass based on LNT testing results. > This mail is to share some results I have found. > > > (1) Analysis of "SCOP Detection Pass" for PolyBench (Attached file PolyBench_SCoPs.log) > Experimental results show that the "SCOP Detection pass" does not lead to significant extra compile-time overhead for compiling PolyBench. The percent of compile-time overhead caused by "SCOP Detection Pass" is usually less than 4% of total compile-time. Details for each benchmark can be seen in attached file SCoPs.tgz. I think this is because a lot of other Polly passes, such as "Cloog code generation" and "Induction Variable Simplification" are much more expensive than the "SCOP Detection Pass".Good.> (2) Analysis of "SCOP Detection Pass for two hot benchmarks (tramp3d and oggenc) > “SCOP Detection Pass" would lead to significant compile-time overhead for these two benchmarks: tramp3d and oggenc, both of which are included in LLVM test-suite/MultiSource. > > > The top 5 passes in compiling tramp3d are: (Attached file tramp3d-SCoPs.log) > ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- > 6.0720 ( 21.7%) 0.0400 ( 2.1%) 6.1120 ( 20.5%) 6.1986 ( 20.6%) Polly - Detect static control parts (SCoPs) > 4.0600 ( 14.5%) 0.2000 ( 10.7%) 4.2600 ( 14.3%) 4.3655 ( 14.5%) X86 DAG->DAG Instruction Selection > 1.9880 ( 7.1%) 0.2080 ( 11.1%) 2.1960 ( 7.4%) 2.2277 ( 7.4%) Function Integration/Inlining > 1.7520 ( 6.3%) 0.0840 ( 4.5%) 1.8360 ( 6.2%) 1.8765 ( 6.2%) Global Value Numbering > 1.2440 ( 4.4%) 0.1040 ( 5.5%) 1.3480 ( 4.5%) 1.2925 ( 4.3%) Combine redundant instructions > > > and the top 5 passes in compiling oggenc are: (Attached file oggenc-SCoPs.log) > ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- > 0.7760 ( 14.6%) 0.0280 ( 11.1%) 0.8040 ( 14.4%) 0.8207 ( 14.5%) X86 DAG->DAG Instruction Selection > 0.7080 ( 13.3%) 0.0040 ( 1.6%) 0.7120 ( 12.8%) 0.7317 ( 13.0%) Polly - Detect static control parts (SCoPs) > 0.4200 ( 7.9%) 0.0000 ( 0.0%) 0.4200 ( 7.5%) 0.4135 ( 7.3%) Polly - Calculate dependences > 0.3120 ( 5.9%) 0.0200 ( 7.9%) 0.3320 ( 6.0%) 0.2947 ( 5.2%) Loop Strength Reduction > 0.1720 ( 3.2%) 0.0080 ( 3.2%) 0.1800 ( 3.2%) 0.1992 ( 3.5%) Global Value Numbering > > > Results show that Polly spends a lot of time on detecting scops, but most of region scops are proved to be invalid at last. As a result, this pass waste a lot of compile-time.I think we should improve this pass by detect invalid scop early.Great. Now we have two test cases we can work with. Can you upload the LLVM-IR produced by clang -O0 (without Polly)? The next step is to understand what is going on. Some ideas on how to understand what is going on: 1) Reduce the amount of input code At best, we can reduce this to the single function on which the Polly scop detection takes more than 20.6% of the overall time. To get there, I propose to run the timings with 'opt -O3' (instead of clang). As a first step you disable inlining to make sure that your results are still reproducible. If this is the case, I would (semi-automatically?) try to reduce the test case by removing functions from it for which the removal does not reduce the Polly overhead. 2) Check why the Polly scop detection is failing You can use 'opt -polly-detect -analyze' to see the most common reasons the scop detection failed. We should verify that we perform the most common and cheap tests early.> (3) About detecting scop regions in bottom-up order. > Detecting scop regions in bottom-up order can significantly speed up the scop detection pass. However, as I have discussed with Sebastian, detecting scops in bottom-up order and up-bottom order will lead to different results. As a result, we should not change the detection order.Sebastian had a patch for this. Does his patch improve the scop detection time. I agree with you that we can not just switch the order in which we detect scops, but I still believe that a bottom up detection is the right way to go. However, to abort early we need to classify the scop detection failures into failures that will equally hold for larger scops and ones that may disappear in larger scops. Only if a failure of the first class was detected, we can abort early without reducing scop coverage.> Do you have any other suggestions that may speed up the scop detection pass?I believe bottom-up detection may be a good thing, but before drawing conclusions we need to understand where time is actually spent. The suggestions above should help us there. Cheers Tobi P.S.: Please do not copy llvm-commits. as it is only for patches and commit messages.
Star Tan
2013-Jul-01 02:59 UTC
[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass
At 2013-06-30 08:34:34,"Tobias Grosser" <tobias at grosser.es> wrote:>On 06/29/2013 05:04 PM, Star Tan wrote: >> Hi all, >> >> >> >> I have investigated the compile-time overhead of "Polly Scop Detection" pass based on LNT testing results. >> This mail is to share some results I have found. >> >> >> (1) Analysis of "SCOP Detection Pass" for PolyBench (Attached file PolyBench_SCoPs.log) >> Experimental results show that the "SCOP Detection pass" does not lead to significant extra compile-time overhead for compiling PolyBench. The percent of compile-time overhead caused by "SCOP Detection Pass" is usually less than 4% of total compile-time. Details for each benchmark can be seen in attached file SCoPs.tgz. I think this is because a lot of other Polly passes, such as "Cloog code generation" and "Induction Variable Simplification" are much more expensive than the "SCOP Detection Pass". > >Good. > >> (2) Analysis of "SCOP Detection Pass for two hot benchmarks (tramp3d and oggenc) >> “SCOP Detection Pass" would lead to significant compile-time overhead for these two benchmarks: tramp3d and oggenc, both of which are included in LLVM test-suite/MultiSource. >> >> >> The top 5 passes in compiling tramp3d are: (Attached file tramp3d-SCoPs.log) >> ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- >> 6.0720 ( 21.7%) 0.0400 ( 2.1%) 6.1120 ( 20.5%) 6.1986 ( 20.6%) Polly - Detect static control parts (SCoPs) >> 4.0600 ( 14.5%) 0.2000 ( 10.7%) 4.2600 ( 14.3%) 4.3655 ( 14.5%) X86 DAG->DAG Instruction Selection >> 1.9880 ( 7.1%) 0.2080 ( 11.1%) 2.1960 ( 7.4%) 2.2277 ( 7.4%) Function Integration/Inlining >> 1.7520 ( 6.3%) 0.0840 ( 4.5%) 1.8360 ( 6.2%) 1.8765 ( 6.2%) Global Value Numbering >> 1.2440 ( 4.4%) 0.1040 ( 5.5%) 1.3480 ( 4.5%) 1.2925 ( 4.3%) Combine redundant instructions >> >> >> and the top 5 passes in compiling oggenc are: (Attached file oggenc-SCoPs.log) >> ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- >> 0.7760 ( 14.6%) 0.0280 ( 11.1%) 0.8040 ( 14.4%) 0.8207 ( 14.5%) X86 DAG->DAG Instruction Selection >> 0.7080 ( 13.3%) 0.0040 ( 1.6%) 0.7120 ( 12.8%) 0.7317 ( 13.0%) Polly - Detect static control parts (SCoPs) >> 0.4200 ( 7.9%) 0.0000 ( 0.0%) 0.4200 ( 7.5%) 0.4135 ( 7.3%) Polly - Calculate dependences >> 0.3120 ( 5.9%) 0.0200 ( 7.9%) 0.3320 ( 6.0%) 0.2947 ( 5.2%) Loop Strength Reduction >> 0.1720 ( 3.2%) 0.0080 ( 3.2%) 0.1800 ( 3.2%) 0.1992 ( 3.5%) Global Value Numbering >> >> >> Results show that Polly spends a lot of time on detecting scops, but most of region scops are proved to be invalid at last. As a result, this pass waste a lot of compile-time.I think we should improve this pass by detect invalid scop early. > >Great. Now we have two test cases we can work with. Can you >upload the LLVM-IR produced by clang -O0 (without Polly)?I have attached the LLVM-IR file for tramp3d and oggenc.> >The next step is to understand what is going on. Some ideas on how to >understand what is going on: > >1) Reduce the amount of input code > >At best, we can reduce this to the single function on which the Polly >scop detection takes more than 20.6% of the overall time. To get there, >I propose to run the timings with 'opt -O3' (instead of clang). As a >first step you disable inlining to make sure that your results are still >reproducible. If this is the case, I would (semi-automatically?) try to >reduce the test case by removing functions from it for which the removal >does not reduce the Polly overhead.Yes, the original LLVM IR code is too complex. I would try to reduce the code size by investigating some hot functions in these two testcases.> >2) Check why the Polly scop detection is failing > >You can use 'opt -polly-detect -analyze' to see the most common reasons >the scop detection failed. We should verify that we perform the most >common and cheap tests early.I would dig into the details of these two testcases.> >> (3) About detecting scop regions in bottom-up order. >> Detecting scop regions in bottom-up order can significantly speed up the scop detection pass. However, as I have discussed with Sebastian, detecting scops in bottom-up order and up-bottom order will lead to different results. As a result, we should not change the detection order. > >Sebastian had a patch for this. Does his patch improve the scop >detection time. > >I agree with you that we can not just switch the order in which we >detect scops, but I still believe that a bottom up detection is the >right way to go. However, to abort early we need to classify the scop >detection failures into failures that will equally hold for larger scops >and ones that may disappear in larger scops. Only if a failure of the >first class was detected, we can abort early without reducing scop coverage. > >> Do you have any other suggestions that may speed up the scop detection pass? > >I believe bottom-up detection may be a good thing, but before drawing >conclusions we need to understand where time is actually spent. The >suggestions above should help us there. > >Cheers >Tobi > >P.S.: Please do not copy llvm-commits. as it is only for patches and >commit messages.Thanks for your suggestions again. Best wishes, Star Tan 从网易yeah邮箱发来的云附件 oggenc_tramp3d_llvm_ir.tgz (2.11M, 2013年7月16日 10:58 到期) 下载 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/2e79a95a/attachment.html>
Star Tan
2013-Jul-01 13:51 UTC
[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass
>Great. Now we have two test cases we can work with. Can you>upload the LLVM-IR produced by clang -O0 (without Polly)?Since tramp3d-v4.ll is to large (19M with 267 thousand lines), I would focus on the oggenc benchmark at firat. I attached the oggenc.ll (LLVM-IR produced by clang -O0 without Polly), which compressed into the file oggenc.tgz.>2) Check why the Polly scop detection is failing > >You can use 'opt -polly-detect -analyze' to see the most common reasons >the scop detection failed. We should verify that we perform the most >common and cheap tests early. >I also attached the output file oggenc_polly_detect_analyze.log produced by "polly-opt -O3 -polly-detect -analyze oggenc.ll". Unfortunately, it only dumps valid scop regions. At first, I thought to dump all debugging information by "-debug" option, but it will dump too many unrelated information produced by other passes. Do you know any option that allows me to dump debugging information for the "-polly-detect" pass, but at the same time disabling debugging information for other passes? Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/40997612/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: oggenc.tgz Type: application/octet-stream Size: 657372 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/40997612/attachment.obj>
Star Tan
2013-Jul-01 15:18 UTC
[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass
>> (3) About detecting scop regions in bottom-up order.>> Detecting scop regions in bottom-up order can significantly speed up the scop detection pass. However, as I have discussed with Sebastian, detecting scops in bottom-up order and up-bottom order will lead to different results. As a result, we should not change the detection order. > >Sebastian had a patch for this. Does his patch improve the scop >detection time.LNT testing results for Sebastian's patch file can be seen on http://188.40.87.11:8000/db_default/v4/nts/recent_activity (Run Order: ScopDetect130615). You can compare ScopDetect130615 (Polly with Sebastian's patch) to pOpt130615 (Polly without Sebastian's patch). The result seems not show significant performance improvements with the bottom-up patch for LLVM test-suite benchmarks. You are right. I think I should better firstly focus on the some simple examples, such as oggenc, to understand where scop detection pass spend its time. After that, we can then investigate the scop detection order. Bests, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/2e5dbee9/attachment.html>
Possibly Parallel Threads
- [LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass
- [LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass
- [LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass
- [LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass
- [LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass