At 2013-09-02 17:05:52,"Tobias Grosser" <tobias at grosser.es> wrote:>On 09/01/2013 08:02 PM, Star Tan wrote: >> Hi all, >> >> >> It seems that Polly's code generation can leads to high compile-time overhead, especially for PolyBench applications such as 2mm, 3mm, gemm, syrk, etc. Some basic evaluation and analysis for Polly's code generation can be referred to http://llvm.org/bugs/show_bug.cgi?id=16898. >> >> >> Currently, we can choose to run -polly-code-generator=cloog or -polly-code-generator=isl for code generation, but both of them lead to almost double compile-time overhead for the 2mm benchmark. Unfortunately, both Cloog and ISL can not improve the execution time compared with -polly-code-generator=none. I think if we could identify it will not improve execution time in advance, then we can skip the expensive Cloog and ISL code generator. >> >> >> Can any one provide some suggestions or hints on this problem? > >OK. I think in this case the problem is actually to figure out why Polly >does not give a speedup in terms of execution time, because we have seen >large speedups for 2mm and 3mm. > >Here is what I see: > >2mm$ polly-clang 2mm.c -O3 -I ../../../utilities/ -DPOLYBENCH_TIME >-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-ignore-aliasing >2mm$ time ./a.out >18.217128 > >real 0m18.256s >user 0m18.128s >sys 0m0.064s >2mm$ polly-clang 2mm.c -O3 -I ../../../utilities/ -DPOLYBENCH_TIME >-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-ignore-aliasing -mllvm -polly >2mm$ time ./a.out >4.986877 > >real 0m5.036s >user 0m4.940s >sys 0m0.068s > >So the reason this does not work is that the polybench kernels in the >test suite do not annotate the functions called with the 'restrict' >keyword (that's whe we need the ignore-aliasing) as well as that the >size of the arrays is given as scalars but the corresponding loop bounds >are not. It would be great to fix up those issues. > >The first issue can be fixed by adding run-time alias analysis checks. >Adding those checks now became very easy with the new isl code >generation. The basic idea is that we ask isl to generate the necessary >run-time check and add it into the condition created by >executeScopConditionally(). In case you are interested in looking into >this, this would be a great help! >Thanks for your helpful reply. Yes, if we add -polly-ignore-aliasing, which skills the aliasing checking in ScopDetection, then we can detect the kernel loop as a valid scop and gain significant performance improvement. I tried to follow your hints to look into the executeScopConditionally() in CodeGen/Utils.cpp, but I cannot fully understand how to affect ScopDetection pass by modifying the executionScopConditionally(). Do you mean I can add ISL checking information into the Context in executionScopConditionally()? Could you give some more concrete ideas? Is there any code examples about ISL alias analysis? Thanks, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130909/4a4ad502/attachment.html>
Tobias Grosser
2013-Sep-08  21:02 UTC
[LLVMdev] [Polly] Compile-time of Polly's code generation
On 09/08/2013 11:46 AM, Star Tan wrote:> At 2013-09-02 17:05:52,"Tobias Grosser" <tobias at grosser.es> wrote: > >> On 09/01/2013 08:02 PM, Star Tan wrote: >>> Hi all, >>> >>> >>> It seems that Polly's code generation can leads to high compile-time overhead, especially for PolyBench applications such as 2mm, 3mm, gemm, syrk, etc. Some basic evaluation and analysis for Polly's code generation can be referred to http://llvm.org/bugs/show_bug.cgi?id=16898. >>> >>> >>> Currently, we can choose to run -polly-code-generator=cloog or -polly-code-generator=isl for code generation, but both of them lead to almost double compile-time overhead for the 2mm benchmark. Unfortunately, both Cloog and ISL can not improve the execution time compared with -polly-code-generator=none. I think if we could identify it will not improve execution time in advance, then we can skip the expensive Cloog and ISL code generator. >>> >>> >>> Can any one provide some suggestions or hints on this problem? >> >> OK. I think in this case the problem is actually to figure out why Polly >> does not give a speedup in terms of execution time, because we have seen >> large speedups for 2mm and 3mm. >> >> Here is what I see: >> >> 2mm$ polly-clang 2mm.c -O3 -I ../../../utilities/ -DPOLYBENCH_TIME >> -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-ignore-aliasing >> 2mm$ time ./a.out >> 18.217128 >> >> real 0m18.256s >> user 0m18.128s >> sys 0m0.064s >> 2mm$ polly-clang 2mm.c -O3 -I ../../../utilities/ -DPOLYBENCH_TIME >> -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-ignore-aliasing -mllvm -polly >> 2mm$ time ./a.out >> 4.986877 >> >> real 0m5.036s >> user 0m4.940s >> sys 0m0.068s >> >> So the reason this does not work is that the polybench kernels in the >> test suite do not annotate the functions called with the 'restrict' >> keyword (that's whe we need the ignore-aliasing) as well as that the >> size of the arrays is given as scalars but the corresponding loop bounds >> are not. It would be great to fix up those issues. >> >> The first issue can be fixed by adding run-time alias analysis checks. >> Adding those checks now became very easy with the new isl code >> generation. The basic idea is that we ask isl to generate the necessary >> run-time check and add it into the condition created by >> executeScopConditionally(). In case you are interested in looking into >> this, this would be a great help! >> > Thanks for your helpful reply. Yes, if we add -polly-ignore-aliasing, which skills the aliasing checking in ScopDetection, then we can detect the kernel loop as a valid scop and gain significant performance improvement. I tried to follow your hints to look into the executeScopConditionally() in CodeGen/Utils.cpp, but I cannot fully understand how to affect ScopDetection pass by modifying the executionScopConditionally(). Do you mean I can add ISL checking information into the Context in executionScopConditionally()? Could you give some more concrete ideas? Is there any code examples about ISL alias analysis?The point is that we can not just skip the alias analysis check. However, skipping the alias-analysis check becomes save in case we can perform the necessary alias-analysis check at run-time. So the idea would be to enhance the isl code generation such that it can emit a run-time check for certain cases of aliasing and to then allow such cases in the SCoP detection. A simple run-time check is to take a set of base pointers that are in a may-alias set, and check that for two distinct base pointers that are part of this set, all accesses can not overlap. To do this, I propose to take a simple example of two array accesses with distinct base pointers that may alias and start from there. The idea would be to collect for each of the base pointers all accesses that use it, and to create an isl_pw_aff that is 'one' if the pointers do overlap and 'zero' otherwise. You can use the isl code ast generator< (isl_ast_build_expr_from_pw_aff()) to create LLVM IR that performs exactly this check at run-time and you can use the result of this check in executeScopConditionally() to only execute the modified SCoP, if we found it safe to do so. Cheers, Tobias
At 2013-09-09 05:02:14,"Tobias Grosser" <tobias at grosser.es> wrote:>On 09/08/2013 11:46 AM, Star Tan wrote: >> At 2013-09-02 17:05:52,"Tobias Grosser" <tobias at grosser.es> wrote: >> >>> On 09/01/2013 08:02 PM, Star Tan wrote: >>>> Hi all, >>>> >>>> >>>> It seems that Polly's code generation can leads to high compile-time overhead, especially for PolyBench applications such as 2mm, 3mm, gemm, syrk, etc. Some basic evaluation and analysis for Polly's code generation can be referred to http://llvm.org/bugs/show_bug.cgi?id=16898. >>>> >>>> >>>> Currently, we can choose to run -polly-code-generator=cloog or -polly-code-generator=isl for code generation, but both of them lead to almost double compile-time overhead for the 2mm benchmark. Unfortunately, both Cloog and ISL can not improve the execution time compared with -polly-code-generator=none. I think if we could identify it will not improve execution time in advance, then we can skip the expensive Cloog and ISL code generator. >>>> >>>> >>>> Can any one provide some suggestions or hints on this problem? >>> >>> OK. I think in this case the problem is actually to figure out why Polly >>> does not give a speedup in terms of execution time, because we have seen >>> large speedups for 2mm and 3mm. >>> >>> Here is what I see: >>> >>> 2mm$ polly-clang 2mm.c -O3 -I ../../../utilities/ -DPOLYBENCH_TIME >>> -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-ignore-aliasing >>> 2mm$ time ./a.out >>> 18.217128 >>> >>> real 0m18.256s >>> user 0m18.128s >>> sys 0m0.064s >>> 2mm$ polly-clang 2mm.c -O3 -I ../../../utilities/ -DPOLYBENCH_TIME >>> -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-ignore-aliasing -mllvm -polly >>> 2mm$ time ./a.out >>> 4.986877 >>> >>> real 0m5.036s >>> user 0m4.940s >>> sys 0m0.068s >>> >>> So the reason this does not work is that the polybench kernels in the >>> test suite do not annotate the functions called with the 'restrict' >>> keyword (that's whe we need the ignore-aliasing) as well as that the >>> size of the arrays is given as scalars but the corresponding loop bounds >>> are not. It would be great to fix up those issues. >>> >>> The first issue can be fixed by adding run-time alias analysis checks. >>> Adding those checks now became very easy with the new isl code >>> generation. The basic idea is that we ask isl to generate the necessary >>> run-time check and add it into the condition created by >>> executeScopConditionally(). In case you are interested in looking into >>> this, this would be a great help! >>> >> Thanks for your helpful reply. Yes, if we add -polly-ignore-aliasing, which skills the aliasing checking in ScopDetection, then we can detect the kernel loop as a valid scop and gain significant performance improvement. I tried to follow your hints to look into the executeScopConditionally() in CodeGen/Utils.cpp, but I cannot fully understand how to affect ScopDetection pass by modifying the executionScopConditionally(). Do you mean I can add ISL checking information into the Context in executionScopConditionally()? Could you give some more concrete ideas? Is there any code examples about ISL alias analysis? > >The point is that we can not just skip the alias analysis check. >However, skipping the alias-analysis check becomes save in case we can >perform the necessary alias-analysis check at run-time. > >So the idea would be to enhance the isl code generation such that it can >emit a run-time check for certain cases of aliasing and to then allow >such cases in the SCoP detection. A simple run-time check is to >take a set of base pointers that are in a may-alias set, and check that >for two distinct base pointers that are part of this set, all accesses >can not overlap. > >To do this, I propose to take a simple example of two array accesses >with distinct base pointers that may alias and start from there. The >idea would be to collect for each of the base pointers all accesses that >use it, and to create an isl_pw_aff that is 'one' if the pointers do >overlap and 'zero' otherwise. You can use the isl code ast generator< >(isl_ast_build_expr_from_pw_aff()) to create LLVM IR that performs >exactly this check at run-time and you can use the result of this check >in executeScopConditionally() to only execute the modified SCoP, if we >found it safe to do so.I see, you mean we can generate LLVM code for runtime alias checking to allow more valid scops in polly-detect. In that case, I think it may be not easy to implement such support since the aliasing may be complex. Of course we can firstly take some simple examples. I have added your suggestion to the original bug 16898 (http://llvm.org/bugs/show_bug.cgi?id=16898) and I will try to move forward. Thanks, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130909/09a8c879/attachment.html>
Possibly Parallel Threads
- [LLVMdev] [Polly] Compile-time of Polly's code generation
- [LLVMdev] [Polly] Compile-time of Polly's code generation
- [LLVMdev] [Polly] Comionpile-time of Polly's code generation
- [LLVMdev] [Polly] Comionpile-time of Polly's code generation
- [LLVMdev] [Polly] Performance comparison between Cloog and ISL code generation