thr3ads.net - llvm dev - [LLVMdev] [Polly] GSoC Proposal: Reducing LLVM-Polly Compiling overhead [May 2013]

If this information is useful, please help other people find it:
Share via:

Tobias Grosser

2013-May-03 11:05 UTC

[LLVMdev] [Polly] GSoC Proposal: Reducing LLVM-Polly Compiling overhead

On 05/03/2013 11:39 AM, Star Tan wrote:> Dear Tobias,
>
>
> Thank you very much for your very helpful advice.
>
>
> Yes, -debug-pass and -time-passes are two very useful and powerful
> options when evaluating the compile-time of each compiler pass. They
> are exactly what I need! With these options, I can step into details
> of the compile-time overhead of each pass. I have finished some
> preliminary testing based on two randomly selected files from
> PolyBench and MediaBench. Results are listed on
> https://gist.github.com/tanstar/5508153 .
Great. Thanks for the quick update and the nice formatting of your
proposal.
> In this project, I try to reduce Polly compile-time overhead by
> revising Polly passes.  For this purpose, I firstly try to find out
> where the compile-time overhead comes from. When Polly optimizes a
> program, it takes the following steps:
>
> * step1 - Polly canonicalization: prepare basic information and
> complete some transformations.
> * step2 - LLVM IR to Polly description: detect valid scops and
> translates them into polyhedral representation.
> * step3 - Polly optimization: analyze and optimize polyhedral scops.
> * step4 - Polly description to LLVM-IR: translates the polyhedral
> description back into LLVM-IR.
I may make sense to list in the attachments which passes exactly belong
to these different steps.
> Based on these results, I plan to revise all canonicalization passes,
> code generation passes and optimization passes in Polly. Then, I will
> try my best to reduce the compile-time overhead for each part of them.
>
> In the following 12 weeks, my schedule for this project includes the
following stages:
> * Stage1 (1 week): set up a Polly performance tester and integrate the
automation testing environment.
> * Stage2 (1 week): collect and analyze the detailed profiling information
for Polly compile process;
> * Stage3 (3 weeks): revise and improve some expensive Polly passes or LLVM
passes that Polly depends on;
> * Stage4 (3 weeks): revise canonicalization passes of Polly and try to
remove most of canonicalization passes;
> * Stage5 (1 week): revise the pass ordering problem and let Polly bail out
if it cannot benefit programs;
> * Stage6 (1 week): revise and improve Polly code generation passes;
> * Stage7 (1 week): revise and improve Polly optimization passes;
> * Stage8 (1 week): revise other parts of Polly.
I believe that before we should probably move Polly rather early into
the default -O3 optimization chain. You may take some time to profile
Polly and to understand it better, but we should not do extensive
optimizations before moving it. As long as we can ensure the number of
scops detected remains the same, this should be good enough. We can then
spend the time optimizing the Polly integration in the normal -O3 chain.
> ### 3. Project Details
>
> #### Stage1 -- Set up a Polly performance tester to track the compile time.
[Week 1]
>
> The first important work is to pick criteria to evaluate my work in
> this project. This project targets to reduce compile-time overhead,
> but at the same time it must not degrade the performance. To simplify
> the testing, I will use the number of scops that optimized by Polly as
> the performance criteria. As a result, our Polly performance tester
> should contains two parts:
>
> * Code performance: the number of scops optimized by Polly should not
> be degraded.
> * Compile-time Overhead: both the total compile-time overhead and the
> percentage of each Polly pass are both important.
>
> Furthermore, I currently use some scripts to collect the compile-time
> statistics, but it still requires some tricky and manual work. My plan
> is to set up an automation testing environment and integrate it into
> Polly.
Yes, this is nice. It would be great if we could get some hardware to
run such tests regularly. I will check if we can find a sponsor for
this?
> **Week 3**: revise the "Polly - Calculate dependence" pass. Since
this
> pass leads to significant compile-time overhead in both PolyBench and
> MediaBench, I will investigate the details of this pass and try to
> reduce the compile-time overhead.
> **Week 4**: revise the "Polly - Detect static control parts
(SCoPs)"
> and "Polly - Create polyhedral description of Scops" pass. As
shown in
> table 3 and table 4, these two passes are expensive. However, they are
> also very important to Polly because all Polly optimizations are based
> on scops detected in these two passes. I will step into the details of
> the two passes and make sure no performance loss happens no matter how
> much compile-time overhead is reduced.
> **Week 5**: revise other top ten expensive Polly passes. For example,
> "Combine redundant instructions" pass runs several times because
of
> Polly and it can causes a large compile-time overhead. I will try to
> eliminate or reduce the run of this pass.
>
> In this stage, I will also step into some existing bugs related to
> those expensive passes discussed above. For example, the bug
> llvm.org/PR15643 is related to the "Polly - Calculate dependence"
> pass. Although it has brought some discussions in LLVM/Polly mail
> list, there is still no feasible solution. I will try my best to fix
> up such bugs when I step into the details of those passes.
I actually pointed out a solution. We need to change Polly or
recanonicalize the SCEVs we analyse as the current way the SCEV are
canonicalized makes Polly introduce different parameters for expressions
{p + 1, +, 1}_<l> and {p + 2, +, 1}_<l> in case the loop 'l'
is not part
of the scop.
> Our evaluation is based on Intel Pentium Dual CPU T2390(1.86GHz) with
> 2GB DDR2 memory. Each benchmark is run multiple times and data are
> collected using ministat (https://github.com/codahale/ministat).
> Polly is built with Cloog in Release+Assert mode. All Polly, LLVM and
> Cloog source code are checked out from SVN at April 24, 2013.
Can you give the svn revision / git hash for the checkouts of LLVM,
Polly, isl, cloog and polybench.
> Table 1 and table 2 show the compile-time overhead of Polly. Five cases are
tested:
> (alias pollycc="clang -O3 -load LLVMPolly.so -mllvm -polly)
> * **clang**:  clang -O3
> * **pBasic**: clang -O3 -load LLVMPolly.so
> * **pNoGen**: pollycc -O3 -mllvm -polly-optimizer=none -mllvm
-polly-code-generatorr=none
> * **pNoOpt**: pollycc -O3 -mllvm -polly-optimizer=none
> * **pOpt**:   pollycc -O3
You probably want to add a couple of additional parameters here to
ensure we detect all scops in polybench. I would add -mllvm
-polly-ignore-aliasing and -DPOLYBENCH_USE_SCALAR_LB

You can use '-mllvm -debug-only=polly-cloog' to see if the kernels
are properly detected. I get the following output:

alias polly-clang
alias polly-clang='~/Projekte/polly/build/bin/clang -Xclang -load 
-Xclang ~/Projekte/polly/build/lib/LLVMPolly.so'
grosser at tobilaptop:~/Projekte/polybench$ polly-clang -O3 -mllvm -polly 
-mllvm -debug-only=polly-cloog linear-algebra/kernels/gemm/gemm.c -I 
utilities/ utilities/polybench.c  -mllvm -polly-ignore-aliasing 
-DPOLYBENCH_USE_SCALAR_LB
:: init_array : entry.split => for.end56
if ((nj >= 1) && (nk >= 1) && (p_1 >= 1) &&
(p_4 >= 1)) {
   for (c2=0;c2<=p_4-1;c2+=32) {
     for 
(c3=max(-32*floord(p_1-12*p_4+10,32)-32*p_4,-32*c2-32*floord(-12*c2+p_1+10,32)-640);c3<=-20*c2;c3+=32)
{
       for 
(c4=max(ceild(-c3-p_1-30,20),c2);c4<=min(min(floord(-c3,20),c2+31),p_4-1);c4++)
{
         for (c5=max(c3,-20*c4-p_1+1);c5<=min(-20*c4,c3+31);c5++) {
           Stmt_for_body41(c4,-20*c4-c5);
         }
       }
     }
   }
}
if ((ni >= 1) && (nk >= 1) && (p_0 >= 1) &&
(p_4 >= 1)) {
   for (c2=0;c2<=p_0-1;c2+=32) {
     for 
(c3=max(-32*floord(-12*p_0+p_4+10,32)-32*p_0,-32*c2-32*floord(-12*c2+p_4+10,32)-640);c3<=-20*c2;c3+=32)
{
       for 
(c4=max(ceild(-c3-p_4-30,20),c2);c4<=min(min(floord(-c3,20),c2+31),p_0-1);c4++)
{
         for (c5=max(c3,-20*c4-p_4+1);c5<=min(-20*c4,c3+31);c5++) {
           Stmt_for_body18(c4,-20*c4-c5);
         }
       }
     }
   }
}
if ((ni >= 1) && (nj >= 1) && (p_0 >= 1) &&
(p_1 >= 1)) {
   for (c2=0;c2<=p_0-1;c2+=32) {
     for 
(c3=max(-32*floord(-12*p_0+p_1+10,32)-32*p_0,-32*c2-32*floord(-12*c2+p_1+10,32)-640);c3<=-20*c2;c3+=32)
{
       for 
(c4=max(ceild(-c3-p_1-30,20),c2);c4<=min(min(floord(-c3,20),c2+31),p_0-1);c4++)
{
         for (c5=max(c3,-20*c4-p_1+1);c5<=min(-20*c4,c3+31);c5++) {
           Stmt_for_body3(c4,-20*c4-c5);
         }
       }
     }
   }
}
Stmt_entry_split();
:: kernel_gemm : for.cond1.preheader => for.end28
for (c2=0;c2<=1023;c2+=32) {
   for (c3=0;c3<=1023;c3+=32) {
     for (c4=c2;c4<=c2+31;c4++) {
       for (c5=c3;c5<=c3+31;c5++) {
         Stmt_for_body3(c4,c5);
       }
     }
   }
}
for (c2=0;c2<=1023;c2+=32) {
   for (c3=0;c3<=1023;c3+=32) {
     for (c4=0;c4<=1023;c4+=32) {
       for (c5=c2;c5<=c2+31;c5++) {
         for (c6=c3;c6<=c3+31;c6++) {
           for (c7=c4;c7<=c4+31;c7++) {
             Stmt_for_body8(c5,c6,c7);
           }
         }
       }
     }
   }
}
:: polybench_flush_cache : for.inc => for.end
for (c1=0;c1<=4194559;c1++) {
   Stmt_for_inc(c1);
}

As a first step, we should ensure that Polly is fast for these flags. As
subsequent steps we should then ensure that the absence of these flags
does not increase compile-time.
> Table 3 and table 4 show the compile-time overhead of top 15 passes in
> polly-opt. we first generate LLVM IR files by the command "clang -O3
> -emit-llvm XX.c -o XX.ll", then we run opt to collect the compile time
> of each pass using the command "opt -basicaa  -load LLVMPolly.so -O3
> -polly -polly-codegen-scev XX.ll -S -o XX.opt.ll -time-passes"
You want to extract the .ll file with 'clang -O0'. Otherwise you run
polly on code that is already -O3 optimized, making the runs not
comparable to the clang integrated ones and also unrealistic as they do
not reflect what we do when running Polly from within clang.

It would be interesting to understand the doitgen results better. The
time in the optimizer is only 0.408 seconds, whereas the increase from
pBasic to pOpt is 0.897 - 0.151 = 0.746 seconds. This seems surprising.
Is this because of running polly on -O3 optimized code, is Polly
producing bigger .ll files which yield to longer object file emmission
time or is there a measurement problem?

That's it for now. Thanks for the update!

Cheers,
Tobi

Star Tan

2013-May-03 16:30 UTC

head link

[LLVMdev] [Polly] GSoC Proposal: Reducing LLVM-Polly Compiling overhead

Dear Tobias and all LLVM/Polly developers,


Thank you very much for all your help and advice. 


I have submitted my proposal to GSoC 2013 application system:
http://www.google-melange.com/gsoc/proposal/review/google/gsoc2013/star/1. 
Some tables and paragraphs are simplified to make it more readable on GSoC
official pages. Any suggestion or comment would be appreciated.


At 2013-05-03 19:05:05,"Tobias Grosser" <tobias at grosser.es>
wrote:>On 05/03/2013 11:39 AM, Star Tan wrote:
>> Dear Tobias,
>>
>>
>> Thank you very much for your very helpful advice.
>>
>>
>> Yes, -debug-pass and -time-passes are two very useful and powerful
>> options when evaluating the compile-time of each compiler pass. They
>> are exactly what I need! With these options, I can step into details
>> of the compile-time overhead of each pass. I have finished some
>> preliminary testing based on two randomly selected files from
>> PolyBench and MediaBench. Results are listed on
>> https://gist.github.com/tanstar/5508153 .
>
>Great. Thanks for the quick update and the nice formatting of your
>proposal.
>
>> In this project, I try to reduce Polly compile-time overhead by
>> revising Polly passes.  For this purpose, I firstly try to find out
>> where the compile-time overhead comes from. When Polly optimizes a
>> program, it takes the following steps:
>>
>> * step1 - Polly canonicalization: prepare basic information and
>> complete some transformations.
>> * step2 - LLVM IR to Polly description: detect valid scops and
>> translates them into polyhedral representation.
>> * step3 - Polly optimization: analyze and optimize polyhedral scops.
>> * step4 - Polly description to LLVM-IR: translates the polyhedral
>> description back into LLVM-IR.
>
>I may make sense to list in the attachments which passes exactly belong
>to these different steps.
OK, I will add further information in Table 5.>
>> Based on these results, I plan to revise all canonicalization passes,
>> code generation passes and optimization passes in Polly. Then, I will
>> try my best to reduce the compile-time overhead for each part of them.
>>
>> In the following 12 weeks, my schedule for this project includes the
following stages:
>> * Stage1 (1 week): set up a Polly performance tester and integrate the
automation testing environment.
>> * Stage2 (1 week): collect and analyze the detailed profiling
information for Polly compile process;
>> * Stage3 (3 weeks): revise and improve some expensive Polly passes or
LLVM passes that Polly depends on;
>> * Stage4 (3 weeks): revise canonicalization passes of Polly and try to
remove most of canonicalization passes;
>> * Stage5 (1 week): revise the pass ordering problem and let Polly bail
out if it cannot benefit programs;
>> * Stage6 (1 week): revise and improve Polly code generation passes;
>> * Stage7 (1 week): revise and improve Polly optimization passes;
>> * Stage8 (1 week): revise other parts of Polly.
>
>I believe that before we should probably move Polly rather early into
>the default -O3 optimization chain. You may take some time to profile
>Polly and to understand it better, but we should not do extensive
>optimizations before moving it. As long as we can ensure the number of
>scops detected remains the same, this should be good enough. We can then
>spend the time optimizing the Polly integration in the normal -O3 chain.I agree with you.  I will try to understand Polly better as soon as possible by
profiling Polly and stepping into the details of some critical Polly passes. It
may take a little time before we can  start optimizing the Polly integration in
the normal -O3 chain.

>
>> ### 3. Project Details
>>
>> #### Stage1 -- Set up a Polly performance tester to track the compile
time. [Week 1]
>>
>> The first important work is to pick criteria to evaluate my work in
>> this project. This project targets to reduce compile-time overhead,
>> but at the same time it must not degrade the performance. To simplify
>> the testing, I will use the number of scops that optimized by Polly as
>> the performance criteria. As a result, our Polly performance tester
>> should contains two parts:
>>
>> * Code performance: the number of scops optimized by Polly should not
>> be degraded.
>> * Compile-time Overhead: both the total compile-time overhead and the
>> percentage of each Polly pass are both important.
>>
>> Furthermore, I currently use some scripts to collect the compile-time
>> statistics, but it still requires some tricky and manual work. My plan
>> is to set up an automation testing environment and integrate it into
>> Polly.
>
>Yes, this is nice. It would be great if we could get some hardware to
>run such tests regularly. I will check if we can find a sponsor for
>this?Thank you. That would be great if we can find a sponsor.
Otherwise, I think I have to run all test on my laptop.

>
>> **Week 3**: revise the "Polly - Calculate dependence" pass.
Since this
>> pass leads to significant compile-time overhead in both PolyBench and
>> MediaBench, I will investigate the details of this pass and try to
>> reduce the compile-time overhead.
>> **Week 4**: revise the "Polly - Detect static control parts
(SCoPs)"
>> and "Polly - Create polyhedral description of Scops" pass. As
shown in
>> table 3 and table 4, these two passes are expensive. However, they are
>> also very important to Polly because all Polly optimizations are based
>> on scops detected in these two passes. I will step into the details of
>> the two passes and make sure no performance loss happens no matter how
>> much compile-time overhead is reduced.
>> **Week 5**: revise other top ten expensive Polly passes. For example,
>> "Combine redundant instructions" pass runs several times
because of
>> Polly and it can causes a large compile-time overhead. I will try to
>> eliminate or reduce the run of this pass.
>>
>> In this stage, I will also step into some existing bugs related to
>> those expensive passes discussed above. For example, the bug
>> llvm.org/PR15643 is related to the "Polly - Calculate
dependence"
>> pass. Although it has brought some discussions in LLVM/Polly mail
>> list, there is still no feasible solution. I will try my best to fix
>> up such bugs when I step into the details of those passes.
>
>I actually pointed out a solution. We need to change Polly or
>recanonicalize the SCEVs we analyse as the current way the SCEV are
>canonicalized makes Polly introduce different parameters for expressions
>{p + 1, +, 1}_<l> and {p + 2, +, 1}_<l> in case the loop
'l' is not part
>of the scop.
>
>> Our evaluation is based on Intel Pentium Dual CPU T2390(1.86GHz) with
>> 2GB DDR2 memory. Each benchmark is run multiple times and data are
>> collected using ministat (https://github.com/codahale/ministat).
>> Polly is built with Cloog in Release+Assert mode. All Polly, LLVM and
>> Cloog source code are checked out from SVN at April 24, 2013.
>
>Can you give the svn revision / git hash for the checkouts of LLVM,
>Polly, isl, cloog and polybench.I use the following git clones: http://llvm.org/git/llvm.git,
http://llvm.org/git/polly.git and http://llvm.org/git/clang.git.
I usually run "git pull" command for all the three repositories before
I start a new evaluation. Of course sometimes the LLVM is not up-to-date because
rebuilding LLVM is really time consuming.  Their git hash value are:
Cloog: 77c44782f4cbf6d60a7dbfe996b0649336ec7205
Polly:   5f3abecad642a6646c53cbd39879091da87c3c37
LLVM: d050e96133fac8565e3bb1eabe9a587dd5a6ac4d>
>> Table 1 and table 2 show the compile-time overhead of Polly. Five cases
are tested:
>> (alias pollycc="clang -O3 -load LLVMPolly.so -mllvm -polly)
>> * **clang**:  clang -O3
>> * **pBasic**: clang -O3 -load LLVMPolly.so
>> * **pNoGen**: pollycc -O3 -mllvm -polly-optimizer=none -mllvm
-polly-code-generatorr=none
>> * **pNoOpt**: pollycc -O3 -mllvm -polly-optimizer=none
>> * **pOpt**:   pollycc -O3
>
>You probably want to add a couple of additional parameters here to
>ensure we detect all scops in polybench. I would add -mllvm
>-polly-ignore-aliasing and -DPOLYBENCH_USE_SCALAR_LB
>
>You can use '-mllvm -debug-only=polly-cloog' to see if the kernels
>are properly detected. I get the following output:
>
>alias polly-clang
>alias polly-clang='~/Projekte/polly/build/bin/clang -Xclang -load 
>-Xclang ~/Projekte/polly/build/lib/LLVMPolly.so'
>grosser at tobilaptop:~/Projekte/polybench$ polly-clang -O3 -mllvm -polly 
>-mllvm -debug-only=polly-cloog linear-algebra/kernels/gemm/gemm.c -I 
>utilities/ utilities/polybench.c  -mllvm -polly-ignore-aliasing 
>-DPOLYBENCH_USE_SCALAR_LB
>:: init_array : entry.split => for.end56
>if ((nj >= 1) && (nk >= 1) && (p_1 >= 1) &&
(p_4 >= 1)) {
>   for (c2=0;c2<=p_4-1;c2+=32) {
>     for 
>(c3=max(-32*floord(p_1-12*p_4+10,32)-32*p_4,-32*c2-32*floord(-12*c2+p_1+10,32)-640);c3<=-20*c2;c3+=32)
>{
>       for 
>(c4=max(ceild(-c3-p_1-30,20),c2);c4<=min(min(floord(-c3,20),c2+31),p_4-1);c4++)
>{
>         for (c5=max(c3,-20*c4-p_1+1);c5<=min(-20*c4,c3+31);c5++) {
>           Stmt_for_body41(c4,-20*c4-c5);
>         }
>       }
>     }
>   }
>}
>if ((ni >= 1) && (nk >= 1) && (p_0 >= 1) &&
(p_4 >= 1)) {
>   for (c2=0;c2<=p_0-1;c2+=32) {
>     for 
>(c3=max(-32*floord(-12*p_0+p_4+10,32)-32*p_0,-32*c2-32*floord(-12*c2+p_4+10,32)-640);c3<=-20*c2;c3+=32)
>{
>       for 
>(c4=max(ceild(-c3-p_4-30,20),c2);c4<=min(min(floord(-c3,20),c2+31),p_0-1);c4++)
>{
>         for (c5=max(c3,-20*c4-p_4+1);c5<=min(-20*c4,c3+31);c5++) {
>           Stmt_for_body18(c4,-20*c4-c5);
>         }
>       }
>     }
>   }
>}
>if ((ni >= 1) && (nj >= 1) && (p_0 >= 1) &&
(p_1 >= 1)) {
>   for (c2=0;c2<=p_0-1;c2+=32) {
>     for 
>(c3=max(-32*floord(-12*p_0+p_1+10,32)-32*p_0,-32*c2-32*floord(-12*c2+p_1+10,32)-640);c3<=-20*c2;c3+=32)
>{
>       for 
>(c4=max(ceild(-c3-p_1-30,20),c2);c4<=min(min(floord(-c3,20),c2+31),p_0-1);c4++)
>{
>         for (c5=max(c3,-20*c4-p_1+1);c5<=min(-20*c4,c3+31);c5++) {
>           Stmt_for_body3(c4,-20*c4-c5);
>         }
>       }
>     }
>   }
>}
>Stmt_entry_split();
>:: kernel_gemm : for.cond1.preheader => for.end28
>for (c2=0;c2<=1023;c2+=32) {
>   for (c3=0;c3<=1023;c3+=32) {
>     for (c4=c2;c4<=c2+31;c4++) {
>       for (c5=c3;c5<=c3+31;c5++) {
>         Stmt_for_body3(c4,c5);
>       }
>     }
>   }
>}
>for (c2=0;c2<=1023;c2+=32) {
>   for (c3=0;c3<=1023;c3+=32) {
>     for (c4=0;c4<=1023;c4+=32) {
>       for (c5=c2;c5<=c2+31;c5++) {
>         for (c6=c3;c6<=c3+31;c6++) {
>           for (c7=c4;c7<=c4+31;c7++) {
>             Stmt_for_body8(c5,c6,c7);
>           }
>         }
>       }
>     }
>   }
>}
>:: polybench_flush_cache : for.inc => for.end
>for (c1=0;c1<=4194559;c1++) {
>   Stmt_for_inc(c1);
>}
>
>As a first step, we should ensure that Polly is fast for these flags. As
>subsequent steps we should then ensure that the absence of these flags
>does not increase compile-time.Thanks for your advice. I will investigate more flags and enrich the content of
table 1 and table 2.

>
>> Table 3 and table 4 show the compile-time overhead of top 15 passes in
>> polly-opt. we first generate LLVM IR files by the command "clang
-O3
>> -emit-llvm XX.c -o XX.ll", then we run opt to collect the compile
time
>> of each pass using the command "opt -basicaa  -load LLVMPolly.so
-O3
>> -polly -polly-codegen-scev XX.ll -S -o XX.opt.ll -time-passes"
>
>You want to extract the .ll file with 'clang -O0'. Otherwise you run
>polly on code that is already -O3 optimized, making the runs not
>comparable to the clang integrated ones and also unrealistic as they do
>not reflect what we do when running Polly from within clang.
>
>It would be interesting to understand the doitgen results better. The
>time in the optimizer is only 0.408 seconds, whereas the increase from
>pBasic to pOpt is 0.897 - 0.151 = 0.746 seconds. This seems surprising.
>Is this because of running polly on -O3 optimized code, is Polly
>producing bigger .ll files which yield to longer object file emmission
>time or is there a measurement problem? I run three groups of commands:
Group 1:  "clang -O0 -emit-llvm" (0.082s) + "polly-opt -O3"
(0.589s)
Group 2:  "clang -O3 -emit-llvm" (0.128s) + "polly-opt -O3"
(0.402s)
Group 3: "clang -O3 -mllvm -polly -emit-llvm" (0.848s)
*Note: I did not use -emit-llvm in previous evaluation, which means it take a
little time to translate LLVM IR to X86 assembler code, so the time in Group 3
(0.848s) is a little less than pOpt  listed in table 1 (0.897)
You are right. I should use "clang -O0 -emit-llvm" to generate LLVM IR
for the subsequent Polly optimization.
However, I notice that there is still a gap beween Group 1 (0.671s in total) and
Group 3 (0.848s in total). I guess this is because Polly generates bigger and
more complicated LLVM IR, which slows down clang passes after Polly in Group 3
commands.
Is there any option like -time-passes when running clang? Such kind of options
can help to find out where the extra compile time spends on.

>
>That's it for now. Thanks for the update!
>
>Cheers,
>Tobi
>
Best wishes,
Star Tan

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130504/f2468e86/attachment.html>

Sebastian Pop

2013-May-03 18:24 UTC

head link

[LLVMdev] [Polly] GSoC Proposal: Reducing LLVM-Polly Compiling overhead

Star Tan wrote:> >> #### Stage1 -- Set up a Polly performance tester to track the
compile time. [Week 1]
> >>
> >> The first important work is to pick criteria to evaluate my work
in
> >> this project. This project targets to reduce compile-time
overhead,
> >> but at the same time it must not degrade the performance. To
simplify
> >> the testing, I will use the number of scops that optimized by
Polly as
> >> the performance criteria. As a result, our Polly performance
tester
> >> should contains two parts:
> >>
> >> * Code performance: the number of scops optimized by Polly should
not
> >> be degraded.
> >> * Compile-time Overhead: both the total compile-time overhead and
the
> >> percentage of each Polly pass are both important.
> >>
> >> Furthermore, I currently use some scripts to collect the
compile-time
> >> statistics, but it still requires some tricky and manual work. My
plan
> >> is to set up an automation testing environment and integrate it
into
> >> Polly.
> >
> >Yes, this is nice. It would be great if we could get some hardware to
> >run such tests regularly. I will check if we can find a sponsor for
> >this?
I will ask if we can have a machine to run a polly perf build-bot.
> Thank you. That would be great if we can find a sponsor.
> Otherwise, I think I have to run all test on my laptop.
> 
An alternative is to set up an automatic tester in the gcc compile farm
http://gcc.gnu.org/wiki/CompileFarm

and run the perf measurements on several different machines: collect all the
data and use a noise filter to discard any hiccups due to the use of the system
by another process, etc.

http://repo.or.cz/w/gcc-perf-regression-tester.git/blob_plain/HEAD:/analyze.R
http://repo.or.cz/w/gcc-perf-regression-tester.git/blob_plain/HEAD:/analyze-core.R

Note that even on a dedicated system you would have the same noise problem.

Sebastian
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - May 2013 - [LLVMdev] [Polly] GSoC Proposal: Reducing LLVM-Polly Compiling overhead

[LLVMdev] [Polly] GSoC Proposal: Reducing LLVM-Polly Compiling overhead

[LLVMdev] [Polly] GSoC Proposal: Reducing LLVM-Polly Compiling overhead

[LLVMdev] [Polly] GSoC Proposal: Reducing LLVM-Polly Compiling overhead

Apparently Analagous Threads