Tobias Grosser
2013-Mar-20 13:06 UTC
[LLVMdev] [Polly]GSoC Proposal: Reducing LLVM-Polly Compiling overhead
On 03/19/2013 11:02 AM, Star Tan wrote:> > Dear Tobias Grosser, > > Today I have rebuilt the LLVM-Polly in Release mode. The configuration of my own testing machine is: Intel Pentium Dual CPU T2390(1.86GHz) with 2GB DDR2 memory. > I evaluated the Polly using PolyBench and Mediabench. It takes too long time to evaluate the whole LLVM-testsuite, so I just choose the Mediabench from LLVM-testsuite.OK. This is a good baseline.> The preliminary results of Polly compiling overhead is listed as follows: > > Table 1: Compiling time overhead of Polly for PolyBench. > > | | Clang > (econd) | Polly-load > (econd) | Polly-optimize > (econd) | Polly-load penalty | Polly-optimize > Penalty | > | 2mm.c | 0.155 | 0.158 | 0.75 | 1.9% | 383.9% | > | correlation.c | 0.132 | 0.133 | 0.319 | 0.8% | 141.7% | > | geummv.c | 0.152 | 0.157 | 0.794 | 3.3% | 422.4% | > | ludcmp.c | 0.157 | 0.159 | 0.391 | 1.3% | 149.0% | > | 3mm.c | 0.103 | 0.109 | 0.122 | 5.8% | 18.4% | > | covariance.c | 0.16 | 0.163 | 1.346 | 1.9% | 741.3% |This is a very large slowdown. On my system I get 0.06 sec for Polly-load 0.09 sec for Polly-optimize What exact version of Polybench did you use? What compiler flags did you use to compile the benchmark? Also, did you run the executables several times? How large is the standard deviation of the results? (You can use a tool like ministat to calculate these values [1])> | gramchmidt.c | 0.159 | 0.167 | 1.023 | 5.0% | 543.4% | > | eidel.c | 0.125 | 0.13 | 0.285 | 4.0% | 128.0% | > | adi.c | 0.155 | 0.156 | 0.953 | 0.6% | 514.8% | > | doitgen.c | 0.124 | 0.128 | 0.298 | 3.2% | 140.3% | > | intrument.c | 0.149 | 0.151 | 0.837 | 1.3% | 461.7% |This number is surprising. In your last numbers you reported Polly-optimize as taking 0.495 sec in debug mode. The time you now report for the release mode is almost twice as much. Can you verify this number please?> | atax.c | 0.135 | 0.136 | 0.917 | 0.7% | 579.3% | > | gemm.c | 0.161 | 0.162 | 1.839 | 0.6% | 1042.2% |This number looks also fishy. In debug mode you reported for Polly-optimize 1.327 seconds. This is again faster than in release mode.> | jacobi-2d-imper.c | 0.16 | 0.161 | 0.649 | 0.6% | 305.6% | > | bicg.c | 0.149 | 0.152 | 0.444 | 2.0% | 198.0% | > | gemver.c | 0.135 | 0.136 | 0.416 | 0.7% | 208.1% | > | lu.c | 0.143 | 0.148 | 0.398 | 3.5% | 178.3% | > | Average | | | | 2.20% | 362.15% |Otherwise, those numbers look like a good start. Maybe you can put them on some website/wiki/document where you can extend them as you proceed with benchmarking.> Table 2: Compiling time overhead of Polly for Mediabench (Selected from LLVM-testsuite). > | | Clang > (econd) | Polly-load > (econd) | Polly-optimize > (econd) | Polly-load penalty | Polly-optimize > Penalty | > | adpcm | 0.18 | 0.187 | 0.218 | 3.9% | 21.1% | > | g721 | 0.538 | 0.538 | 0.803 | 0.0% | 49.3% | > | gsm | 2.869 | 2.936 | 4.789 | 2.3% | 66.9% | > | mpeg2 | 3.026 | 3.072 | 4.662 | 1.5% | 54.1% | > | jpeg | 13.083 | 13.248 | 22.488 | 1.3% | 71.9% | > | Average | | | | 1.80% | 52.65% |I run jpeg myself to verify these numbers on my machine. I got: A: -O3 B: -O3 -load LLVMPolly.so C: -O3 -load LLVMPolly.so -mllvm -polly D: -O3 -load LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none E: -O3 -load LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none A B C D E | jpeg | 5.1 | 5.2 | 8.0 | 7.9 | 5.5 The overhead between A and C is similar to the one you report. Hence, the numbers seem to be correct. I also added two more runs D and E to figure out where the slowdown comes from. As you can see most of the slow down disappears when we do not do code generation. This either means that the polly code generation itself is slow or that the LLVM passes afterwards need more time due to the code we generated (it contains many opportunities for scalar simplifications). It would be interesting to see if this holds for the other benchmarks and to investigate the actual reasons for the slowdown. It is also interesting to see that just running Polly, but without applying optimizations does not slow down the compilation a lot. Does this also hold for other benchmarks?> As shown in these two tables, Polly can significantly increase the compiling time when it indeed works for the Polybench. On average, Polly will increase the compiling time by 4.5X for Polybench. Even for the Mediabench, in which Polly does not actually improve the efficiency of generated code, it still increases the compiling time by 1.5X. > Based on the above observation, I think we should not only reduce the Polly analysis and optimization time, but also make it bail out early when it cannot improve the efficiency of generated code. That is very important when Polly is enabled in default for LLVM users.Bailing out early is definitely something we can think about. To get started here, you could e.g. look into the jpeg benchmark and investigate on which files Polly is spending a lot of time, where exactly the time is spent and what kind of SCoPs Polly is optimizing. In case we do not expect any benefit, we may skip code generation entirely. Thanks again for your interesting analysis. Cheers, Tobi [1] https://github.com/codahale/ministat
tanmx_star
2013-Mar-23 16:23 UTC
[LLVMdev] [Polly]GSoC Proposal: Reducing LLVM-Polly Compiling overhead
Dear Tobies, Sorry for the late reply. I have checked the experiment and I found some of the data is mismatched because of incorrect manual copy and paste, so I have written a Shell script to automatically collect data. Newest data is listed in the attached file. Tobies, I have made a simple HTML page (attached polly-compiling-overhead.html) to show the experimental data and my plans for this project. I think a public webpage can be helpful for our further discussion. If possible, could you put it on Polly website (Either a public link or a temporary webpage) ? I think I will try to remove unnecessary code transformations for canonicalization in next step. Thank you very much for your warm help. Best Regards, Star Tan From: Tobias Grosser Date: 2013-03-20 21:06 To: Star Tan CC: llvmdev Subject: Re: [Polly]GSoC Proposal: Reducing LLVM-Polly Compiling overhead On 03/19/2013 11:02 AM, Star Tan wrote:> > Dear Tobias Grosser, > > Today I have rebuilt the LLVM-Polly in Release mode. The configuration of my own testing machine is: Intel Pentium Dual CPU T2390(1.86GHz) with 2GB DDR2 memory. > I evaluated the Polly using PolyBench and Mediabench. It takes too long time to evaluate the whole LLVM-testsuite, so I just choose the Mediabench from LLVM-testsuite.OK. This is a good baseline.> The preliminary results of Polly compiling overhead is listed as follows: > > Table 1: Compiling time overhead of Polly for PolyBench. > > | | Clang > (econd) | Polly-load > (econd) | Polly-optimize > (econd) | Polly-load penalty | Polly-optimize > Penalty | > | 2mm.c | 0.155 | 0.158 | 0.75 | 1.9% | 383.9% | > | correlation.c | 0.132 | 0.133 | 0.319 | 0.8% | 141.7% | > | geummv.c | 0.152 | 0.157 | 0.794 | 3.3% | 422.4% | > | ludcmp.c | 0.157 | 0.159 | 0.391 | 1.3% | 149.0% | > | 3mm.c | 0.103 | 0.109 | 0.122 | 5.8% | 18.4% | > | covariance.c | 0.16 | 0.163 | 1.346 | 1.9% | 741.3% |This is a very large slowdown. On my system I get 0.06 sec for Polly-load 0.09 sec for Polly-optimize What exact version of Polybench did you use? What compiler flags did you use to compile the benchmark? Also, did you run the executables several times? How large is the standard deviation of the results? (You can use a tool like ministat to calculate these values [1])> | gramchmidt.c | 0.159 | 0.167 | 1.023 | 5.0% | 543.4% | > | eidel.c | 0.125 | 0.13 | 0.285 | 4.0% | 128.0% | > | adi.c | 0.155 | 0.156 | 0.953 | 0.6% | 514.8% | > | doitgen.c | 0.124 | 0.128 | 0.298 | 3.2% | 140.3% | > | intrument.c | 0.149 | 0.151 | 0.837 | 1.3% | 461.7% |This number is surprising. In your last numbers you reported Polly-optimize as taking 0.495 sec in debug mode. The time you now report for the release mode is almost twice as much. Can you verify this number please?> | atax.c | 0.135 | 0.136 | 0.917 | 0.7% | 579.3% | > | gemm.c | 0.161 | 0.162 | 1.839 | 0.6% | 1042.2% |This number looks also fishy. In debug mode you reported for Polly-optimize 1.327 seconds. This is again faster than in release mode.> | jacobi-2d-imper.c | 0.16 | 0.161 | 0.649 | 0.6% | 305.6% | > | bicg.c | 0.149 | 0.152 | 0.444 | 2.0% | 198.0% | > | gemver.c | 0.135 | 0.136 | 0.416 | 0.7% | 208.1% | > | lu.c | 0.143 | 0.148 | 0.398 | 3.5% | 178.3% | > | Average | | | | 2.20% | 362.15% |Otherwise, those numbers look like a good start. Maybe you can put them on some website/wiki/document where you can extend them as you proceed with benchmarking.> Table 2: Compiling time overhead of Polly for Mediabench (Selected from LLVM-testsuite). > | | Clang > (econd) | Polly-load > (econd) | Polly-optimize > (econd) | Polly-load penalty | Polly-optimize > Penalty | > | adpcm | 0.18 | 0.187 | 0.218 | 3.9% | 21.1% | > | g721 | 0.538 | 0.538 | 0.803 | 0.0% | 49.3% | > | gsm | 2.869 | 2.936 | 4.789 | 2.3% | 66.9% | > | mpeg2 | 3.026 | 3.072 | 4.662 | 1.5% | 54.1% | > | jpeg | 13.083 | 13.248 | 22.488 | 1.3% | 71.9% | > | Average | | | | 1.80% | 52.65% |I run jpeg myself to verify these numbers on my machine. I got: A: -O3 B: -O3 -load LLVMPolly.so C: -O3 -load LLVMPolly.so -mllvm -polly D: -O3 -load LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none E: -O3 -load LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none A B C D E | jpeg | 5.1 | 5.2 | 8.0 | 7.9 | 5.5 The overhead between A and C is similar to the one you report. Hence, the numbers seem to be correct. I also added two more runs D and E to figure out where the slowdown comes from. As you can see most of the slow down disappears when we do not do code generation. This either means that the polly code generation itself is slow or that the LLVM passes afterwards need more time due to the code we generated (it contains many opportunities for scalar simplifications). It would be interesting to see if this holds for the other benchmarks and to investigate the actual reasons for the slowdown. It is also interesting to see that just running Polly, but without applying optimizations does not slow down the compilation a lot. Does this also hold for other benchmarks?> As shown in these two tables, Polly can significantly increase the compiling time when it indeed works for the Polybench. On average, Polly will increase the compiling time by 4.5X for Polybench. Even for the Mediabench, in which Polly does not actually improve the efficiency of generated code, it still increases the compiling time by 1.5X. > Based on the above observation, I think we should not only reduce the Polly analysis and optimization time, but also make it bail out early when it cannot improve the efficiency of generated code. That is very important when Polly is enabled in default for LLVM users.Bailing out early is definitely something we can think about. To get started here, you could e.g. look into the jpeg benchmark and investigate on which files Polly is spending a lot of time, where exactly the time is spent and what kind of SCoPs Polly is optimizing. In case we do not expect any benefit, we may skip code generation entirely. Thanks again for your interesting analysis. Cheers, Tobi [1] https://github.com/codahale/ministat -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130324/3a85931c/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: polly-compiling-overhead.html Type: application/octet-stream Size: 8687 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130324/3a85931c/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: polly_build.sh Type: application/octet-stream Size: 1177 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130324/3a85931c/attachment-0001.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: polly_compile.sh Type: application/octet-stream Size: 1213 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130324/3a85931c/attachment-0002.obj>
Tobias Grosser
2013-Mar-23 17:37 UTC
[LLVMdev] [Polly]GSoC Proposal: Reducing LLVM-Polly Compiling overhead
On 03/23/2013 05:23 PM, tanmx_star wrote:> Dear Tobies, > > Sorry for the late reply. > > I have checked the experiment and I found some of the data is mismatched because of incorrect manual copy and paste, so I have written a Shell script to automatically collect data. Newest data is listed in the attached file.Yes, automatizing those experiments to make them reproducible is a very good idea. I did not yet verify the numbers, but will as soon as your script is online. Two comments: o Can you run also with the following flags: D: -O3 -load LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none E: -O3 -load LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none o Some numbers are again fishy: adi: In your previous report you reported 0.953 seconds, the website now says 1.839 seconds. ludcmp: In your previous report you reported 0.391 seconds, the website now says 1.346 seconds instrument: It seems you rounded the previous numbers to one significant digit and calculated the performance difference from the rounded numbers. I would prefer if you would use the original numbers and you would only round when displaying/printing the results> Tobies, I have made a simple HTML page (attached polly-compiling-overhead.html) to show the experimental data and my plans for this project. I think a public webpage can be helpful for our further discussion. If possible, could you put it on Polly website (Either a public link or a temporary webpage) ?Yes, I believe a website is a very good start to illustrate your findings and to organize the information that you got. For now I propose to host it yourself as I expect it to change often and you waiting for me to add changes just adds overhead (there are plenty of free hosting services). We can later move it to the Polly website. Some comments on the content: - Just put your name as the person who runs the project I appreciate that you put my name on the top, but this is work you started and that you will use as a summer of code project application. So you should be the only person mentioned there - Cite properly Also, as this will later become an application, I believe it is necessary to make clear what part of the document comes from you and which part was something you got from reviews/external sources. Specifically, if you copy a larger text from one of my emails, please mark it accordingly. - Typo 'memeory'> I think I will try to remove unnecessary code transformations for canonicalization in next step.Are you referring to the region simplification change, I was proposing earlier? I believe this is a good change to work on as it is simple, self contained and also a conceptual cleanup. After this patch, I believe it is necessary to get more details about your performance numbers to understand better where your work will be beneficial. All the best, Tobi
Possibly Parallel Threads
- [LLVMdev] [Polly] GSoC Proposal: Reducing LLVM-Polly Compiling overhead
- [LLVMdev] [Polly] GSoC Proposal: Reducing LLVM-Polly Compiling overhead
- [LLVMdev] [Polly] GSoC Proposal: Reducing LLVM-Polly Compiling overhead
- [LLVMdev] [Polly] GSoC Proposal: Reducing LLVM-Polly Compiling overhead
- [LLVMdev] [Polly] GSoC Proposal: Reducing LLVM-Polly Compiling overhead