Mihail Popov via llvm-dev
2017-Nov-28 17:05 UTC
[llvm-dev] Publication LLVM Related Publications Submission
Hello, I would like to submit two papers that use LLVM to the Related Publications section. Both papers focus on code isolation applied to perform piecewise compiler optimizations. The code isolation process is performed by CERE, an open source tool based on LLVM. The second paper is an extended version of the first one. 1) Piecewise Holistic Autotuning of Compiler and Runtime Parameters @inproceedings{popov2016piecewise, title={Piecewise Holistic Autotuning of Compiler and Runtime Parameters}, author={Popov, Mihail and Akel, Chadi and Jalby, William and de Oliveira Castro, Pablo}, booktitle={European Conference on Parallel Processing}, pages={238--250}, year={2016}, organization={Springer} } 2) Piecewise holistic autotuning of parallel programs with CERE @article{popov2017piecewise, title={Piecewise holistic autotuning of parallel programs with CERE}, author={Popov, Mihail and Akel, Chadi and Chatelain, Yohan and Jalby, William and de Oliveira Castro, Pablo}, journal={Concurrency and Computation: Practice and Experience}, volume={29}, number={15}, year={2017}, publisher={Wiley Online Library} } Do not hesitate if you have any questions or if you need any additional documents. Thank you, Mihail Popov ----------------------------------------------------------------------------------- PAPERS SUMMARY: Piecewise Holistic Autotuning of Compiler and Runtime Parameters Abstract. Current architecture complexity requires fine tuning of compiler and runtime parameters to achieve full potential performance. Autotuning substantially improves default parameters in many scenarios but it is a costly process requiring a long iterative evaluation. We propose an automatic piecewise autotuner based on CERE (Codelet Extractor and REplayer). CERE decomposes applications into small pieces called codelets: each codelet maps to a loop or to an OpenMP parallel region and can be replayed as a standalone program. Codelet autotuning achieves better speedups at a lower tuning cost. By grouping codelet invocations with the same performance behavior, CERE reduces the number of loops or OpenMP regions to be evaluated. Moreover unlike whole-program tuning, CERE customizes the set of best parameters for each specific OpenMP region or loop. We demonstrate CERE tuning of compiler optimizations, number of threads and thread affinity on a NUMA architecture. On average over the NAS 3.0 benchmarks, we achieve a speedup of 1.08× after tuning. Tuning a single codelet is 13× cheaper than whole-program evaluation and estimates the tuning impact on the original region with a 94.7% accuracy. On a Reverse Time Migration (RTM) proto-application we achieve a 1.11× speedup with a 200× cheaper exploration. Piecewise Holistic Autotuning of Parallel Programs with CERE Current architecture complexity requires fine tuning of compiler and runtime parameters to achieve best performance. Autotuning substantially improves default parameters in many scenarios but it is a costly process requiring long iterative evaluations. We propose an automatic piecewise autotuner based on CERE (Codelet Extractor and REplayer). CERE decomposes applications into small pieces called codelets: each codelet maps to a loop or to an OpenMP parallel region and can be replayed as a standalone program. Codelet autotuning achieves better speedups at a lower tuning cost. By grouping codelet invocations with the same performance behavior, CERE reduces the number of loops or OpenMP regions to be evaluated. Moreover unlike whole-program tuning, CERE customizes the set of best parameters for each specific OpenMP region or loop. We demonstrate the CERE tuning of compiler optimizations, number of threads, thread affinity, and scheduling policy on both NUMA and heterogeneous architectures. Over the NAS benchmarks, we achieve an average speedup of 1.08× after tuning. Tuning a codelet is 13× cheaper than whole-program evaluation and predicts the tuning impact with a 94.7% accuracy. Similarly, exploring thread configurations and scheduling policies for a Black-Scholes solver on an heterogeneous big.LITTLE architecture is over 40× faster using CERE. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171128/d31e9c54/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 2016_codelet_tuning_Euro-Par.pdf Type: application/pdf Size: 467678 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171128/d31e9c54/attachment-0002.pdf> -------------- next part -------------- A non-text attachment was scrubbed... Name: 2017_CERE_tuning_Concurrency_and_Computation__Practice_and_Experience.pdf Type: application/pdf Size: 868319 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171128/d31e9c54/attachment-0003.pdf>
John Criswell via llvm-dev
2018-Jan-30 13:30 UTC
[llvm-dev] Publication LLVM Related Publications Submission
Dear Mihail, I've added these two publications to the publications page. Please review it and let me know if I need to make any changes. In particular, if you have URLs to use for the papers, having those would be greatly appreciated. Regards, John Criswell On 11/28/17 12:05 PM, Mihail Popov via llvm-dev wrote:> > Hello, > > I would like to submit two papers that use LLVM to the Related > Publications section. > > Both papers focus on code isolation applied to perform piecewise > compiler optimizations. > The code isolation process is performed by CERE, an open source tool > based on LLVM. > > The second paper is an extended version of the first one. > > 1) Piecewise Holistic Autotuning of Compiler and Runtime Parameters > > > @inproceedings{popov2016piecewise, > title={Piecewise Holistic Autotuning of Compiler and Runtime > Parameters}, > author={Popov, Mihail and Akel, Chadi and Jalby, William and de > Oliveira Castro, Pablo}, > booktitle={European Conference on Parallel Processing}, > pages={238--250}, > year={2016}, > organization={Springer} > } > > 2) Piecewise holistic autotuning of parallel programs with CERE > > > @article{popov2017piecewise, > title={Piecewise holistic autotuning of parallel programs with CERE}, > author={Popov, Mihail and Akel, Chadi and Chatelain, Yohan and > Jalby, William and de Oliveira Castro, Pablo}, > journal={Concurrency and Computation: Practice and Experience}, > volume={29}, > number={15}, > year={2017}, > publisher={Wiley Online Library} > } > > Do not hesitate if you have any questions or if you need any > additional documents. > > Thank you, > Mihail Popov > > > ----------------------------------------------------------------------------------- > > PAPERS SUMMARY: > > Piecewise Holistic Autotuning of Compiler and Runtime Parameters > > Abstract. Current architecture complexity requires fine tuning of > compiler > and runtime parameters to achieve full potential performance. Autotuning > substantially improves default parameters in many scenarios > but it is a costly process requiring a long iterative evaluation. > We propose an automatic piecewise autotuner based on CERE (Codelet > Extractor and REplayer). CERE decomposes applications into small > pieces called codelets: each codelet maps to a loop or to an OpenMP > parallel region and can be replayed as a standalone program. > Codelet autotuning achieves better speedups at a lower tuning cost. By > grouping codelet invocations with the same performance behavior, CERE > reduces the number of loops or OpenMP regions to be evaluated. Moreover > unlike whole-program tuning, CERE customizes the set of best > parameters for each specific OpenMP region or loop. > We demonstrate CERE tuning of compiler optimizations, number of > threads and thread affinity on a NUMA architecture. On average over the > NAS 3.0 benchmarks, we achieve a speedup of 1.08× after tuning. Tuning > a single codelet is 13× cheaper than whole-program evaluation and > estimates the tuning impact on the original region with a 94.7% accuracy. > On a Reverse Time Migration (RTM) proto-application we achieve > a 1.11× speedup with a 200× cheaper exploration. > > > Piecewise Holistic Autotuning of Parallel Programs with CERE > > Current architecture complexity requires fine tuning of compiler > and runtime parameters to achieve best performance. Autotuning > substantially improves default parameters in many scenarios but it is a > costly process requiring long iterative evaluations. > We propose an automatic piecewise autotuner based on CERE (Codelet > Extractor and REplayer). CERE decomposes applications into small > pieces called codelets: each codelet maps to a loop or to an OpenMP > parallel region and can be replayed as a standalone program. > Codelet autotuning achieves better speedups at a lower tuning cost. By > grouping codelet invocations with the same performance behavior, CERE > reduces the number of loops or OpenMP regions to be evaluated. Moreover > unlike whole-program tuning, CERE customizes the set of best parameters > for each specific OpenMP region or loop. > We demonstrate the CERE tuning of compiler optimizations, number > of threads, thread affinity, and scheduling policy on both NUMA and > heterogeneous architectures. Over the NAS benchmarks, we achieve an > average speedup of 1.08× after tuning. Tuning a codelet is 13× cheaper > than whole-program evaluation and predicts the tuning impact with a > 94.7% accuracy. Similarly, exploring thread configurations and scheduling > policies for a Black-Scholes solver on an heterogeneous big.LITTLE > architecture is over 40× faster using CERE. > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- John Criswell Assistant Professor Department of Computer Science, University of Rochester http://www.cs.rochester.edu/u/criswell -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180130/befd7e9d/attachment.html>
Mihail Popov via llvm-dev
2018-Jan-30 13:53 UTC
[llvm-dev] Publication LLVM Related Publications Submission
Dear John, Thank you! The references are good. Here are some links for each paper: Piecewise Holistic Autotuning of Parallel Programs with CERE official: http://onlinelibrary.wiley.com/doi/10.1002/cpe.4190/full Hal open pdf version: https://hal-uvsq.archives-ouvertes.fr/hal-01542912v2/document Piecewise Holistic Autotuning of Compiler and Runtime Parameters official: https://link.springer.com/chapter/10.1007/978-3-319-43659-3_18 An open pdf version: https://www.sifflez.org/publications/europar16.pdf I would suggest to use the open URL because everyone can access them. Regards, Mihail Popov Le 30.01.2018 14:30, John Criswell a écrit :> Dear Mihail, > > I've added these two publications to thepublications page. Please review it and let me know if I need to make any changes. In particular, if you have URLs to use for the papers, having those would be greatly appreciated.> > Regards, > > JohnCriswell> > On 11/28/17 12:05 PM, Mihail Popov via llvm-dev wrote: >>> Hello, >> >> I would like to submit two papers that use LLVM to theRelated Publications section.>> >> Both papers focus on code isolationapplied to perform piecewise compiler optimizations.>> The codeisolation process is performed by CERE, an open source tool based on LLVM.>> >> The second paper is an extended version of the firstone.>> >> 1) Piecewise Holistic Autotuning of Compiler and RuntimeParameters>> >> @inproceedings{popov2016piecewise, >>title={Piecewise Holistic Autotuning of Compiler and Runtime Parameters},>> author={Popov, Mihail and Akel, Chadi and Jalby, Williamand de Oliveira Castro, Pablo},>> booktitle={European Conference onParallel Processing},>> pages={238--250}, >> year={2016}, >>organization={Springer}>> } >> >> 2) Piecewise holistic autotuning ofparallel programs with CERE>> >> @article{popov2017piecewise, >>title={Piecewise holistic autotuning of parallel programs with CERE},>>author={Popov, Mihail and Akel, Chadi and Chatelain, Yohan and Jalby, William and de Oliveira Castro, Pablo},>> journal={Concurrency andComputation: Practice and Experience},>> volume={29}, >>number={15},>> year={2017}, >> publisher={Wiley Online Library} >> } >>>> Do not hesitate if you have any questions or if you need anyadditional documents.>> >> Thank you, >> Mihail Popov >> >>----------------------------------------------------------------------------------->>>> PAPERS SUMMARY: >> >> Piecewise Holistic Autotuning of Compilerand Runtime Parameters>> >> Abstract. Current architecture complexityrequires fine tuning of compiler>> and runtime parameters to achievefull potential performance. Autotuning>> substantially improvesdefault parameters in many scenarios>> but it is a costly processrequiring a long iterative evaluation.>> We propose an automaticpiecewise autotuner based on CERE (Codelet>> Extractor and REplayer).CERE decomposes applications into small>> pieces called codelets: eachcodelet maps to a loop or to an OpenMP>> parallel region and can bereplayed as a standalone program.>> Codelet autotuning achieves betterspeedups at a lower tuning cost. By>> grouping codelet invocations withthe same performance behavior, CERE>> reduces the number of loops orOpenMP regions to be evaluated. Moreover>> unlike whole-programtuning, CERE customizes the set of best>> parameters for each specificOpenMP region or loop.>> We demonstrate CERE tuning of compileroptimizations, number of>> threads and thread affinity on a NUMAarchitecture. On average over the>> NAS 3.0 benchmarks, we achieve aspeedup of 1.08× after tuning. Tuning>> a single codelet is 13×cheaper than whole-program evaluation and>> estimates the tuning impacton the original region with a 94.7% accuracy.>> On a Reverse TimeMigration (RTM) proto-application we achieve>> a 1.11× speedup with a200× cheaper exploration.>> >> Piecewise Holistic Autotuning ofParallel Programs with CERE>> >> Current architecture complexityrequires fine tuning of compiler>> and runtime parameters to achievebest performance. Autotuning>> substantially improves defaultparameters in many scenarios but it is a>> costly process requiringlong iterative evaluations.>> We propose an automatic piecewiseautotuner based on CERE (Codelet>> Extractor and REplayer). CEREdecomposes applications into small>> pieces called codelets: eachcodelet maps to a loop or to an OpenMP>> parallel region and can bereplayed as a standalone program.>> Codelet autotuning achieves betterspeedups at a lower tuning cost. By>> grouping codelet invocations withthe same performance behavior, CERE>> reduces the number of loops orOpenMP regions to be evaluated. Moreover>> unlike whole-programtuning, CERE customizes the set of best parameters>> for each specificOpenMP region or loop.>> We demonstrate the CERE tuning of compileroptimizations, number>> of threads, thread affinity, and schedulingpolicy on both NUMA and>> heterogeneous architectures. Over the NASbenchmarks, we achieve an>> average speedup of 1.08× after tuning.Tuning a codelet is 13× cheaper>> than whole-program evaluation andpredicts the tuning impact with a>> 94.7% accuracy. Similarly,exploring thread configurations and scheduling>> policies for aBlack-Scholes solver on an heterogeneous big.LITTLE>> architecture isover 40× faster using CERE.>> >>_______________________________________________>> LLVM Developersmailing list>> llvm-dev at lists.llvm.org >>http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > -- > JohnCriswell> Assistant Professor > Department of Computer Science,University of Rochester> http://www.cs.rochester.edu/u/criswell-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180130/b7d29c86/attachment.html>