thr3ads.net - llvm dev - [llvm-dev] Publication LLVM Related Publications Submission [Jan 2018]

If this information is useful, please help other people find it:
Share via:

Mihail Popov via llvm-dev

2017-Nov-28 17:05 UTC

[llvm-dev] Publication LLVM Related Publications Submission

Hello,

I would like to submit two papers that use LLVM to the
Related Publications section.

Both papers focus on code isolation
applied to perform piecewise compiler optimizations.
The code isolation
process is performed by CERE, an open source tool based on LLVM.

The
second paper is an extended version of the first one.

1) Piecewise
Holistic Autotuning of Compiler and Runtime Parameters

@inproceedings{popov2016piecewise,
title={Piecewise Holistic
Autotuning of Compiler and Runtime Parameters},
author={Popov, Mihail
and Akel, Chadi and Jalby, William and de Oliveira Castro, Pablo},

booktitle={European Conference on Parallel Processing},

pages={238--250},
year={2016},
organization={Springer}
}

2) Piecewise
holistic autotuning of parallel programs with CERE

@article{popov2017piecewise,
title={Piecewise holistic autotuning of
parallel programs with CERE},
author={Popov, Mihail and Akel, Chadi and
Chatelain, Yohan and Jalby, William and de Oliveira Castro, Pablo},

journal={Concurrency and Computation: Practice and Experience},

volume={29},
number={15},
year={2017},
publisher={Wiley Online
Library}
}

Do not hesitate if you have any questions or if you need any
additional documents.

Thank you,
Mihail
Popov

-----------------------------------------------------------------------------------

PAPERS
SUMMARY:

Piecewise Holistic Autotuning of Compiler and Runtime
Parameters

Abstract. Current architecture complexity requires fine
tuning of compiler
and runtime parameters to achieve full potential
performance. Autotuning
substantially improves default parameters in
many scenarios
but it is a costly process requiring a long iterative
evaluation.
We propose an automatic piecewise autotuner based on CERE
(Codelet
Extractor and REplayer). CERE decomposes applications into
small
pieces called codelets: each codelet maps to a loop or to an
OpenMP
parallel region and can be replayed as a standalone
program.
Codelet autotuning achieves better speedups at a lower tuning
cost. By
grouping codelet invocations with the same performance
behavior, CERE
reduces the number of loops or OpenMP regions to be
evaluated. Moreover
unlike whole-program tuning, CERE customizes the
set of best
parameters for each specific OpenMP region or loop.
We
demonstrate CERE tuning of compiler optimizations, number of
threads and
thread affinity on a NUMA architecture. On average over the
NAS 3.0
benchmarks, we achieve a speedup of 1.08× after tuning. Tuning
a single
codelet is 13× cheaper than whole-program evaluation and
estimates the
tuning impact on the original region with a 94.7% accuracy.
On a
Reverse Time Migration (RTM) proto-application we achieve
a 1.11×
speedup with a 200× cheaper exploration.

Piecewise Holistic Autotuning
of Parallel Programs with CERE

Current architecture complexity requires
fine tuning of compiler
and runtime parameters to achieve best
performance. Autotuning
substantially improves default parameters in
many scenarios but it is a
costly process requiring long iterative
evaluations.
We propose an automatic piecewise autotuner based on CERE
(Codelet
Extractor and REplayer). CERE decomposes applications into
small
pieces called codelets: each codelet maps to a loop or to an
OpenMP
parallel region and can be replayed as a standalone
program.
Codelet autotuning achieves better speedups at a lower tuning
cost. By
grouping codelet invocations with the same performance
behavior, CERE
reduces the number of loops or OpenMP regions to be
evaluated. Moreover
unlike whole-program tuning, CERE customizes the
set of best parameters
for each specific OpenMP region or loop.
We
demonstrate the CERE tuning of compiler optimizations, number
of
threads, thread affinity, and scheduling policy on both NUMA
and
heterogeneous architectures. Over the NAS benchmarks, we achieve
an
average speedup of 1.08× after tuning. Tuning a codelet is 13×
cheaper
than whole-program evaluation and predicts the tuning impact
with a
94.7% accuracy. Similarly, exploring thread configurations and
scheduling
policies for a Black-Scholes solver on an heterogeneous
big.LITTLE
architecture is over 40× faster using CERE.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171128/d31e9c54/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2016_codelet_tuning_Euro-Par.pdf
Type: application/pdf
Size: 467678 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171128/d31e9c54/attachment-0002.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2017_CERE_tuning_Concurrency_and_Computation__Practice_and_Experience.pdf
Type: application/pdf
Size: 868319 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171128/d31e9c54/attachment-0003.pdf>

John Criswell via llvm-dev

2018-Jan-30 13:30 UTC

head link

[llvm-dev] Publication LLVM Related Publications Submission

Dear Mihail,

I've added these two publications to the publications page. Please 
review it and let me know if I need to make any changes. In particular, 
if you have URLs to use for the papers, having those would be greatly 
appreciated.

Regards,

John Criswell

On 11/28/17 12:05 PM, Mihail Popov via llvm-dev wrote:>
> Hello,
>
> I would like to submit two papers that use LLVM to the Related 
> Publications section.
>
> Both papers focus on code isolation applied to perform piecewise 
> compiler optimizations.
> The code isolation process is performed by CERE, an open source tool 
> based on LLVM.
>
> The second paper is an extended version of the first one.
>
> 1) Piecewise Holistic Autotuning of Compiler and Runtime Parameters
>
>
> @inproceedings{popov2016piecewise,
>   title={Piecewise Holistic Autotuning of Compiler and Runtime 
> Parameters},
>   author={Popov, Mihail and Akel, Chadi and Jalby, William and de 
> Oliveira Castro, Pablo},
>   booktitle={European Conference on Parallel Processing},
>   pages={238--250},
>   year={2016},
>   organization={Springer}
> }
>
> 2) Piecewise holistic autotuning of parallel programs with CERE
>
>
> @article{popov2017piecewise,
>   title={Piecewise holistic autotuning of parallel programs with CERE},
>   author={Popov, Mihail and Akel, Chadi and Chatelain, Yohan and 
> Jalby, William and de Oliveira Castro, Pablo},
>   journal={Concurrency and Computation: Practice and Experience},
>   volume={29},
>   number={15},
>   year={2017},
>   publisher={Wiley Online Library}
> }
>
> Do not hesitate if you have any questions or if you need any 
> additional documents.
>
> Thank you,
> Mihail Popov
>
>
>
-----------------------------------------------------------------------------------
>
> PAPERS SUMMARY:
>
> Piecewise Holistic Autotuning of Compiler and Runtime Parameters
>
> Abstract. Current architecture complexity requires fine tuning of 
> compiler
> and runtime parameters to achieve full potential performance. Autotuning
> substantially improves default parameters in many scenarios
> but it is a costly process requiring a long iterative evaluation.
> We propose an automatic piecewise autotuner based on CERE (Codelet
> Extractor and REplayer). CERE decomposes applications into small
> pieces called codelets: each codelet maps to a loop or to an OpenMP
> parallel region and can be replayed as a standalone program.
> Codelet autotuning achieves better speedups at a lower tuning cost. By
> grouping codelet invocations with the same performance behavior, CERE
> reduces the number of loops or OpenMP regions to be evaluated. Moreover
> unlike whole-program tuning, CERE customizes the set of best
> parameters for each specific OpenMP region or loop.
> We demonstrate CERE tuning of compiler optimizations, number of
> threads and thread affinity on a NUMA architecture. On average over the
> NAS 3.0 benchmarks, we achieve a speedup of 1.08× after tuning. Tuning
> a single codelet is 13× cheaper than whole-program evaluation and
> estimates the tuning impact on the original region with a 94.7% accuracy.
> On a Reverse Time Migration (RTM) proto-application we achieve
> a 1.11× speedup with a 200× cheaper exploration.
>
>
> Piecewise Holistic Autotuning of Parallel Programs with CERE
>
> Current architecture complexity requires fine tuning of compiler
>  and runtime parameters to achieve best performance. Autotuning
> substantially improves default parameters in many scenarios but it is a
> costly process requiring long iterative evaluations.
> We propose an automatic piecewise autotuner based on CERE (Codelet
> Extractor and REplayer). CERE decomposes applications into small
> pieces called codelets: each codelet maps to a loop or to an OpenMP
> parallel region and can be replayed as a standalone program.
> Codelet autotuning achieves better speedups at a lower tuning cost. By
> grouping codelet invocations with the same performance behavior, CERE
> reduces the number of loops or OpenMP regions to be evaluated. Moreover
> unlike whole-program tuning, CERE customizes the set of best parameters
>  for each specific OpenMP region or loop.
> We demonstrate the CERE tuning of compiler optimizations, number
> of threads, thread affinity, and scheduling policy on both NUMA and
> heterogeneous architectures. Over the NAS benchmarks, we achieve an
> average speedup of 1.08× after tuning. Tuning a codelet is 13× cheaper
> than whole-program evaluation and predicts the tuning impact with a
> 94.7% accuracy. Similarly, exploring thread configurations and scheduling
>  policies for a Black-Scholes solver on an heterogeneous big.LITTLE
> architecture is over 40× faster using CERE.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180130/befd7e9d/attachment.html>

Mihail Popov via llvm-dev

2018-Jan-30 13:53 UTC

head link

[llvm-dev] Publication LLVM Related Publications Submission

Dear John,

Thank you! The references are good. 

Here are some
links for each paper:

Piecewise Holistic Autotuning of Parallel
Programs with
CERE

official:
http://onlinelibrary.wiley.com/doi/10.1002/cpe.4190/full

Hal
open pdf
version:
https://hal-uvsq.archives-ouvertes.fr/hal-01542912v2/document

Piecewise
Holistic Autotuning of Compiler and Runtime
Parameters

official:
https://link.springer.com/chapter/10.1007/978-3-319-43659-3_18

An
open pdf version:
https://www.sifflez.org/publications/europar16.pdf

I
would suggest to use the open URL because everyone can access
them.

Regards,
Mihail Popov

Le 30.01.2018 14:30, John Criswell a écrit
: 
> Dear Mihail,
> 
> I've added these two publications to thepublications page. Please review it and let me know if I need to make
any changes. In particular, if you have URLs to use for the papers,
having those would be greatly appreciated.> 
> Regards,
> 
> John
Criswell> 
> On 11/28/17 12:05 PM, Mihail Popov via llvm-dev wrote: 
>
>> Hello,
>> 
>> I would like to submit two papers that use LLVM to the
Related Publications section.>> 
>> Both papers focus on code isolation
applied to perform piecewise compiler optimizations.>> The codeisolation process is performed by CERE, an open source tool based on
LLVM.>> 
>> The second paper is an extended version of the first
one.>> 
>> 1) Piecewise Holistic Autotuning of Compiler and Runtime
Parameters >> 
>> @inproceedings{popov2016piecewise,
>>title={Piecewise Holistic Autotuning of Compiler and Runtime
Parameters},>> author={Popov, Mihail and Akel, Chadi and Jalby, William
and de Oliveira Castro, Pablo},>> booktitle={European Conference on
Parallel Processing},>> pages={238--250},
>> year={2016},
>>
organization={Springer}>> }
>> 
>> 2) Piecewise holistic autotuning of
parallel programs with CERE >> 
>> @article{popov2017piecewise,
>>title={Piecewise holistic autotuning of parallel programs with
CERE},>>author={Popov, Mihail and Akel, Chadi and Chatelain, Yohan and Jalby,
William and de Oliveira Castro, Pablo},>> journal={Concurrency and
Computation: Practice and Experience},>> volume={29},
>>
number={15},>> year={2017},
>> publisher={Wiley Online Library}
>> }
>>
>> Do not hesitate if you have any questions or if you need any
additional documents.>> 
>> Thank you,
>> Mihail Popov
>> 
>>
----------------------------------------------------------------------------------->>
>> PAPERS SUMMARY:
>> 
>> Piecewise Holistic Autotuning of Compiler
and Runtime Parameters>> 
>> Abstract. Current architecture complexity
requires fine tuning of compiler >> and runtime parameters to achieve
full potential performance. Autotuning >> substantially improves
default parameters in many scenarios>> but it is a costly process
requiring a long iterative evaluation.>> We propose an automatic
piecewise autotuner based on CERE (Codelet>> Extractor and REplayer).
CERE decomposes applications into small>> pieces called codelets: each
codelet maps to a loop or to an OpenMP>> parallel region and can be
replayed as a standalone program.>> Codelet autotuning achieves better
speedups at a lower tuning cost. By>> grouping codelet invocations with
the same performance behavior, CERE>> reduces the number of loops or
OpenMP regions to be evaluated. Moreover >> unlike whole-program
tuning, CERE customizes the set of best >> parameters for each specific
OpenMP region or loop.>> We demonstrate CERE tuning of compiler
optimizations, number of>> threads and thread affinity on a NUMA
architecture. On average over the>> NAS 3.0 benchmarks, we achieve a
speedup of 1.08× after tuning. Tuning >> a single codelet is 13×
cheaper than whole-program evaluation and>> estimates the tuning impact
on the original region with a 94.7% accuracy. >> On a Reverse Time
Migration (RTM) proto-application we achieve>> a 1.11× speedup with a
200× cheaper exploration.>> 
>> Piecewise Holistic Autotuning of
Parallel Programs with CERE>> 
>> Current architecture complexity
requires fine tuning of compiler>> and runtime parameters to achieve
best performance. Autotuning>> substantially improves default
parameters in many scenarios but it is a>> costly process requiring
long iterative evaluations.>> We propose an automatic piecewise
autotuner based on CERE (Codelet>> Extractor and REplayer). CERE
decomposes applications into small>> pieces called codelets: each
codelet maps to a loop or to an OpenMP>> parallel region and can be
replayed as a standalone program.>> Codelet autotuning achieves better
speedups at a lower tuning cost. By>> grouping codelet invocations with
the same performance behavior, CERE>> reduces the number of loops or
OpenMP regions to be evaluated. Moreover >> unlike whole-program
tuning, CERE customizes the set of best parameters>> for each specific
OpenMP region or loop.>> We demonstrate the CERE tuning of compiler
optimizations, number>> of threads, thread affinity, and scheduling
policy on both NUMA and>> heterogeneous architectures. Over the NAS
benchmarks, we achieve an>> average speedup of 1.08× after tuning.
Tuning a codelet is 13× cheaper>> than whole-program evaluation and
predicts the tuning impact with a>> 94.7% accuracy. Similarly,
exploring thread configurations and scheduling>> policies for a
Black-Scholes solver on an heterogeneous big.LITTLE>> architecture is
over 40× faster using CERE.>> 
>>
_______________________________________________>> LLVM Developers
mailing list>> llvm-dev at lists.llvm.org
>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> 
> -- 
> John
Criswell> Assistant Professor
> Department of Computer Science,
University of Rochester> http://www.cs.rochester.edu/u/criswell 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180130/b7d29c86/attachment.html>

Seemingly Similar Threads

Search for more apparently analagous threads

llvm dev - Jan 2018 - Publication LLVM Related Publications Submission

[llvm-dev] Publication LLVM Related Publications Submission

[llvm-dev] Publication LLVM Related Publications Submission

[llvm-dev] Publication LLVM Related Publications Submission

Seemingly Similar Threads