Johannes Doerfert via llvm-dev
2018-Jun-06 16:21 UTC
[llvm-dev] [RFC] Abstract Parallel IR Optimizations
This is an RFC to add analyses and transformation passes into LLVM to optimize programs based on an abstract notion of a parallel region. == this is _not_ a proposal to add a new encoding of parallelism = We currently perform poorly when it comes to optimizations for parallel codes. In fact, parallelizing your loops might actually prevent various optimizations that would have been applied otherwise. One solution to this problem is to teach the compiler about the semantics of the used parallel representation. While this sounds tedious at first, it turns out that we can perform key optimizations with reasonable implementation effort (and thereby also reasonable maintenance costs). However, we have various parallel representations that are already in use (KMPC, GOMP, CILK runtime, ...) or proposed (Tapir, IntelPIR, ...). Our proposal seeks to introduce parallelism specific optimizations for multiple representations while minimizing the implementation overhead. This is done through an abstract notion of a parallel region which hides the actual representation from the analysis and optimization passes. In the schemata below, our current five optimizations (described in detail here [0]) are shown on the left, the abstract parallel IR interface is is in the middle, and the representation specific implementations is on the right. Optimization (A)nalysis/(T)ransformation Impl. --------------------------------------------------------------------------- CodePlacementOpt \ /---> ParallelRegionInfo (A) ---------|-> KMPCImpl (A) RegionExpander -\ | | GOMPImpl (A) AttributeAnnotator -|-|---> ParallelCommunicationInfo (A) --/ ... BarrierElimination -/ | VariablePrivatization / \---> ParallelIR/Builder (T) -----------> KMPCImpl (T) In our setting, a parallel region can be an outlined function called through a runtime library but also a fork-join/attach-reattach region embedded in an otherwise sequential code. The new optimizations will provide parallelism specific optimizations to all of them (if applicable). There are various reasons why we believe this is a worthwhile effort that belongs into the LLVM codebase, including: 1) We improve the performance of parallel programs, today. 2) It serves as a meaningful baseline for future discussions on (optimized) parallel representations. 3) It allows to determine the pros and cons of the different schemes when it comes to actual optimizations and inputs. 4) It helps to identify problems that might arise once we start to transform parallel programs but _before_ we commit to a specific representation. Our prototypes for the OpenMP KMPC library (used by clang) already shows significant speedups for various benchmarks [0]. It also exposed a (to me) prior unknown problem between restrict/noalias pointers and (potential) barriers (see Section 3 in [0]). We are currently in the process of cleaning the code, extending the support for OpenMP constructs and adding a second implementation for a embedded parallel regions. Though, a first horizontal prototype implementation is already available for review [1]. Inputs of any kind are welcome and reviewers are needed! Cheers, Johannes [0] http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf [1] https://reviews.llvm.org/D47300 P.S. Sorry if you received this message multiple times! -- Johannes Doerfert PhD Student / Researcher Compiler Design Lab (Professor Hack) / Argonne National Laboratory Saarland Informatics Campus, Germany / Lemont, IL 60439, USA Building E1.3, Room 4.31 Tel. +49 (0)681 302-57521 : doerfert at cs.uni-saarland.de / jdoerfert at anl.gov Fax. +49 (0)681 302-3065 : http://www.cdl.uni-saarland.de/people/doerfert -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: Digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180606/e55db1bd/attachment.sig>