Hi All, We'd like to make a proposal for OpenMP representation in LLVM IR. Our goal is to reach an agreement in the community on a simple, complete and extensible representation of OpenMP language constructs in LLVM IR. Hopefully, this would serve as a common ground and would enable further development of OpenMP support both in Clang and LLVM compiler toolchain. We seek feedback on the proposed design and ways to improve it. Main authors of the proposal are Andrey Bokhanko and Alexey Bataev. Also, we'd like to acknowledge valuable contributions and advice provided by Kevin B Smith, Xinmin Tian, Stefanus Du Toit, Brian Minard, Dmitry Babokin and other colleagues from Intel and Diego Novillo from Google. NB, this *does not* automatically imply support of the proposal by said individuals. Please find the proposal in *.pdf (attached to the message, for reading convenience) and plain text (below the message, for quoting convenience) formats. Their content is identical. Full disclosure: both of us are associated with Intel and Intel Compiler team. Yours, Andrey Bokhanko Alexey Bataev ============Software Engineers Intel Compiler Team Intel Corp. OpenMP Representation in LLVM IR ===============================Design Proposal Alexey Bataev, Andrey Bokhanko Introduction ===========This document describes design proposal for OpenMP representation in LLVM IR. Authors assume that readers have basic understanding of: 1) OpenMP principles and language constructs 2) General design of LLVM compiler system Goals ====Our end goal is to provide a simple, complete and extensible support for OpenMP representation in LLVM IR. We aim to extend LLVM IR as little as possible, preferably without adding any new types / language constructs at all. Also, we’d like to keep opportunities for optimization of parallel code as widely opened as possible –obviously, within the boundaries of preserving correct semantics of a user program. And finally, in the spirit of OpenMP Specification ([OpenMP11]), LLVM IR based compilers should be able to easily skip IR extensions related to OpenMP support and still generate correct, albeit sequential, code. Function Outlining =================Eventually, parallel regions should be put into separate routines, a process usually called “function outlining” or “procedurization”. This can happen as early as in front-end, and as late as right before code generation. As can be seen in the following sections, the IR extension we propose doesn’t involve explicit procedurization. Thus, we assume that function outlining should happen somewhere in the LLVM back-end, and usually this should be aligned with how chosen OpenMP runtime library works and what it expects. This is a deliberate decision on our part. We believe it provides the following benefits (when compared with designs involving procedurization done in a front-end): 1) Function outlining doesn’t depend on source language; thus, it can be implemented once and used with any front-ends. 2) Optimizations are usually restricted by a single function boundary. If procedurization is done in a front-end, this effectively kills any optimizations – as simple ones as loop invariant code motion. Refer to [Tian2005] for more information on why this is important for efficient optimization of OpenMP programs. It should be stressed, though, that in order to preserve correct semantics of a user program, optimizations should be made thread-aware (which, to the best of our knowledge, is not the case with LLVM optimizations). By “thread-aware” we mean an optimization that performs legal transformations on parallel programs. Refer to [Novillo00] for more information on correctness of optimizations for parallel programs. Given that LLVM optimizations are currently not thread-aware, initial implementation should call procedurization pass right at the start of the optimizer. In the future, this pass can be moved further and further down the optimizer, as more optimizations will become thread-aware. Essentially, this initial implementation is not that different from implementations with procedurization done in front-end; however, it keeps window of possibility opened. Scope ====Scope of this design is OpenMP representation in LLVM IR. Thus, all other issues related to OpenMP support, like runtime library, ABI, etc are not covered. We also included a set of requirements for front-ends and back-ends, which establish mutual expectations and is an important addition to the design. Our proposal is based on the latest published OpenMP specification, which is version 3.1 ([OpenMP31]) at the time of writing. However, the design approach we employed is general enough to allow easy adaptation for future versions of OpenMP standard. Front-End/Back-End Contract ==========================While not a part of OpenMP representation design per se, the following pre- and post-conditions help to establish a set of mutual expectation between front-ends (that generate LLVM IR with OpenMP support) and back-ends (that consume it). Without this kind of a “contract” ([Meyer92]), a back-end has to verify too much and basically repeat the work already done by a front-end. While it is possible to generate LLVM IR that violates these conditions, we consider it to be non-conformant. Obviously, conformance can be verified, if necessary. Pre- and Post-conditions for Front-Ends --------------------------------------- 1) Guarantee correct semantics of directives and clauses. This includes nomenclature and number of clauses in directives, correct nesting of directives, values of clauses, etc. For example, front-ends should guarantee that omp single directive has at most one private, firstprivate, copyprivate and nowait clause, and nothing else; omp section directive is nested inside omp sections directive; the only possible value of private clause is a list of variables available at the point where this clause is present; etc. 2) Guarantee correct semantics of statements / structured blocks following directives. As an example, it should be guaranteed that only for-loops follow omp parallel for directive. 3) Set _OPENMP macro. This is required in section 2.2 of [OpenMP11]. 4) Support a command-line option that enables/disables OpenMP support. This is required in Chapter 2 of [OpenMP11]. If a user program violates OpenMP semantics and thus, makes compliance with first two conditions impossible, a front-end should report a meaningful error message and stop compilation. It should be noted that front-ends are not required to guarantee full conformance of generated LLVM IR to OpenMP specification. As an example, the specification deems programs that branch into or out of parallel regions to be non-conformant. We believe that verification of such properties is too complex for most front-ends. This matches well with what is written in the specification itself: “compliant implementations are not required to check for code sequences that cause a program to be classified as non-conforming” ([OpenMP11], Section 1.1). Pre- and Post-conditions for Back-Ends -------------------------------------- 1) Intrinsics, builtins, library routines and language implementation should be thread-safe. This is equally applicable to front-ends and is required in Section 1.5 of [OpenMP11]. 2) Optimizations should be thread-aware. 3) Support a command-line option that enables/disables OpenMP support. This is required in Chapter 2 of [OpenMP11]. 4) Rely on post-conditions guaranteed by front-end, and nothing else. Conditions 2)-5) are only applicable to optimizations working on IR before procedurization. As noted in the previous section, LLVM IR is not guaranteed to be fully conformant with OpenMP specification. Back-ends should be able to compile cleanly non-compliant programs, but no promises are made on correct behavior of resulting machine code. Elements of OpenMP =================OpenMP is comprised of four components (as of version 3.1): * Directives (+ Clauses) * Internal Control Variables * Runtime Library Routines * Environment Variables Internal Control Variables, Runtime Library Routines and Environment Variables are provided / handled by OpenMP runtime library. These components of OpenMP don’t require special support in LLVM IR and thus are out of scope of this document. All OpenMP directives in C/C++ are specified with #pragma preprocessing directive and have the following format: #pragma omp directive-name [clause [[,] clause]…]. Next two chapters describe our design proposal for representation of, correspondingly, directives and clauses. Directives =========Directives in LLVM IR are represented as calls to “llvm.omp.directive” intrinsic with a single argument referencing LLVM IR metadata. The metadata contains an identifier of a directive. Almost all OpenMP directives are represented with two intrinsic calls: one for entry and one for exit point of the directive’s context. It is enough to have just one intrinsic call for several directives which are supposed to be enclosed in other directives (like omp section, which must appear only within omp sections context) or specify a single instruction (like omp flush) or are declarative (like omp threadprivate). Exit point for such directives is either non-existent or can be determined by examining other intrinsic calls and thus, not required to be explicitly present. Metadata specify only one thing: type of a directive. Currently LLVM IR does not support integer or enumeration metadata types; thus, we decided to use MDString (metadata string type) to represent the type. The list of directives and their identifiers is shown in Table 1. Table 1. OpenMP Directives Type of directive #pragma omp ... | MDString in metadata ------------------------------------------------------------ parallel | OMP_PARALLEL | OMP_END_PARALLEL ------------------------------------------------------------ [parallel] for | OMP_[PARALLEL_]LOOP | OMP_END_[PARALLEL_]LOOP ------------------------------------------------------------ [parallel] sections | OMP_[PARALLEL_]SECTIONS | OMP_END_[PARALLEL_]SECTIONS ------------------------------------------------------------ section | OMP_SECTION ------------------------------------------------------------ single | OMP_SINGLE | OMP_END_SINGLE ------------------------------------------------------------ task | OMP_PTASK | OMP_END_PTASK ------------------------------------------------------------ taskyield | OMP_PTASKYIELD ------------------------------------------------------------ master | OMP_MASTER | OMP_END_MASTER ------------------------------------------------------------ critical | OMP_CRITICAL | OMP_END_CRITICAL ------------------------------------------------------------ barrier | OMP_BARRIER ------------------------------------------------------------ taskwait | OMP_PTASKWAIT ------------------------------------------------------------ atomic | OMP_ATOMIC | OMP_END_ATOMIC ------------------------------------------------------------ flush | OMP_FLUSH ------------------------------------------------------------ ordered | OMP_ORDERED | OMP_END_ORDERED An example of an OpenMP directive and its representation in LLVM IR: C/C++ ----- #pragma omp parallel LLVM IR ------- call void @llvm.omp.directive(metadata !0) ... call void @llvm.omp.directive(metadata !1) !0 = metadata !{metadata !”OMP_PARALLEL”} !1 = metadata !{metadata !”OMP_END_PARALLEL”} Clauses ======All OpenMP clauses can be divided into four groups: * Ones with predefined values (default, ordered, nowait, untied, read, write, update, capture) * Ones with a list of variables (shared, private, firstprivate, lastprivate, copyin, copyprivate, directives flush, threadprivate) * Ones with a scalar or integer expression (if, num_threads, final, collapse) * Compound ones (reduction, schedule, directive critical) Clauses in LLVM IR are represented as intrinsic calls. Each intrinsic call representing a clause has one mandatory argument and arbitrary number of optional arguments. The mandatory argument references LLVM IR metadata. The metadata contains identifier of the clause (MDString) and, for some compound clauses, additional data represented as another MDString. Optional arguments reference LLVM variables or expressions. Clauses with Predefined Values ------------------------------ Clauses with predefined values are represented as calls to “llvm.omp.simple” intrinsic. For this group of clauses metadata contains an MDString with a clause’s identifier. The list of clauses and their identifiers is shown in Table 2. Table 2. OpenMP Clauses with Predefined Values Clause | MDString in Metadata ------------------------------------------------------------ default(none) | OMP_DEFAULT_NONE ------------------------------------------------------------ default(shared) | OMP_DEFAULT_SHARED ------------------------------------------------------------ ordered | OMP_ORDERED ------------------------------------------------------------ nowait | OMP_NOWAIT ------------------------------------------------------------ untied | OMP_UNTIED ------------------------------------------------------------ read | OMP_READ ------------------------------------------------------------ write | OMP_WRITE ------------------------------------------------------------ update | OMP_UPDATE ------------------------------------------------------------ capture | OMP_CAPTURE Here is an example of OpenMP clause with a predefined value and its representation in LLVM IR: C/C++ ----- #pragma omp atomic read LLVM IR ------- call void @llvm.omp.simple(metadata !1) !1 = metadata !{metadata !”OMP_READ”} Clauses with a List of Variables -------------------------------- Clauses with a list of variables are represented as calls to “llvm.omp.list” intrinsic. For this group of clauses metadata contains an MDString with a clause’s identifier. Additional arguments of the intrinsic reference LLVM variables associated with a clause. It is important to reference variables directly in intrinsic calls and not in metadata, in order to preserve data dependency. Each variable of a non user-defined type is represented with a single argument (referencing the variable). Each variable of a user-defined type is represented with four arguments: one referencing the variable itself, one referencing its default constructor, one referencing its copy constructor and one referencing its destructor. References to constructors / destructors are required to correctly create and destroy private copies of variables. The list of clauses and their identifiers is shown in Table 3. Table 3. OpenMP Clauses with a List of Variables Clause | MDString in Metadata ------------------------------------------------------------ private(list) | OMP_PRIVATE ------------------------------------------------------------ firstprivate(list) | OMP_FIRSTPRIVATE ------------------------------------------------------------ lastprivate(list) | OMP_LASTPRIVATE ------------------------------------------------------------ shared(list) | OMP_SHARED ------------------------------------------------------------ copyin(list) | OMP_COPYIN ------------------------------------------------------------ Directive flush(list) | OMP_FLUSH ------------------------------------------------------------ Directive threadprivate(list) | OMP_THREADPRIVATE Here is an example of OpenMP clause with a list of variables and its representation in LLVM IR: C/C++ ----- #pragma omp parallel private(a,b) LLVM IR ------- call void (metadata, ...)* @llvm.omp.list(metadata !1, i32* @a, i32* @b) !1 = metadata !{metadata !”OMP_PRIVATE”} Clauses with Scalar or Integer Expressions ------------------------------------------ Clauses with scalar or integer expressions are represented as calls to “llvm.omp.expr” intrinsic. For this group of clauses metadata contains an MDString with a clause’s identifier. Second argument of the intrinsic references a scalar or integer LLVM expression associated with a clause. It is important to reference expressions directly in intrinsic calls and not in metadata, in order to preserve data dependency. The list of clauses and their identifiers is shown in Table 4. Table 4. OpenMP Clauses with Scalar or Integer Expressions Clause | MDString in Metadata ------------------------------------------------------------ if(scalar_expr) | OMP_IF ------------------------------------------------------------ num_threads(integer_expr) | OMP_NUM_THREADS ------------------------------------------------------------ final(scalar_expr) | OMP_FINAL ------------------------------------------------------------ collapse(const_integer_expr) | OMP_COLLAPSE Here is an example of OpenMP clause with a scalar expression and its representation in LLVM IR: C/C++ ----- #pragma omp parallel if(a) LLVM IR ------- %4 = load i32* @a, align 4 %5 = icmp ne i32 %4, 0 call void @llvm.omp.expr(metadata !1, i32 %5) ... !1 = metadata !{metadata !”OMP_IF”} Compound Clauses ---------------- Compound clauses are represented as calls to “llvm.omp.compound” intrinsic. For this group of clauses metadata contains an MDString with a clause’s identifier and additional data, also represented as an MDString. Additional arguments of an intrinsic call for reduction clause reference LLVM variables associated with a clause (see format of “Clauses with a List of Variables”). Second argument of an intrinsic call for schedule (static), schedule (dynamic) and schedule (guided) clauses references an integer LLVM expression associated with a clause (see format of “Clauses with Scalar or Integer Expressions”). The list of compound clauses along with their representation in metadata is shown in Table 5. Table 5. OpenMP Compound Clauses Clause | Representation in Metadata ------------------------------------------------------------ reduction(operator : list) | OMP_REDUCTION, <operator> ------------------------------------------------------------ schedule(static [, integer_expr]) | OMP_SCHEDULE, STATIC ------------------------------------------------------------ schedule(dynamic [, integer_expr]) | OMP_SCHEDULE, DYNAMIC ------------------------------------------------------------ schedule(guided [, integer_expr]) | OMP_SCHEDULE, GUIDED ------------------------------------------------------------ schedule(auto) | OMP_SCHEDULE, AUTO ------------------------------------------------------------ schedule(runtime) | OMP_SCHEDULE, RUNTIME ------------------------------------------------------------ Directive critical(name) | OMP_NAME, <name> Here is an example of a compound OpenMP clause and its representation in LLVM IR: C/C++ ----- #pragma omp parallel reduction(+ : a, b) LLVM IR ------- call void (metadata, ...)* @llvm.omp.compound(metadata !1, i32* @a, i32* @b) !1 = metadata !{metadata !”OMP_REDUCTION”, metadata !”+”} An Example ---------- Here is a more complex example, demonstrating LLVM IR representation of a directive with several clauses of different types. C/C++ ----- int a, gVar; int main() { int lVar; #pragma omp parallel default(shared), private(gVar, a), if(lVar) { ... } return (0); } LLVM IR ------- @a = global i32 0, align 4 @gVar = global i32 0, align 4 define i32 @main () nounwind uwtable ssp { %lVar = alloca i32, align 4 call void @llvm.omp.directive(metadata !0) call void @llvm.omp.simple(metadata !1) call void (metadata, ...)* @llvm.omp.list(metadata !2, i32* @gVar, i32* @a) call void @llvm.omp.expr(metadata !3, i32* %lVar) ... call void @llvm.omp.directive(metadata !4) ret i32 0 } !0 = metadata !{metadata !”OMP_PARALLEL”} !1 = metadata !{metadata !”OMP_DEFAULT_SHARED”} !2 = metadata !{metadata !”OMP_PRIVATE”} !3 = metadata !{metadata !”OMP_IF”} !4 = metadata !{metadata !”OMP_END_PARALLEL”} Design Alternatives ==================It is possible to propose several viable alternatives to representing information on OpenMP directives and clauses while keeping general design approach intact. Some things that we considered: a) Represent information on types of directives and clauses as constant expressions instead of MDStrings. b) Place MDStrings with identifiers of directives and clauses directly into intrinsic calls instead of metadata. c) Do not represent clauses as separate intrinsics with references to metadata; instead, put a list of references to metadata for all clauses associated with a directive at the end of metadata describing the directive. As we said, all these are viable alternatives. We decided to choose what we chose and not employ the alternatives listed above in order to keep true the following three design principles: 1) Absolute simplicity and readability (including human readability). This ruled out alternatives a) and [partially] c). 2) Uniformity of representation, for both directives and clauses. This ruled out alternatives b) and c). 3) As much extensibility and openness for future changes in both LLVM IR and OpenMP as possible. This ruled out alternatives b) and c). We believe that the only downside of our choice is larger size of LLVM IR required to represent same number of directives and clauses. However, we consider this as a relatively minor element of our design; one might argue in favor of these or other similar design alternatives. Conclusion =========In this document we described our design proposal for OpenMP representation in LLVM IR. We believe the IR extensions we proposed are: 1) Simple They rely on existing LLVM IR types and language constructs; the only new things are a few new intrinsics. Thus, LLVM IR consumers lacking OpenMP support can simply ignore these two intrinsics and still generate correct, albeit sequential, code. 2) Complete OpenMP 3.1 Specification ([OpenMP31]) is fully covered. 3) Extensible The design approach is general enough to readily incorporate future versions of OpenMP standard. 4) Enables both early and late procedurization and aggressive optimization This is when compared with designs implying explicit procedurization done right in front-ends. Acknowledgements ===============Authors would like to acknowledge valuable contributions and advice provided by Kevin B Smith, Xinmin Tian, Stefanus Du Toit, Brian Minard, Dmitry Babokin and other colleagues from Intel and Diego Novillo from Google. This *does not* automatically imply approval of this proposal by said individuals. References ========= [Lattner10] Chris Lattner, Devang Patel, “Extensible Metadata in LLVM IR”. Available at: http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html [Meyer92] Bertrand Meyer, “Applying “Design by Contract”, IEEE Computer, October 1992, pp. 40-51. Available at: http://se.ethz.ch/~meyer/publications/computer/contract.pdf [Novillo00] Diego Novillo, “Analysis and Optimization of Explicitly Parallel Programs”, PhD Thesis, University of Alberta, Edmonton, Alberta, Canada, 2000. Available at: http://www.airs.com/dnovillo/Papers/Thesis.pdf [OpenMP11] “OpenMP Application Program Interface”, Version 3.1, July 2011. Available at: http://www.openmp.org/mp-documents/OpenMP3.1.pdf [Tian05] Xinmin Tian, Milind Girkar, Aart Bik and Hideki Saito, “Practical Compiler Techniques on Efficient Multithreaded Code Generation for OpenMP Programs”, The Computer Journal, September 2005, pp.588-601. Available at: http://dl.acm.org/citation.cfm?id=1095017 -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenMP_LLVM_IR.pdf Type: application/pdf Size: 666518 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120928/aaca2b70/attachment.pdf>
Andrey, I am very glad to see that you're interested in working on this! I have a few comments: As you may know, this is the third such proposal over the past two months, one by me (http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052472.html) and the other, based somewhat on mine, by Sanjoy (http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-September/053798.html) In order for your proposal to work well, there will be a lot of infrastructure work required (more than with my proposal); many passes will need to be made explicitly aware of how they can, or can't, reorder things with respect to the parallelization intrinsics; loop restructuring may require special care, etc. How this is done depends in part on where the state information is stored: Do we keep the parallelization information in the intrinsics during mid-level optimization, or do we move its state into an analysis pass? In any case, I don't object to this approach so long as we have a good plan for how this work will be done. When we discussed this earlier this year, there seemed to be some consensus that we wanted to avoid, to the extent possible, introducing OpenMP-specific intrinsics into the LLVM layer. Rather, we should define some kind of parallelization API (in the form of metadata, intrinsics, etc.) onto which OpenMP can naturally map along with other paradigms. There is interest in supporting OpenACC, for example, which will require data copying clauses, and it would make sense to share as much of the infrastructure as possible with OpenMP. Are you interested in providing Cilk support as well? We probably don't want to have NxM slightly-different ways of expressing 'this is a parallel region'. There are obviously cases in which things need to be specific to the interface (like runtime loop scheduling in OpenMP which implies a specific interaction with the runtime library), but such cases may be the exception rather than the rule. We don't need 'omp' in the intrinsic names and also 'OMP_' on all of the string specifiers. Maybe, to my previous point, we could call the intrinsics 'parallel' and use 'OMP_' only when something is really OpenMP-specific? You don't seem to want to map thread-private variables onto the existing TLS support. Why? Sincerely, Hal On Fri, 28 Sep 2012 22:42:37 +0400 Andrey Bokhanko <andreybokhanko at gmail.com> wrote:> Hi All, > > We'd like to make a proposal for OpenMP representation in LLVM IR. > > Our goal is to reach an agreement in the community on a simple, > complete and extensible representation of OpenMP language constructs > in LLVM IR. Hopefully, this would serve as a common ground and would > enable further development of OpenMP support both in Clang and LLVM > compiler toolchain. > > We seek feedback on the proposed design and ways to improve it. > > Main authors of the proposal are Andrey Bokhanko and Alexey Bataev. > Also, we'd like to acknowledge valuable contributions and advice > provided by Kevin B Smith, Xinmin Tian, Stefanus Du Toit, Brian > Minard, Dmitry Babokin and other colleagues from Intel and Diego > Novillo from Google. NB, this *does not* automatically imply support > of the proposal by said individuals. > > Please find the proposal in *.pdf (attached to the message, for > reading convenience) and plain text (below the message, for quoting > convenience) formats. Their content is identical. > > Full disclosure: both of us are associated with Intel and Intel > Compiler team. > > Yours, > Andrey Bokhanko > Alexey Bataev > ============> Software Engineers > Intel Compiler Team > Intel Corp. > > > OpenMP Representation in LLVM IR > ===============================> Design Proposal > > Alexey Bataev, Andrey Bokhanko > > > Introduction > ===========> This document describes design proposal for OpenMP representation in > LLVM IR. > > Authors assume that readers have basic understanding of: > 1) OpenMP principles and language constructs > 2) General design of LLVM compiler system > > > Goals > ====> Our end goal is to provide a simple, complete and extensible support > for OpenMP representation in LLVM IR. > > We aim to extend LLVM IR as little as possible, preferably without > adding any new types / language constructs at all. > > Also, we’d like to keep opportunities for optimization of parallel > code as widely opened as possible –obviously, within the boundaries of > preserving correct semantics of a user program. > > And finally, in the spirit of OpenMP Specification ([OpenMP11]), LLVM > IR based compilers should be able to easily skip IR extensions related > to OpenMP support and still generate correct, albeit sequential, code. > > > Function Outlining > =================> Eventually, parallel regions should be put into separate routines, a > process usually called “function outlining” or “procedurization”. This > can happen as early as in front-end, and as late as right before code > generation. > > As can be seen in the following sections, the IR extension we propose > doesn’t involve explicit procedurization. Thus, we assume that > function outlining should happen somewhere in the LLVM back-end, and > usually this should be aligned with how chosen OpenMP runtime library > works and what it expects. This is a deliberate decision on our part. > We believe it provides the following benefits (when compared with > designs involving procedurization done in a front-end): > > 1) Function outlining doesn’t depend on source language; thus, it can > be implemented once and used with any front-ends. > > 2) Optimizations are usually restricted by a single function boundary. > If procedurization is done in a front-end, this effectively kills any > optimizations – as simple ones as loop invariant code motion. Refer to > [Tian2005] for more information on why this is important for efficient > optimization of OpenMP programs. > > It should be stressed, though, that in order to preserve correct > semantics of a user program, optimizations should be made thread-aware > (which, to the best of our knowledge, is not the case with LLVM > optimizations). > > By “thread-aware” we mean an optimization that performs legal > transformations on parallel programs. Refer to [Novillo00] for more > information on correctness of optimizations for parallel programs. > > Given that LLVM optimizations are currently not thread-aware, initial > implementation should call procedurization pass right at the start of > the optimizer. In the future, this pass can be moved further and > further down the optimizer, as more optimizations will become > thread-aware. > > Essentially, this initial implementation is not that different from > implementations with procedurization done in front-end; however, it > keeps window of possibility opened. > > > Scope > ====> Scope of this design is OpenMP representation in LLVM IR. Thus, all > other issues related to OpenMP support, like runtime library, ABI, etc > are not covered. > > We also included a set of requirements for front-ends and back-ends, > which establish mutual expectations and is an important addition to > the design. > > Our proposal is based on the latest published OpenMP specification, > which is version 3.1 ([OpenMP31]) at the time of writing. However, the > design approach we employed is general enough to allow easy adaptation > for future versions of OpenMP standard. > > > Front-End/Back-End Contract > ==========================> While not a part of OpenMP representation design per se, the following > pre- and post-conditions help to establish a set of mutual expectation > between front-ends (that generate LLVM IR with OpenMP support) and > back-ends (that consume it). Without this kind of a “contract” > ([Meyer92]), a back-end has to verify too much and basically repeat > the work already done by a front-end. > > While it is possible to generate LLVM IR that violates these > conditions, we consider it to be non-conformant. Obviously, > conformance can be verified, if necessary. > > Pre- and Post-conditions for Front-Ends > --------------------------------------- > 1) Guarantee correct semantics of directives and clauses. This > includes nomenclature and number of clauses in directives, correct > nesting of directives, values of clauses, etc. > > For example, front-ends should guarantee that omp single directive has > at most one private, firstprivate, copyprivate and nowait clause, and > nothing else; omp section directive is nested inside omp sections > directive; the only possible value of private clause is a list of > variables available at the point where this clause is present; etc. > > 2) Guarantee correct semantics of statements / structured blocks > following directives. > As an example, it should be guaranteed that only for-loops follow omp > parallel for directive. > > 3) Set _OPENMP macro. This is required in section 2.2 of [OpenMP11]. > > 4) Support a command-line option that enables/disables OpenMP support. > This is required in Chapter 2 of [OpenMP11]. > > If a user program violates OpenMP semantics and thus, makes compliance > with first two conditions impossible, a front-end should report a > meaningful error message and stop compilation. > > It should be noted that front-ends are not required to guarantee full > conformance of generated LLVM IR to OpenMP specification. As an > example, the specification deems programs that branch into or out of > parallel regions to be non-conformant. We believe that verification of > such properties is too complex for most front-ends. This matches well > with what is written in the specification itself: “compliant > implementations are not required to check for code sequences that > cause a program to be classified as non-conforming” ([OpenMP11], > Section 1.1). > > Pre- and Post-conditions for Back-Ends > -------------------------------------- > 1) Intrinsics, builtins, library routines and language implementation > should be thread-safe. This is equally applicable to front-ends and is > required in Section 1.5 of [OpenMP11]. > > 2) Optimizations should be thread-aware. > > 3) Support a command-line option that enables/disables OpenMP support. > This is required in Chapter 2 of [OpenMP11]. > > 4) Rely on post-conditions guaranteed by front-end, and nothing else. > > Conditions 2)-5) are only applicable to optimizations working on IR > before procedurization. > > As noted in the previous section, LLVM IR is not guaranteed to be > fully conformant with OpenMP specification. Back-ends should be able > to compile cleanly non-compliant programs, but no promises are made on > correct behavior of resulting machine code. > > > Elements of OpenMP > =================> OpenMP is comprised of four components (as of version 3.1): > * Directives (+ Clauses) > * Internal Control Variables > * Runtime Library Routines > * Environment Variables > > Internal Control Variables, Runtime Library Routines and Environment > Variables are provided / handled by OpenMP runtime library. These > components of OpenMP don’t require special support in LLVM IR and thus > are out of scope of this document. > > All OpenMP directives in C/C++ are specified with #pragma > preprocessing directive and have the following format: > > #pragma omp directive-name [clause [[,] clause]…]. > > Next two chapters describe our design proposal for representation of, > correspondingly, directives and clauses. > > > Directives > =========> Directives in LLVM IR are represented as calls to “llvm.omp.directive” > intrinsic with a single argument referencing LLVM IR metadata. The > metadata contains an identifier of a directive. > > Almost all OpenMP directives are represented with two intrinsic calls: > one for entry and one for exit point of the directive’s context. It is > enough to have just one intrinsic call for several directives which > are supposed to be enclosed in other directives (like omp section, > which must appear only within omp sections context) or specify a > single instruction (like omp flush) or are declarative (like omp > threadprivate). Exit point for such directives is either non-existent > or can be determined by examining other intrinsic calls and thus, not > required to be explicitly present. > > Metadata specify only one thing: type of a directive. Currently LLVM > IR does not support integer or enumeration metadata types; thus, we > decided to use MDString (metadata string type) to represent the type. > The list of directives and their identifiers is shown in Table 1. > > Table 1. OpenMP Directives > > Type of directive #pragma omp ... | MDString in metadata > ------------------------------------------------------------ > parallel | OMP_PARALLEL > | OMP_END_PARALLEL > ------------------------------------------------------------ > [parallel] for | OMP_[PARALLEL_]LOOP > | OMP_END_[PARALLEL_]LOOP > ------------------------------------------------------------ > [parallel] sections | OMP_[PARALLEL_]SECTIONS > | OMP_END_[PARALLEL_]SECTIONS > ------------------------------------------------------------ > section | OMP_SECTION > ------------------------------------------------------------ > single | OMP_SINGLE > | OMP_END_SINGLE > ------------------------------------------------------------ > task | OMP_PTASK > | OMP_END_PTASK > ------------------------------------------------------------ > taskyield | OMP_PTASKYIELD > ------------------------------------------------------------ > master | OMP_MASTER > | OMP_END_MASTER > ------------------------------------------------------------ > critical | OMP_CRITICAL > | OMP_END_CRITICAL > ------------------------------------------------------------ > barrier | OMP_BARRIER > ------------------------------------------------------------ > taskwait | OMP_PTASKWAIT > ------------------------------------------------------------ > atomic | OMP_ATOMIC > | OMP_END_ATOMIC > ------------------------------------------------------------ > flush | OMP_FLUSH > ------------------------------------------------------------ > ordered | OMP_ORDERED > | OMP_END_ORDERED > > An example of an OpenMP directive and its representation in LLVM IR: > > C/C++ > ----- > #pragma omp parallel > > LLVM IR > ------- > call void @llvm.omp.directive(metadata !0) > ... > call void @llvm.omp.directive(metadata !1) > > !0 = metadata !{metadata !”OMP_PARALLEL”} > !1 = metadata !{metadata !”OMP_END_PARALLEL”} > > > Clauses > ======> All OpenMP clauses can be divided into four groups: > * Ones with predefined values (default, ordered, nowait, untied, read, > write, update, capture) > * Ones with a list of variables (shared, private, firstprivate, > lastprivate, copyin, copyprivate, directives flush, threadprivate) > * Ones with a scalar or integer expression (if, num_threads, final, > collapse) > * Compound ones (reduction, schedule, directive critical) > > Clauses in LLVM IR are represented as intrinsic calls. Each intrinsic > call representing a clause has one mandatory argument and arbitrary > number of optional arguments. The mandatory argument references LLVM > IR metadata. The metadata contains identifier of the clause (MDString) > and, for some compound clauses, additional data represented as another > MDString. Optional arguments reference LLVM variables or expressions. > > > Clauses with Predefined Values > ------------------------------ > Clauses with predefined values are represented as calls to > “llvm.omp.simple” intrinsic. For this group of clauses metadata > contains an MDString with a clause’s identifier. > > The list of clauses and their identifiers is shown in Table 2. > > Table 2. OpenMP Clauses with Predefined Values > > Clause | MDString in Metadata > ------------------------------------------------------------ > default(none) | OMP_DEFAULT_NONE > ------------------------------------------------------------ > default(shared) | OMP_DEFAULT_SHARED > ------------------------------------------------------------ > ordered | OMP_ORDERED > ------------------------------------------------------------ > nowait | OMP_NOWAIT > ------------------------------------------------------------ > untied | OMP_UNTIED > ------------------------------------------------------------ > read | OMP_READ > ------------------------------------------------------------ > write | OMP_WRITE > ------------------------------------------------------------ > update | OMP_UPDATE > ------------------------------------------------------------ > capture | OMP_CAPTURE > > Here is an example of OpenMP clause with a predefined value and its > representation in LLVM IR: > > C/C++ > ----- > #pragma omp atomic read > > LLVM IR > ------- > call void @llvm.omp.simple(metadata !1) > > !1 = metadata !{metadata !”OMP_READ”} > > > Clauses with a List of Variables > -------------------------------- > Clauses with a list of variables are represented as calls to > “llvm.omp.list” intrinsic. For this group of clauses metadata contains > an MDString with a clause’s identifier. > > Additional arguments of the intrinsic reference LLVM variables > associated with a clause. It is important to reference variables > directly in intrinsic calls and not in metadata, in order to preserve > data dependency. > > Each variable of a non user-defined type is represented with a single > argument (referencing the variable). > > Each variable of a user-defined type is represented with four > arguments: one referencing the variable itself, one referencing its > default constructor, one referencing its copy constructor and one > referencing its destructor. References to constructors / destructors > are required to correctly create and destroy private copies of > variables. > > The list of clauses and their identifiers is shown in Table 3. > > Table 3. OpenMP Clauses with a List of Variables > > Clause | MDString in Metadata > ------------------------------------------------------------ > private(list) | OMP_PRIVATE > ------------------------------------------------------------ > firstprivate(list) | OMP_FIRSTPRIVATE > ------------------------------------------------------------ > lastprivate(list) | OMP_LASTPRIVATE > ------------------------------------------------------------ > shared(list) | OMP_SHARED > ------------------------------------------------------------ > copyin(list) | OMP_COPYIN > ------------------------------------------------------------ > Directive flush(list) | OMP_FLUSH > ------------------------------------------------------------ > Directive threadprivate(list) | OMP_THREADPRIVATE > > Here is an example of OpenMP clause with a list of variables and its > representation in LLVM IR: > > C/C++ > ----- > #pragma omp parallel private(a,b) > > LLVM IR > ------- > call void (metadata, ...)* @llvm.omp.list(metadata !1, i32* @a, i32* > @b) > > !1 = metadata !{metadata !”OMP_PRIVATE”} > > > Clauses with Scalar or Integer Expressions > ------------------------------------------ > Clauses with scalar or integer expressions are represented as calls to > “llvm.omp.expr” intrinsic. For this group of clauses metadata contains > an MDString with a clause’s identifier. > > Second argument of the intrinsic references a scalar or integer LLVM > expression associated with a clause. It is important to reference > expressions directly in intrinsic calls and not in metadata, in order > to preserve data dependency. > > The list of clauses and their identifiers is shown in Table 4. > > Table 4. OpenMP Clauses with Scalar or Integer Expressions > > Clause | MDString in Metadata > ------------------------------------------------------------ > if(scalar_expr) | OMP_IF > ------------------------------------------------------------ > num_threads(integer_expr) | OMP_NUM_THREADS > ------------------------------------------------------------ > final(scalar_expr) | OMP_FINAL > ------------------------------------------------------------ > collapse(const_integer_expr) | OMP_COLLAPSE > > Here is an example of OpenMP clause with a scalar expression and its > representation in LLVM IR: > > C/C++ > ----- > #pragma omp parallel if(a) > > LLVM IR > ------- > %4 = load i32* @a, align 4 > %5 = icmp ne i32 %4, 0 > call void @llvm.omp.expr(metadata !1, i32 %5) > ... > !1 = metadata !{metadata !”OMP_IF”} > > > Compound Clauses > ---------------- > Compound clauses are represented as calls to “llvm.omp.compound” > intrinsic. For this group of clauses metadata contains an MDString > with a clause’s identifier and additional data, also represented as an > MDString. > > Additional arguments of an intrinsic call for reduction clause > reference LLVM variables associated with a clause (see format of > “Clauses with a List of Variables”). > > Second argument of an intrinsic call for schedule (static), schedule > (dynamic) and schedule (guided) clauses references an integer LLVM > expression associated with a clause (see format of “Clauses with > Scalar or Integer Expressions”). > > The list of compound clauses along with their representation in > metadata is shown in Table 5. > > Table 5. OpenMP Compound Clauses > > Clause | Representation in Metadata > ------------------------------------------------------------ > reduction(operator : list) | OMP_REDUCTION, <operator> > ------------------------------------------------------------ > schedule(static [, integer_expr]) | OMP_SCHEDULE, STATIC > ------------------------------------------------------------ > schedule(dynamic [, integer_expr]) | OMP_SCHEDULE, DYNAMIC > ------------------------------------------------------------ > schedule(guided [, integer_expr]) | OMP_SCHEDULE, GUIDED > ------------------------------------------------------------ > schedule(auto) | OMP_SCHEDULE, AUTO > ------------------------------------------------------------ > schedule(runtime) | OMP_SCHEDULE, RUNTIME > ------------------------------------------------------------ > Directive critical(name) | OMP_NAME, <name> > > Here is an example of a compound OpenMP clause and its representation > in LLVM IR: > > C/C++ > ----- > #pragma omp parallel reduction(+ : a, b) > > LLVM IR > ------- > call void (metadata, ...)* @llvm.omp.compound(metadata !1, i32* @a, > i32* @b) > > !1 = metadata !{metadata !”OMP_REDUCTION”, metadata !”+”} > > > An Example > ---------- > Here is a more complex example, demonstrating LLVM IR representation > of a directive with several clauses of different types. > > C/C++ > ----- > int a, gVar; > int main() { > int lVar; > #pragma omp parallel default(shared), private(gVar, a), if(lVar) > { > ... > } > return (0); > } > > LLVM IR > ------- > @a = global i32 0, align 4 > @gVar = global i32 0, align 4 > define i32 @main () nounwind uwtable ssp { > %lVar = alloca i32, align 4 > call void @llvm.omp.directive(metadata !0) > call void @llvm.omp.simple(metadata !1) > call void (metadata, ...)* @llvm.omp.list(metadata !2, i32* @gVar, > i32* @a) call void @llvm.omp.expr(metadata !3, i32* %lVar) > ... > call void @llvm.omp.directive(metadata !4) > ret i32 0 > } > !0 = metadata !{metadata !”OMP_PARALLEL”} > !1 = metadata !{metadata !”OMP_DEFAULT_SHARED”} > !2 = metadata !{metadata !”OMP_PRIVATE”} > !3 = metadata !{metadata !”OMP_IF”} > !4 = metadata !{metadata !”OMP_END_PARALLEL”} > > > Design Alternatives > ==================> It is possible to propose several viable alternatives to representing > information on OpenMP directives and clauses while keeping general > design approach intact. > > Some things that we considered: > > a) Represent information on types of directives and clauses as > constant expressions instead of MDStrings. > b) Place MDStrings with identifiers of directives and clauses directly > into intrinsic calls instead of metadata. > c) Do not represent clauses as separate intrinsics with references to > metadata; instead, put a list of references to metadata for all > clauses associated with a directive at the end of metadata describing > the directive. > > As we said, all these are viable alternatives. We decided to choose > what we chose and not employ the alternatives listed above in order to > keep true the following three design principles: > > 1) Absolute simplicity and readability (including human readability). > This ruled out alternatives a) and [partially] c). > 2) Uniformity of representation, for both directives and clauses. This > ruled out alternatives b) and c). > 3) As much extensibility and openness for future changes in both LLVM > IR and OpenMP as possible. This ruled out alternatives b) and c). > > We believe that the only downside of our choice is larger size of LLVM > IR required to represent same number of directives and clauses. > > However, we consider this as a relatively minor element of our design; > one might argue in favor of these or other similar design > alternatives. > > > Conclusion > =========> In this document we described our design proposal for OpenMP > representation in LLVM IR. We believe the IR extensions we proposed > are: > > 1) Simple > > They rely on existing LLVM IR types and language constructs; the only > new things are a few new intrinsics. Thus, LLVM IR consumers lacking > OpenMP support can simply ignore these two intrinsics and still > generate correct, albeit sequential, code. > > 2) Complete > > OpenMP 3.1 Specification ([OpenMP31]) is fully covered. > > 3) Extensible > > The design approach is general enough to readily incorporate future > versions of OpenMP standard. > > 4) Enables both early and late procedurization and aggressive > optimization > > This is when compared with designs implying explicit procedurization > done right in front-ends. > > > Acknowledgements > ===============> Authors would like to acknowledge valuable contributions and advice > provided by Kevin B Smith, Xinmin Tian, Stefanus Du Toit, Brian > Minard, Dmitry Babokin and other colleagues from Intel and Diego > Novillo from Google. This *does not* automatically imply approval of > this proposal by said individuals. > > > References > =========> > [Lattner10] Chris Lattner, Devang Patel, “Extensible Metadata in LLVM > IR”. Available at: > http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html > > [Meyer92] Bertrand Meyer, “Applying “Design by Contract”, IEEE > Computer, October 1992, pp. 40-51. Available at: > http://se.ethz.ch/~meyer/publications/computer/contract.pdf > > [Novillo00] Diego Novillo, “Analysis and Optimization of Explicitly > Parallel Programs”, PhD Thesis, University of Alberta, Edmonton, > Alberta, Canada, 2000. Available at: > http://www.airs.com/dnovillo/Papers/Thesis.pdf > > [OpenMP11] “OpenMP Application Program Interface”, Version 3.1, July > 2011. Available at: http://www.openmp.org/mp-documents/OpenMP3.1.pdf > > [Tian05] Xinmin Tian, Milind Girkar, Aart Bik and Hideki Saito, > “Practical Compiler Techniques on Efficient Multithreaded Code > Generation for OpenMP Programs”, The Computer Journal, September 2005, > pp.588-601. Available at: http://dl.acm.org/citation.cfm?id=1095017-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
On 09/29/2012 12:56 AM, Hal Finkel wrote:> There is interest in supporting OpenACC, for example, which > will require data copying clauses, and it would make sense to share > as much of the infrastructure as possible with OpenMP. Are you > interested in providing Cilk support as well?I'd like to remind that more generic parallelism constructs in LLVM would also help the Portable OpenCL implementation cause. The SPMD multi work-item work-group functions (produced from OpenCL C kernels) should be easily mappable to such constructs after which the actual parallelization (DLP, TLP, ILP, or a combination of them) can be target-specific and done in generic LLVM passes which, e.g., OpenMP benefits from. -- --Pekka
Hal, Thank you for the reply!> As you may know, this is the third such proposal over the past two > months, one by me > (http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052472.html) > and the other, based somewhat on mine, by Sanjoy > (http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-September/053798.html)Yes, I was aware of your proposal. I hesitated to make any comments or criticism -- as I am, obviously, biased. In my opinion, two most important differences between our proposals are: 1) Your design employs explicit procedurization done in front-end, while our design allows both early (right after front-end) and late (later in the back-end) procedurization. 2) You aim to provide a general support for all (or at least most) parallel standards, while our aim is more modest -- just OpenMP. Please see discussion of 1) in "Function Outlining" section of our proposal. As for 2), there are many arguments one might use in favor of more general or more specialized solution. What is easier to implement? What is better for LLVM IR development? Are we sure what we see as necessary and sufficient today would be suitable for future parallel standards -- given all the developments happening in this area as we speak? Whatever one answers, it would be quite subjective. My personal preference is for simplest and most focused solution -- but then again this is subjective.> In order for your proposal to work well, there will be a lot of > infrastructure work required (more than with my proposal); many passes > will need to be made explicitly aware of how they can, or can't, reorder > things with respect to the parallelization intrinsics; loop > restructuring may require special care, etc. How this is done depends > in part on where the state information is stored: Do we keep the > parallelization information in the intrinsics during mid-level > optimization, or do we move its state into an analysis pass? In any > case, I don't object to this approach so long as we have a good plan > for how this work will be done.No -- only passes than happen before procedurization should be aware of these intrinsics. I agree that it is not so easy to make optimizations "thread-aware". But the problem essentially the same, no matter how parallel extension is manifested in the IR.> When we discussed this earlier this year, there seemed to be some > consensus that we wanted to avoid, to the extent possible, introducing > OpenMP-specific intrinsics into the LLVM layer. Rather, we should > define some kind of parallelization API (in the form of metadata, > intrinsics, etc.) onto which OpenMP can naturally map along with other > paradigms. There is interest in supporting OpenACC, for example, which > will require data copying clauses, and it would make sense to share > as much of the infrastructure as possible with OpenMP. Are you > interested in providing Cilk support as well? We probably don't want to > have NxM slightly-different ways of expressing 'this is a parallel > region'. There are obviously cases in which things need to be specific > to the interface (like runtime loop scheduling in OpenMP which implies > a specific interaction with the runtime library), but such cases may be > the exception rather than the rule. > > We don't need 'omp' in the intrinsic names and also 'OMP_' on all of > the string specifiers. Maybe, to my previous point, we could call the > intrinsics 'parallel' and use 'OMP_' only when something is really > OpenMP-specific?As I said before, our aim was quite simple -- OpenMP support only. Can the design be extended to allow more general form of parallel extensions support? Probably... but this is definitely more than what we intended.> You don't seem to want to map thread-private variables onto the > existing TLS support. Why?Because we don't employ explicit procedurization. What happens after procedurization (including how thread-private variables are manifested in the IR) is heavily dependent on OpenMP runtime library one relies upon and out of scope of our proposal. Yours, Andrey Bokhanko --- Software Engineer Intel Compiler Team Intel Corp.
greened at obbligato.org
2012-Oct-02 01:06 UTC
[LLVMdev] [RFC] OpenMP Representation in LLVM IR
Andrey Bokhanko <andreybokhanko at gmail.com> writes:> Hi All, > > We'd like to make a proposal for OpenMP representation in LLVM IR.I'm providing some brief comments after a skim of this..> Our goal is to reach an agreement in the community on a simple, > complete and extensible representation of OpenMP language constructs > in LLVM IR.I think this is a bad idea. OpenMP is not Low Level and I can't think of a good reason to start putting OpenMP support in the IR. Cray has complete functioning OpenMP stack that performs very well without any LLVM IR support at all.> As can be seen in the following sections, the IR extension we propose > doesn’t involve explicit procedurization. Thus, we assume that > function outlining should happen somewhere in the LLVM back-end, and > usually this should be aligned with how chosen OpenMP runtime library > works and what it expects. This is a deliberate decision on our part. > We believe it provides the following benefits (when compared with > designs involving procedurization done in a front-end):This is a very high-level transformation. I don't think it belongs in a low-level backend.> 1) Function outlining doesn’t depend on source language; thus, it can > be implemented once and used with any front-ends.A higher-level IR would be more appropriate for this, either something provided by Clang or another frontend or a some other mid-level IR.> 2) Optimizations are usually restricted by a single function boundary. > If procedurization is done in a front-end, this effectively kills any > optimizations – as simple ones as loop invariant code motion. Refer to > [Tian2005] for more information on why this is important for efficient > optimization of OpenMP programs.You're assuming all optimization is done by LLVM. That's not true in general.> It should be stressed, though, that in order to preserve correct > semantics of a user program, optimizations should be made thread-aware > (which, to the best of our knowledge, is not the case with LLVM > optimizations).Another reason not to do this in LLVM.> We also included a set of requirements for front-ends and back-ends, > which establish mutual expectations and is an important addition to > the design.This will increase coupling between the "front ends" and LLVM. That would be very unfortunate. One of LLVM's great strength is its flexibility. I didn't look at the details of how you map OMP directives to LLVM IR. I think this is really the wrong way to go. -David
greened at obbligato.org
2012-Oct-02 01:15 UTC
[LLVMdev] [RFC] OpenMP Representation in LLVM IR
Hal Finkel <hfinkel at anl.gov> writes: Hi Hal,> As you may know, this is the third such proposal over the past two > months, one by me > (http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052472.html)This link seems to be broken. I missed your earlier proposal and would like to read it. As with this proposal, I fear any direct parallelization support in LLVM is going to take us out of the "low level" feature of LLVM which is a huge strength.> and the other, based somewhat on mine, by Sanjoy > (http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-September/053798.html)I read this proposal quickly. I don't understand why we need intrinsics. Won't calls to runtime routines work just fine? Ah, Sanjoy had a link to your proposal in his message. Again, I only skimmed the document, but I was left with the question, "why not just make calls to runtime routines?" What is the reason for the "paralleliation metadata?" It seems to me this implies/requires that LLVM have knowledge of parallel semantics. That would be very unfortunate. -David
On Mon, 01 Oct 2012 20:06:20 -0500 greened at obbligato.org wrote:> Andrey Bokhanko <andreybokhanko at gmail.com> writes: > > > Hi All, > > > > We'd like to make a proposal for OpenMP representation in LLVM IR. > > I'm providing some brief comments after a skim of this.. > > > Our goal is to reach an agreement in the community on a simple, > > complete and extensible representation of OpenMP language constructs > > in LLVM IR. > > I think this is a bad idea. OpenMP is not Low Level and I can't think > of a good reason to start putting OpenMP support in the IR. Cray has > complete functioning OpenMP stack that performs very well without any > LLVM IR support at all.OpenMP provides a mechanism to expresses parallel semantics, and the best way to implement those semantics is highly target-dependent. On some targets early lowering into a runtime library will perform well, and optimization opportunities lost by doing so will prove fairly insignificant in many cases. I can believe that this is true on those systems that Cray targets. However, that will not be true everywhere.> > > As can be seen in the following sections, the IR extension we > > propose doesn’t involve explicit procedurization. Thus, we assume > > that function outlining should happen somewhere in the LLVM > > back-end, and usually this should be aligned with how chosen OpenMP > > runtime library works and what it expects. This is a deliberate > > decision on our part. We believe it provides the following benefits > > (when compared with designs involving procedurization done in a > > front-end): > > This is a very high-level transformation. I don't think it belongs > in a low-level backend. > > > 1) Function outlining doesn’t depend on source language; thus, it > > can be implemented once and used with any front-ends. > > A higher-level IR would be more appropriate for this, either something > provided by Clang or another frontend or a some other mid-level IR.For some things, yes, but at the moment we don't have anything else besides the LLVM IR. The LLVM IR is currently where vectorization is done, loop-restructuring is done, aliasing analysis is performed, etc. and so is where the parallelization should be done as well. For other things, like atomics, lowering later may be better.> > > 2) Optimizations are usually restricted by a single function > > boundary. If procedurization is done in a front-end, this > > effectively kills any optimizations – as simple ones as loop > > invariant code motion. Refer to [Tian2005] for more information on > > why this is important for efficient optimization of OpenMP programs. > > You're assuming all optimization is done by LLVM. That's not true in > general.Even if LLVM has support for parallelization, no customer is required to use it. If you'd like to lower parallelization semantics into runtime calls before lowering in LLVM, you're free to do that. Nevertheless, LLVM is where, for example, loop-invariant code motion is performed. We don't want to procedurize parallel loops before that happens.> > > It should be stressed, though, that in order to preserve correct > > semantics of a user program, optimizations should be made > > thread-aware (which, to the best of our knowledge, is not the case > > with LLVM optimizations). > > Another reason not to do this in LLVM. > > > We also included a set of requirements for front-ends and back-ends, > > which establish mutual expectations and is an important addition to > > the design. > > This will increase coupling between the "front ends" and LLVM. That > would be very unfortunate. One of LLVM's great strength is its > flexibility.Most users do not use every feature of a programming language, and LLVM is no different.> > I didn't look at the details of how you map OMP directives to LLVM IR. > I think this is really the wrong way to go.Respectfully, I disagree. -Hal> > -David > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Not to distract, but the word, `procedurization' is not an English word. It's just leaping out at me when it is either procedure(s) (noun) or proceduralize (verb). Even processes would make sense. I couldn't help myself because the word was distracting. - Marc P.S. Not that my vote counts, but I'm more in the camp of Hal whose approach to tackling the parallelization foundation within LLVM [OpenMP just one aspect] makes more sense hitting that now by providing a clean and generic container without being hardwired to any one particular third party extension, ala the `complete and extensible representation of OpenMP.' Speaking from not having the necessary foundation to really sway either direction it just seems more the spirit of Hal's project would fall in-line, with what has been historically how Lattner and company prefer to keep LLVM as clean a design and extensible as possible without being interdependent upon other projects hooks into extending value to the project. As a former Apple/NeXT alum I would have already expected Enderby and company to have designed LLVM/Clang with OpenMP hooks if it were intended to have an intimate relationship with the OpenMP project. It seems to me that OpenCL, OpenMP and other solutions are best served as add-ons to the project freeing the project from being design compromised from it's original goals. I'd make the same observations whether AMD, ARM or anyone else made the proposal attempting to interweave OpenMP as you hope to sway the community to allow. I prefer Hal's approach to the who problem space. On 09/28/2012 11:42 AM, Andrey Bokhanko wrote:> Hi All, > > We'd like to make a proposal for OpenMP representation in LLVM IR. > > Our goal is to reach an agreement in the community on a simple, > complete and extensible representation of OpenMP language constructs > in LLVM IR. Hopefully, this would serve as a common ground and would > enable further development of OpenMP support both in Clang and LLVM > compiler toolchain. > > We seek feedback on the proposed design and ways to improve it. > > Main authors of the proposal are Andrey Bokhanko and Alexey Bataev. > Also, we'd like to acknowledge valuable contributions and advice > provided by Kevin B Smith, Xinmin Tian, Stefanus Du Toit, Brian > Minard, Dmitry Babokin and other colleagues from Intel and Diego > Novillo from Google. NB, this *does not* automatically imply support > of the proposal by said individuals. > > Please find the proposal in *.pdf (attached to the message, for > reading convenience) and plain text (below the message, for quoting > convenience) formats. Their content is identical. > > Full disclosure: both of us are associated with Intel and Intel Compiler team. > > Yours, > Andrey Bokhanko > Alexey Bataev > ============> Software Engineers > Intel Compiler Team > Intel Corp. > > > OpenMP Representation in LLVM IR > ===============================> Design Proposal > > Alexey Bataev, Andrey Bokhanko > > > Introduction > ===========> This document describes design proposal for OpenMP representation in LLVM IR. > > Authors assume that readers have basic understanding of: > 1) OpenMP principles and language constructs > 2) General design of LLVM compiler system > > > Goals > ====> Our end goal is to provide a simple, complete and extensible support > for OpenMP representation in LLVM IR. > > We aim to extend LLVM IR as little as possible, preferably without > adding any new types / language constructs at all. > > Also, we’d like to keep opportunities for optimization of parallel > code as widely opened as possible –obviously, within the boundaries of > preserving correct semantics of a user program. > > And finally, in the spirit of OpenMP Specification ([OpenMP11]), LLVM > IR based compilers should be able to easily skip IR extensions related > to OpenMP support and still generate correct, albeit sequential, code. > > > Function Outlining > =================> Eventually, parallel regions should be put into separate routines, a > process usually called “function outlining” or “procedurization”. This > can happen as early as in front-end, and as late as right before code > generation. > > As can be seen in the following sections, the IR extension we propose > doesn’t involve explicit procedurization. Thus, we assume that > function outlining should happen somewhere in the LLVM back-end, and > usually this should be aligned with how chosen OpenMP runtime library > works and what it expects. This is a deliberate decision on our part. > We believe it provides the following benefits (when compared with > designs involving procedurization done in a front-end): > > 1) Function outlining doesn’t depend on source language; thus, it can > be implemented once and used with any front-ends. > > 2) Optimizations are usually restricted by a single function boundary. > If procedurization is done in a front-end, this effectively kills any > optimizations – as simple ones as loop invariant code motion. Refer to > [Tian2005] for more information on why this is important for efficient > optimization of OpenMP programs. > > It should be stressed, though, that in order to preserve correct > semantics of a user program, optimizations should be made thread-aware > (which, to the best of our knowledge, is not the case with LLVM > optimizations). > > By “thread-aware” we mean an optimization that performs legal > transformations on parallel programs. Refer to [Novillo00] for more > information on correctness of optimizations for parallel programs. > > Given that LLVM optimizations are currently not thread-aware, initial > implementation should call procedurization pass right at the start of > the optimizer. In the future, this pass can be moved further and > further down the optimizer, as more optimizations will become > thread-aware. > > Essentially, this initial implementation is not that different from > implementations with procedurization done in front-end; however, it > keeps window of possibility opened. > > > Scope > ====> Scope of this design is OpenMP representation in LLVM IR. Thus, all > other issues related to OpenMP support, like runtime library, ABI, etc > are not covered. > > We also included a set of requirements for front-ends and back-ends, > which establish mutual expectations and is an important addition to > the design. > > Our proposal is based on the latest published OpenMP specification, > which is version 3.1 ([OpenMP31]) at the time of writing. However, the > design approach we employed is general enough to allow easy adaptation > for future versions of OpenMP standard. > > > Front-End/Back-End Contract > ==========================> While not a part of OpenMP representation design per se, the following > pre- and post-conditions help to establish a set of mutual expectation > between front-ends (that generate LLVM IR with OpenMP support) and > back-ends (that consume it). Without this kind of a “contract” > ([Meyer92]), a back-end has to verify too much and basically repeat > the work already done by a front-end. > > While it is possible to generate LLVM IR that violates these > conditions, we consider it to be non-conformant. Obviously, > conformance can be verified, if necessary. > > Pre- and Post-conditions for Front-Ends > --------------------------------------- > 1) Guarantee correct semantics of directives and clauses. This > includes nomenclature and number of clauses in directives, correct > nesting of directives, values of clauses, etc. > > For example, front-ends should guarantee that omp single directive has > at most one private, firstprivate, copyprivate and nowait clause, and > nothing else; omp section directive is nested inside omp sections > directive; the only possible value of private clause is a list of > variables available at the point where this clause is present; etc. > > 2) Guarantee correct semantics of statements / structured blocks > following directives. > As an example, it should be guaranteed that only for-loops follow omp > parallel for directive. > > 3) Set _OPENMP macro. This is required in section 2.2 of [OpenMP11]. > > 4) Support a command-line option that enables/disables OpenMP support. > This is required in Chapter 2 of [OpenMP11]. > > If a user program violates OpenMP semantics and thus, makes compliance > with first two conditions impossible, a front-end should report a > meaningful error message and stop compilation. > > It should be noted that front-ends are not required to guarantee full > conformance of generated LLVM IR to OpenMP specification. As an > example, the specification deems programs that branch into or out of > parallel regions to be non-conformant. We believe that verification of > such properties is too complex for most front-ends. This matches well > with what is written in the specification itself: “compliant > implementations are not required to check for code sequences that > cause a program to be classified as non-conforming” ([OpenMP11], > Section 1.1). > > Pre- and Post-conditions for Back-Ends > -------------------------------------- > 1) Intrinsics, builtins, library routines and language implementation > should be thread-safe. This is equally applicable to front-ends and is > required in Section 1.5 of [OpenMP11]. > > 2) Optimizations should be thread-aware. > > 3) Support a command-line option that enables/disables OpenMP support. > This is required in Chapter 2 of [OpenMP11]. > > 4) Rely on post-conditions guaranteed by front-end, and nothing else. > > Conditions 2)-5) are only applicable to optimizations working on IR > before procedurization. > > As noted in the previous section, LLVM IR is not guaranteed to be > fully conformant with OpenMP specification. Back-ends should be able > to compile cleanly non-compliant programs, but no promises are made on > correct behavior of resulting machine code. > > > Elements of OpenMP > =================> OpenMP is comprised of four components (as of version 3.1): > * Directives (+ Clauses) > * Internal Control Variables > * Runtime Library Routines > * Environment Variables > > Internal Control Variables, Runtime Library Routines and Environment > Variables are provided / handled by OpenMP runtime library. These > components of OpenMP don’t require special support in LLVM IR and thus > are out of scope of this document. > > All OpenMP directives in C/C++ are specified with #pragma > preprocessing directive and have the following format: > > #pragma omp directive-name [clause [[,] clause]…]. > > Next two chapters describe our design proposal for representation of, > correspondingly, directives and clauses. > > > Directives > =========> Directives in LLVM IR are represented as calls to “llvm.omp.directive” > intrinsic with a single argument referencing LLVM IR metadata. The > metadata contains an identifier of a directive. > > Almost all OpenMP directives are represented with two intrinsic calls: > one for entry and one for exit point of the directive’s context. It is > enough to have just one intrinsic call for several directives which > are supposed to be enclosed in other directives (like omp section, > which must appear only within omp sections context) or specify a > single instruction (like omp flush) or are declarative (like omp > threadprivate). Exit point for such directives is either non-existent > or can be determined by examining other intrinsic calls and thus, not > required to be explicitly present. > > Metadata specify only one thing: type of a directive. Currently LLVM > IR does not support integer or enumeration metadata types; thus, we > decided to use MDString (metadata string type) to represent the type. > The list of directives and their identifiers is shown in Table 1. > > Table 1. OpenMP Directives > > Type of directive #pragma omp ... | MDString in metadata > ------------------------------------------------------------ > parallel | OMP_PARALLEL > | OMP_END_PARALLEL > ------------------------------------------------------------ > [parallel] for | OMP_[PARALLEL_]LOOP > | OMP_END_[PARALLEL_]LOOP > ------------------------------------------------------------ > [parallel] sections | OMP_[PARALLEL_]SECTIONS > | OMP_END_[PARALLEL_]SECTIONS > ------------------------------------------------------------ > section | OMP_SECTION > ------------------------------------------------------------ > single | OMP_SINGLE > | OMP_END_SINGLE > ------------------------------------------------------------ > task | OMP_PTASK > | OMP_END_PTASK > ------------------------------------------------------------ > taskyield | OMP_PTASKYIELD > ------------------------------------------------------------ > master | OMP_MASTER > | OMP_END_MASTER > ------------------------------------------------------------ > critical | OMP_CRITICAL > | OMP_END_CRITICAL > ------------------------------------------------------------ > barrier | OMP_BARRIER > ------------------------------------------------------------ > taskwait | OMP_PTASKWAIT > ------------------------------------------------------------ > atomic | OMP_ATOMIC > | OMP_END_ATOMIC > ------------------------------------------------------------ > flush | OMP_FLUSH > ------------------------------------------------------------ > ordered | OMP_ORDERED > | OMP_END_ORDERED > > An example of an OpenMP directive and its representation in LLVM IR: > > C/C++ > ----- > #pragma omp parallel > > LLVM IR > ------- > call void @llvm.omp.directive(metadata !0) > ... > call void @llvm.omp.directive(metadata !1) > > !0 = metadata !{metadata !”OMP_PARALLEL”} > !1 = metadata !{metadata !”OMP_END_PARALLEL”} > > > Clauses > ======> All OpenMP clauses can be divided into four groups: > * Ones with predefined values (default, ordered, nowait, untied, read, > write, update, capture) > * Ones with a list of variables (shared, private, firstprivate, > lastprivate, copyin, copyprivate, directives flush, threadprivate) > * Ones with a scalar or integer expression (if, num_threads, final, collapse) > * Compound ones (reduction, schedule, directive critical) > > Clauses in LLVM IR are represented as intrinsic calls. Each intrinsic > call representing a clause has one mandatory argument and arbitrary > number of optional arguments. The mandatory argument references LLVM > IR metadata. The metadata contains identifier of the clause (MDString) > and, for some compound clauses, additional data represented as another > MDString. Optional arguments reference LLVM variables or expressions. > > > Clauses with Predefined Values > ------------------------------ > Clauses with predefined values are represented as calls to > “llvm.omp.simple” intrinsic. For this group of clauses metadata > contains an MDString with a clause’s identifier. > > The list of clauses and their identifiers is shown in Table 2. > > Table 2. OpenMP Clauses with Predefined Values > > Clause | MDString in Metadata > ------------------------------------------------------------ > default(none) | OMP_DEFAULT_NONE > ------------------------------------------------------------ > default(shared) | OMP_DEFAULT_SHARED > ------------------------------------------------------------ > ordered | OMP_ORDERED > ------------------------------------------------------------ > nowait | OMP_NOWAIT > ------------------------------------------------------------ > untied | OMP_UNTIED > ------------------------------------------------------------ > read | OMP_READ > ------------------------------------------------------------ > write | OMP_WRITE > ------------------------------------------------------------ > update | OMP_UPDATE > ------------------------------------------------------------ > capture | OMP_CAPTURE > > Here is an example of OpenMP clause with a predefined value and its > representation in LLVM IR: > > C/C++ > ----- > #pragma omp atomic read > > LLVM IR > ------- > call void @llvm.omp.simple(metadata !1) > > !1 = metadata !{metadata !”OMP_READ”} > > > Clauses with a List of Variables > -------------------------------- > Clauses with a list of variables are represented as calls to > “llvm.omp.list” intrinsic. For this group of clauses metadata contains > an MDString with a clause’s identifier. > > Additional arguments of the intrinsic reference LLVM variables > associated with a clause. It is important to reference variables > directly in intrinsic calls and not in metadata, in order to preserve > data dependency. > > Each variable of a non user-defined type is represented with a single > argument (referencing the variable). > > Each variable of a user-defined type is represented with four > arguments: one referencing the variable itself, one referencing its > default constructor, one referencing its copy constructor and one > referencing its destructor. References to constructors / destructors > are required to correctly create and destroy private copies of > variables. > > The list of clauses and their identifiers is shown in Table 3. > > Table 3. OpenMP Clauses with a List of Variables > > Clause | MDString in Metadata > ------------------------------------------------------------ > private(list) | OMP_PRIVATE > ------------------------------------------------------------ > firstprivate(list) | OMP_FIRSTPRIVATE > ------------------------------------------------------------ > lastprivate(list) | OMP_LASTPRIVATE > ------------------------------------------------------------ > shared(list) | OMP_SHARED > ------------------------------------------------------------ > copyin(list) | OMP_COPYIN > ------------------------------------------------------------ > Directive flush(list) | OMP_FLUSH > ------------------------------------------------------------ > Directive threadprivate(list) | OMP_THREADPRIVATE > > Here is an example of OpenMP clause with a list of variables and its > representation in LLVM IR: > > C/C++ > ----- > #pragma omp parallel private(a,b) > > LLVM IR > ------- > call void (metadata, ...)* @llvm.omp.list(metadata !1, i32* @a, i32* @b) > > !1 = metadata !{metadata !”OMP_PRIVATE”} > > > Clauses with Scalar or Integer Expressions > ------------------------------------------ > Clauses with scalar or integer expressions are represented as calls to > “llvm.omp.expr” intrinsic. For this group of clauses metadata contains > an MDString with a clause’s identifier. > > Second argument of the intrinsic references a scalar or integer LLVM > expression associated with a clause. It is important to reference > expressions directly in intrinsic calls and not in metadata, in order > to preserve data dependency. > > The list of clauses and their identifiers is shown in Table 4. > > Table 4. OpenMP Clauses with Scalar or Integer Expressions > > Clause | MDString in Metadata > ------------------------------------------------------------ > if(scalar_expr) | OMP_IF > ------------------------------------------------------------ > num_threads(integer_expr) | OMP_NUM_THREADS > ------------------------------------------------------------ > final(scalar_expr) | OMP_FINAL > ------------------------------------------------------------ > collapse(const_integer_expr) | OMP_COLLAPSE > > Here is an example of OpenMP clause with a scalar expression and its > representation in LLVM IR: > > C/C++ > ----- > #pragma omp parallel if(a) > > LLVM IR > ------- > %4 = load i32* @a, align 4 > %5 = icmp ne i32 %4, 0 > call void @llvm.omp.expr(metadata !1, i32 %5) > ... > !1 = metadata !{metadata !”OMP_IF”} > > > Compound Clauses > ---------------- > Compound clauses are represented as calls to “llvm.omp.compound” > intrinsic. For this group of clauses metadata contains an MDString > with a clause’s identifier and additional data, also represented as an > MDString. > > Additional arguments of an intrinsic call for reduction clause > reference LLVM variables associated with a clause (see format of > “Clauses with a List of Variables”). > > Second argument of an intrinsic call for schedule (static), schedule > (dynamic) and schedule (guided) clauses references an integer LLVM > expression associated with a clause (see format of “Clauses with > Scalar or Integer Expressions”). > > The list of compound clauses along with their representation in > metadata is shown in Table 5. > > Table 5. OpenMP Compound Clauses > > Clause | Representation in Metadata > ------------------------------------------------------------ > reduction(operator : list) | OMP_REDUCTION,<operator> > ------------------------------------------------------------ > schedule(static [, integer_expr]) | OMP_SCHEDULE, STATIC > ------------------------------------------------------------ > schedule(dynamic [, integer_expr]) | OMP_SCHEDULE, DYNAMIC > ------------------------------------------------------------ > schedule(guided [, integer_expr]) | OMP_SCHEDULE, GUIDED > ------------------------------------------------------------ > schedule(auto) | OMP_SCHEDULE, AUTO > ------------------------------------------------------------ > schedule(runtime) | OMP_SCHEDULE, RUNTIME > ------------------------------------------------------------ > Directive critical(name) | OMP_NAME,<name> > > Here is an example of a compound OpenMP clause and its representation > in LLVM IR: > > C/C++ > ----- > #pragma omp parallel reduction(+ : a, b) > > LLVM IR > ------- > call void (metadata, ...)* @llvm.omp.compound(metadata !1, i32* @a, i32* @b) > > !1 = metadata !{metadata !”OMP_REDUCTION”, metadata !”+”} > > > An Example > ---------- > Here is a more complex example, demonstrating LLVM IR representation > of a directive with several clauses of different types. > > C/C++ > ----- > int a, gVar; > int main() { > int lVar; > #pragma omp parallel default(shared), private(gVar, a), if(lVar) > { > ... > } > return (0); > } > > LLVM IR > ------- > @a = global i32 0, align 4 > @gVar = global i32 0, align 4 > define i32 @main () nounwind uwtable ssp { > %lVar = alloca i32, align 4 > call void @llvm.omp.directive(metadata !0) > call void @llvm.omp.simple(metadata !1) > call void (metadata, ...)* @llvm.omp.list(metadata !2, i32* @gVar, i32* @a) > call void @llvm.omp.expr(metadata !3, i32* %lVar) > ... > call void @llvm.omp.directive(metadata !4) > ret i32 0 > } > !0 = metadata !{metadata !”OMP_PARALLEL”} > !1 = metadata !{metadata !”OMP_DEFAULT_SHARED”} > !2 = metadata !{metadata !”OMP_PRIVATE”} > !3 = metadata !{metadata !”OMP_IF”} > !4 = metadata !{metadata !”OMP_END_PARALLEL”} > > > Design Alternatives > ==================> It is possible to propose several viable alternatives to representing > information on OpenMP directives and clauses while keeping general > design approach intact. > > Some things that we considered: > > a) Represent information on types of directives and clauses as > constant expressions instead of MDStrings. > b) Place MDStrings with identifiers of directives and clauses directly > into intrinsic calls instead of metadata. > c) Do not represent clauses as separate intrinsics with references to > metadata; instead, put a list of references to metadata for all > clauses associated with a directive at the end of metadata describing > the directive. > > As we said, all these are viable alternatives. We decided to choose > what we chose and not employ the alternatives listed above in order to > keep true the following three design principles: > > 1) Absolute simplicity and readability (including human readability). > This ruled out alternatives a) and [partially] c). > 2) Uniformity of representation, for both directives and clauses. This > ruled out alternatives b) and c). > 3) As much extensibility and openness for future changes in both LLVM > IR and OpenMP as possible. This ruled out alternatives b) and c). > > We believe that the only downside of our choice is larger size of LLVM > IR required to represent same number of directives and clauses. > > However, we consider this as a relatively minor element of our design; > one might argue in favor of these or other similar design > alternatives. > > > Conclusion > =========> In this document we described our design proposal for OpenMP > representation in LLVM IR. We believe the IR extensions we proposed > are: > > 1) Simple > > They rely on existing LLVM IR types and language constructs; the only > new things are a few new intrinsics. Thus, LLVM IR consumers lacking > OpenMP support can simply ignore these two intrinsics and still > generate correct, albeit sequential, code. > > 2) Complete > > OpenMP 3.1 Specification ([OpenMP31]) is fully covered. > > 3) Extensible > > The design approach is general enough to readily incorporate future > versions of OpenMP standard. > > 4) Enables both early and late procedurization and aggressive optimization > > This is when compared with designs implying explicit procedurization > done right in front-ends. > > > Acknowledgements > ===============> Authors would like to acknowledge valuable contributions and advice > provided by Kevin B Smith, Xinmin Tian, Stefanus Du Toit, Brian > Minard, Dmitry Babokin and other colleagues from Intel and Diego > Novillo from Google. This *does not* automatically imply approval of > this proposal by said individuals. > > > References > =========> > [Lattner10] Chris Lattner, Devang Patel, “Extensible Metadata in LLVM > IR”. Available at: > http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html > > [Meyer92] Bertrand Meyer, “Applying “Design by Contract”, IEEE > Computer, October 1992, pp. 40-51. Available at: > http://se.ethz.ch/~meyer/publications/computer/contract.pdf > > [Novillo00] Diego Novillo, “Analysis and Optimization of Explicitly > Parallel Programs”, PhD Thesis, University of Alberta, Edmonton, > Alberta, Canada, 2000. Available at: > http://www.airs.com/dnovillo/Papers/Thesis.pdf > > [OpenMP11] “OpenMP Application Program Interface”, Version 3.1, July > 2011. Available at: http://www.openmp.org/mp-documents/OpenMP3.1.pdf > > [Tian05] Xinmin Tian, Milind Girkar, Aart Bik and Hideki Saito, > “Practical Compiler Techniques on Efficient Multithreaded Code > Generation for OpenMP Programs”, The Computer Journal, September 2005, > pp.588-601. Available at: http://dl.acm.org/citation.cfm?id=1095017 > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- Marc J. Driftmeyer Email :: mjd at reanimality.com <mailto:mjd at reanimality.com> Web :: http://www.reanimality.com Cell :: (509) 435-5212 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121002/3104b480/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: mjd.vcf Type: text/x-vcard Size: 317 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121002/3104b480/attachment.vcf>
Mark,> Speaking from not having the necessary foundation to really sway either > direction it just seems more the spirit of Hal's project would fall in-line, > with what has been historically how Lattner and company prefer to keep LLVM > as clean a design and extensible as possible without being interdependent > upon other projects hooks into extending value to the project.I have hard time understanding why one proposal is different from another in this regard. Both rely on intrinsics and metadata... why intrinsics used in one proposal are better (or worse) than ones used in another? Yours, Andrey --- Software Engineer Intel Compiler Team Intel Corp. P.S.: I tried diligently not to use "parallelization" word in this message. :-) On Tue, Oct 2, 2012 at 12:22 PM, Marc J. Driftmeyer <mjd at reanimality.com> wrote:> Not to distract, but the word, `procedurization' is not an English word. > It's just leaping out at me when it is either procedure(s) (noun) or > proceduralize (verb). Even processes would make sense. I couldn't help > myself because the word was distracting. > > - Marc > > P.S. Not that my vote counts, but I'm more in the camp of Hal whose approach > to tackling the parallelization foundation within LLVM [OpenMP just one > aspect] makes more sense hitting that now by providing a clean and generic > container without being hardwired to any one particular third party > extension, ala the `complete and extensible representation of OpenMP.' > > Speaking from not having the necessary foundation to really sway either > direction it just seems more the spirit of Hal's project would fall in-line, > with what has been historically how Lattner and company prefer to keep LLVM > as clean a design and extensible as possible without being interdependent > upon other projects hooks into extending value to the project. > > As a former Apple/NeXT alum I would have already expected Enderby and > company to have designed LLVM/Clang with OpenMP hooks if it were intended to > have an intimate relationship with the OpenMP project. It seems to me that > OpenCL, OpenMP and other solutions are best served as add-ons to the project > freeing the project from being design compromised from it's original goals. > > I'd make the same observations whether AMD, ARM or anyone else made the > proposal attempting to interweave OpenMP as you hope to sway the community > to allow. I prefer Hal's approach to the who problem space. > > > On 09/28/2012 11:42 AM, Andrey Bokhanko wrote: > > Hi All, > > We'd like to make a proposal for OpenMP representation in LLVM IR. > > Our goal is to reach an agreement in the community on a simple, > complete and extensible representation of OpenMP language constructs > in LLVM IR. Hopefully, this would serve as a common ground and would > enable further development of OpenMP support both in Clang and LLVM > compiler toolchain. > > We seek feedback on the proposed design and ways to improve it. > > Main authors of the proposal are Andrey Bokhanko and Alexey Bataev. > Also, we'd like to acknowledge valuable contributions and advice > provided by Kevin B Smith, Xinmin Tian, Stefanus Du Toit, Brian > Minard, Dmitry Babokin and other colleagues from Intel and Diego > Novillo from Google. NB, this *does not* automatically imply support > of the proposal by said individuals. > > Please find the proposal in *.pdf (attached to the message, for > reading convenience) and plain text (below the message, for quoting > convenience) formats. Their content is identical. > > Full disclosure: both of us are associated with Intel and Intel Compiler > team. > > Yours, > Andrey Bokhanko > Alexey Bataev > ============> Software Engineers > Intel Compiler Team > Intel Corp. > > > OpenMP Representation in LLVM IR > ===============================> Design Proposal > > Alexey Bataev, Andrey Bokhanko > > > Introduction > ===========> This document describes design proposal for OpenMP representation in LLVM > IR. > > Authors assume that readers have basic understanding of: > 1) OpenMP principles and language constructs > 2) General design of LLVM compiler system > > > Goals > ====> Our end goal is to provide a simple, complete and extensible support > for OpenMP representation in LLVM IR. > > We aim to extend LLVM IR as little as possible, preferably without > adding any new types / language constructs at all. > > Also, we’d like to keep opportunities for optimization of parallel > code as widely opened as possible –obviously, within the boundaries of > preserving correct semantics of a user program. > > And finally, in the spirit of OpenMP Specification ([OpenMP11]), LLVM > IR based compilers should be able to easily skip IR extensions related > to OpenMP support and still generate correct, albeit sequential, code. > > > Function Outlining > =================> Eventually, parallel regions should be put into separate routines, a > process usually called “function outlining” or “procedurization”. This > can happen as early as in front-end, and as late as right before code > generation. > > As can be seen in the following sections, the IR extension we propose > doesn’t involve explicit procedurization. Thus, we assume that > function outlining should happen somewhere in the LLVM back-end, and > usually this should be aligned with how chosen OpenMP runtime library > works and what it expects. This is a deliberate decision on our part. > We believe it provides the following benefits (when compared with > designs involving procedurization done in a front-end): > > 1) Function outlining doesn’t depend on source language; thus, it can > be implemented once and used with any front-ends. > > 2) Optimizations are usually restricted by a single function boundary. > If procedurization is done in a front-end, this effectively kills any > optimizations – as simple ones as loop invariant code motion. Refer to > [Tian2005] for more information on why this is important for efficient > optimization of OpenMP programs. > > It should be stressed, though, that in order to preserve correct > semantics of a user program, optimizations should be made thread-aware > (which, to the best of our knowledge, is not the case with LLVM > optimizations). > > By “thread-aware” we mean an optimization that performs legal > transformations on parallel programs. Refer to [Novillo00] for more > information on correctness of optimizations for parallel programs. > > Given that LLVM optimizations are currently not thread-aware, initial > implementation should call procedurization pass right at the start of > the optimizer. In the future, this pass can be moved further and > further down the optimizer, as more optimizations will become > thread-aware. > > Essentially, this initial implementation is not that different from > implementations with procedurization done in front-end; however, it > keeps window of possibility opened. > > > Scope > ====> Scope of this design is OpenMP representation in LLVM IR. Thus, all > other issues related to OpenMP support, like runtime library, ABI, etc > are not covered. > > We also included a set of requirements for front-ends and back-ends, > which establish mutual expectations and is an important addition to > the design. > > Our proposal is based on the latest published OpenMP specification, > which is version 3.1 ([OpenMP31]) at the time of writing. However, the > design approach we employed is general enough to allow easy adaptation > for future versions of OpenMP standard. > > > Front-End/Back-End Contract > ==========================> While not a part of OpenMP representation design per se, the following > pre- and post-conditions help to establish a set of mutual expectation > between front-ends (that generate LLVM IR with OpenMP support) and > back-ends (that consume it). Without this kind of a “contract” > ([Meyer92]), a back-end has to verify too much and basically repeat > the work already done by a front-end. > > While it is possible to generate LLVM IR that violates these > conditions, we consider it to be non-conformant. Obviously, > conformance can be verified, if necessary. > > Pre- and Post-conditions for Front-Ends > --------------------------------------- > 1) Guarantee correct semantics of directives and clauses. This > includes nomenclature and number of clauses in directives, correct > nesting of directives, values of clauses, etc. > > For example, front-ends should guarantee that omp single directive has > at most one private, firstprivate, copyprivate and nowait clause, and > nothing else; omp section directive is nested inside omp sections > directive; the only possible value of private clause is a list of > variables available at the point where this clause is present; etc. > > 2) Guarantee correct semantics of statements / structured blocks > following directives. > As an example, it should be guaranteed that only for-loops follow omp > parallel for directive. > > 3) Set _OPENMP macro. This is required in section 2.2 of [OpenMP11]. > > 4) Support a command-line option that enables/disables OpenMP support. > This is required in Chapter 2 of [OpenMP11]. > > If a user program violates OpenMP semantics and thus, makes compliance > with first two conditions impossible, a front-end should report a > meaningful error message and stop compilation. > > It should be noted that front-ends are not required to guarantee full > conformance of generated LLVM IR to OpenMP specification. As an > example, the specification deems programs that branch into or out of > parallel regions to be non-conformant. We believe that verification of > such properties is too complex for most front-ends. This matches well > with what is written in the specification itself: “compliant > implementations are not required to check for code sequences that > cause a program to be classified as non-conforming” ([OpenMP11], > Section 1.1). > > Pre- and Post-conditions for Back-Ends > -------------------------------------- > 1) Intrinsics, builtins, library routines and language implementation > should be thread-safe. This is equally applicable to front-ends and is > required in Section 1.5 of [OpenMP11]. > > 2) Optimizations should be thread-aware. > > 3) Support a command-line option that enables/disables OpenMP support. > This is required in Chapter 2 of [OpenMP11]. > > 4) Rely on post-conditions guaranteed by front-end, and nothing else. > > Conditions 2)-5) are only applicable to optimizations working on IR > before procedurization. > > As noted in the previous section, LLVM IR is not guaranteed to be > fully conformant with OpenMP specification. Back-ends should be able > to compile cleanly non-compliant programs, but no promises are made on > correct behavior of resulting machine code. > > > Elements of OpenMP > =================> OpenMP is comprised of four components (as of version 3.1): > * Directives (+ Clauses) > * Internal Control Variables > * Runtime Library Routines > * Environment Variables > > Internal Control Variables, Runtime Library Routines and Environment > Variables are provided / handled by OpenMP runtime library. These > components of OpenMP don’t require special support in LLVM IR and thus > are out of scope of this document. > > All OpenMP directives in C/C++ are specified with #pragma > preprocessing directive and have the following format: > > #pragma omp directive-name [clause [[,] clause]…]. > > Next two chapters describe our design proposal for representation of, > correspondingly, directives and clauses. > > > Directives > =========> Directives in LLVM IR are represented as calls to “llvm.omp.directive” > intrinsic with a single argument referencing LLVM IR metadata. The > metadata contains an identifier of a directive. > > Almost all OpenMP directives are represented with two intrinsic calls: > one for entry and one for exit point of the directive’s context. It is > enough to have just one intrinsic call for several directives which > are supposed to be enclosed in other directives (like omp section, > which must appear only within omp sections context) or specify a > single instruction (like omp flush) or are declarative (like omp > threadprivate). Exit point for such directives is either non-existent > or can be determined by examining other intrinsic calls and thus, not > required to be explicitly present. > > Metadata specify only one thing: type of a directive. Currently LLVM > IR does not support integer or enumeration metadata types; thus, we > decided to use MDString (metadata string type) to represent the type. > The list of directives and their identifiers is shown in Table 1. > > Table 1. OpenMP Directives > > Type of directive #pragma omp ... | MDString in metadata > ------------------------------------------------------------ > parallel | OMP_PARALLEL > | OMP_END_PARALLEL > ------------------------------------------------------------ > [parallel] for | OMP_[PARALLEL_]LOOP > | OMP_END_[PARALLEL_]LOOP > ------------------------------------------------------------ > [parallel] sections | OMP_[PARALLEL_]SECTIONS > | OMP_END_[PARALLEL_]SECTIONS > ------------------------------------------------------------ > section | OMP_SECTION > ------------------------------------------------------------ > single | OMP_SINGLE > | OMP_END_SINGLE > ------------------------------------------------------------ > task | OMP_PTASK > | OMP_END_PTASK > ------------------------------------------------------------ > taskyield | OMP_PTASKYIELD > ------------------------------------------------------------ > master | OMP_MASTER > | OMP_END_MASTER > ------------------------------------------------------------ > critical | OMP_CRITICAL > | OMP_END_CRITICAL > ------------------------------------------------------------ > barrier | OMP_BARRIER > ------------------------------------------------------------ > taskwait | OMP_PTASKWAIT > ------------------------------------------------------------ > atomic | OMP_ATOMIC > | OMP_END_ATOMIC > ------------------------------------------------------------ > flush | OMP_FLUSH > ------------------------------------------------------------ > ordered | OMP_ORDERED > | OMP_END_ORDERED > > An example of an OpenMP directive and its representation in LLVM IR: > > C/C++ > ----- > #pragma omp parallel > > LLVM IR > ------- > call void @llvm.omp.directive(metadata !0) > ... > call void @llvm.omp.directive(metadata !1) > > !0 = metadata !{metadata !”OMP_PARALLEL”} > !1 = metadata !{metadata !”OMP_END_PARALLEL”} > > > Clauses > ======> All OpenMP clauses can be divided into four groups: > * Ones with predefined values (default, ordered, nowait, untied, read, > write, update, capture) > * Ones with a list of variables (shared, private, firstprivate, > lastprivate, copyin, copyprivate, directives flush, threadprivate) > * Ones with a scalar or integer expression (if, num_threads, final, > collapse) > * Compound ones (reduction, schedule, directive critical) > > Clauses in LLVM IR are represented as intrinsic calls. Each intrinsic > call representing a clause has one mandatory argument and arbitrary > number of optional arguments. The mandatory argument references LLVM > IR metadata. The metadata contains identifier of the clause (MDString) > and, for some compound clauses, additional data represented as another > MDString. Optional arguments reference LLVM variables or expressions. > > > Clauses with Predefined Values > ------------------------------ > Clauses with predefined values are represented as calls to > “llvm.omp.simple” intrinsic. For this group of clauses metadata > contains an MDString with a clause’s identifier. > > The list of clauses and their identifiers is shown in Table 2. > > Table 2. OpenMP Clauses with Predefined Values > > Clause | MDString in Metadata > ------------------------------------------------------------ > default(none) | OMP_DEFAULT_NONE > ------------------------------------------------------------ > default(shared) | OMP_DEFAULT_SHARED > ------------------------------------------------------------ > ordered | OMP_ORDERED > ------------------------------------------------------------ > nowait | OMP_NOWAIT > ------------------------------------------------------------ > untied | OMP_UNTIED > ------------------------------------------------------------ > read | OMP_READ > ------------------------------------------------------------ > write | OMP_WRITE > ------------------------------------------------------------ > update | OMP_UPDATE > ------------------------------------------------------------ > capture | OMP_CAPTURE > > Here is an example of OpenMP clause with a predefined value and its > representation in LLVM IR: > > C/C++ > ----- > #pragma omp atomic read > > LLVM IR > ------- > call void @llvm.omp.simple(metadata !1) > > !1 = metadata !{metadata !”OMP_READ”} > > > Clauses with a List of Variables > -------------------------------- > Clauses with a list of variables are represented as calls to > “llvm.omp.list” intrinsic. For this group of clauses metadata contains > an MDString with a clause’s identifier. > > Additional arguments of the intrinsic reference LLVM variables > associated with a clause. It is important to reference variables > directly in intrinsic calls and not in metadata, in order to preserve > data dependency. > > Each variable of a non user-defined type is represented with a single > argument (referencing the variable). > > Each variable of a user-defined type is represented with four > arguments: one referencing the variable itself, one referencing its > default constructor, one referencing its copy constructor and one > referencing its destructor. References to constructors / destructors > are required to correctly create and destroy private copies of > variables. > > The list of clauses and their identifiers is shown in Table 3. > > Table 3. OpenMP Clauses with a List of Variables > > Clause | MDString in Metadata > ------------------------------------------------------------ > private(list) | OMP_PRIVATE > ------------------------------------------------------------ > firstprivate(list) | OMP_FIRSTPRIVATE > ------------------------------------------------------------ > lastprivate(list) | OMP_LASTPRIVATE > ------------------------------------------------------------ > shared(list) | OMP_SHARED > ------------------------------------------------------------ > copyin(list) | OMP_COPYIN > ------------------------------------------------------------ > Directive flush(list) | OMP_FLUSH > ------------------------------------------------------------ > Directive threadprivate(list) | OMP_THREADPRIVATE > > Here is an example of OpenMP clause with a list of variables and its > representation in LLVM IR: > > C/C++ > ----- > #pragma omp parallel private(a,b) > > LLVM IR > ------- > call void (metadata, ...)* @llvm.omp.list(metadata !1, i32* @a, i32* @b) > > !1 = metadata !{metadata !”OMP_PRIVATE”} > > > Clauses with Scalar or Integer Expressions > ------------------------------------------ > Clauses with scalar or integer expressions are represented as calls to > “llvm.omp.expr” intrinsic. For this group of clauses metadata contains > an MDString with a clause’s identifier. > > Second argument of the intrinsic references a scalar or integer LLVM > expression associated with a clause. It is important to reference > expressions directly in intrinsic calls and not in metadata, in order > to preserve data dependency. > > The list of clauses and their identifiers is shown in Table 4. > > Table 4. OpenMP Clauses with Scalar or Integer Expressions > > Clause | MDString in Metadata > ------------------------------------------------------------ > if(scalar_expr) | OMP_IF > ------------------------------------------------------------ > num_threads(integer_expr) | OMP_NUM_THREADS > ------------------------------------------------------------ > final(scalar_expr) | OMP_FINAL > ------------------------------------------------------------ > collapse(const_integer_expr) | OMP_COLLAPSE > > Here is an example of OpenMP clause with a scalar expression and its > representation in LLVM IR: > > C/C++ > ----- > #pragma omp parallel if(a) > > LLVM IR > ------- > %4 = load i32* @a, align 4 > %5 = icmp ne i32 %4, 0 > call void @llvm.omp.expr(metadata !1, i32 %5) > ... > !1 = metadata !{metadata !”OMP_IF”} > > > Compound Clauses > ---------------- > Compound clauses are represented as calls to “llvm.omp.compound” > intrinsic. For this group of clauses metadata contains an MDString > with a clause’s identifier and additional data, also represented as an > MDString. > > Additional arguments of an intrinsic call for reduction clause > reference LLVM variables associated with a clause (see format of > “Clauses with a List of Variables”). > > Second argument of an intrinsic call for schedule (static), schedule > (dynamic) and schedule (guided) clauses references an integer LLVM > expression associated with a clause (see format of “Clauses with > Scalar or Integer Expressions”). > > The list of compound clauses along with their representation in > metadata is shown in Table 5. > > Table 5. OpenMP Compound Clauses > > Clause | Representation in Metadata > ------------------------------------------------------------ > reduction(operator : list) | OMP_REDUCTION, <operator> > ------------------------------------------------------------ > schedule(static [, integer_expr]) | OMP_SCHEDULE, STATIC > ------------------------------------------------------------ > schedule(dynamic [, integer_expr]) | OMP_SCHEDULE, DYNAMIC > ------------------------------------------------------------ > schedule(guided [, integer_expr]) | OMP_SCHEDULE, GUIDED > ------------------------------------------------------------ > schedule(auto) | OMP_SCHEDULE, AUTO > ------------------------------------------------------------ > schedule(runtime) | OMP_SCHEDULE, RUNTIME > ------------------------------------------------------------ > Directive critical(name) | OMP_NAME, <name> > > Here is an example of a compound OpenMP clause and its representation > in LLVM IR: > > C/C++ > ----- > #pragma omp parallel reduction(+ : a, b) > > LLVM IR > ------- > call void (metadata, ...)* @llvm.omp.compound(metadata !1, i32* @a, i32* @b) > > !1 = metadata !{metadata !”OMP_REDUCTION”, metadata !”+”} > > > An Example > ---------- > Here is a more complex example, demonstrating LLVM IR representation > of a directive with several clauses of different types. > > C/C++ > ----- > int a, gVar; > int main() { > int lVar; > #pragma omp parallel default(shared), private(gVar, a), if(lVar) > { > ... > } > return (0); > } > > LLVM IR > ------- > @a = global i32 0, align 4 > @gVar = global i32 0, align 4 > define i32 @main () nounwind uwtable ssp { > %lVar = alloca i32, align 4 > call void @llvm.omp.directive(metadata !0) > call void @llvm.omp.simple(metadata !1) > call void (metadata, ...)* @llvm.omp.list(metadata !2, i32* @gVar, i32* > @a) > call void @llvm.omp.expr(metadata !3, i32* %lVar) > ... > call void @llvm.omp.directive(metadata !4) > ret i32 0 > } > !0 = metadata !{metadata !”OMP_PARALLEL”} > !1 = metadata !{metadata !”OMP_DEFAULT_SHARED”} > !2 = metadata !{metadata !”OMP_PRIVATE”} > !3 = metadata !{metadata !”OMP_IF”} > !4 = metadata !{metadata !”OMP_END_PARALLEL”} > > > Design Alternatives > ==================> It is possible to propose several viable alternatives to representing > information on OpenMP directives and clauses while keeping general > design approach intact. > > Some things that we considered: > > a) Represent information on types of directives and clauses as > constant expressions instead of MDStrings. > b) Place MDStrings with identifiers of directives and clauses directly > into intrinsic calls instead of metadata. > c) Do not represent clauses as separate intrinsics with references to > metadata; instead, put a list of references to metadata for all > clauses associated with a directive at the end of metadata describing > the directive. > > As we said, all these are viable alternatives. We decided to choose > what we chose and not employ the alternatives listed above in order to > keep true the following three design principles: > > 1) Absolute simplicity and readability (including human readability). > This ruled out alternatives a) and [partially] c). > 2) Uniformity of representation, for both directives and clauses. This > ruled out alternatives b) and c). > 3) As much extensibility and openness for future changes in both LLVM > IR and OpenMP as possible. This ruled out alternatives b) and c). > > We believe that the only downside of our choice is larger size of LLVM > IR required to represent same number of directives and clauses. > > However, we consider this as a relatively minor element of our design; > one might argue in favor of these or other similar design > alternatives. > > > Conclusion > =========> In this document we described our design proposal for OpenMP > representation in LLVM IR. We believe the IR extensions we proposed > are: > > 1) Simple > > They rely on existing LLVM IR types and language constructs; the only > new things are a few new intrinsics. Thus, LLVM IR consumers lacking > OpenMP support can simply ignore these two intrinsics and still > generate correct, albeit sequential, code. > > 2) Complete > > OpenMP 3.1 Specification ([OpenMP31]) is fully covered. > > 3) Extensible > > The design approach is general enough to readily incorporate future > versions of OpenMP standard. > > 4) Enables both early and late procedurization and aggressive optimization > > This is when compared with designs implying explicit procedurization > done right in front-ends. > > > Acknowledgements > ===============> Authors would like to acknowledge valuable contributions and advice > provided by Kevin B Smith, Xinmin Tian, Stefanus Du Toit, Brian > Minard, Dmitry Babokin and other colleagues from Intel and Diego > Novillo from Google. This *does not* automatically imply approval of > this proposal by said individuals. > > > References > =========> > [Lattner10] Chris Lattner, Devang Patel, “Extensible Metadata in LLVM > IR”. Available at: > http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html > > [Meyer92] Bertrand Meyer, “Applying “Design by Contract”, IEEE > Computer, October 1992, pp. 40-51. Available at: > http://se.ethz.ch/~meyer/publications/computer/contract.pdf > > [Novillo00] Diego Novillo, “Analysis and Optimization of Explicitly > Parallel Programs”, PhD Thesis, University of Alberta, Edmonton, > Alberta, Canada, 2000. Available at: > http://www.airs.com/dnovillo/Papers/Thesis.pdf > > [OpenMP11] “OpenMP Application Program Interface”, Version 3.1, July > 2011. Available at: http://www.openmp.org/mp-documents/OpenMP3.1.pdf > > [Tian05] Xinmin Tian, Milind Girkar, Aart Bik and Hideki Saito, > “Practical Compiler Techniques on Efficient Multithreaded Code > Generation for OpenMP Programs”, The Computer Journal, September 2005, > pp.588-601. Available at: http://dl.acm.org/citation.cfm?id=1095017 > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > -- > Marc J. Driftmeyer > Email :: mjd at reanimality.com > Web :: http://www.reanimality.com > Cell :: (509) 435-5212 > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Andrey, While I think that it will be relatively easy to have the intrinsics serve as code-motion barriers for other code that might be threads sensitive (like other external function calls), we would need to think through exactly how this would work. The easiest thing would be to make the intrinsics have having unmodeled side effects, although we might want to do something more intelligent. Where do you propose placing the parallel loop intrinsics calls relative to the loop code? Will this inhibit restructuring (like loop interchange), fusion, etc. if necessary? -Hal On Fri, 28 Sep 2012 22:42:37 +0400 Andrey Bokhanko <andreybokhanko at gmail.com> wrote:> Hi All, > > We'd like to make a proposal for OpenMP representation in LLVM IR. > > Our goal is to reach an agreement in the community on a simple, > complete and extensible representation of OpenMP language constructs > in LLVM IR. Hopefully, this would serve as a common ground and would > enable further development of OpenMP support both in Clang and LLVM > compiler toolchain. > > We seek feedback on the proposed design and ways to improve it. > > Main authors of the proposal are Andrey Bokhanko and Alexey Bataev. > Also, we'd like to acknowledge valuable contributions and advice > provided by Kevin B Smith, Xinmin Tian, Stefanus Du Toit, Brian > Minard, Dmitry Babokin and other colleagues from Intel and Diego > Novillo from Google. NB, this *does not* automatically imply support > of the proposal by said individuals. > > Please find the proposal in *.pdf (attached to the message, for > reading convenience) and plain text (below the message, for quoting > convenience) formats. Their content is identical. > > Full disclosure: both of us are associated with Intel and Intel > Compiler team. > > Yours, > Andrey Bokhanko > Alexey Bataev > ============> Software Engineers > Intel Compiler Team > Intel Corp. > > > OpenMP Representation in LLVM IR > ===============================> Design Proposal > > Alexey Bataev, Andrey Bokhanko > > > Introduction > ===========> This document describes design proposal for OpenMP representation in > LLVM IR. > > Authors assume that readers have basic understanding of: > 1) OpenMP principles and language constructs > 2) General design of LLVM compiler system > > > Goals > ====> Our end goal is to provide a simple, complete and extensible support > for OpenMP representation in LLVM IR. > > We aim to extend LLVM IR as little as possible, preferably without > adding any new types / language constructs at all. > > Also, we’d like to keep opportunities for optimization of parallel > code as widely opened as possible –obviously, within the boundaries of > preserving correct semantics of a user program. > > And finally, in the spirit of OpenMP Specification ([OpenMP11]), LLVM > IR based compilers should be able to easily skip IR extensions related > to OpenMP support and still generate correct, albeit sequential, code. > > > Function Outlining > =================> Eventually, parallel regions should be put into separate routines, a > process usually called “function outlining” or “procedurization”. This > can happen as early as in front-end, and as late as right before code > generation. > > As can be seen in the following sections, the IR extension we propose > doesn’t involve explicit procedurization. Thus, we assume that > function outlining should happen somewhere in the LLVM back-end, and > usually this should be aligned with how chosen OpenMP runtime library > works and what it expects. This is a deliberate decision on our part. > We believe it provides the following benefits (when compared with > designs involving procedurization done in a front-end): > > 1) Function outlining doesn’t depend on source language; thus, it can > be implemented once and used with any front-ends. > > 2) Optimizations are usually restricted by a single function boundary. > If procedurization is done in a front-end, this effectively kills any > optimizations – as simple ones as loop invariant code motion. Refer to > [Tian2005] for more information on why this is important for efficient > optimization of OpenMP programs. > > It should be stressed, though, that in order to preserve correct > semantics of a user program, optimizations should be made thread-aware > (which, to the best of our knowledge, is not the case with LLVM > optimizations). > > By “thread-aware” we mean an optimization that performs legal > transformations on parallel programs. Refer to [Novillo00] for more > information on correctness of optimizations for parallel programs. > > Given that LLVM optimizations are currently not thread-aware, initial > implementation should call procedurization pass right at the start of > the optimizer. In the future, this pass can be moved further and > further down the optimizer, as more optimizations will become > thread-aware. > > Essentially, this initial implementation is not that different from > implementations with procedurization done in front-end; however, it > keeps window of possibility opened. > > > Scope > ====> Scope of this design is OpenMP representation in LLVM IR. Thus, all > other issues related to OpenMP support, like runtime library, ABI, etc > are not covered. > > We also included a set of requirements for front-ends and back-ends, > which establish mutual expectations and is an important addition to > the design. > > Our proposal is based on the latest published OpenMP specification, > which is version 3.1 ([OpenMP31]) at the time of writing. However, the > design approach we employed is general enough to allow easy adaptation > for future versions of OpenMP standard. > > > Front-End/Back-End Contract > ==========================> While not a part of OpenMP representation design per se, the following > pre- and post-conditions help to establish a set of mutual expectation > between front-ends (that generate LLVM IR with OpenMP support) and > back-ends (that consume it). Without this kind of a “contract” > ([Meyer92]), a back-end has to verify too much and basically repeat > the work already done by a front-end. > > While it is possible to generate LLVM IR that violates these > conditions, we consider it to be non-conformant. Obviously, > conformance can be verified, if necessary. > > Pre- and Post-conditions for Front-Ends > --------------------------------------- > 1) Guarantee correct semantics of directives and clauses. This > includes nomenclature and number of clauses in directives, correct > nesting of directives, values of clauses, etc. > > For example, front-ends should guarantee that omp single directive has > at most one private, firstprivate, copyprivate and nowait clause, and > nothing else; omp section directive is nested inside omp sections > directive; the only possible value of private clause is a list of > variables available at the point where this clause is present; etc. > > 2) Guarantee correct semantics of statements / structured blocks > following directives. > As an example, it should be guaranteed that only for-loops follow omp > parallel for directive. > > 3) Set _OPENMP macro. This is required in section 2.2 of [OpenMP11]. > > 4) Support a command-line option that enables/disables OpenMP support. > This is required in Chapter 2 of [OpenMP11]. > > If a user program violates OpenMP semantics and thus, makes compliance > with first two conditions impossible, a front-end should report a > meaningful error message and stop compilation. > > It should be noted that front-ends are not required to guarantee full > conformance of generated LLVM IR to OpenMP specification. As an > example, the specification deems programs that branch into or out of > parallel regions to be non-conformant. We believe that verification of > such properties is too complex for most front-ends. This matches well > with what is written in the specification itself: “compliant > implementations are not required to check for code sequences that > cause a program to be classified as non-conforming” ([OpenMP11], > Section 1.1). > > Pre- and Post-conditions for Back-Ends > -------------------------------------- > 1) Intrinsics, builtins, library routines and language implementation > should be thread-safe. This is equally applicable to front-ends and is > required in Section 1.5 of [OpenMP11]. > > 2) Optimizations should be thread-aware. > > 3) Support a command-line option that enables/disables OpenMP support. > This is required in Chapter 2 of [OpenMP11]. > > 4) Rely on post-conditions guaranteed by front-end, and nothing else. > > Conditions 2)-5) are only applicable to optimizations working on IR > before procedurization. > > As noted in the previous section, LLVM IR is not guaranteed to be > fully conformant with OpenMP specification. Back-ends should be able > to compile cleanly non-compliant programs, but no promises are made on > correct behavior of resulting machine code. > > > Elements of OpenMP > =================> OpenMP is comprised of four components (as of version 3.1): > * Directives (+ Clauses) > * Internal Control Variables > * Runtime Library Routines > * Environment Variables > > Internal Control Variables, Runtime Library Routines and Environment > Variables are provided / handled by OpenMP runtime library. These > components of OpenMP don’t require special support in LLVM IR and thus > are out of scope of this document. > > All OpenMP directives in C/C++ are specified with #pragma > preprocessing directive and have the following format: > > #pragma omp directive-name [clause [[,] clause]…]. > > Next two chapters describe our design proposal for representation of, > correspondingly, directives and clauses. > > > Directives > =========> Directives in LLVM IR are represented as calls to “llvm.omp.directive” > intrinsic with a single argument referencing LLVM IR metadata. The > metadata contains an identifier of a directive. > > Almost all OpenMP directives are represented with two intrinsic calls: > one for entry and one for exit point of the directive’s context. It is > enough to have just one intrinsic call for several directives which > are supposed to be enclosed in other directives (like omp section, > which must appear only within omp sections context) or specify a > single instruction (like omp flush) or are declarative (like omp > threadprivate). Exit point for such directives is either non-existent > or can be determined by examining other intrinsic calls and thus, not > required to be explicitly present. > > Metadata specify only one thing: type of a directive. Currently LLVM > IR does not support integer or enumeration metadata types; thus, we > decided to use MDString (metadata string type) to represent the type. > The list of directives and their identifiers is shown in Table 1. > > Table 1. OpenMP Directives > > Type of directive #pragma omp ... | MDString in metadata > ------------------------------------------------------------ > parallel | OMP_PARALLEL > | OMP_END_PARALLEL > ------------------------------------------------------------ > [parallel] for | OMP_[PARALLEL_]LOOP > | OMP_END_[PARALLEL_]LOOP > ------------------------------------------------------------ > [parallel] sections | OMP_[PARALLEL_]SECTIONS > | OMP_END_[PARALLEL_]SECTIONS > ------------------------------------------------------------ > section | OMP_SECTION > ------------------------------------------------------------ > single | OMP_SINGLE > | OMP_END_SINGLE > ------------------------------------------------------------ > task | OMP_PTASK > | OMP_END_PTASK > ------------------------------------------------------------ > taskyield | OMP_PTASKYIELD > ------------------------------------------------------------ > master | OMP_MASTER > | OMP_END_MASTER > ------------------------------------------------------------ > critical | OMP_CRITICAL > | OMP_END_CRITICAL > ------------------------------------------------------------ > barrier | OMP_BARRIER > ------------------------------------------------------------ > taskwait | OMP_PTASKWAIT > ------------------------------------------------------------ > atomic | OMP_ATOMIC > | OMP_END_ATOMIC > ------------------------------------------------------------ > flush | OMP_FLUSH > ------------------------------------------------------------ > ordered | OMP_ORDERED > | OMP_END_ORDERED > > An example of an OpenMP directive and its representation in LLVM IR: > > C/C++ > ----- > #pragma omp parallel > > LLVM IR > ------- > call void @llvm.omp.directive(metadata !0) > ... > call void @llvm.omp.directive(metadata !1) > > !0 = metadata !{metadata !”OMP_PARALLEL”} > !1 = metadata !{metadata !”OMP_END_PARALLEL”} > > > Clauses > ======> All OpenMP clauses can be divided into four groups: > * Ones with predefined values (default, ordered, nowait, untied, read, > write, update, capture) > * Ones with a list of variables (shared, private, firstprivate, > lastprivate, copyin, copyprivate, directives flush, threadprivate) > * Ones with a scalar or integer expression (if, num_threads, final, > collapse) > * Compound ones (reduction, schedule, directive critical) > > Clauses in LLVM IR are represented as intrinsic calls. Each intrinsic > call representing a clause has one mandatory argument and arbitrary > number of optional arguments. The mandatory argument references LLVM > IR metadata. The metadata contains identifier of the clause (MDString) > and, for some compound clauses, additional data represented as another > MDString. Optional arguments reference LLVM variables or expressions. > > > Clauses with Predefined Values > ------------------------------ > Clauses with predefined values are represented as calls to > “llvm.omp.simple” intrinsic. For this group of clauses metadata > contains an MDString with a clause’s identifier. > > The list of clauses and their identifiers is shown in Table 2. > > Table 2. OpenMP Clauses with Predefined Values > > Clause | MDString in Metadata > ------------------------------------------------------------ > default(none) | OMP_DEFAULT_NONE > ------------------------------------------------------------ > default(shared) | OMP_DEFAULT_SHARED > ------------------------------------------------------------ > ordered | OMP_ORDERED > ------------------------------------------------------------ > nowait | OMP_NOWAIT > ------------------------------------------------------------ > untied | OMP_UNTIED > ------------------------------------------------------------ > read | OMP_READ > ------------------------------------------------------------ > write | OMP_WRITE > ------------------------------------------------------------ > update | OMP_UPDATE > ------------------------------------------------------------ > capture | OMP_CAPTURE > > Here is an example of OpenMP clause with a predefined value and its > representation in LLVM IR: > > C/C++ > ----- > #pragma omp atomic read > > LLVM IR > ------- > call void @llvm.omp.simple(metadata !1) > > !1 = metadata !{metadata !”OMP_READ”} > > > Clauses with a List of Variables > -------------------------------- > Clauses with a list of variables are represented as calls to > “llvm.omp.list” intrinsic. For this group of clauses metadata contains > an MDString with a clause’s identifier. > > Additional arguments of the intrinsic reference LLVM variables > associated with a clause. It is important to reference variables > directly in intrinsic calls and not in metadata, in order to preserve > data dependency. > > Each variable of a non user-defined type is represented with a single > argument (referencing the variable). > > Each variable of a user-defined type is represented with four > arguments: one referencing the variable itself, one referencing its > default constructor, one referencing its copy constructor and one > referencing its destructor. References to constructors / destructors > are required to correctly create and destroy private copies of > variables. > > The list of clauses and their identifiers is shown in Table 3. > > Table 3. OpenMP Clauses with a List of Variables > > Clause | MDString in Metadata > ------------------------------------------------------------ > private(list) | OMP_PRIVATE > ------------------------------------------------------------ > firstprivate(list) | OMP_FIRSTPRIVATE > ------------------------------------------------------------ > lastprivate(list) | OMP_LASTPRIVATE > ------------------------------------------------------------ > shared(list) | OMP_SHARED > ------------------------------------------------------------ > copyin(list) | OMP_COPYIN > ------------------------------------------------------------ > Directive flush(list) | OMP_FLUSH > ------------------------------------------------------------ > Directive threadprivate(list) | OMP_THREADPRIVATE > > Here is an example of OpenMP clause with a list of variables and its > representation in LLVM IR: > > C/C++ > ----- > #pragma omp parallel private(a,b) > > LLVM IR > ------- > call void (metadata, ...)* @llvm.omp.list(metadata !1, i32* @a, i32* > @b) > > !1 = metadata !{metadata !”OMP_PRIVATE”} > > > Clauses with Scalar or Integer Expressions > ------------------------------------------ > Clauses with scalar or integer expressions are represented as calls to > “llvm.omp.expr” intrinsic. For this group of clauses metadata contains > an MDString with a clause’s identifier. > > Second argument of the intrinsic references a scalar or integer LLVM > expression associated with a clause. It is important to reference > expressions directly in intrinsic calls and not in metadata, in order > to preserve data dependency. > > The list of clauses and their identifiers is shown in Table 4. > > Table 4. OpenMP Clauses with Scalar or Integer Expressions > > Clause | MDString in Metadata > ------------------------------------------------------------ > if(scalar_expr) | OMP_IF > ------------------------------------------------------------ > num_threads(integer_expr) | OMP_NUM_THREADS > ------------------------------------------------------------ > final(scalar_expr) | OMP_FINAL > ------------------------------------------------------------ > collapse(const_integer_expr) | OMP_COLLAPSE > > Here is an example of OpenMP clause with a scalar expression and its > representation in LLVM IR: > > C/C++ > ----- > #pragma omp parallel if(a) > > LLVM IR > ------- > %4 = load i32* @a, align 4 > %5 = icmp ne i32 %4, 0 > call void @llvm.omp.expr(metadata !1, i32 %5) > ... > !1 = metadata !{metadata !”OMP_IF”} > > > Compound Clauses > ---------------- > Compound clauses are represented as calls to “llvm.omp.compound” > intrinsic. For this group of clauses metadata contains an MDString > with a clause’s identifier and additional data, also represented as an > MDString. > > Additional arguments of an intrinsic call for reduction clause > reference LLVM variables associated with a clause (see format of > “Clauses with a List of Variables”). > > Second argument of an intrinsic call for schedule (static), schedule > (dynamic) and schedule (guided) clauses references an integer LLVM > expression associated with a clause (see format of “Clauses with > Scalar or Integer Expressions”). > > The list of compound clauses along with their representation in > metadata is shown in Table 5. > > Table 5. OpenMP Compound Clauses > > Clause | Representation in Metadata > ------------------------------------------------------------ > reduction(operator : list) | OMP_REDUCTION, <operator> > ------------------------------------------------------------ > schedule(static [, integer_expr]) | OMP_SCHEDULE, STATIC > ------------------------------------------------------------ > schedule(dynamic [, integer_expr]) | OMP_SCHEDULE, DYNAMIC > ------------------------------------------------------------ > schedule(guided [, integer_expr]) | OMP_SCHEDULE, GUIDED > ------------------------------------------------------------ > schedule(auto) | OMP_SCHEDULE, AUTO > ------------------------------------------------------------ > schedule(runtime) | OMP_SCHEDULE, RUNTIME > ------------------------------------------------------------ > Directive critical(name) | OMP_NAME, <name> > > Here is an example of a compound OpenMP clause and its representation > in LLVM IR: > > C/C++ > ----- > #pragma omp parallel reduction(+ : a, b) > > LLVM IR > ------- > call void (metadata, ...)* @llvm.omp.compound(metadata !1, i32* @a, > i32* @b) > > !1 = metadata !{metadata !”OMP_REDUCTION”, metadata !”+”} > > > An Example > ---------- > Here is a more complex example, demonstrating LLVM IR representation > of a directive with several clauses of different types. > > C/C++ > ----- > int a, gVar; > int main() { > int lVar; > #pragma omp parallel default(shared), private(gVar, a), if(lVar) > { > ... > } > return (0); > } > > LLVM IR > ------- > @a = global i32 0, align 4 > @gVar = global i32 0, align 4 > define i32 @main () nounwind uwtable ssp { > %lVar = alloca i32, align 4 > call void @llvm.omp.directive(metadata !0) > call void @llvm.omp.simple(metadata !1) > call void (metadata, ...)* @llvm.omp.list(metadata !2, i32* @gVar, > i32* @a) call void @llvm.omp.expr(metadata !3, i32* %lVar) > ... > call void @llvm.omp.directive(metadata !4) > ret i32 0 > } > !0 = metadata !{metadata !”OMP_PARALLEL”} > !1 = metadata !{metadata !”OMP_DEFAULT_SHARED”} > !2 = metadata !{metadata !”OMP_PRIVATE”} > !3 = metadata !{metadata !”OMP_IF”} > !4 = metadata !{metadata !”OMP_END_PARALLEL”} > > > Design Alternatives > ==================> It is possible to propose several viable alternatives to representing > information on OpenMP directives and clauses while keeping general > design approach intact. > > Some things that we considered: > > a) Represent information on types of directives and clauses as > constant expressions instead of MDStrings. > b) Place MDStrings with identifiers of directives and clauses directly > into intrinsic calls instead of metadata. > c) Do not represent clauses as separate intrinsics with references to > metadata; instead, put a list of references to metadata for all > clauses associated with a directive at the end of metadata describing > the directive. > > As we said, all these are viable alternatives. We decided to choose > what we chose and not employ the alternatives listed above in order to > keep true the following three design principles: > > 1) Absolute simplicity and readability (including human readability). > This ruled out alternatives a) and [partially] c). > 2) Uniformity of representation, for both directives and clauses. This > ruled out alternatives b) and c). > 3) As much extensibility and openness for future changes in both LLVM > IR and OpenMP as possible. This ruled out alternatives b) and c). > > We believe that the only downside of our choice is larger size of LLVM > IR required to represent same number of directives and clauses. > > However, we consider this as a relatively minor element of our design; > one might argue in favor of these or other similar design > alternatives. > > > Conclusion > =========> In this document we described our design proposal for OpenMP > representation in LLVM IR. We believe the IR extensions we proposed > are: > > 1) Simple > > They rely on existing LLVM IR types and language constructs; the only > new things are a few new intrinsics. Thus, LLVM IR consumers lacking > OpenMP support can simply ignore these two intrinsics and still > generate correct, albeit sequential, code. > > 2) Complete > > OpenMP 3.1 Specification ([OpenMP31]) is fully covered. > > 3) Extensible > > The design approach is general enough to readily incorporate future > versions of OpenMP standard. > > 4) Enables both early and late procedurization and aggressive > optimization > > This is when compared with designs implying explicit procedurization > done right in front-ends. > > > Acknowledgements > ===============> Authors would like to acknowledge valuable contributions and advice > provided by Kevin B Smith, Xinmin Tian, Stefanus Du Toit, Brian > Minard, Dmitry Babokin and other colleagues from Intel and Diego > Novillo from Google. This *does not* automatically imply approval of > this proposal by said individuals. > > > References > =========> > [Lattner10] Chris Lattner, Devang Patel, “Extensible Metadata in LLVM > IR”. Available at: > http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html > > [Meyer92] Bertrand Meyer, “Applying “Design by Contract”, IEEE > Computer, October 1992, pp. 40-51. Available at: > http://se.ethz.ch/~meyer/publications/computer/contract.pdf > > [Novillo00] Diego Novillo, “Analysis and Optimization of Explicitly > Parallel Programs”, PhD Thesis, University of Alberta, Edmonton, > Alberta, Canada, 2000. Available at: > http://www.airs.com/dnovillo/Papers/Thesis.pdf > > [OpenMP11] “OpenMP Application Program Interface”, Version 3.1, July > 2011. Available at: http://www.openmp.org/mp-documents/OpenMP3.1.pdf > > [Tian05] Xinmin Tian, Milind Girkar, Aart Bik and Hideki Saito, > “Practical Compiler Techniques on Efficient Multithreaded Code > Generation for OpenMP Programs”, The Computer Journal, September 2005, > pp.588-601. Available at: http://dl.acm.org/citation.cfm?id=1095017-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Hal,> While I think that it will be relatively easy to have the intrinsics > serve as code-motion barriers for other code that might be threads > sensitive (like other external function calls), we would need to think > through exactly how this would work. The easiest thing would be to make > the intrinsics have having unmodeled side effects, although we might > want to do something more intelligent.Yes, that's exactly the idea.> Where do you propose placing the parallel loop intrinsics calls > relative to the loop code?In preloop ("opening" intrinsic) and postloop ("closing" one).> Will this inhibit restructuring (like loop > interchange), fusion, etc. if necessary?I guess so... Loops usually deal with reading/writing memory, and if an intrinsic is marked as "modifies everything", this hardly leaves any possibility for [at least] the optimizations you mentioned. But this is different from what I have in mind. Basically, the plan is to perform analysis and some optimizations before procedurization, and do the rest (including loop restructuring) after it. This is not mentioned in the proposal (we tried to be succint -- only 20 pages long! :-)), but explained in detail in [Tian05] (sorry, the link in the proposal doesn't lead you directly to pdf file; use this one instead: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.97.3763&rep=rep1&type=pdf). Yours, Andrey
Krzysztof Parzyszek
2012-Nov-07 17:31 UTC
[LLVMdev] [RFC] OpenMP Representation in LLVM IR
On 10/1/2012 8:06 PM, greened at obbligato.org wrote:> > I think this is a bad idea. OpenMP is not Low Level and I can't think > of a good reason to start putting OpenMP support in the IR.OpenMP is as low-level as loop nest optimizations or inlining. There is nothing about it that would make it not fit in the LLVM's model. As for reasons, here are two: 1. Implementing parallelization directives in the IR would allow multiple front-ends to use the same interface to express SMP semantics without having each one of them to do the actual implementation. 2. Having the paralellization infrastructure implemented in LLVM would allow auto-parallelization to be added at some point without adding a new parallelization infrastructure to the optimizer (on top of what the front-ends would have already implemented). -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation