Vinay Madhusudan via llvm-dev
2020-Feb-17 18:28 UTC
[llvm-dev] [flang-dev] About OpenMP dialect in MLIR
Please find the reply inline below On Sun, Feb 16, 2020 at 12:59 AM Mehdi AMINI <joker.eph at gmail.com> wrote:> > > On Sat, Feb 15, 2020 at 10:42 AM Vinay Madhusudan via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Reply to Kiran Chandramohan: >> >> > You are welcome to participate, provide feedback and criticism to >> change the design as well as to contribute to the implementation. >> >> Thank you Kiran. >> >> > But the latest is what is there in the RFC in discourse. >> >> I have used this as reference for the response. >> >> > We did a study of a few constructs and clauses which was shared as >> mails to flang-dev and the RFC. As we make progress and before >> implementation, we will share further details. >> >> > “ Yes, parallel and flush would be the next two constructs that we >> will do.” -- from a comment in latest RFC >> >> For the above mentioned reasons, I will try to restrict my reply to how >> the “parallel (do)” construct would be lowered. >> >> > If it is OK we can have further discussions in discourse as River >> points out. >> >> Given that the multiple components of the LLVM project, namely clang, >> flang, MLIR and LLVM are involved, llvm-dev is probably a better place, >> with a much wider audience >> > > Possibly wider, but maybe less focused about discussing MLIR dialect > design. In particular there is an RFC thread for this particular dialect on > Discourse, which is the canonical place to discuss its design. > > >> , until it is clear how different components must interact. >> > > They don't need to interact so closely: they are very loosely related: > flang will use MLIR but clang won't (in the foreseeable future) and LLVM > has many other frontends. > > >> >> > It is the review for translation to LLVM IR that is in progress. >> >> > “If we decide that the OpenMP construct (for e.g. collapse) can be >> handled fully in MLIR and that is the best place to do it (based on >> experiments) then we will not use the OpenMP IRBuilder for these constructs.” >> -- latest RFC in discourse >> >> If it is not finalized that the OpenMPIRBuilder will be used for all the >> constructs, wouldn’t it be better to delay the submission of “translation >> to LLVM IR” patch in MLIR? Lowering code will become inconsistent if the >> OpenMPIRBuilder is used only for a few constructs and not for others. >> > >> Also, the patch does OpenMP dialect lowering *alongside* LLVM Dialect to >> LLVM IR. This is different from most dialects which get directly lowered to >> LLVM Dialect. I think lowering to LLVM Dialect would be a cleaner way if >> OpenMPIRBuilder is not being considered for all constructs. >> > > > I don't disagree, but there are a lot of speculation here: your quote > starts with "If we decide that the OpenMP construct (for e.g. collapse) can > be handled fully in MLIR", are you thinking that we need to first decide > this once and for all before making progress on building this path? > What disadvantages do you perceive to an approach where we would bring up > this dialect using the OpenMPIRBuilders for exporting to LLVM IR until we > gain enough experience? Do you think starting like this will make it > significantly harder to transition away from the builders if this is what > we want? > It seemed to me like it wouldn't, and that's why I'm supportive of this > path: the omp dialect design, implementation, and the > transformation/analysis that will be performed there seems entirely > disjoint from the LLVM lowering, I'd hope we can swap the LLVM lowering at > a later time (if this is what we'd want). > > >> >>The statement you quoted is from the RFC in discourse by Kiran. It is actually unclear to whom you are referring to here. I am assuming that it is for him to answer. The below details would cover some of your questions as well. About Clang / MLIR / LLVM being loosely related and not being relevant in llvm-dev: With the introduction of the OpenMPIRBuilder in MLIR (from this review : https://reviews.llvm.org/D72962), Clang and MLIR would now have the common code for building OpenMP constructs. I do not think it is so loosely related anymore. Note that MLIR and Clang frontends for LLVM are very different. Clang emits LLVM IR with almost no optimizations and MLIR already supports considerable amount of optimizations. Decision of using the OpenMPIRBuilder for MLIR was discussed in the following flang-dev threads (Please correct me If I am missing some newer discussions on the below topics) 1. [May 2019] h ttp://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html <http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html> 2. [June 2019] http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-June/000251.html However I could not find any conclusions for the concerns raised by Kiran: 1. Early outlining (in MLIR) vs. Late outlining (in LLVM) 2. Handling of target constructions: high-level transformations for GPUs and CPUs (offloading in LLVM vs. MLIR?) Kiran seems to suggest the early outlining (version 2) would be better( http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000224.html ). But currently, the late outlining has been implemented in LLVM (version 1) ( https://github.com/llvm/llvm-project/blob/master/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp ). Early outlining in MLIR would have the following benefits as suggested in the thread: 1. Enables more optimization in MLIR (intra-procedural because of regions). 2. Offloading in MLIR (which is designed for heterogenous hardware compilation support) 3. Direct LLVM Dialect lowering of OpenMP operations (no LLVM IR lowering) MLIR google groups discussion (*h* ttps://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw <https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw>) regarding the use of OpenMPIRBuilder doesn’t seem to discuss the above concerns and also about how the various design decisions in OpenMPIRBuilder affects MLIR in general. Also, > “The point here is that we do not want to use MLIR just as a pass-through layer because MLIR has a lot of strengths” .... > “The point here is that if we lower to LLVM dialect, we will not be able to reuse OpenMP codegen & optimisation code from Clang/LLVM.” --- by Kiran in https://lists.llvm.org/pipermail/llvm-dev/2020-February/139181.html In the latest reply from Kiran (quoted above) to this thread, Kiran seems to suggest that lowering to LLVM Dialect (instead of LLVM IR) would restrict the use of OpenMP Optimization code from LLVM and also MLIR will just be a pass-through to the OpenMPIRBuilder. Because of the above reasons, it seems to me that design considerations of using OpenMPIRBuilder for MLIR should also be mentioned (and discussed) before commiting LLVM IR lowering part for OpenMP dialect in https://reviews.llvm.org/D72962> Mehdi also seems to have the same suggestion: “I agree that having >> dialect lowering would be cleaner” in https://reviews.llvm.org/D72962 >> > > Since you're calling me out: yes it would be cleaner from a pure MLIR > point of view, I don't think there is much disagreement on this (I think?). > However we already have the OpenMP builders available and they will > continue to be maintained/evolved to support OpenMP in clang. > Duplicating them entirely in MLIR for the sake of purity seems like a lack > of pragmatism here, so I support the current approach with the current > tradeoffs. > > >> >> > Yes, the design has mildly changed over time to incorporate feedback. >> But the latest is what is there in the RFC in discourse. >> >> RFC fails to discuss the following (I have also mentioned some of them in >> my reply to Johannes): >> >> > The proposed plan involves a) lowering F18 AST with OpenMP directly to >> a mix of OpenMP and FIR dialects. b) converting this finally to a mix of >> OpenMP and LLVM dialects. >> >> It is unclear in the RFC what other dialects are considered as supported >> for OpenMP dialect (std, affine, vector, loop, etc) and how it would be >> transformed, used and lowered from FIR to LLVM. >> >> It becomes important to list down the various dialects / operations / >> types supported for OpenMP (which is mainly defined for C, C++ and Fortran >> programs. MLIR has a much wider scope. >> >> It wouldn’t add much value for the proposed OpenMP dialect to be in the >> MLIR tree if it cannot support at least the relevant standard dialect types >> / operations. >> > > I agree, and I think this was something I called out as important in the > RFC: "It seems that the dialect can be orthogonal to FIR and its type > system, which the most important thing to me to integrate MLIR (favor > reusability across other frontends / compiler frameworks)". > If you don't think that this is the case, then please raise this in the > RFC! > I think it is perfectly fair to ask for more examples from the author and > digging a bit deeper if you're unconvinced that the proposed modeling can > be applicable outside of FIR. This is exactly why we ask such proposal to > go through RFC by the way: to allow people like you to point at the > blindspot and ask the right questions. > > Best, > > -- > Mehdi > > > >> > We would like to take advantage of the transformations in cases that >> are possible. FIR loops will be converted to affine/loop dialect. So the >> loop inside an omp.do can be in these dialects as clarified in the >> discussion in discourse and also shown in slide 20 of the fosdem >> presentation (links to both below). >> >> >> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397/7?u=kiranchandramohan >> >> >> https://fosdem.org/2020/schedule/event/llvm_flang/attachments/slides/3839/export/events/attachments/llvm_flang/slides/3839/flang_llvm_frontend.pdf >> >> Although it is mentioned that the affine/ loop.for is used, following >> things are unclear: >> >> I am assuming that there will be lowering / conversion code in f18 repo >> dialect from fir.do to loop.for / affine.for. Is it the case? If so, I >> think it is worth mentioning it in the “sequential code flow >> representation” in the RFC. >> >> This raises the following questions. >> >> >> >> 1. >> >> Which types are supported? Standard dialect types and FIR types? >> >> >> For example, what types are used for Fortran arrays used inside OpenMP >> regions? Is it std.memref OR Fortran array representation in FIR dialect >> (fir.array?) OR both? Note that Fortran has support for column major >> arrays. std.memref supports custom memory layouts. What custom layouts are >> supported? >> >> >> How would different non-scalar types in standard dialect be lowered to >> LLVM IR and passed to OpenMP runtime calls? Can you please elaborate on >> this? >> >> The example provided in slide 20 of the fosdem presentation contains >> >> “loop.for %j = %lb2 to %ub2 : !integer {“ >> >> But loop.for accepts “index” type. Not sure what type “!integer” >> represents here. >> >> >> 1. >> >> What are the different memory access operations which are supported >> inside the OpenMP region and lowered to proper OpenMP runtime calls in LLVM >> IR? >> >> >> The possibilities are: >> >> 1. >> >> affine.load / affine.store >> 2. >> >> std.load / std.store >> 3. >> >> FIR dialect memory access operations. >> >> >> > I must also point out that the question of where to do loop >> transformations is a topic we have not fully converged on. See the >> following thread for discussions. >> http://lists.llvm.org/pipermail/flang-dev/2019-September/000042.html >> >> Looks like placement (MLIR / LLVM) of various transformations related to >> OpenMP has not been finalized, from what I could infer from Johannes’s >> reply and the below text in the latest RFC in discourse: >> >> “So there exist some questions regarding where the optimisations should >> be carried out. We will decide on which framework to choose only after some >> experimentation.” >> >> > i) we need to keep the loops separately so as to take advantage of >> transformations that other dialects like affine/loop would provide. >> >> 1) Keeping the loops separate from the OpenMP operations will expose them >> to the “regular” transformations passes in MLIR inside the OpenMP region. >> Most of them are invalid or in-efficient for OpenMP operations. >> >> Examples: >> >> 1. >> >> Constant propagation example mentioned by Johannes in this thread. >> (omp task shared(x)) >> 2. >> >> Loop (nest) transformations (permute / split / fuse / tile, etc) will >> happen ignoring the surrounding OpenMP operations. >> 3. >> >> Hoisting and sinking of various memory/ SSA values inside the OpenMP >> region. This goes against the likes of “map”, “firstprivate”, shared, etc >> clauses and more. >> >> >> 2) Various loop operations (loop.for, affine.for, fir.do) have (or will >> have) different transformations/ optimization passes which are different >> from one another. >> >> Example: >> >> 1. >> >> AffineLoopInvariantCodeMotion.cpp is different from >> LoopInvariantCodeMotion.cpp. >> 2. >> >> Other Loop transformation passes for affine.for >> >> >> These loops also use different Types and memory access operations in >> general for transformations. Example, most Affine dialect transformations >> (if not all) work on affine.load and affine.store operations. >> >> Supporting different loop operations means that there would be *OpenMP >> specific transformations* for each one of them and also requires a way to >> restrict each of them from existing transformations (when nested in OpenMP >> constructs). >> >> There would be different lowerings for different loop operations as well. >> Example, affine.for and loop.for would have to be lowered to omp.do in >> different ways. >> >> From slide 20 of fosdem presentation you mentioned, the LLVM + OpenMP >> dialect representation is as follows: >> >> ------------------------------ >> >> Mlir.region(…) { >> >> omp.parallel { >> >> %ub3 = … >> >> omp.do %i = 0 to %ub3 : !integer { >> >> … >> >> } >> >> } >> >> } >> >> ------------------------------- >> >> Currently, the LLVM Dialect doesn’t contain a high level loop operation. >> It is all based on CFG implementation. >> >> Will omp.do follow the same structure (SizedRegion<1>) as loop.for? OR >> there would be CFG for LLVM Dialect based loop operation? >> >> Can you please mention how the OpenMP + LLVM dialect will look like for >> the below parallel do construct? >> >> integer :: i=1, k=10 >> >> integer :: a(10), b(10), c(10) >> >> ... >> >> !$omp parallel do >> >> do i = 1, k >> >> if (i .ne. 1) *cycle* >> >> c(i) = a(i) + b(i) >> >> end do >> >> !$omp end parallel do >> >> print *,c >> > >> Thanks, >> >> Vinay >> >> On Fri, Feb 14, 2020 at 6:52 AM Kiran Chandramohan via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Hello Vinay, >>> >>> Thanks for your mail about the OpenMP dialect in MLIR. Happy to know >>> that you and several other groups are interested in the OpenMP dialect. At >>> the outset, I must point out that the design is not set in stone and will >>> change as we make progress. You are welcome to participate, provide >>> feedback and criticism to change the design as well as to contribute to the >>> implementation. I provide some clarifications and replies to your comments >>> below. If it is OK we can have further discussions in discourse as River >>> points out. >>> >>> 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and >>> clang frontends. Note that this proposal was before considering MLIR for >>> FIR. >>> >>> A correction here. The proposal for OpenMPIRBuilder was made when MLIR >>> was being considered for FIR. >>> (i) Gary Klimowicz's minutes for Flang call in April 2019 mentions >>> considering MLIR for FIR. >>> >>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-April/000194.html >>> (ii) My reply to Johaness's proposal in May 2019 mentions MLIR for FIR. >>> >>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000220.html >>> >>> b. Review of barrier construct is in progress: >>> https://reviews.llvm.org/D72962 >>> >>> Minor correction here. The addition of barrier construct was accepted >>> and has landed (https://reviews.llvm.org/D7240 >>> <https://reviews.llvm.org/D72400>). It is the review for translation to >>> LLVM IR that is in progress. >>> >>> It looks like the design has evolved over time and there is no one place >>> which contains the latest design decisions that fits all the different >>> pieces of the puzzle. I will try to deduce it from the above mentioned >>> references. Please correct me If I am referring to anything which has >>> changed. >>> >>> Yes, the design has mildly changed over time to incorporate feedback. >>> But the latest is what is there in the RFC in discourse. >>> >>> For most OpenMP design discussions, FIR examples are used (as seen in >>> (2) and (3)). The MLIR examples mentioned in the design only talks about >>> FIR dialect and LLVM dialect. >>> >>> Our initial concern was how will all these pieces (FIR, LLVM Dialect, >>> OpenMPIRBuilder, LLVM IR) fit together. Hence you see the prominence of FIR >>> and LLVM dialect and more information about lowering/translation than >>> transformations/optimisations. >>> >>> This completely ignores the likes of standard, affine (where most loop >>> transformations are supposed to happen) and loop dialects. >>> >>> Adding to the reply above. We would like to take advantage of the >>> transformations in cases that are possible. FIR loops will be converted to >>> affine/loop dialect. So the loop inside an omp.do can be in these dialects >>> as clarified in the discussion in discourse and also shown in slide 20 of >>> the fosdem presentation (links to both below). >>> >>> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397/7?u=kiranchandramohan >>> >>> https://fosdem.org/2020/schedule/event/llvm_flang/attachments/slides/3839/export/events/attachments/llvm_flang/slides/3839/flang_llvm_frontend.pdf >>> >>> I must also point out that the question of where to do loop >>> transformations is a topic we have not fully converged on. See the >>> following thread for discussions. >>> http://lists.llvm.org/pipermail/flang-dev/2019-September/000042.html >>> >>> Is it the same omp.do operation which now contains the bounds and >>> induction variables of the loop after the LLVM conversion? >>> >>> The point here is that i) we need to keep the loops separately so as to >>> take advantage of transformations that other dialects like affine/loop >>> would provide. ii) We will need the loop information while lowering the >>> OpenMP do operation. For implementation, if reusing the same operation (in >>> different contexts) is difficult then we can add a new operation. >>> >>> It is also not mentioned how clauses like firstprivate, shared, >>> private, reduce, map, etc are lowered to OpenMP dialect. >>> >>> Yes, it is not mentioned. We did a study of a few constructs and clauses >>> which was shared as mails to flang-dev and the RFC. As we make progress and >>> before implementation, we will share further details. >>> >>> it would be beneficial to have an omp.parallel_do operation which has >>> semantics similar to other loop structures (may not be LoopLikeInterface) >>> in MLIR. >>> >>> I am not against adding parallel_do if it can help with transformations >>> or reduce the complexity of lowering. Please share the details in discourse >>> as a reply to the RFC or a separate thread. >>> >>> it looks like having OpenMP operations based on standard MLIR types and >>> operations (scalars and memrefs mainly) is the right way to go. >>> >>> This will definitely be the first version that we implement. But I do >>> not understand why we should restrict to only the standard types and >>> operations. To ease lowering and translation and to avoid adding OpenMP >>> operations to other dialects, I believe OpenMP dialect should also be able >>> to exist with other dialects like FIR and LLVM. >>> >>> E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct >>> lowering to LLVM IR ignoring all the advantages that MLIR provides. >>> >>> Also, OpenMP codegen will automatically benefit from the GPU dialect >>> based optimizations. For example, it would be way easier to hoist a memory >>> reference out of GPU kernel in MLIR than in LLVM IR. >>> >>> I might not have fully understood you here. But the dialect lives >>> independently of the translation to LLVM IR. If there are optimisations >>> (like hoisting that you mention here) I believe they can be performed as >>> transformation passes on the dialect. It is not ruled out. >>> >>> --Kiran >>> ------------------------------ >>> *From:* flang-dev <flang-dev-bounces at lists.llvm.org> on behalf of Vinay >>> Madhusudan via flang-dev <flang-dev at lists.llvm.org> >>> *Sent:* 13 February 2020 16:33 >>> *To:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>; >>> flang-dev at lists.llvm.org <flang-dev at lists.llvm.org> >>> *Subject:* [flang-dev] About OpenMP dialect in MLIR >>> >>> >>> Hi, >>> >>> I have few questions / concerns regarding the design of OpenMP dialect >>> in MLIR that is currently being implemented, mainly for the f18 compiler. >>> Below, I summarize the current state of various efforts in clang / f18 / >>> MLIR / LLVM regarding this. Feel free to add to the list in case I have >>> missed something. >>> >>> 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and >>> clang frontends. Note that this proposal was before considering MLIR for >>> FIR. >>> >>> a. llvm-dev proposal : >>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html >>> >>> b. Patches in review: https://reviews.llvm.org/D70290. This also >>> includes the clang codegen changes. >>> >>> 2. [July - September 2019] OpenMP dialect for MLIR was discussed / >>> proposed with respect to the f18 compilation stack (keeping FIR in mind). >>> >>> a. flang-dev discussion link: >>> https://lists.llvm.org/pipermail/flang-dev/2019-September/000020.html >>> >>> b. Design decisions captured in PPT: >>> https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view >>> >>> c. MLIR google groups discussion: >>> https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw >>> >>> d. Target constructs design: >>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html >>> >>> e. SIMD constructs design: >>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html >>> >>> 3. [Jan 2020] OpenMP dialect RFC in llvm discourse : >>> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397 >>> >>> 4. [Jan- Feb 2020] Implementation of OpenMP dialect in MLIR: >>> >>> a. The first patch which introduces the OpenMP dialect was pushed. >>> >>> b. Review of barrier construct is in progress: >>> https://reviews.llvm.org/D72962 >>> https://reviews.llvm.org/D72400 >>> >>> I have tried to list below different topics of interest (to different >>> people) around this work. Most of these are in the design phase (or very >>> new) and multiple parties are interested with different sets of goals in >>> mind. >>> >>> I. Flang frontend and its integration >>> >>> II. Fortran representation in MLIR / FIR development >>> >>> III. OpenMP development for flang, OpenMP builder in LLVM. >>> >>> IV. Loop Transformations in MLIR / LLVM with respect to OpenMP. >>> >>> It looks like the design has evolved over time and there is no one place >>> which contains the latest design decisions that fits all the different >>> pieces of the puzzle. I will try to deduce it from the above mentioned >>> references. Please correct me If I am referring to anything which has >>> changed. >>> >>> A. For most OpenMP design discussions, FIR examples are used (as seen >>> in (2) and (3)). The MLIR examples mentioned in the design only talks about >>> FIR dialect and LLVM dialect. >>> >>> This completely ignores the likes of standard, affine (where most loop >>> transformations are supposed to happen) and loop dialects. I think it is >>> critical to decouple the OpenMP dialect development in MLIR from the >>> current flang / FIR effort. It would be useful if someone can mention these >>> examples using existing dialects in MLIR and also how the different >>> transformations / lowerings are planned. >>> >>> B. In latest RFC(3), it is mentioned that the initial OpenMP dialect >>> version will be as follows, >>> >>> omp.parallel { >>> >>> omp.do { >>> >>> fir.do %i = 0 to %ub3 : !fir.integer { >>> >>> ... >>> >>> } >>> >>> } >>> >>> } >>> >>> and then after the "LLVM conversion" it is converted as follows: >>> >>> omp.parallel { >>> >>> %ub3 >>> >>> omp.do %i = 0 to %ub3 : !llvm.integer { >>> >>> ... >>> >>> } >>> >>> } >>> >>> >>> a. Is it the same omp.do operation which now contains the bounds and >>> induction variables of the loop after the LLVM conversion? If so, will the >>> same operation have two different semantics during a single compilation? >>> >>> b. Will there be different lowerings for various loop operations from >>> different dialects? loop.for and affine.for under omp operations would need >>> different OpenMP / LLVM lowerings. Currently, both of them are lowered >>> to the CFG based loops during the LLVM dialect conversion (which is much >>> before the proposed OpenMP dialect lowering). >>> >>> There would be no standard way to represent OpenMP operations >>> (especially the ones which involve loops) in MLIR. This would drastically >>> complicate lowering. >>> >>> C. It is also not mentioned how clauses like firstprivate, shared, >>> private, reduce, map, etc are lowered to OpenMP dialect. The example in >>> the RFC contains FIR and LLVM types and nothing about std dialect types. >>> Consider the below example: >>> >>> #pragma omp parallel for reduction(+:x) >>> >>> for (int i = 0; i < N; ++i) >>> >>> x += a[i]; >>> >>> How would the above be represented in OpenMP dialect? and What type >>> would "x" be in MLIR? It is not mentioned in the design as to how the >>> various SSA values for various OpenMP clauses are passed around in OpenMP >>> operations. >>> >>> D. Because of (A), (B) and (C), it would be beneficial to have an omp. >>> parallel_do operation which has semantics similar to other loop >>> structures (may not be LoopLikeInterface) in MLIR. To me, it looks like >>> having OpenMP operations based on standard MLIR types and operations >>> (scalars and memrefs mainly) is the right way to go. >>> >>> Why not have omp.parallel_do operation with AffineMap based bounds, so >>> as to decouple it from Value/Type similar to affine.for? >>> >>> 1. With the current design, the number of transformations / >>> optimizations that one can write on OpenMP constructs would become limited >>> as there can be any custom loop structure with custom operations / types >>> inside it. >>> >>> 2. It would also be easier to transform the Loop nests containing OpenMP >>> constructs if the body of the OpenMP operations is well defined (i.e., does >>> not accept arbitrary loop structures). Having nested redundant "parallel" , >>> "target" and "do" regions seems unnecessary. >>> >>> 3. There would also be new sets of loop structures in new dialects when >>> C/C++ is compiled to MLIR. It would complicate the number of possible >>> combinations inside the OpenMP region. >>> >>> E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct >>> lowering to LLVM IR ignoring all the advantages that MLIR provides. Being >>> able to compile the code for heterogeneous hardware is one of the biggest >>> advantages that MLIR brings to the table. That is being completely missed >>> here. This also requires solving the problem of handling target information >>> in MLIR. But that is a problem which needs to be solved anyway. Using GPU >>> dialect also gives us an opportunity to represent offloading semantics in >>> MLIR. >>> >>> Given the ability to represent multiple ModuleOps and the existence of >>> GPU dialect, couldn't higher level optimizations on offloaded code be done >>> at MLIR level?. The proposed design would lead us to the same problems that >>> we are currently facing in LLVM IR. >>> >>> Also, OpenMP codegen will automatically benefit from the GPU dialect >>> based optimizations. For example, it would be way easier to hoist a memory >>> reference out of GPU kernel in MLIR than in LLVM IR. >>> >>> Thanks, >>> >>> Vinay >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200217/4c394cc1/attachment-0001.html>
Johannes Doerfert via llvm-dev
2020-Feb-18 00:49 UTC
[llvm-dev] [flang-dev] About OpenMP dialect in MLIR
On 02/17, Vinay Madhusudan via flang-dev wrote:> Please find the reply inline below > > [...] > > About Clang / MLIR / LLVM being loosely related and not being relevant in > llvm-dev: > > With the introduction of the OpenMPIRBuilder in MLIR (from this review : > https://reviews.llvm.org/D72962), Clang and MLIR would now have the common > code for building OpenMP constructs. I do not think it is so loosely > related anymore. Note that MLIR and Clang frontends for LLVM are very > different. Clang emits LLVM IR with almost no optimizations and MLIR > already supports considerable amount of optimizations. > > Decision of using the OpenMPIRBuilder for MLIR was discussed in the > following flang-dev threads (Please correct me If I am missing some newer > discussions on the below topics)I still am unsure about a basic detail that seems really important: Could you explain to me what the differences/benefits are between lowering OpenMP Ops with something we call CGOpenMP[0] into LLVM dialect (as far as I understand this) instead of using the OpenMPIRBuilder to generate LLVM-IR from these Ops? I am unsure because the functions/functionality in those two files look pretty similar to me, except that the CGOpenMP solution requires the entire* runtime call encoding of clang to be duplicated and maintained inside MLIR [1] while it is shared in a single space for the OpenMPIRBuilder. * For comparison: OpenMPLowering in FC lowers to 9 runtime calls [1]. Clang for the host runtime part knows about 64 [2]. Once the above is sorted out we can discuss other things but IMHO arguments are all over the place right now which makes it hard to justify anything. I mean, OpenMPIRBuilder is about creating OpenMP runtime calls for OpenMP directives. To me it seems a lot of the arguments below talk about high-level transformations on MLIR and for which I have a hard time to relate them to the OpenMPIRBuilder. Cheers, Johannes [0] https://github.com/compiler-tree-technologies/fc/blob/master/lib/codegen/CGOpenMP.cpp [1] https://github.com/compiler-tree-technologies/fc/blob/master/lib/transforms/OpenMPLowering.cpp [2] https://github.com/llvm/llvm-project/blob/master/clang/lib/CodeGen/CGOpenMPRuntime.cpp#L568> 1. > > [May 2019] h > ttp://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html > <http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html> > 2. > > [June 2019] > http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-June/000251.html > > > However I could not find any conclusions for the concerns raised by Kiran: > > > 1. > > Early outlining (in MLIR) vs. Late outlining (in LLVM) > 2. > > Handling of target constructions: high-level transformations for GPUs > and CPUs (offloading in LLVM vs. MLIR?) > > > Kiran seems to suggest the early outlining (version 2) would be better( > http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000224.html > ). But currently, the late outlining has been implemented in LLVM (version > 1) ( > https://github.com/llvm/llvm-project/blob/master/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp > ). > > Early outlining in MLIR would have the following benefits as suggested in > the thread: > > > > 1. > > Enables more optimization in MLIR (intra-procedural because of regions). > 2. > > Offloading in MLIR (which is designed for heterogenous hardware > compilation support) > 3. > > Direct LLVM Dialect lowering of OpenMP operations (no LLVM IR lowering) > > > MLIR google groups discussion (*h* > ttps://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw > <https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw>) > regarding the use of OpenMPIRBuilder doesn’t seem to discuss the above > concerns and also about how the various design decisions in OpenMPIRBuilder > affects MLIR in general. > > Also, > > > “The point here is that we do not want to use MLIR just as a > pass-through layer because MLIR has a lot of strengths” > > .... > > > “The point here is that if we lower to LLVM dialect, we will not be > able to reuse OpenMP codegen & optimisation code from Clang/LLVM.” > > --- by Kiran in > https://lists.llvm.org/pipermail/llvm-dev/2020-February/139181.html > > In the latest reply from Kiran (quoted above) to this thread, Kiran seems > to suggest that lowering to LLVM Dialect (instead of LLVM IR) would > restrict the use of OpenMP Optimization code from LLVM and also MLIR will > just be a pass-through to the OpenMPIRBuilder. > > Because of the above reasons, it seems to me that design considerations of > using OpenMPIRBuilder for MLIR should also be mentioned (and discussed) > before commiting LLVM IR lowering part for OpenMP dialect in > https://reviews.llvm.org/D72962 > > > > > Mehdi also seems to have the same suggestion: “I agree that having > >> dialect lowering would be cleaner” in https://reviews.llvm.org/D72962 > >> > > > > Since you're calling me out: yes it would be cleaner from a pure MLIR > > point of view, I don't think there is much disagreement on this (I think?). > > However we already have the OpenMP builders available and they will > > continue to be maintained/evolved to support OpenMP in clang. > > Duplicating them entirely in MLIR for the sake of purity seems like a lack > > of pragmatism here, so I support the current approach with the current > > tradeoffs. > > > > > >> > >> > Yes, the design has mildly changed over time to incorporate feedback. > >> But the latest is what is there in the RFC in discourse. > >> > >> RFC fails to discuss the following (I have also mentioned some of them in > >> my reply to Johannes): > >> > >> > The proposed plan involves a) lowering F18 AST with OpenMP directly to > >> a mix of OpenMP and FIR dialects. b) converting this finally to a mix of > >> OpenMP and LLVM dialects. > >> > >> It is unclear in the RFC what other dialects are considered as supported > >> for OpenMP dialect (std, affine, vector, loop, etc) and how it would be > >> transformed, used and lowered from FIR to LLVM. > >> > >> It becomes important to list down the various dialects / operations / > >> types supported for OpenMP (which is mainly defined for C, C++ and Fortran > >> programs. MLIR has a much wider scope. > >> > >> It wouldn’t add much value for the proposed OpenMP dialect to be in the > >> MLIR tree if it cannot support at least the relevant standard dialect types > >> / operations. > >> > > > > I agree, and I think this was something I called out as important in the > > RFC: "It seems that the dialect can be orthogonal to FIR and its type > > system, which the most important thing to me to integrate MLIR (favor > > reusability across other frontends / compiler frameworks)". > > If you don't think that this is the case, then please raise this in the > > RFC! > > I think it is perfectly fair to ask for more examples from the author and > > digging a bit deeper if you're unconvinced that the proposed modeling can > > be applicable outside of FIR. This is exactly why we ask such proposal to > > go through RFC by the way: to allow people like you to point at the > > blindspot and ask the right questions. > > > > Best, > > > > -- > > Mehdi > > > > > > > >> > We would like to take advantage of the transformations in cases that > >> are possible. FIR loops will be converted to affine/loop dialect. So the > >> loop inside an omp.do can be in these dialects as clarified in the > >> discussion in discourse and also shown in slide 20 of the fosdem > >> presentation (links to both below). > >> > >> > >> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397/7?u=kiranchandramohan > >> > >> > >> https://fosdem.org/2020/schedule/event/llvm_flang/attachments/slides/3839/export/events/attachments/llvm_flang/slides/3839/flang_llvm_frontend.pdf > >> > >> Although it is mentioned that the affine/ loop.for is used, following > >> things are unclear: > >> > >> I am assuming that there will be lowering / conversion code in f18 repo > >> dialect from fir.do to loop.for / affine.for. Is it the case? If so, I > >> think it is worth mentioning it in the “sequential code flow > >> representation” in the RFC. > >> > >> This raises the following questions. > >> > >> > >> > >> 1. > >> > >> Which types are supported? Standard dialect types and FIR types? > >> > >> > >> For example, what types are used for Fortran arrays used inside OpenMP > >> regions? Is it std.memref OR Fortran array representation in FIR dialect > >> (fir.array?) OR both? Note that Fortran has support for column major > >> arrays. std.memref supports custom memory layouts. What custom layouts are > >> supported? > >> > >> > >> How would different non-scalar types in standard dialect be lowered to > >> LLVM IR and passed to OpenMP runtime calls? Can you please elaborate on > >> this? > >> > >> The example provided in slide 20 of the fosdem presentation contains > >> > >> “loop.for %j = %lb2 to %ub2 : !integer {“ > >> > >> But loop.for accepts “index” type. Not sure what type “!integer” > >> represents here. > >> > >> > >> 1. > >> > >> What are the different memory access operations which are supported > >> inside the OpenMP region and lowered to proper OpenMP runtime calls in LLVM > >> IR? > >> > >> > >> The possibilities are: > >> > >> 1. > >> > >> affine.load / affine.store > >> 2. > >> > >> std.load / std.store > >> 3. > >> > >> FIR dialect memory access operations. > >> > >> > >> > I must also point out that the question of where to do loop > >> transformations is a topic we have not fully converged on. See the > >> following thread for discussions. > >> http://lists.llvm.org/pipermail/flang-dev/2019-September/000042.html > >> > >> Looks like placement (MLIR / LLVM) of various transformations related to > >> OpenMP has not been finalized, from what I could infer from Johannes’s > >> reply and the below text in the latest RFC in discourse: > >> > >> “So there exist some questions regarding where the optimisations should > >> be carried out. We will decide on which framework to choose only after some > >> experimentation.” > >> > >> > i) we need to keep the loops separately so as to take advantage of > >> transformations that other dialects like affine/loop would provide. > >> > >> 1) Keeping the loops separate from the OpenMP operations will expose them > >> to the “regular” transformations passes in MLIR inside the OpenMP region. > >> Most of them are invalid or in-efficient for OpenMP operations. > >> > >> Examples: > >> > >> 1. > >> > >> Constant propagation example mentioned by Johannes in this thread. > >> (omp task shared(x)) > >> 2. > >> > >> Loop (nest) transformations (permute / split / fuse / tile, etc) will > >> happen ignoring the surrounding OpenMP operations. > >> 3. > >> > >> Hoisting and sinking of various memory/ SSA values inside the OpenMP > >> region. This goes against the likes of “map”, “firstprivate”, shared, etc > >> clauses and more. > >> > >> > >> 2) Various loop operations (loop.for, affine.for, fir.do) have (or will > >> have) different transformations/ optimization passes which are different > >> from one another. > >> > >> Example: > >> > >> 1. > >> > >> AffineLoopInvariantCodeMotion.cpp is different from > >> LoopInvariantCodeMotion.cpp. > >> 2. > >> > >> Other Loop transformation passes for affine.for > >> > >> > >> These loops also use different Types and memory access operations in > >> general for transformations. Example, most Affine dialect transformations > >> (if not all) work on affine.load and affine.store operations. > >> > >> Supporting different loop operations means that there would be *OpenMP > >> specific transformations* for each one of them and also requires a way to > >> restrict each of them from existing transformations (when nested in OpenMP > >> constructs). > >> > >> There would be different lowerings for different loop operations as well. > >> Example, affine.for and loop.for would have to be lowered to omp.do in > >> different ways. > >> > >> From slide 20 of fosdem presentation you mentioned, the LLVM + OpenMP > >> dialect representation is as follows: > >> > >> ------------------------------ > >> > >> Mlir.region(…) { > >> > >> omp.parallel { > >> > >> %ub3 = … > >> > >> omp.do %i = 0 to %ub3 : !integer { > >> > >> … > >> > >> } > >> > >> } > >> > >> } > >> > >> ------------------------------- > >> > >> Currently, the LLVM Dialect doesn’t contain a high level loop operation. > >> It is all based on CFG implementation. > >> > >> Will omp.do follow the same structure (SizedRegion<1>) as loop.for? OR > >> there would be CFG for LLVM Dialect based loop operation? > >> > >> Can you please mention how the OpenMP + LLVM dialect will look like for > >> the below parallel do construct? > >> > >> integer :: i=1, k=10 > >> > >> integer :: a(10), b(10), c(10) > >> > >> ... > >> > >> !$omp parallel do > >> > >> do i = 1, k > >> > >> if (i .ne. 1) *cycle* > >> > >> c(i) = a(i) + b(i) > >> > >> end do > >> > >> !$omp end parallel do > >> > >> print *,c > >> > > > >> Thanks, > >> > >> Vinay > >> > >> On Fri, Feb 14, 2020 at 6:52 AM Kiran Chandramohan via llvm-dev < > >> llvm-dev at lists.llvm.org> wrote: > >> > >>> Hello Vinay, > >>> > >>> Thanks for your mail about the OpenMP dialect in MLIR. Happy to know > >>> that you and several other groups are interested in the OpenMP dialect. At > >>> the outset, I must point out that the design is not set in stone and will > >>> change as we make progress. You are welcome to participate, provide > >>> feedback and criticism to change the design as well as to contribute to the > >>> implementation. I provide some clarifications and replies to your comments > >>> below. If it is OK we can have further discussions in discourse as River > >>> points out. > >>> > >>> 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and > >>> clang frontends. Note that this proposal was before considering MLIR for > >>> FIR. > >>> > >>> A correction here. The proposal for OpenMPIRBuilder was made when MLIR > >>> was being considered for FIR. > >>> (i) Gary Klimowicz's minutes for Flang call in April 2019 mentions > >>> considering MLIR for FIR. > >>> > >>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-April/000194.html > >>> (ii) My reply to Johaness's proposal in May 2019 mentions MLIR for FIR. > >>> > >>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000220.html > >>> > >>> b. Review of barrier construct is in progress: > >>> https://reviews.llvm.org/D72962 > >>> > >>> Minor correction here. The addition of barrier construct was accepted > >>> and has landed (https://reviews.llvm.org/D7240 > >>> <https://reviews.llvm.org/D72400>). It is the review for translation to > >>> LLVM IR that is in progress. > >>> > >>> It looks like the design has evolved over time and there is no one place > >>> which contains the latest design decisions that fits all the different > >>> pieces of the puzzle. I will try to deduce it from the above mentioned > >>> references. Please correct me If I am referring to anything which has > >>> changed. > >>> > >>> Yes, the design has mildly changed over time to incorporate feedback. > >>> But the latest is what is there in the RFC in discourse. > >>> > >>> For most OpenMP design discussions, FIR examples are used (as seen in > >>> (2) and (3)). The MLIR examples mentioned in the design only talks about > >>> FIR dialect and LLVM dialect. > >>> > >>> Our initial concern was how will all these pieces (FIR, LLVM Dialect, > >>> OpenMPIRBuilder, LLVM IR) fit together. Hence you see the prominence of FIR > >>> and LLVM dialect and more information about lowering/translation than > >>> transformations/optimisations. > >>> > >>> This completely ignores the likes of standard, affine (where most loop > >>> transformations are supposed to happen) and loop dialects. > >>> > >>> Adding to the reply above. We would like to take advantage of the > >>> transformations in cases that are possible. FIR loops will be converted to > >>> affine/loop dialect. So the loop inside an omp.do can be in these dialects > >>> as clarified in the discussion in discourse and also shown in slide 20 of > >>> the fosdem presentation (links to both below). > >>> > >>> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397/7?u=kiranchandramohan > >>> > >>> https://fosdem.org/2020/schedule/event/llvm_flang/attachments/slides/3839/export/events/attachments/llvm_flang/slides/3839/flang_llvm_frontend.pdf > >>> > >>> I must also point out that the question of where to do loop > >>> transformations is a topic we have not fully converged on. See the > >>> following thread for discussions. > >>> http://lists.llvm.org/pipermail/flang-dev/2019-September/000042.html > >>> > >>> Is it the same omp.do operation which now contains the bounds and > >>> induction variables of the loop after the LLVM conversion? > >>> > >>> The point here is that i) we need to keep the loops separately so as to > >>> take advantage of transformations that other dialects like affine/loop > >>> would provide. ii) We will need the loop information while lowering the > >>> OpenMP do operation. For implementation, if reusing the same operation (in > >>> different contexts) is difficult then we can add a new operation. > >>> > >>> It is also not mentioned how clauses like firstprivate, shared, > >>> private, reduce, map, etc are lowered to OpenMP dialect. > >>> > >>> Yes, it is not mentioned. We did a study of a few constructs and clauses > >>> which was shared as mails to flang-dev and the RFC. As we make progress and > >>> before implementation, we will share further details. > >>> > >>> it would be beneficial to have an omp.parallel_do operation which has > >>> semantics similar to other loop structures (may not be LoopLikeInterface) > >>> in MLIR. > >>> > >>> I am not against adding parallel_do if it can help with transformations > >>> or reduce the complexity of lowering. Please share the details in discourse > >>> as a reply to the RFC or a separate thread. > >>> > >>> it looks like having OpenMP operations based on standard MLIR types and > >>> operations (scalars and memrefs mainly) is the right way to go. > >>> > >>> This will definitely be the first version that we implement. But I do > >>> not understand why we should restrict to only the standard types and > >>> operations. To ease lowering and translation and to avoid adding OpenMP > >>> operations to other dialects, I believe OpenMP dialect should also be able > >>> to exist with other dialects like FIR and LLVM. > >>> > >>> E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct > >>> lowering to LLVM IR ignoring all the advantages that MLIR provides. > >>> > >>> Also, OpenMP codegen will automatically benefit from the GPU dialect > >>> based optimizations. For example, it would be way easier to hoist a memory > >>> reference out of GPU kernel in MLIR than in LLVM IR. > >>> > >>> I might not have fully understood you here. But the dialect lives > >>> independently of the translation to LLVM IR. If there are optimisations > >>> (like hoisting that you mention here) I believe they can be performed as > >>> transformation passes on the dialect. It is not ruled out. > >>> > >>> --Kiran > >>> ------------------------------ > >>> *From:* flang-dev <flang-dev-bounces at lists.llvm.org> on behalf of Vinay > >>> Madhusudan via flang-dev <flang-dev at lists.llvm.org> > >>> *Sent:* 13 February 2020 16:33 > >>> *To:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>; > >>> flang-dev at lists.llvm.org <flang-dev at lists.llvm.org> > >>> *Subject:* [flang-dev] About OpenMP dialect in MLIR > >>> > >>> > >>> Hi, > >>> > >>> I have few questions / concerns regarding the design of OpenMP dialect > >>> in MLIR that is currently being implemented, mainly for the f18 compiler. > >>> Below, I summarize the current state of various efforts in clang / f18 / > >>> MLIR / LLVM regarding this. Feel free to add to the list in case I have > >>> missed something. > >>> > >>> 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and > >>> clang frontends. Note that this proposal was before considering MLIR for > >>> FIR. > >>> > >>> a. llvm-dev proposal : > >>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html > >>> > >>> b. Patches in review: https://reviews.llvm.org/D70290. This also > >>> includes the clang codegen changes. > >>> > >>> 2. [July - September 2019] OpenMP dialect for MLIR was discussed / > >>> proposed with respect to the f18 compilation stack (keeping FIR in mind). > >>> > >>> a. flang-dev discussion link: > >>> https://lists.llvm.org/pipermail/flang-dev/2019-September/000020.html > >>> > >>> b. Design decisions captured in PPT: > >>> https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view > >>> > >>> c. MLIR google groups discussion: > >>> https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw > >>> > >>> d. Target constructs design: > >>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html > >>> > >>> e. SIMD constructs design: > >>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html > >>> > >>> 3. [Jan 2020] OpenMP dialect RFC in llvm discourse : > >>> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397 > >>> > >>> 4. [Jan- Feb 2020] Implementation of OpenMP dialect in MLIR: > >>> > >>> a. The first patch which introduces the OpenMP dialect was pushed. > >>> > >>> b. Review of barrier construct is in progress: > >>> https://reviews.llvm.org/D72962 > >>> https://reviews.llvm.org/D72400 > >>> > >>> I have tried to list below different topics of interest (to different > >>> people) around this work. Most of these are in the design phase (or very > >>> new) and multiple parties are interested with different sets of goals in > >>> mind. > >>> > >>> I. Flang frontend and its integration > >>> > >>> II. Fortran representation in MLIR / FIR development > >>> > >>> III. OpenMP development for flang, OpenMP builder in LLVM. > >>> > >>> IV. Loop Transformations in MLIR / LLVM with respect to OpenMP. > >>> > >>> It looks like the design has evolved over time and there is no one place > >>> which contains the latest design decisions that fits all the different > >>> pieces of the puzzle. I will try to deduce it from the above mentioned > >>> references. Please correct me If I am referring to anything which has > >>> changed. > >>> > >>> A. For most OpenMP design discussions, FIR examples are used (as seen > >>> in (2) and (3)). The MLIR examples mentioned in the design only talks about > >>> FIR dialect and LLVM dialect. > >>> > >>> This completely ignores the likes of standard, affine (where most loop > >>> transformations are supposed to happen) and loop dialects. I think it is > >>> critical to decouple the OpenMP dialect development in MLIR from the > >>> current flang / FIR effort. It would be useful if someone can mention these > >>> examples using existing dialects in MLIR and also how the different > >>> transformations / lowerings are planned. > >>> > >>> B. In latest RFC(3), it is mentioned that the initial OpenMP dialect > >>> version will be as follows, > >>> > >>> omp.parallel { > >>> > >>> omp.do { > >>> > >>> fir.do %i = 0 to %ub3 : !fir.integer { > >>> > >>> ... > >>> > >>> } > >>> > >>> } > >>> > >>> } > >>> > >>> and then after the "LLVM conversion" it is converted as follows: > >>> > >>> omp.parallel { > >>> > >>> %ub3 > >>> > >>> omp.do %i = 0 to %ub3 : !llvm.integer { > >>> > >>> ... > >>> > >>> } > >>> > >>> } > >>> > >>> > >>> a. Is it the same omp.do operation which now contains the bounds and > >>> induction variables of the loop after the LLVM conversion? If so, will the > >>> same operation have two different semantics during a single compilation? > >>> > >>> b. Will there be different lowerings for various loop operations from > >>> different dialects? loop.for and affine.for under omp operations would need > >>> different OpenMP / LLVM lowerings. Currently, both of them are lowered > >>> to the CFG based loops during the LLVM dialect conversion (which is much > >>> before the proposed OpenMP dialect lowering). > >>> > >>> There would be no standard way to represent OpenMP operations > >>> (especially the ones which involve loops) in MLIR. This would drastically > >>> complicate lowering. > >>> > >>> C. It is also not mentioned how clauses like firstprivate, shared, > >>> private, reduce, map, etc are lowered to OpenMP dialect. The example in > >>> the RFC contains FIR and LLVM types and nothing about std dialect types. > >>> Consider the below example: > >>> > >>> #pragma omp parallel for reduction(+:x) > >>> > >>> for (int i = 0; i < N; ++i) > >>> > >>> x += a[i]; > >>> > >>> How would the above be represented in OpenMP dialect? and What type > >>> would "x" be in MLIR? It is not mentioned in the design as to how the > >>> various SSA values for various OpenMP clauses are passed around in OpenMP > >>> operations. > >>> > >>> D. Because of (A), (B) and (C), it would be beneficial to have an omp. > >>> parallel_do operation which has semantics similar to other loop > >>> structures (may not be LoopLikeInterface) in MLIR. To me, it looks like > >>> having OpenMP operations based on standard MLIR types and operations > >>> (scalars and memrefs mainly) is the right way to go. > >>> > >>> Why not have omp.parallel_do operation with AffineMap based bounds, so > >>> as to decouple it from Value/Type similar to affine.for? > >>> > >>> 1. With the current design, the number of transformations / > >>> optimizations that one can write on OpenMP constructs would become limited > >>> as there can be any custom loop structure with custom operations / types > >>> inside it. > >>> > >>> 2. It would also be easier to transform the Loop nests containing OpenMP > >>> constructs if the body of the OpenMP operations is well defined (i.e., does > >>> not accept arbitrary loop structures). Having nested redundant "parallel" , > >>> "target" and "do" regions seems unnecessary. > >>> > >>> 3. There would also be new sets of loop structures in new dialects when > >>> C/C++ is compiled to MLIR. It would complicate the number of possible > >>> combinations inside the OpenMP region. > >>> > >>> E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct > >>> lowering to LLVM IR ignoring all the advantages that MLIR provides. Being > >>> able to compile the code for heterogeneous hardware is one of the biggest > >>> advantages that MLIR brings to the table. That is being completely missed > >>> here. This also requires solving the problem of handling target information > >>> in MLIR. But that is a problem which needs to be solved anyway. Using GPU > >>> dialect also gives us an opportunity to represent offloading semantics in > >>> MLIR. > >>> > >>> Given the ability to represent multiple ModuleOps and the existence of > >>> GPU dialect, couldn't higher level optimizations on offloaded code be done > >>> at MLIR level?. The proposed design would lead us to the same problems that > >>> we are currently facing in LLVM IR. > >>> > >>> Also, OpenMP codegen will automatically benefit from the GPU dialect > >>> based optimizations. For example, it would be way easier to hoist a memory > >>> reference out of GPU kernel in MLIR than in LLVM IR. > >>> > >>> Thanks, > >>> > >>> Vinay > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> llvm-dev at lists.llvm.org > >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> > >> _______________________________________________ > flang-dev mailing list > flang-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/flang-dev-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200217/77004c70/attachment.sig>
Mehdi AMINI via llvm-dev
2020-Feb-18 02:32 UTC
[llvm-dev] [flang-dev] About OpenMP dialect in MLIR
On Mon, Feb 17, 2020 at 10:29 AM Vinay Madhusudan <vinay at compilertree.com> wrote:> Please find the reply inline below > > On Sun, Feb 16, 2020 at 12:59 AM Mehdi AMINI <joker.eph at gmail.com> wrote: > >> >> >> On Sat, Feb 15, 2020 at 10:42 AM Vinay Madhusudan via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Reply to Kiran Chandramohan: >>> >>> > You are welcome to participate, provide feedback and criticism to >>> change the design as well as to contribute to the implementation. >>> >>> Thank you Kiran. >>> >>> > But the latest is what is there in the RFC in discourse. >>> >>> I have used this as reference for the response. >>> >>> > We did a study of a few constructs and clauses which was shared as >>> mails to flang-dev and the RFC. As we make progress and before >>> implementation, we will share further details. >>> >>> > “ Yes, parallel and flush would be the next two constructs that we >>> will do.” -- from a comment in latest RFC >>> >>> For the above mentioned reasons, I will try to restrict my reply to how >>> the “parallel (do)” construct would be lowered. >>> >>> > If it is OK we can have further discussions in discourse as River >>> points out. >>> >>> Given that the multiple components of the LLVM project, namely clang, >>> flang, MLIR and LLVM are involved, llvm-dev is probably a better place, >>> with a much wider audience >>> >> >> Possibly wider, but maybe less focused about discussing MLIR dialect >> design. In particular there is an RFC thread for this particular dialect on >> Discourse, which is the canonical place to discuss its design. >> >> >>> , until it is clear how different components must interact. >>> >> >> They don't need to interact so closely: they are very loosely related: >> flang will use MLIR but clang won't (in the foreseeable future) and LLVM >> has many other frontends. >> >> >>> >>> > It is the review for translation to LLVM IR that is in progress. >>> >>> > “If we decide that the OpenMP construct (for e.g. collapse) can be >>> handled fully in MLIR and that is the best place to do it (based on >>> experiments) then we will not use the OpenMP IRBuilder for these constructs.” >>> -- latest RFC in discourse >>> >>> If it is not finalized that the OpenMPIRBuilder will be used for all the >>> constructs, wouldn’t it be better to delay the submission of “translation >>> to LLVM IR” patch in MLIR? Lowering code will become inconsistent if the >>> OpenMPIRBuilder is used only for a few constructs and not for others. >>> >> >>> Also, the patch does OpenMP dialect lowering *alongside* LLVM Dialect to >>> LLVM IR. This is different from most dialects which get directly lowered to >>> LLVM Dialect. I think lowering to LLVM Dialect would be a cleaner way if >>> OpenMPIRBuilder is not being considered for all constructs. >>> >> >> >> I don't disagree, but there are a lot of speculation here: your quote >> starts with "If we decide that the OpenMP construct (for e.g. collapse) can >> be handled fully in MLIR", are you thinking that we need to first decide >> this once and for all before making progress on building this path? >> What disadvantages do you perceive to an approach where we would bring up >> this dialect using the OpenMPIRBuilders for exporting to LLVM IR until we >> gain enough experience? Do you think starting like this will make it >> significantly harder to transition away from the builders if this is what >> we want? >> It seemed to me like it wouldn't, and that's why I'm supportive of this >> path: the omp dialect design, implementation, and the >> transformation/analysis that will be performed there seems entirely >> disjoint from the LLVM lowering, I'd hope we can swap the LLVM lowering at >> a later time (if this is what we'd want). >> >> >>> >>> > The statement you quoted is from the RFC in discourse by Kiran. It is > actually unclear to whom you are referring to here. I am assuming that it > is for him to answer. >No I'm asking you. You quoted Kiran and you concluded from this quote "wouldn’t it be better to delay the submission [...]". I am questioning this aspect in particular when I wrote "are you thinking that we need to first decide this once and for all before making progress on building this path?" This question and the following are important to answer, it isn't clear to that you did in you answer below. In particular "Do you think starting like this will make it significantly harder to transition away from the builders if this is what we want?" is important: even if using the OpenMPIRBuilder would be suboptimal on the long-term, how much of it would be a problem to replace later? It seems to me that it shouldn't limit anything, unless you plan to write optimization on the LLVM Dialect itself. Best, -- Mehdi> > The below details would cover some of your questions as well. > > About Clang / MLIR / LLVM being loosely related and not being relevant in > llvm-dev: > > With the introduction of the OpenMPIRBuilder in MLIR (from this review : > https://reviews.llvm.org/D72962), Clang and MLIR would now have the > common code for building OpenMP constructs. I do not think it is so loosely > related anymore. Note that MLIR and Clang frontends for LLVM are very > different. Clang emits LLVM IR with almost no optimizations and MLIR > already supports considerable amount of optimizations. > > Decision of using the OpenMPIRBuilder for MLIR was discussed in the > following flang-dev threads (Please correct me If I am missing some newer > discussions on the below topics) > > > 1. > > [May 2019] h > ttp://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html > <http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html> > 2. > > [June 2019] > http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-June/000251.html > > > However I could not find any conclusions for the concerns raised by Kiran: > > > 1. > > Early outlining (in MLIR) vs. Late outlining (in LLVM) > 2. > > Handling of target constructions: high-level transformations for GPUs > and CPUs (offloading in LLVM vs. MLIR?) > > > Kiran seems to suggest the early outlining (version 2) would be better( > http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000224.html > ). But currently, the late outlining has been implemented in LLVM > (version 1) ( > https://github.com/llvm/llvm-project/blob/master/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp > ). > > Early outlining in MLIR would have the following benefits as suggested in > the thread: > > > > 1. > > Enables more optimization in MLIR (intra-procedural because of > regions). > 2. > > Offloading in MLIR (which is designed for heterogenous hardware > compilation support) > 3. > > Direct LLVM Dialect lowering of OpenMP operations (no LLVM IR lowering) > > > MLIR google groups discussion (*h* > ttps://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw > <https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw>) > regarding the use of OpenMPIRBuilder doesn’t seem to discuss the above > concerns and also about how the various design decisions in OpenMPIRBuilder > affects MLIR in general. > > Also, > > > “The point here is that we do not want to use MLIR just as a > pass-through layer because MLIR has a lot of strengths” > > .... > > > “The point here is that if we lower to LLVM dialect, we will not be > able to reuse OpenMP codegen & optimisation code from Clang/LLVM.” > > --- by Kiran in > https://lists.llvm.org/pipermail/llvm-dev/2020-February/139181.html > > In the latest reply from Kiran (quoted above) to this thread, Kiran seems > to suggest that lowering to LLVM Dialect (instead of LLVM IR) would > restrict the use of OpenMP Optimization code from LLVM and also MLIR will > just be a pass-through to the OpenMPIRBuilder. >> Because of the above reasons, it seems to me that design considerations of > using OpenMPIRBuilder for MLIR should also be mentioned (and discussed) > before commiting LLVM IR lowering part for OpenMP dialect in > https://reviews.llvm.org/D72962 >Mehdi also seems to have the same suggestion: “I agree that having dialect>>> lowering would be cleaner” in https://reviews.llvm.org/D72962 >>> >> >> Since you're calling me out: yes it would be cleaner from a pure MLIR >> point of view, I don't think there is much disagreement on this (I think?). >> However we already have the OpenMP builders available and they will >> continue to be maintained/evolved to support OpenMP in clang. >> Duplicating them entirely in MLIR for the sake of purity seems like a >> lack of pragmatism here, so I support the current approach with the current >> tradeoffs. >> >> >>> >>> > Yes, the design has mildly changed over time to incorporate feedback. >>> But the latest is what is there in the RFC in discourse. >>> >>> RFC fails to discuss the following (I have also mentioned some of them >>> in my reply to Johannes): >>> >>> > The proposed plan involves a) lowering F18 AST with OpenMP directly to >>> a mix of OpenMP and FIR dialects. b) converting this finally to a mix of >>> OpenMP and LLVM dialects. >>> >>> It is unclear in the RFC what other dialects are considered as supported >>> for OpenMP dialect (std, affine, vector, loop, etc) and how it would be >>> transformed, used and lowered from FIR to LLVM. >>> >>> It becomes important to list down the various dialects / operations / >>> types supported for OpenMP (which is mainly defined for C, C++ and Fortran >>> programs. MLIR has a much wider scope. >>> >>> It wouldn’t add much value for the proposed OpenMP dialect to be in the >>> MLIR tree if it cannot support at least the relevant standard dialect types >>> / operations. >>> >> >> I agree, and I think this was something I called out as important in the >> RFC: "It seems that the dialect can be orthogonal to FIR and its type >> system, which the most important thing to me to integrate MLIR (favor >> reusability across other frontends / compiler frameworks)". >> If you don't think that this is the case, then please raise this in the >> RFC! >> I think it is perfectly fair to ask for more examples from the author and >> digging a bit deeper if you're unconvinced that the proposed modeling can >> be applicable outside of FIR. This is exactly why we ask such proposal to >> go through RFC by the way: to allow people like you to point at the >> blindspot and ask the right questions. >> >> Best, >> >> -- >> Mehdi >> >> >> >>> > We would like to take advantage of the transformations in cases that >>> are possible. FIR loops will be converted to affine/loop dialect. So the >>> loop inside an omp.do can be in these dialects as clarified in the >>> discussion in discourse and also shown in slide 20 of the fosdem >>> presentation (links to both below). >>> >>> >>> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397/7?u=kiranchandramohan >>> >>> >>> https://fosdem.org/2020/schedule/event/llvm_flang/attachments/slides/3839/export/events/attachments/llvm_flang/slides/3839/flang_llvm_frontend.pdf >>> >>> Although it is mentioned that the affine/ loop.for is used, following >>> things are unclear: >>> >>> I am assuming that there will be lowering / conversion code in f18 repo >>> dialect from fir.do to loop.for / affine.for. Is it the case? If so, I >>> think it is worth mentioning it in the “sequential code flow >>> representation” in the RFC. >>> >>> This raises the following questions. >>> >>> >>> >>> 1. >>> >>> Which types are supported? Standard dialect types and FIR types? >>> >>> >>> For example, what types are used for Fortran arrays used inside OpenMP >>> regions? Is it std.memref OR Fortran array representation in FIR dialect >>> (fir.array?) OR both? Note that Fortran has support for column major >>> arrays. std.memref supports custom memory layouts. What custom layouts are >>> supported? >>> >>> >>> How would different non-scalar types in standard dialect be lowered to >>> LLVM IR and passed to OpenMP runtime calls? Can you please elaborate on >>> this? >>> >>> The example provided in slide 20 of the fosdem presentation contains >>> >>> “loop.for %j = %lb2 to %ub2 : !integer {“ >>> >>> But loop.for accepts “index” type. Not sure what type “!integer” >>> represents here. >>> >>> >>> 1. >>> >>> What are the different memory access operations which are supported >>> inside the OpenMP region and lowered to proper OpenMP runtime calls in LLVM >>> IR? >>> >>> >>> The possibilities are: >>> >>> 1. >>> >>> affine.load / affine.store >>> 2. >>> >>> std.load / std.store >>> 3. >>> >>> FIR dialect memory access operations. >>> >>> >>> > I must also point out that the question of where to do loop >>> transformations is a topic we have not fully converged on. See the >>> following thread for discussions. >>> http://lists.llvm.org/pipermail/flang-dev/2019-September/000042.html >>> >>> Looks like placement (MLIR / LLVM) of various transformations related to >>> OpenMP has not been finalized, from what I could infer from Johannes’s >>> reply and the below text in the latest RFC in discourse: >>> >>> “So there exist some questions regarding where the optimisations should >>> be carried out. We will decide on which framework to choose only after some >>> experimentation.” >>> >>> > i) we need to keep the loops separately so as to take advantage of >>> transformations that other dialects like affine/loop would provide. >>> >>> 1) Keeping the loops separate from the OpenMP operations will expose >>> them to the “regular” transformations passes in MLIR inside the OpenMP >>> region. Most of them are invalid or in-efficient for OpenMP operations. >>> >>> Examples: >>> >>> 1. >>> >>> Constant propagation example mentioned by Johannes in this thread. >>> (omp task shared(x)) >>> 2. >>> >>> Loop (nest) transformations (permute / split / fuse / tile, etc) >>> will happen ignoring the surrounding OpenMP operations. >>> 3. >>> >>> Hoisting and sinking of various memory/ SSA values inside the OpenMP >>> region. This goes against the likes of “map”, “firstprivate”, shared, etc >>> clauses and more. >>> >>> >>> 2) Various loop operations (loop.for, affine.for, fir.do) have (or will >>> have) different transformations/ optimization passes which are different >>> from one another. >>> >>> Example: >>> >>> 1. >>> >>> AffineLoopInvariantCodeMotion.cpp is different from >>> LoopInvariantCodeMotion.cpp. >>> 2. >>> >>> Other Loop transformation passes for affine.for >>> >>> >>> These loops also use different Types and memory access operations in >>> general for transformations. Example, most Affine dialect transformations >>> (if not all) work on affine.load and affine.store operations. >>> >>> Supporting different loop operations means that there would be *OpenMP >>> specific transformations* for each one of them and also requires a way to >>> restrict each of them from existing transformations (when nested in OpenMP >>> constructs). >>> >>> There would be different lowerings for different loop operations as >>> well. Example, affine.for and loop.for would have to be lowered to omp.do >>> in different ways. >>> >>> From slide 20 of fosdem presentation you mentioned, the LLVM + OpenMP >>> dialect representation is as follows: >>> >>> ------------------------------ >>> >>> Mlir.region(…) { >>> >>> omp.parallel { >>> >>> %ub3 = … >>> >>> omp.do %i = 0 to %ub3 : !integer { >>> >>> … >>> >>> } >>> >>> } >>> >>> } >>> >>> ------------------------------- >>> >>> Currently, the LLVM Dialect doesn’t contain a high level loop operation. >>> It is all based on CFG implementation. >>> >>> Will omp.do follow the same structure (SizedRegion<1>) as loop.for? OR >>> there would be CFG for LLVM Dialect based loop operation? >>> >>> Can you please mention how the OpenMP + LLVM dialect will look like for >>> the below parallel do construct? >>> >>> integer :: i=1, k=10 >>> >>> integer :: a(10), b(10), c(10) >>> >>> ... >>> >>> !$omp parallel do >>> >>> do i = 1, k >>> >>> if (i .ne. 1) *cycle* >>> >>> c(i) = a(i) + b(i) >>> >>> end do >>> >>> !$omp end parallel do >>> >>> print *,c >>> >> >>> Thanks, >>> >>> Vinay >>> >>> On Fri, Feb 14, 2020 at 6:52 AM Kiran Chandramohan via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> Hello Vinay, >>>> >>>> Thanks for your mail about the OpenMP dialect in MLIR. Happy to know >>>> that you and several other groups are interested in the OpenMP dialect. At >>>> the outset, I must point out that the design is not set in stone and will >>>> change as we make progress. You are welcome to participate, provide >>>> feedback and criticism to change the design as well as to contribute to the >>>> implementation. I provide some clarifications and replies to your comments >>>> below. If it is OK we can have further discussions in discourse as River >>>> points out. >>>> >>>> 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and >>>> clang frontends. Note that this proposal was before considering MLIR for >>>> FIR. >>>> >>>> A correction here. The proposal for OpenMPIRBuilder was made when MLIR >>>> was being considered for FIR. >>>> (i) Gary Klimowicz's minutes for Flang call in April 2019 mentions >>>> considering MLIR for FIR. >>>> >>>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-April/000194.html >>>> (ii) My reply to Johaness's proposal in May 2019 mentions MLIR for FIR. >>>> >>>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000220.html >>>> >>>> b. Review of barrier construct is in progress: >>>> https://reviews.llvm.org/D72962 >>>> >>>> Minor correction here. The addition of barrier construct was accepted >>>> and has landed (https://reviews.llvm.org/D7240 >>>> <https://reviews.llvm.org/D72400>). It is the review for translation >>>> to LLVM IR that is in progress. >>>> >>>> It looks like the design has evolved over time and there is no one >>>> place which contains the latest design decisions that fits all the >>>> different pieces of the puzzle. I will try to deduce it from the above >>>> mentioned references. Please correct me If I am referring to anything which >>>> has changed. >>>> >>>> Yes, the design has mildly changed over time to incorporate feedback. >>>> But the latest is what is there in the RFC in discourse. >>>> >>>> For most OpenMP design discussions, FIR examples are used (as seen in >>>> (2) and (3)). The MLIR examples mentioned in the design only talks about >>>> FIR dialect and LLVM dialect. >>>> >>>> Our initial concern was how will all these pieces (FIR, LLVM Dialect, >>>> OpenMPIRBuilder, LLVM IR) fit together. Hence you see the prominence of FIR >>>> and LLVM dialect and more information about lowering/translation than >>>> transformations/optimisations. >>>> >>>> This completely ignores the likes of standard, affine (where most loop >>>> transformations are supposed to happen) and loop dialects. >>>> >>>> Adding to the reply above. We would like to take advantage of the >>>> transformations in cases that are possible. FIR loops will be converted to >>>> affine/loop dialect. So the loop inside an omp.do can be in these dialects >>>> as clarified in the discussion in discourse and also shown in slide 20 of >>>> the fosdem presentation (links to both below). >>>> >>>> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397/7?u=kiranchandramohan >>>> >>>> https://fosdem.org/2020/schedule/event/llvm_flang/attachments/slides/3839/export/events/attachments/llvm_flang/slides/3839/flang_llvm_frontend.pdf >>>> >>>> I must also point out that the question of where to do loop >>>> transformations is a topic we have not fully converged on. See the >>>> following thread for discussions. >>>> http://lists.llvm.org/pipermail/flang-dev/2019-September/000042.html >>>> >>>> Is it the same omp.do operation which now contains the bounds and >>>> induction variables of the loop after the LLVM conversion? >>>> >>>> The point here is that i) we need to keep the loops separately so as to >>>> take advantage of transformations that other dialects like affine/loop >>>> would provide. ii) We will need the loop information while lowering the >>>> OpenMP do operation. For implementation, if reusing the same operation (in >>>> different contexts) is difficult then we can add a new operation. >>>> >>>> It is also not mentioned how clauses like firstprivate, shared, >>>> private, reduce, map, etc are lowered to OpenMP dialect. >>>> >>>> Yes, it is not mentioned. We did a study of a few constructs and >>>> clauses which was shared as mails to flang-dev and the RFC. As we make >>>> progress and before implementation, we will share further details. >>>> >>>> it would be beneficial to have an omp.parallel_do operation which has >>>> semantics similar to other loop structures (may not be LoopLikeInterface) >>>> in MLIR. >>>> >>>> I am not against adding parallel_do if it can help with transformations >>>> or reduce the complexity of lowering. Please share the details in discourse >>>> as a reply to the RFC or a separate thread. >>>> >>>> it looks like having OpenMP operations based on standard MLIR types and >>>> operations (scalars and memrefs mainly) is the right way to go. >>>> >>>> This will definitely be the first version that we implement. But I do >>>> not understand why we should restrict to only the standard types and >>>> operations. To ease lowering and translation and to avoid adding OpenMP >>>> operations to other dialects, I believe OpenMP dialect should also be able >>>> to exist with other dialects like FIR and LLVM. >>>> >>>> E. Lowering of target constructs mentioned in ( 2(d) ) specifies >>>> direct lowering to LLVM IR ignoring all the advantages that MLIR provides. >>>> >>>> Also, OpenMP codegen will automatically benefit from the GPU dialect >>>> based optimizations. For example, it would be way easier to hoist a memory >>>> reference out of GPU kernel in MLIR than in LLVM IR. >>>> >>>> I might not have fully understood you here. But the dialect lives >>>> independently of the translation to LLVM IR. If there are optimisations >>>> (like hoisting that you mention here) I believe they can be performed as >>>> transformation passes on the dialect. It is not ruled out. >>>> >>>> --Kiran >>>> ------------------------------ >>>> *From:* flang-dev <flang-dev-bounces at lists.llvm.org> on behalf of >>>> Vinay Madhusudan via flang-dev <flang-dev at lists.llvm.org> >>>> *Sent:* 13 February 2020 16:33 >>>> *To:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>; >>>> flang-dev at lists.llvm.org <flang-dev at lists.llvm.org> >>>> *Subject:* [flang-dev] About OpenMP dialect in MLIR >>>> >>>> >>>> Hi, >>>> >>>> I have few questions / concerns regarding the design of OpenMP dialect >>>> in MLIR that is currently being implemented, mainly for the f18 compiler. >>>> Below, I summarize the current state of various efforts in clang / f18 / >>>> MLIR / LLVM regarding this. Feel free to add to the list in case I have >>>> missed something. >>>> >>>> 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and >>>> clang frontends. Note that this proposal was before considering MLIR for >>>> FIR. >>>> >>>> a. llvm-dev proposal : >>>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html >>>> >>>> b. Patches in review: https://reviews.llvm.org/D70290. This also >>>> includes the clang codegen changes. >>>> >>>> 2. [July - September 2019] OpenMP dialect for MLIR was discussed / >>>> proposed with respect to the f18 compilation stack (keeping FIR in mind). >>>> >>>> a. flang-dev discussion link: >>>> https://lists.llvm.org/pipermail/flang-dev/2019-September/000020.html >>>> >>>> b. Design decisions captured in PPT: >>>> https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view >>>> >>>> c. MLIR google groups discussion: >>>> https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw >>>> >>>> d. Target constructs design: >>>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html >>>> >>>> e. SIMD constructs design: >>>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html >>>> >>>> 3. [Jan 2020] OpenMP dialect RFC in llvm discourse : >>>> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397 >>>> >>>> 4. [Jan- Feb 2020] Implementation of OpenMP dialect in MLIR: >>>> >>>> a. The first patch which introduces the OpenMP dialect was pushed. >>>> >>>> b. Review of barrier construct is in progress: >>>> https://reviews.llvm.org/D72962 >>>> https://reviews.llvm.org/D72400 >>>> >>>> I have tried to list below different topics of interest (to different >>>> people) around this work. Most of these are in the design phase (or very >>>> new) and multiple parties are interested with different sets of goals in >>>> mind. >>>> >>>> I. Flang frontend and its integration >>>> >>>> II. Fortran representation in MLIR / FIR development >>>> >>>> III. OpenMP development for flang, OpenMP builder in LLVM. >>>> >>>> IV. Loop Transformations in MLIR / LLVM with respect to OpenMP. >>>> >>>> It looks like the design has evolved over time and there is no one >>>> place which contains the latest design decisions that fits all the >>>> different pieces of the puzzle. I will try to deduce it from the above >>>> mentioned references. Please correct me If I am referring to anything which >>>> has changed. >>>> >>>> A. For most OpenMP design discussions, FIR examples are used (as seen >>>> in (2) and (3)). The MLIR examples mentioned in the design only talks about >>>> FIR dialect and LLVM dialect. >>>> >>>> This completely ignores the likes of standard, affine (where most loop >>>> transformations are supposed to happen) and loop dialects. I think it is >>>> critical to decouple the OpenMP dialect development in MLIR from the >>>> current flang / FIR effort. It would be useful if someone can mention these >>>> examples using existing dialects in MLIR and also how the different >>>> transformations / lowerings are planned. >>>> >>>> B. In latest RFC(3), it is mentioned that the initial OpenMP dialect >>>> version will be as follows, >>>> >>>> omp.parallel { >>>> >>>> omp.do { >>>> >>>> fir.do %i = 0 to %ub3 : !fir.integer { >>>> >>>> ... >>>> >>>> } >>>> >>>> } >>>> >>>> } >>>> >>>> and then after the "LLVM conversion" it is converted as follows: >>>> >>>> omp.parallel { >>>> >>>> %ub3 >>>> >>>> omp.do %i = 0 to %ub3 : !llvm.integer { >>>> >>>> ... >>>> >>>> } >>>> >>>> } >>>> >>>> >>>> a. Is it the same omp.do operation which now contains the bounds and >>>> induction variables of the loop after the LLVM conversion? If so, will the >>>> same operation have two different semantics during a single compilation? >>>> >>>> b. Will there be different lowerings for various loop operations from >>>> different dialects? loop.for and affine.for under omp operations would need >>>> different OpenMP / LLVM lowerings. Currently, both of them are lowered >>>> to the CFG based loops during the LLVM dialect conversion (which is much >>>> before the proposed OpenMP dialect lowering). >>>> >>>> There would be no standard way to represent OpenMP operations >>>> (especially the ones which involve loops) in MLIR. This would drastically >>>> complicate lowering. >>>> >>>> C. It is also not mentioned how clauses like firstprivate, shared, >>>> private, reduce, map, etc are lowered to OpenMP dialect. The example >>>> in the RFC contains FIR and LLVM types and nothing about std dialect types. >>>> Consider the below example: >>>> >>>> #pragma omp parallel for reduction(+:x) >>>> >>>> for (int i = 0; i < N; ++i) >>>> >>>> x += a[i]; >>>> >>>> How would the above be represented in OpenMP dialect? and What type >>>> would "x" be in MLIR? It is not mentioned in the design as to how the >>>> various SSA values for various OpenMP clauses are passed around in OpenMP >>>> operations. >>>> >>>> D. Because of (A), (B) and (C), it would be beneficial to have an omp. >>>> parallel_do operation which has semantics similar to other loop >>>> structures (may not be LoopLikeInterface) in MLIR. To me, it looks like >>>> having OpenMP operations based on standard MLIR types and operations >>>> (scalars and memrefs mainly) is the right way to go. >>>> >>>> Why not have omp.parallel_do operation with AffineMap based bounds, so >>>> as to decouple it from Value/Type similar to affine.for? >>>> >>>> 1. With the current design, the number of transformations / >>>> optimizations that one can write on OpenMP constructs would become limited >>>> as there can be any custom loop structure with custom operations / types >>>> inside it. >>>> >>>> 2. It would also be easier to transform the Loop nests containing >>>> OpenMP constructs if the body of the OpenMP operations is well defined >>>> (i.e., does not accept arbitrary loop structures). Having nested redundant >>>> "parallel" , "target" and "do" regions seems unnecessary. >>>> >>>> 3. There would also be new sets of loop structures in new dialects when >>>> C/C++ is compiled to MLIR. It would complicate the number of possible >>>> combinations inside the OpenMP region. >>>> >>>> E. Lowering of target constructs mentioned in ( 2(d) ) specifies >>>> direct lowering to LLVM IR ignoring all the advantages that MLIR provides. >>>> Being able to compile the code for heterogeneous hardware is one of the >>>> biggest advantages that MLIR brings to the table. That is being completely >>>> missed here. This also requires solving the problem of handling target >>>> information in MLIR. But that is a problem which needs to be solved anyway. >>>> Using GPU dialect also gives us an opportunity to represent offloading >>>> semantics in MLIR. >>>> >>>> Given the ability to represent multiple ModuleOps and the existence of >>>> GPU dialect, couldn't higher level optimizations on offloaded code be done >>>> at MLIR level?. The proposed design would lead us to the same problems that >>>> we are currently facing in LLVM IR. >>>> >>>> Also, OpenMP codegen will automatically benefit from the GPU dialect >>>> based optimizations. For example, it would be way easier to hoist a memory >>>> reference out of GPU kernel in MLIR than in LLVM IR. >>>> >>>> Thanks, >>>> >>>> Vinay >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200217/48977f49/attachment-0001.html>
Vinay Madhusudan via llvm-dev
2020-Feb-18 16:20 UTC
[llvm-dev] [flang-dev] About OpenMP dialect in MLIR
Please find the reply inline below: On Tue, Feb 18, 2020 at 8:02 AM Mehdi AMINI <joker.eph at gmail.com> wrote:> > > On Mon, Feb 17, 2020 at 10:29 AM Vinay Madhusudan <vinay at compilertree.com> > wrote: > >> Please find the reply inline below >> >> On Sun, Feb 16, 2020 at 12:59 AM Mehdi AMINI <joker.eph at gmail.com> wrote: >> >>> >>> >>> On Sat, Feb 15, 2020 at 10:42 AM Vinay Madhusudan via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> Reply to Kiran Chandramohan: >>>> >>>> > You are welcome to participate, provide feedback and criticism to >>>> change the design as well as to contribute to the implementation. >>>> >>>> Thank you Kiran. >>>> >>>> > But the latest is what is there in the RFC in discourse. >>>> >>>> I have used this as reference for the response. >>>> >>>> > We did a study of a few constructs and clauses which was shared as >>>> mails to flang-dev and the RFC. As we make progress and before >>>> implementation, we will share further details. >>>> >>>> > “ Yes, parallel and flush would be the next two constructs that we >>>> will do.” -- from a comment in latest RFC >>>> >>>> For the above mentioned reasons, I will try to restrict my reply to how >>>> the “parallel (do)” construct would be lowered. >>>> >>>> > If it is OK we can have further discussions in discourse as River >>>> points out. >>>> >>>> Given that the multiple components of the LLVM project, namely clang, >>>> flang, MLIR and LLVM are involved, llvm-dev is probably a better place, >>>> with a much wider audience >>>> >>> >>> Possibly wider, but maybe less focused about discussing MLIR dialect >>> design. In particular there is an RFC thread for this particular dialect on >>> Discourse, which is the canonical place to discuss its design. >>> >>> >>>> , until it is clear how different components must interact. >>>> >>> >>> They don't need to interact so closely: they are very loosely related: >>> flang will use MLIR but clang won't (in the foreseeable future) and LLVM >>> has many other frontends. >>> >>> >>>> >>>> > It is the review for translation to LLVM IR that is in progress. >>>> >>>> > “If we decide that the OpenMP construct (for e.g. collapse) can be >>>> handled fully in MLIR and that is the best place to do it (based on >>>> experiments) then we will not use the OpenMP IRBuilder for these constructs.” >>>> -- latest RFC in discourse >>>> >>>> If it is not finalized that the OpenMPIRBuilder will be used for all >>>> the constructs, wouldn’t it be better to delay the submission of >>>> “translation to LLVM IR” patch in MLIR? Lowering code will become >>>> inconsistent if the OpenMPIRBuilder is used only for a few constructs and >>>> not for others. >>>> >>> >>>> Also, the patch does OpenMP dialect lowering *alongside* LLVM Dialect >>>> to LLVM IR. This is different from most dialects which get directly lowered >>>> to LLVM Dialect. I think lowering to LLVM Dialect would be a cleaner way if >>>> OpenMPIRBuilder is not being considered for all constructs. >>>> >>> >>> >>> I don't disagree, but there are a lot of speculation here: your quote >>> starts with "If we decide that the OpenMP construct (for e.g. collapse) can >>> be handled fully in MLIR", are you thinking that we need to first decide >>> this once and for all before making progress on building this path? >>> What disadvantages do you perceive to an approach where we would bring >>> up this dialect using the OpenMPIRBuilders for exporting to LLVM IR until >>> we gain enough experience? Do you think starting like this will make it >>> significantly harder to transition away from the builders if this is what >>> we want? >>> It seemed to me like it wouldn't, and that's why I'm supportive of this >>> path: the omp dialect design, implementation, and the >>> transformation/analysis that will be performed there seems entirely >>> disjoint from the LLVM lowering, I'd hope we can swap the LLVM lowering at >>> a later time (if this is what we'd want). >>> >>> >>>> >>>> >> The statement you quoted is from the RFC in discourse by Kiran. It is >> actually unclear to whom you are referring to here. I am assuming that it >> is for him to answer. >> > > No I'm asking you. You quoted Kiran and you concluded from this quote > "wouldn’t it be better to delay the submission [...]". I am questioning > this aspect in particular when I wrote "are you thinking that we need to > first decide this once and for all before making progress on building this > path?" > This question and the following are important to answer, it isn't clear to > that you did in you answer below. In particular "Do you think starting > like this will make it significantly harder to transition away from the > builders if this is what we want?" is important: even if using the > OpenMPIRBuilder would be suboptimal on the long-term, how much of it would > be a problem to replace later? It seems to me that it shouldn't limit > anything, unless you plan to write optimization on the LLVM Dialect itself. > >Given that there are unconcluded things in this thread about OpenMP representation for basic constructs (including target constructs) in MLIR and OpenMPIRBuilder being the high level *common* interface for Clang AST and optimized MLIR IR, I believe that it would be a good idea to wait for things until there is more clarity on OpenMP in MLIR.> Best, > > -- > Mehdi > > >> >> The below details would cover some of your questions as well. >> >> About Clang / MLIR / LLVM being loosely related and not being relevant in >> llvm-dev: >> >> With the introduction of the OpenMPIRBuilder in MLIR (from this review : >> https://reviews.llvm.org/D72962), Clang and MLIR would now have the >> common code for building OpenMP constructs. I do not think it is so loosely >> related anymore. Note that MLIR and Clang frontends for LLVM are very >> different. Clang emits LLVM IR with almost no optimizations and MLIR >> already supports considerable amount of optimizations. >> >> Decision of using the OpenMPIRBuilder for MLIR was discussed in the >> following flang-dev threads (Please correct me If I am missing some newer >> discussions on the below topics) >> >> >> 1. >> >> [May 2019] h >> ttp://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html >> <http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html> >> 2. >> >> [June 2019] >> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-June/000251.html >> >> >> However I could not find any conclusions for the concerns raised by Kiran: >> >> >> 1. >> >> Early outlining (in MLIR) vs. Late outlining (in LLVM) >> 2. >> >> Handling of target constructions: high-level transformations for GPUs >> and CPUs (offloading in LLVM vs. MLIR?) >> >> >> Kiran seems to suggest the early outlining (version 2) would be better( >> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000224.html >> ). But currently, the late outlining has been implemented in LLVM >> (version 1) ( >> https://github.com/llvm/llvm-project/blob/master/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp >> ). >> >> Early outlining in MLIR would have the following benefits as suggested in >> the thread: >> >> >> >> 1. >> >> Enables more optimization in MLIR (intra-procedural because of >> regions). >> 2. >> >> Offloading in MLIR (which is designed for heterogenous hardware >> compilation support) >> 3. >> >> Direct LLVM Dialect lowering of OpenMP operations (no LLVM IR >> lowering) >> >> >> MLIR google groups discussion (*h* >> ttps://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw >> <https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw>) >> regarding the use of OpenMPIRBuilder doesn’t seem to discuss the above >> concerns and also about how the various design decisions in OpenMPIRBuilder >> affects MLIR in general. >> >> Also, >> >> > “The point here is that we do not want to use MLIR just as a >> pass-through layer because MLIR has a lot of strengths” >> >> .... >> >> > “The point here is that if we lower to LLVM dialect, we will not be >> able to reuse OpenMP codegen & optimisation code from Clang/LLVM.” >> >> --- by Kiran in >> https://lists.llvm.org/pipermail/llvm-dev/2020-February/139181.html >> >> In the latest reply from Kiran (quoted above) to this thread, Kiran seems >> to suggest that lowering to LLVM Dialect (instead of LLVM IR) would >> restrict the use of OpenMP Optimization code from LLVM and also MLIR will >> just be a pass-through to the OpenMPIRBuilder. >> > >> Because of the above reasons, it seems to me that design considerations >> of using OpenMPIRBuilder for MLIR should also be mentioned (and discussed) >> before commiting LLVM IR lowering part for OpenMP dialect in >> https://reviews.llvm.org/D72962 >> > Mehdi also seems to have the same suggestion: “I agree that having >>>> dialect lowering would be cleaner” in https://reviews.llvm.org/D72962 >>>> >>> >>> Since you're calling me out: yes it would be cleaner from a pure MLIR >>> point of view, I don't think there is much disagreement on this (I think?). >>> However we already have the OpenMP builders available and they will >>> continue to be maintained/evolved to support OpenMP in clang. >>> Duplicating them entirely in MLIR for the sake of purity seems like a >>> lack of pragmatism here, so I support the current approach with the current >>> tradeoffs. >>> >>> >>>> >>>> > Yes, the design has mildly changed over time to incorporate >>>> feedback. But the latest is what is there in the RFC in discourse. >>>> >>>> RFC fails to discuss the following (I have also mentioned some of them >>>> in my reply to Johannes): >>>> >>>> > The proposed plan involves a) lowering F18 AST with OpenMP directly >>>> to a mix of OpenMP and FIR dialects. b) converting this finally to a mix of >>>> OpenMP and LLVM dialects. >>>> >>>> It is unclear in the RFC what other dialects are considered as >>>> supported for OpenMP dialect (std, affine, vector, loop, etc) and how it >>>> would be transformed, used and lowered from FIR to LLVM. >>>> >>>> It becomes important to list down the various dialects / operations / >>>> types supported for OpenMP (which is mainly defined for C, C++ and Fortran >>>> programs. MLIR has a much wider scope. >>>> >>>> It wouldn’t add much value for the proposed OpenMP dialect to be in the >>>> MLIR tree if it cannot support at least the relevant standard dialect types >>>> / operations. >>>> >>> >>> I agree, and I think this was something I called out as important in the >>> RFC: "It seems that the dialect can be orthogonal to FIR and its type >>> system, which the most important thing to me to integrate MLIR (favor >>> reusability across other frontends / compiler frameworks)". >>> If you don't think that this is the case, then please raise this in the >>> RFC! >>> I think it is perfectly fair to ask for more examples from the author >>> and digging a bit deeper if you're unconvinced that the proposed modeling >>> can be applicable outside of FIR. This is exactly why we ask such proposal >>> to go through RFC by the way: to allow people like you to point at the >>> blindspot and ask the right questions. >>> >>> Best, >>> >>> -- >>> Mehdi >>> >>> >>> >>>> > We would like to take advantage of the transformations in cases that >>>> are possible. FIR loops will be converted to affine/loop dialect. So the >>>> loop inside an omp.do can be in these dialects as clarified in the >>>> discussion in discourse and also shown in slide 20 of the fosdem >>>> presentation (links to both below). >>>> >>>> >>>> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397/7?u=kiranchandramohan >>>> >>>> >>>> https://fosdem.org/2020/schedule/event/llvm_flang/attachments/slides/3839/export/events/attachments/llvm_flang/slides/3839/flang_llvm_frontend.pdf >>>> >>>> Although it is mentioned that the affine/ loop.for is used, following >>>> things are unclear: >>>> >>>> I am assuming that there will be lowering / conversion code in f18 repo >>>> dialect from fir.do to loop.for / affine.for. Is it the case? If so, I >>>> think it is worth mentioning it in the “sequential code flow >>>> representation” in the RFC. >>>> >>>> This raises the following questions. >>>> >>>> >>>> >>>> 1. >>>> >>>> Which types are supported? Standard dialect types and FIR types? >>>> >>>> >>>> For example, what types are used for Fortran arrays used inside OpenMP >>>> regions? Is it std.memref OR Fortran array representation in FIR dialect >>>> (fir.array?) OR both? Note that Fortran has support for column major >>>> arrays. std.memref supports custom memory layouts. What custom layouts are >>>> supported? >>>> >>>> >>>> How would different non-scalar types in standard dialect be lowered to >>>> LLVM IR and passed to OpenMP runtime calls? Can you please elaborate on >>>> this? >>>> >>>> The example provided in slide 20 of the fosdem presentation contains >>>> >>>> “loop.for %j = %lb2 to %ub2 : !integer {“ >>>> >>>> But loop.for accepts “index” type. Not sure what type “!integer” >>>> represents here. >>>> >>>> >>>> 1. >>>> >>>> What are the different memory access operations which are supported >>>> inside the OpenMP region and lowered to proper OpenMP runtime calls in LLVM >>>> IR? >>>> >>>> >>>> The possibilities are: >>>> >>>> 1. >>>> >>>> affine.load / affine.store >>>> 2. >>>> >>>> std.load / std.store >>>> 3. >>>> >>>> FIR dialect memory access operations. >>>> >>>> >>>> > I must also point out that the question of where to do loop >>>> transformations is a topic we have not fully converged on. See the >>>> following thread for discussions. >>>> http://lists.llvm.org/pipermail/flang-dev/2019-September/000042.html >>>> >>>> Looks like placement (MLIR / LLVM) of various transformations related >>>> to OpenMP has not been finalized, from what I could infer from Johannes’s >>>> reply and the below text in the latest RFC in discourse: >>>> >>>> “So there exist some questions regarding where the optimisations >>>> should be carried out. We will decide on which framework to choose only >>>> after some experimentation.” >>>> >>>> > i) we need to keep the loops separately so as to take advantage of >>>> transformations that other dialects like affine/loop would provide. >>>> >>>> 1) Keeping the loops separate from the OpenMP operations will expose >>>> them to the “regular” transformations passes in MLIR inside the OpenMP >>>> region. Most of them are invalid or in-efficient for OpenMP operations. >>>> >>>> Examples: >>>> >>>> 1. >>>> >>>> Constant propagation example mentioned by Johannes in this thread. >>>> (omp task shared(x)) >>>> 2. >>>> >>>> Loop (nest) transformations (permute / split / fuse / tile, etc) >>>> will happen ignoring the surrounding OpenMP operations. >>>> 3. >>>> >>>> Hoisting and sinking of various memory/ SSA values inside the >>>> OpenMP region. This goes against the likes of “map”, “firstprivate”, >>>> shared, etc clauses and more. >>>> >>>> >>>> 2) Various loop operations (loop.for, affine.for, fir.do) have (or will >>>> have) different transformations/ optimization passes which are different >>>> from one another. >>>> >>>> Example: >>>> >>>> 1. >>>> >>>> AffineLoopInvariantCodeMotion.cpp is different from >>>> LoopInvariantCodeMotion.cpp. >>>> 2. >>>> >>>> Other Loop transformation passes for affine.for >>>> >>>> >>>> These loops also use different Types and memory access operations in >>>> general for transformations. Example, most Affine dialect transformations >>>> (if not all) work on affine.load and affine.store operations. >>>> >>>> Supporting different loop operations means that there would be *OpenMP >>>> specific transformations* for each one of them and also requires a way to >>>> restrict each of them from existing transformations (when nested in OpenMP >>>> constructs). >>>> >>>> There would be different lowerings for different loop operations as >>>> well. Example, affine.for and loop.for would have to be lowered to omp.do >>>> in different ways. >>>> >>>> From slide 20 of fosdem presentation you mentioned, the LLVM + OpenMP >>>> dialect representation is as follows: >>>> >>>> ------------------------------ >>>> >>>> Mlir.region(…) { >>>> >>>> omp.parallel { >>>> >>>> %ub3 = … >>>> >>>> omp.do %i = 0 to %ub3 : !integer { >>>> >>>> … >>>> >>>> } >>>> >>>> } >>>> >>>> } >>>> >>>> ------------------------------- >>>> >>>> Currently, the LLVM Dialect doesn’t contain a high level loop >>>> operation. It is all based on CFG implementation. >>>> >>>> Will omp.do follow the same structure (SizedRegion<1>) as loop.for? OR >>>> there would be CFG for LLVM Dialect based loop operation? >>>> >>>> Can you please mention how the OpenMP + LLVM dialect will look like for >>>> the below parallel do construct? >>>> >>>> integer :: i=1, k=10 >>>> >>>> integer :: a(10), b(10), c(10) >>>> >>>> ... >>>> >>>> !$omp parallel do >>>> >>>> do i = 1, k >>>> >>>> if (i .ne. 1) *cycle* >>>> >>>> c(i) = a(i) + b(i) >>>> >>>> end do >>>> >>>> !$omp end parallel do >>>> >>>> print *,c >>>> >>> >>>> Thanks, >>>> >>>> Vinay >>>> >>>> On Fri, Feb 14, 2020 at 6:52 AM Kiran Chandramohan via llvm-dev < >>>> llvm-dev at lists.llvm.org> wrote: >>>> >>>>> Hello Vinay, >>>>> >>>>> Thanks for your mail about the OpenMP dialect in MLIR. Happy to know >>>>> that you and several other groups are interested in the OpenMP dialect. At >>>>> the outset, I must point out that the design is not set in stone and will >>>>> change as we make progress. You are welcome to participate, provide >>>>> feedback and criticism to change the design as well as to contribute to the >>>>> implementation. I provide some clarifications and replies to your comments >>>>> below. If it is OK we can have further discussions in discourse as River >>>>> points out. >>>>> >>>>> 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and >>>>> clang frontends. Note that this proposal was before considering MLIR for >>>>> FIR. >>>>> >>>>> A correction here. The proposal for OpenMPIRBuilder was made when MLIR >>>>> was being considered for FIR. >>>>> (i) Gary Klimowicz's minutes for Flang call in April 2019 mentions >>>>> considering MLIR for FIR. >>>>> >>>>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-April/000194.html >>>>> (ii) My reply to Johaness's proposal in May 2019 mentions MLIR for FIR. >>>>> >>>>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000220.html >>>>> >>>>> b. Review of barrier construct is in progress: >>>>> https://reviews.llvm.org/D72962 >>>>> >>>>> Minor correction here. The addition of barrier construct was accepted >>>>> and has landed (https://reviews.llvm.org/D7240 >>>>> <https://reviews.llvm.org/D72400>). It is the review for translation >>>>> to LLVM IR that is in progress. >>>>> >>>>> It looks like the design has evolved over time and there is no one >>>>> place which contains the latest design decisions that fits all the >>>>> different pieces of the puzzle. I will try to deduce it from the above >>>>> mentioned references. Please correct me If I am referring to anything which >>>>> has changed. >>>>> >>>>> Yes, the design has mildly changed over time to incorporate feedback. >>>>> But the latest is what is there in the RFC in discourse. >>>>> >>>>> For most OpenMP design discussions, FIR examples are used (as seen in >>>>> (2) and (3)). The MLIR examples mentioned in the design only talks about >>>>> FIR dialect and LLVM dialect. >>>>> >>>>> Our initial concern was how will all these pieces (FIR, LLVM Dialect, >>>>> OpenMPIRBuilder, LLVM IR) fit together. Hence you see the prominence of FIR >>>>> and LLVM dialect and more information about lowering/translation than >>>>> transformations/optimisations. >>>>> >>>>> This completely ignores the likes of standard, affine (where most loop >>>>> transformations are supposed to happen) and loop dialects. >>>>> >>>>> Adding to the reply above. We would like to take advantage of the >>>>> transformations in cases that are possible. FIR loops will be converted to >>>>> affine/loop dialect. So the loop inside an omp.do can be in these dialects >>>>> as clarified in the discussion in discourse and also shown in slide 20 of >>>>> the fosdem presentation (links to both below). >>>>> >>>>> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397/7?u=kiranchandramohan >>>>> >>>>> https://fosdem.org/2020/schedule/event/llvm_flang/attachments/slides/3839/export/events/attachments/llvm_flang/slides/3839/flang_llvm_frontend.pdf >>>>> >>>>> I must also point out that the question of where to do loop >>>>> transformations is a topic we have not fully converged on. See the >>>>> following thread for discussions. >>>>> http://lists.llvm.org/pipermail/flang-dev/2019-September/000042.html >>>>> >>>>> Is it the same omp.do operation which now contains the bounds and >>>>> induction variables of the loop after the LLVM conversion? >>>>> >>>>> The point here is that i) we need to keep the loops separately so as >>>>> to take advantage of transformations that other dialects like affine/loop >>>>> would provide. ii) We will need the loop information while lowering the >>>>> OpenMP do operation. For implementation, if reusing the same operation (in >>>>> different contexts) is difficult then we can add a new operation. >>>>> >>>>> It is also not mentioned how clauses like firstprivate, shared, >>>>> private, reduce, map, etc are lowered to OpenMP dialect. >>>>> >>>>> Yes, it is not mentioned. We did a study of a few constructs and >>>>> clauses which was shared as mails to flang-dev and the RFC. As we make >>>>> progress and before implementation, we will share further details. >>>>> >>>>> it would be beneficial to have an omp.parallel_do operation which has >>>>> semantics similar to other loop structures (may not be LoopLikeInterface) >>>>> in MLIR. >>>>> >>>>> I am not against adding parallel_do if it can help with >>>>> transformations or reduce the complexity of lowering. Please share the >>>>> details in discourse as a reply to the RFC or a separate thread. >>>>> >>>>> it looks like having OpenMP operations based on standard MLIR types >>>>> and operations (scalars and memrefs mainly) is the right way to go. >>>>> >>>>> This will definitely be the first version that we implement. But I do >>>>> not understand why we should restrict to only the standard types and >>>>> operations. To ease lowering and translation and to avoid adding OpenMP >>>>> operations to other dialects, I believe OpenMP dialect should also be able >>>>> to exist with other dialects like FIR and LLVM. >>>>> >>>>> E. Lowering of target constructs mentioned in ( 2(d) ) specifies >>>>> direct lowering to LLVM IR ignoring all the advantages that MLIR provides. >>>>> >>>>> Also, OpenMP codegen will automatically benefit from the GPU dialect >>>>> based optimizations. For example, it would be way easier to hoist a memory >>>>> reference out of GPU kernel in MLIR than in LLVM IR. >>>>> >>>>> I might not have fully understood you here. But the dialect lives >>>>> independently of the translation to LLVM IR. If there are optimisations >>>>> (like hoisting that you mention here) I believe they can be performed as >>>>> transformation passes on the dialect. It is not ruled out. >>>>> >>>>> --Kiran >>>>> ------------------------------ >>>>> *From:* flang-dev <flang-dev-bounces at lists.llvm.org> on behalf of >>>>> Vinay Madhusudan via flang-dev <flang-dev at lists.llvm.org> >>>>> *Sent:* 13 February 2020 16:33 >>>>> *To:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>; >>>>> flang-dev at lists.llvm.org <flang-dev at lists.llvm.org> >>>>> *Subject:* [flang-dev] About OpenMP dialect in MLIR >>>>> >>>>> >>>>> Hi, >>>>> >>>>> I have few questions / concerns regarding the design of OpenMP dialect >>>>> in MLIR that is currently being implemented, mainly for the f18 compiler. >>>>> Below, I summarize the current state of various efforts in clang / f18 / >>>>> MLIR / LLVM regarding this. Feel free to add to the list in case I have >>>>> missed something. >>>>> >>>>> 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and >>>>> clang frontends. Note that this proposal was before considering MLIR for >>>>> FIR. >>>>> >>>>> a. llvm-dev proposal : >>>>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html >>>>> >>>>> b. Patches in review: https://reviews.llvm.org/D70290. This also >>>>> includes the clang codegen changes. >>>>> >>>>> 2. [July - September 2019] OpenMP dialect for MLIR was discussed / >>>>> proposed with respect to the f18 compilation stack (keeping FIR in mind). >>>>> >>>>> a. flang-dev discussion link: >>>>> https://lists.llvm.org/pipermail/flang-dev/2019-September/000020.html >>>>> >>>>> b. Design decisions captured in PPT: >>>>> https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view >>>>> >>>>> c. MLIR google groups discussion: >>>>> https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw >>>>> >>>>> d. Target constructs design: >>>>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html >>>>> >>>>> e. SIMD constructs design: >>>>> http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html >>>>> >>>>> 3. [Jan 2020] OpenMP dialect RFC in llvm discourse : >>>>> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397 >>>>> >>>>> 4. [Jan- Feb 2020] Implementation of OpenMP dialect in MLIR: >>>>> >>>>> a. The first patch which introduces the OpenMP dialect was pushed. >>>>> >>>>> b. Review of barrier construct is in progress: >>>>> https://reviews.llvm.org/D72962 >>>>> https://reviews.llvm.org/D72400 >>>>> >>>>> I have tried to list below different topics of interest (to different >>>>> people) around this work. Most of these are in the design phase (or very >>>>> new) and multiple parties are interested with different sets of goals in >>>>> mind. >>>>> >>>>> I. Flang frontend and its integration >>>>> >>>>> II. Fortran representation in MLIR / FIR development >>>>> >>>>> III. OpenMP development for flang, OpenMP builder in LLVM. >>>>> >>>>> IV. Loop Transformations in MLIR / LLVM with respect to OpenMP. >>>>> >>>>> It looks like the design has evolved over time and there is no one >>>>> place which contains the latest design decisions that fits all the >>>>> different pieces of the puzzle. I will try to deduce it from the above >>>>> mentioned references. Please correct me If I am referring to anything which >>>>> has changed. >>>>> >>>>> A. For most OpenMP design discussions, FIR examples are used (as seen >>>>> in (2) and (3)). The MLIR examples mentioned in the design only talks about >>>>> FIR dialect and LLVM dialect. >>>>> >>>>> This completely ignores the likes of standard, affine (where most loop >>>>> transformations are supposed to happen) and loop dialects. I think it is >>>>> critical to decouple the OpenMP dialect development in MLIR from the >>>>> current flang / FIR effort. It would be useful if someone can mention these >>>>> examples using existing dialects in MLIR and also how the different >>>>> transformations / lowerings are planned. >>>>> >>>>> B. In latest RFC(3), it is mentioned that the initial OpenMP dialect >>>>> version will be as follows, >>>>> >>>>> omp.parallel { >>>>> >>>>> omp.do { >>>>> >>>>> fir.do %i = 0 to %ub3 : !fir.integer { >>>>> >>>>> ... >>>>> >>>>> } >>>>> >>>>> } >>>>> >>>>> } >>>>> >>>>> and then after the "LLVM conversion" it is converted as follows: >>>>> >>>>> omp.parallel { >>>>> >>>>> %ub3 >>>>> >>>>> omp.do %i = 0 to %ub3 : !llvm.integer { >>>>> >>>>> ... >>>>> >>>>> } >>>>> >>>>> } >>>>> >>>>> >>>>> a. Is it the same omp.do operation which now contains the bounds and >>>>> induction variables of the loop after the LLVM conversion? If so, will the >>>>> same operation have two different semantics during a single compilation? >>>>> >>>>> b. Will there be different lowerings for various loop operations from >>>>> different dialects? loop.for and affine.for under omp operations would need >>>>> different OpenMP / LLVM lowerings. Currently, both of them are >>>>> lowered to the CFG based loops during the LLVM dialect conversion (which is >>>>> much before the proposed OpenMP dialect lowering). >>>>> >>>>> There would be no standard way to represent OpenMP operations >>>>> (especially the ones which involve loops) in MLIR. This would drastically >>>>> complicate lowering. >>>>> >>>>> C. It is also not mentioned how clauses like firstprivate, shared, >>>>> private, reduce, map, etc are lowered to OpenMP dialect. The example >>>>> in the RFC contains FIR and LLVM types and nothing about std dialect types. >>>>> Consider the below example: >>>>> >>>>> #pragma omp parallel for reduction(+:x) >>>>> >>>>> for (int i = 0; i < N; ++i) >>>>> >>>>> x += a[i]; >>>>> >>>>> How would the above be represented in OpenMP dialect? and What type >>>>> would "x" be in MLIR? It is not mentioned in the design as to how the >>>>> various SSA values for various OpenMP clauses are passed around in OpenMP >>>>> operations. >>>>> >>>>> D. Because of (A), (B) and (C), it would be beneficial to have an omp. >>>>> parallel_do operation which has semantics similar to other loop >>>>> structures (may not be LoopLikeInterface) in MLIR. To me, it looks like >>>>> having OpenMP operations based on standard MLIR types and operations >>>>> (scalars and memrefs mainly) is the right way to go. >>>>> >>>>> Why not have omp.parallel_do operation with AffineMap based bounds, so >>>>> as to decouple it from Value/Type similar to affine.for? >>>>> >>>>> 1. With the current design, the number of transformations / >>>>> optimizations that one can write on OpenMP constructs would become limited >>>>> as there can be any custom loop structure with custom operations / types >>>>> inside it. >>>>> >>>>> 2. It would also be easier to transform the Loop nests containing >>>>> OpenMP constructs if the body of the OpenMP operations is well defined >>>>> (i.e., does not accept arbitrary loop structures). Having nested redundant >>>>> "parallel" , "target" and "do" regions seems unnecessary. >>>>> >>>>> 3. There would also be new sets of loop structures in new dialects >>>>> when C/C++ is compiled to MLIR. It would complicate the number of possible >>>>> combinations inside the OpenMP region. >>>>> >>>>> E. Lowering of target constructs mentioned in ( 2(d) ) specifies >>>>> direct lowering to LLVM IR ignoring all the advantages that MLIR provides. >>>>> Being able to compile the code for heterogeneous hardware is one of the >>>>> biggest advantages that MLIR brings to the table. That is being completely >>>>> missed here. This also requires solving the problem of handling target >>>>> information in MLIR. But that is a problem which needs to be solved anyway. >>>>> Using GPU dialect also gives us an opportunity to represent offloading >>>>> semantics in MLIR. >>>>> >>>>> Given the ability to represent multiple ModuleOps and the existence of >>>>> GPU dialect, couldn't higher level optimizations on offloaded code be done >>>>> at MLIR level?. The proposed design would lead us to the same problems that >>>>> we are currently facing in LLVM IR. >>>>> >>>>> Also, OpenMP codegen will automatically benefit from the GPU dialect >>>>> based optimizations. For example, it would be way easier to hoist a memory >>>>> reference out of GPU kernel in MLIR than in LLVM IR. >>>>> >>>>> Thanks, >>>>> >>>>> Vinay >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200218/a624af81/attachment-0001.html>