thr3ads.net - llvm dev - [llvm-dev] About OpenMP dialect in MLIR [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Vinay Madhusudan via llvm-dev

2020-Feb-13 16:33 UTC

[llvm-dev] About OpenMP dialect in MLIR

Hi,

I have few questions / concerns regarding the design of OpenMP dialect in
MLIR that is currently being implemented, mainly for the f18 compiler.
Below, I summarize the current state of various efforts in clang / f18 /
MLIR / LLVM regarding this. Feel free to add to the list in case I have
missed something.

1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and clang
frontends. Note that this proposal was before considering MLIR for FIR.

a. llvm-dev proposal :
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html

b. Patches in review: https://reviews.llvm.org/D70290. This also includes
the clang codegen changes.

2. [July - September 2019] OpenMP dialect for MLIR was discussed /
proposed with respect to the f18 compilation stack (keeping FIR in mind).

a. flang-dev discussion link:
https://lists.llvm.org/pipermail/flang-dev/2019-September/000020.html

b. Design decisions captured in PPT:
https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view

c. MLIR google groups discussion:
https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw

d. Target constructs design:
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html

e. SIMD constructs design:
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html

3. [Jan 2020] OpenMP dialect RFC in llvm discourse :
https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397

4. [Jan- Feb 2020] Implementation of OpenMP dialect in MLIR:

a. The first patch which introduces the OpenMP dialect was pushed.

b. Review of barrier construct is in progress:
https://reviews.llvm.org/D72962

I have tried to list below different topics of interest (to different
people) around this work. Most of these are in the design phase (or very
new) and multiple parties are interested with different sets of goals in
mind.

I. Flang frontend and its integration

II. Fortran representation in MLIR / FIR development

III. OpenMP development for flang, OpenMP builder in LLVM.

IV. Loop Transformations in MLIR / LLVM with respect to OpenMP.

It looks like the design has evolved over time and there is no one place
which contains the latest design decisions that fits all the different
pieces of the puzzle. I will try to deduce it from the above mentioned
references. Please correct me If I am referring to anything which has
changed.

A. For most OpenMP design discussions, FIR examples are used (as seen in
(2) and (3)). The MLIR examples mentioned in the design only talks about
FIR dialect and LLVM dialect.

This completely ignores the likes of standard, affine (where most loop
transformations are supposed to happen) and loop dialects. I think it is
critical to decouple the OpenMP dialect development in MLIR from the
current flang / FIR effort. It would be useful if someone can mention these
examples using existing dialects in MLIR and also how the different
transformations / lowerings are planned.

B. In latest RFC(3), it is mentioned that the initial OpenMP dialect
version will be as follows,

omp.parallel {

omp.do {

fir.do %i = 0 to %ub3 : !fir.integer {

...

}

and then after the "LLVM conversion" it is converted as follows:

omp.parallel {

%ub3
omp.do %i = 0 to %ub3 : !llvm.integer {

...

}

a. Is it the same omp.do operation which now contains the bounds and
induction variables of the loop after the LLVM conversion? If so, will the
same operation have two different semantics during a single compilation?

b. Will there be different lowerings for various loop operations from
different dialects? loop.for and affine.for under omp operations would need
different OpenMP / LLVM lowerings. Currently, both of them are lowered to
the CFG based loops during the LLVM dialect conversion (which is much
before the proposed OpenMP dialect lowering).

There would be no standard way to represent OpenMP operations (especially
the ones which involve loops) in MLIR. This would drastically complicate
lowering.

C. It is also not mentioned how clauses like firstprivate, shared, private,
reduce, map, etc are lowered to OpenMP dialect. The example in the RFC
contains FIR and LLVM types and nothing about std dialect types. Consider
the below example:

#pragma omp parallel for reduction(+:x)

for (int i = 0; i < N; ++i)

x += a[i];

How would the above be represented in OpenMP dialect? and What type would
"x" be in MLIR? It is not mentioned in the design as to how the
various
SSA values for various OpenMP clauses are passed around in OpenMP
operations.

D. Because of (A), (B) and (C), it would be beneficial to have an omp.
parallel_do operation which has semantics similar to other loop structures
(may not be LoopLikeInterface) in MLIR. To me, it looks like having OpenMP
operations based on standard MLIR types and operations (scalars and memrefs
mainly) is the right way to go.

Why not have omp.parallel_do operation with AffineMap based bounds, so as
to decouple it from Value/Type similar to affine.for?

1. With the current design, the number of transformations / optimizations
that one can write on OpenMP constructs would become limited as there can
be any custom loop structure with custom operations / types inside it.

2. It would also be easier to transform the Loop nests containing OpenMP
constructs if the body of the OpenMP operations is well defined (i.e., does
not accept arbitrary loop structures). Having nested redundant
"parallel" ,
"target" and "do" regions seems unnecessary.

3. There would also be new sets of loop structures in new dialects when
C/C++ is compiled to MLIR. It would complicate the number of possible
combinations inside the OpenMP region.

E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct
lowering to LLVM IR ignoring all the advantages that MLIR provides. Being
able to compile the code for heterogeneous hardware is one of the biggest
advantages that MLIR brings to the table. That is being completely missed
here. This also requires solving the problem of handling target information
in MLIR. But that is a problem which needs to be solved anyway. Using GPU
dialect also gives us an opportunity to represent offloading semantics in
MLIR.

Given the ability to represent multiple ModuleOps and the existence of GPU
dialect, couldn't higher level optimizations on offloaded code be done at
MLIR level?. The proposed design would lead us to the same problems that we
are currently facing in LLVM IR.

Also, OpenMP codegen will automatically benefit from the GPU dialect based
optimizations. For example, it would be way easier to hoist a memory
reference out of GPU kernel in MLIR than in LLVM IR.

Thanks,

Vinay
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200213/d3858320/attachment.html>

Johannes Doerfert via llvm-dev

2020-Feb-13 18:18 UTC

head link

[llvm-dev] About OpenMP dialect in MLIR

Hi Vinay,

Thanks for taking an interest and the detailed discussion.

To start by picking a few paragraph from your email to clarify a couple
of things that lead to the current design or that might otherwise need
clarification. We can talk about other points later as well.

[
  Site notes:
    1) I'm not an MLIR person.
    2) It seems unfortnuate that we do not have a mlir-dev list.
]

> 1. With the current design, the number of transformations / optimizations
> that one can write on OpenMP constructs would become limited as there can
> be any custom loop structure with custom operations / types inside it.
OpenMP, as an input language, does not make many assumptions about the
code inside of constructs*. So, inside a parallel can be almost anything
the base language has to offer, both lexically and dynamically.
Assuming otherwise is not going to work. Analyzing a "generic" OpenMP
representation in order to determine if can be represented as a more
restricted "op" seems at least plausible. You will run into various
issue, some mentioned explicitly below. For starters, you still have to
generate proper OpenMP runtime calls, e.g., from your GPU dialect, even
if it is "just" to make sure the OMPD/OMPT interfaces expose useful
information.


* I preclude the `omp loop` construct here as it is not even implemented
  anywhere as far as I know.

> 2. It would also be easier to transform the Loop nests containing OpenMP
> constructs if the body of the OpenMP operations is well defined (i.e., does
> not accept arbitrary loop structures). Having nested redundant
"parallel" ,
> "target" and "do" regions seems unnecessary.
As mentioned above, you cannot start with the assumption OpenMP input is
structured this this way. You have to analyze it first. This is the same
reason we cannot simply transform C/C++ `for loops` into `affine.for`
without proper analysis of the loop body.

Now, more concrete. Nested parallel and target regions are not
necessarily redundant, nor can/should we require the user not to have
them. Nested parallelism can easily make sense, depending on the problem
decomposition. Nested target will make a lot of sense with reverse
offload, which is already in the standard, and it also should be allowed
for the sake of a modular (user) code base.

> 3. There would also be new sets of loop structures in new dialects when
> C/C++ is compiled to MLIR. It would complicate the number of possible
> combinations inside the OpenMP region.
Is anyone working on this? If so, what is the timeline? I personally was
not expecting Clang to switch over to MLIR any time soon but I am happy
if someone wants to correct me on this. I mention this only because it
interacts with the arguments I will make below.

> E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct
> lowering to LLVM IR ignoring all the advantages that MLIR provides. Being
> able to compile the code for heterogeneous hardware is one of the biggest
> advantages that MLIR brings to the table. That is being completely missed
> here. This also requires solving the problem of handling target information
> in MLIR. But that is a problem which needs to be solved anyway. Using GPU
> dialect also gives us an opportunity to represent offloading semantics in
> MLIR.
I'm unsure what the problem with "handling target information in
MLIR" is but
whatever design we end up with, we need to know about the target
(triple) in all stages of the pipeline, even if it is just to pass it
down.

> Given the ability to represent multiple ModuleOps and the existence of GPU
> dialect, couldn't higher level optimizations on offloaded code be done
at
> MLIR level?. The proposed design would lead us to the same problems that we
> are currently facing in LLVM IR.
>
> Also, OpenMP codegen will automatically benefit from the GPU dialect based
> optimizations. For example, it would be way easier to hoist a memory
> reference out of GPU kernel in MLIR than in LLVM IR.
While I agree with the premise that you can potentially reuse MLIR
transformations, it might not be as simple in practice.

As mentioned above, you cannot assume much about OpenMP codes, almost
nothing for a lot of application codes I have seen. Some examples:

If you have a function call, or any synchronization event for that
matter, located between two otherwise adjacent target regions (see
below), you cannot assume the two target regions will be offloaded to
the same device.
```
  #omp target
  {}
  foo();
  #omp target
  {}
```
Similarly, you cannot assume a `omp parallel` is allowed to be executed
with more than a single thread, or that a `omp [parallel] for` does not
have loop carried data-dependences, ...
Data-sharing attributes are also something that has to be treated
carefully:
```
x = 5;
#omp task
  x = 3;
print(x);
```
Should print 5, not 3.

I hope I convinced you that OpenMP is not trivially mappable to existing
dialects without proper analysis. If not, please let me know why you
expect it to be.


Now when it comes to code analyses, LLVM-IR offers a variety of
interesting features, ranging from a mature set of passes to the
cross-language LTO capabilities. We are working on the missing parts,
e.g., heterogeneous llvm::Modules as we speak. Simple OpenMP
optimizations are already present in LLVM and interesting ones are
prototyped for a while now (let me know if you want to see more not-yet
merged patches/optimizations). I also have papers, results, and
talks that might be interesting here. Let me know if you need pointers
to them.


Cheers,
  Johannes



On 02/13, Vinay Madhusudan via llvm-dev wrote:> Hi,
> 
> I have few questions / concerns regarding the design of OpenMP dialect in
> MLIR that is currently being implemented, mainly for the f18 compiler.
> Below, I summarize the current state of various efforts in clang / f18 /
> MLIR / LLVM regarding this. Feel free to add to the list in case I have
> missed something.
> 
> 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and clang
> frontends. Note that this proposal was before considering MLIR for FIR.
> 
> a. llvm-dev proposal :
>
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html
> 
> b. Patches in review: https://reviews.llvm.org/D70290. This also includes
> the clang codegen changes.
> 
> 2.  [July - September 2019] OpenMP dialect for MLIR was discussed /
> proposed with respect to the f18 compilation stack (keeping FIR in mind).
> 
> a. flang-dev discussion link:
> https://lists.llvm.org/pipermail/flang-dev/2019-September/000020.html
> 
> b. Design decisions captured in PPT:
> https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view
> 
> c. MLIR google groups discussion:
> https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw
> 
> d. Target constructs  design:
>
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html
> 
> e. SIMD constructs design:
>
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html
> 
> 3.  [Jan 2020] OpenMP dialect RFC in llvm discourse :
> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397
> 
> 4.  [Jan- Feb 2020] Implementation of OpenMP dialect in MLIR:
> 
> a. The first patch which introduces the OpenMP dialect was pushed.
> 
> b. Review of barrier construct is in progress:
> https://reviews.llvm.org/D72962
> 
> I have tried to list below different topics of interest (to different
> people) around this work. Most of these are in the design phase (or very
> new) and multiple parties are interested with different sets of goals in
> mind.
> 
> I.  Flang frontend and its integration
> 
> II. Fortran representation in MLIR / FIR development
> 
> III. OpenMP development for flang,  OpenMP builder in LLVM.
> 
> IV. Loop Transformations in MLIR / LLVM with respect to OpenMP.
> 
> It looks like the design has evolved over time and there is no one place
> which contains the latest design decisions that fits all the different
> pieces of the puzzle. I will try to deduce it from the above mentioned
> references. Please correct me If I am referring to anything which has
> changed.
> 
> A. For most OpenMP design discussions, FIR examples are used (as seen in
> (2) and (3)). The MLIR examples mentioned in the design only talks about
> FIR dialect and LLVM dialect.
> 
> This completely ignores the likes of standard, affine (where most loop
> transformations are supposed to happen) and loop dialects. I think it is
> critical to decouple the OpenMP dialect development in MLIR from the
> current flang / FIR effort. It would be useful if someone can mention these
> examples using existing dialects in MLIR and also how the different
> transformations / lowerings are planned.
> 
> B. In latest RFC(3), it is mentioned that the initial OpenMP dialect
> version will be as follows,
> 
>   omp.parallel {
> 
>     omp.do {
> 
>        fir.do %i = 0 to %ub3 : !fir.integer {
> 
>         ...
> 
>        }
> 
>     }
> 
>   }
> 
> and then after the "LLVM conversion" it is converted as follows:
> 
>   omp.parallel {
> 
>     %ub3 > 
>     omp.do %i = 0 to %ub3 : !llvm.integer {
> 
>     ...
> 
>     }
> 
>   }
> 
> 
> a. Is it the same omp.do operation which now contains the bounds and
> induction variables of the loop after the LLVM conversion? If so, will the
> same operation have two different semantics during a single compilation?
> 
> b. Will there be different lowerings for various loop operations from
> different dialects? loop.for and affine.for under omp operations would need
> different OpenMP / LLVM lowerings. Currently, both of them are lowered to
> the CFG based loops during the LLVM dialect conversion (which is much
> before the proposed OpenMP dialect lowering).
> 
> There would be no standard way to represent OpenMP operations (especially
> the ones which involve loops) in MLIR. This would drastically complicate
> lowering.
> 
> C. It is also not mentioned how clauses like firstprivate, shared, private,
> reduce, map, etc are lowered to OpenMP dialect. The example in the RFC
> contains FIR and LLVM types and nothing about std dialect types. Consider
> the below example:
> 
> #pragma omp parallel for reduction(+:x)
> 
> for (int i = 0; i < N; ++i)
> 
>   x += a[i];
> 
> How would the above be represented in OpenMP dialect? and What type would
> "x" be in MLIR?  It is not mentioned in the design as to how the
various
> SSA values for various OpenMP clauses are passed around in OpenMP
> operations.
> 
> D. Because of (A), (B) and (C), it would be beneficial to have an omp.
> parallel_do operation which has semantics similar to other loop structures
> (may not be LoopLikeInterface) in MLIR. To me, it looks like having OpenMP
> operations based on standard MLIR types and operations (scalars and memrefs
> mainly) is the right way to go.
> 
> Why not have omp.parallel_do operation with AffineMap based bounds, so as
> to decouple it from Value/Type similar to affine.for?
> 
> 1. With the current design, the number of transformations / optimizations
> that one can write on OpenMP constructs would become limited as there can
> be any custom loop structure with custom operations / types inside it.
> 
> 2. It would also be easier to transform the Loop nests containing OpenMP
> constructs if the body of the OpenMP operations is well defined (i.e., does
> not accept arbitrary loop structures). Having nested redundant
"parallel" ,
> "target" and "do" regions seems unnecessary.
> 
> 3. There would also be new sets of loop structures in new dialects when
> C/C++ is compiled to MLIR. It would complicate the number of possible
> combinations inside the OpenMP region.
> 
> E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct
> lowering to LLVM IR ignoring all the advantages that MLIR provides. Being
> able to compile the code for heterogeneous hardware is one of the biggest
> advantages that MLIR brings to the table. That is being completely missed
> here. This also requires solving the problem of handling target information
> in MLIR. But that is a problem which needs to be solved anyway. Using GPU
> dialect also gives us an opportunity to represent offloading semantics in
> MLIR.
> 
> Given the ability to represent multiple ModuleOps and the existence of GPU
> dialect, couldn't higher level optimizations on offloaded code be done
at
> MLIR level?. The proposed design would lead us to the same problems that we
> are currently facing in LLVM IR.
> 
> Also, OpenMP codegen will automatically benefit from the GPU dialect based
> optimizations. For example, it would be way easier to hoist a memory
> reference out of GPU kernel in MLIR than in LLVM IR.-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200213/c0a311e4/attachment.sig>

River Riddle via llvm-dev

2020-Feb-13 19:49 UTC

head link

[llvm-dev] [flang-dev] About OpenMP dialect in MLIR

On Thu, Feb 13, 2020 at 10:18 AM Johannes Doerfert via flang-dev <
flang-dev at lists.llvm.org> wrote:
> Hi Vinay,
>
> Thanks for taking an interest and the detailed discussion.
>
> To start by picking a few paragraph from your email to clarify a couple
> of things that lead to the current design or that might otherwise need
> clarification. We can talk about other points later as well.
>
> [
>   Site notes:
>     1) I'm not an MLIR person.
>     2) It seems unfortnuate that we do not have a mlir-dev list.

MLIR uses discourse, llvm.discourse.group.


> ]
>
>
> > 1. With the current design, the number of transformations /
optimizations
> > that one can write on OpenMP constructs would become limited as there
can
> > be any custom loop structure with custom operations / types inside it.
>
> OpenMP, as an input language, does not make many assumptions about the
> code inside of constructs*. So, inside a parallel can be almost anything
> the base language has to offer, both lexically and dynamically.
> Assuming otherwise is not going to work. Analyzing a "generic"
OpenMP
> representation in order to determine if can be represented as a more
> restricted "op" seems at least plausible. You will run into
various
> issue, some mentioned explicitly below. For starters, you still have to
> generate proper OpenMP runtime calls, e.g., from your GPU dialect, even
> if it is "just" to make sure the OMPD/OMPT interfaces expose
useful
> information.
>
>
> * I preclude the `omp loop` construct here as it is not even implemented
>   anywhere as far as I know.
>
>
> > 2. It would also be easier to transform the Loop nests containing
OpenMP
> > constructs if the body of the OpenMP operations is well defined (i.e.,
> does
> > not accept arbitrary loop structures). Having nested redundant
> "parallel" ,
> > "target" and "do" regions seems unnecessary.
>
> As mentioned above, you cannot start with the assumption OpenMP input is
> structured this this way. You have to analyze it first. This is the same
> reason we cannot simply transform C/C++ `for loops` into `affine.for`
> without proper analysis of the loop body.
>
> Now, more concrete. Nested parallel and target regions are not
> necessarily redundant, nor can/should we require the user not to have
> them. Nested parallelism can easily make sense, depending on the problem
> decomposition. Nested target will make a lot of sense with reverse
> offload, which is already in the standard, and it also should be allowed
> for the sake of a modular (user) code base.
>
>
> > 3. There would also be new sets of loop structures in new dialects
when
> > C/C++ is compiled to MLIR. It would complicate the number of possible
> > combinations inside the OpenMP region.
>
> Is anyone working on this? If so, what is the timeline? I personally was
> not expecting Clang to switch over to MLIR any time soon but I am happy
> if someone wants to correct me on this. I mention this only because it
> interacts with the arguments I will make below.
>
>
> > E. Lowering of target constructs mentioned in ( 2(d) ) specifies
direct
> > lowering to LLVM IR ignoring all the advantages that MLIR provides.
Being
> > able to compile the code for heterogeneous hardware is one of the
biggest
> > advantages that MLIR brings to the table. That is being completely
missed
> > here. This also requires solving the problem of handling target
> information
> > in MLIR. But that is a problem which needs to be solved anyway. Using
GPU
> > dialect also gives us an opportunity to represent offloading semantics
in
> > MLIR.
>
> I'm unsure what the problem with "handling target information in
MLIR" is
> but
> whatever design we end up with, we need to know about the target
> (triple) in all stages of the pipeline, even if it is just to pass it
> down.
>
>
> > Given the ability to represent multiple ModuleOps and the existence of
> GPU
> > dialect, couldn't higher level optimizations on offloaded code be
done at
> > MLIR level?. The proposed design would lead us to the same problems
that
> we
> > are currently facing in LLVM IR.
> >
> > Also, OpenMP codegen will automatically benefit from the GPU dialect
> based
> > optimizations. For example, it would be way easier to hoist a memory
> > reference out of GPU kernel in MLIR than in LLVM IR.
>
> While I agree with the premise that you can potentially reuse MLIR
> transformations, it might not be as simple in practice.
>
> As mentioned above, you cannot assume much about OpenMP codes, almost
> nothing for a lot of application codes I have seen. Some examples:
>
> If you have a function call, or any synchronization event for that
> matter, located between two otherwise adjacent target regions (see
> below), you cannot assume the two target regions will be offloaded to
> the same device.
> ```
>   #omp target
>   {}
>   foo();
>   #omp target
>   {}
> ```
> Similarly, you cannot assume a `omp parallel` is allowed to be executed
> with more than a single thread, or that a `omp [parallel] for` does not
> have loop carried data-dependences, ...
> Data-sharing attributes are also something that has to be treated
> carefully:
> ```
> x = 5;
> #omp task
>   x = 3;
> print(x);
> ```
> Should print 5, not 3.
>
> I hope I convinced you that OpenMP is not trivially mappable to existing
> dialects without proper analysis. If not, please let me know why you
> expect it to be.
>
>
> Now when it comes to code analyses, LLVM-IR offers a variety of
> interesting features, ranging from a mature set of passes to the
> cross-language LTO capabilities. We are working on the missing parts,
> e.g., heterogeneous llvm::Modules as we speak. Simple OpenMP
> optimizations are already present in LLVM and interesting ones are
> prototyped for a while now (let me know if you want to see more not-yet
> merged patches/optimizations). I also have papers, results, and
> talks that might be interesting here. Let me know if you need pointers
> to them.
>
>
> Cheers,
>   Johannes
>
>
>
> On 02/13, Vinay Madhusudan via llvm-dev wrote:
> > Hi,
> >
> > I have few questions / concerns regarding the design of OpenMP dialect
in
> > MLIR that is currently being implemented, mainly for the f18 compiler.
> > Below, I summarize the current state of various efforts in clang / f18
/
> > MLIR / LLVM regarding this. Feel free to add to the list in case I
have
> > missed something.
> >
> > 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and
clang
> > frontends. Note that this proposal was before considering MLIR for
FIR.
> >
> > a. llvm-dev proposal :
> >
>
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html
> >
> > b. Patches in review: https://reviews.llvm.org/D70290. This also
> includes
> > the clang codegen changes.
> >
> > 2.  [July - September 2019] OpenMP dialect for MLIR was discussed /
> > proposed with respect to the f18 compilation stack (keeping FIR in
mind).
> >
> > a. flang-dev discussion link:
> > https://lists.llvm.org/pipermail/flang-dev/2019-September/000020.html
> >
> > b. Design decisions captured in PPT:
> > https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view
> >
> > c. MLIR google groups discussion:
> >
> https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw
> >
> > d. Target constructs  design:
> >
>
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html
> >
> > e. SIMD constructs design:
> >
>
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html
> >
> > 3.  [Jan 2020] OpenMP dialect RFC in llvm discourse :
> > https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397
> >
> > 4.  [Jan- Feb 2020] Implementation of OpenMP dialect in MLIR:
> >
> > a. The first patch which introduces the OpenMP dialect was pushed.
> >
> > b. Review of barrier construct is in progress:
> > https://reviews.llvm.org/D72962
> >
> > I have tried to list below different topics of interest (to different
> > people) around this work. Most of these are in the design phase (or
very
> > new) and multiple parties are interested with different sets of goals
in
> > mind.
> >
> > I.  Flang frontend and its integration
> >
> > II. Fortran representation in MLIR / FIR development
> >
> > III. OpenMP development for flang,  OpenMP builder in LLVM.
> >
> > IV. Loop Transformations in MLIR / LLVM with respect to OpenMP.
> >
> > It looks like the design has evolved over time and there is no one
place
> > which contains the latest design decisions that fits all the different
> > pieces of the puzzle. I will try to deduce it from the above mentioned
> > references. Please correct me If I am referring to anything which has
> > changed.
> >
> > A. For most OpenMP design discussions, FIR examples are used (as seen
in
> > (2) and (3)). The MLIR examples mentioned in the design only talks
about
> > FIR dialect and LLVM dialect.
> >
> > This completely ignores the likes of standard, affine (where most loop
> > transformations are supposed to happen) and loop dialects. I think it
is
> > critical to decouple the OpenMP dialect development in MLIR from the
> > current flang / FIR effort. It would be useful if someone can mention
> these
> > examples using existing dialects in MLIR and also how the different
> > transformations / lowerings are planned.
> >
> > B. In latest RFC(3), it is mentioned that the initial OpenMP dialect
> > version will be as follows,
> >
> >   omp.parallel {
> >
> >     omp.do {
> >
> >        fir.do %i = 0 to %ub3 : !fir.integer {
> >
> >         ...
> >
> >        }
> >
> >     }
> >
> >   }
> >
> > and then after the "LLVM conversion" it is converted as
follows:
> >
> >   omp.parallel {
> >
> >     %ub3 > >
> >     omp.do %i = 0 to %ub3 : !llvm.integer {
> >
> >     ...
> >
> >     }
> >
> >   }
> >
> >
> > a. Is it the same omp.do operation which now contains the bounds and
> > induction variables of the loop after the LLVM conversion? If so, will
> the
> > same operation have two different semantics during a single
compilation?
> >
> > b. Will there be different lowerings for various loop operations from
> > different dialects? loop.for and affine.for under omp operations would
> need
> > different OpenMP / LLVM lowerings. Currently, both of them are lowered
to
> > the CFG based loops during the LLVM dialect conversion (which is much
> > before the proposed OpenMP dialect lowering).
> >
> > There would be no standard way to represent OpenMP operations
(especially
> > the ones which involve loops) in MLIR. This would drastically
complicate
> > lowering.
> >
> > C. It is also not mentioned how clauses like firstprivate, shared,
> private,
> > reduce, map, etc are lowered to OpenMP dialect. The example in the RFC
> > contains FIR and LLVM types and nothing about std dialect types.
Consider
> > the below example:
> >
> > #pragma omp parallel for reduction(+:x)
> >
> > for (int i = 0; i < N; ++i)
> >
> >   x += a[i];
> >
> > How would the above be represented in OpenMP dialect? and What type
would
> > "x" be in MLIR?  It is not mentioned in the design as to how
the various
> > SSA values for various OpenMP clauses are passed around in OpenMP
> > operations.
> >
> > D. Because of (A), (B) and (C), it would be beneficial to have an omp.
> > parallel_do operation which has semantics similar to other loop
> structures
> > (may not be LoopLikeInterface) in MLIR. To me, it looks like having
> OpenMP
> > operations based on standard MLIR types and operations (scalars and
> memrefs
> > mainly) is the right way to go.
> >
> > Why not have omp.parallel_do operation with AffineMap based bounds, so
as
> > to decouple it from Value/Type similar to affine.for?
> >
> > 1. With the current design, the number of transformations /
optimizations
> > that one can write on OpenMP constructs would become limited as there
can
> > be any custom loop structure with custom operations / types inside it.
> >
> > 2. It would also be easier to transform the Loop nests containing
OpenMP
> > constructs if the body of the OpenMP operations is well defined (i.e.,
> does
> > not accept arbitrary loop structures). Having nested redundant
> "parallel" ,
> > "target" and "do" regions seems unnecessary.
> >
> > 3. There would also be new sets of loop structures in new dialects
when
> > C/C++ is compiled to MLIR. It would complicate the number of possible
> > combinations inside the OpenMP region.
> >
> > E. Lowering of target constructs mentioned in ( 2(d) ) specifies
direct
> > lowering to LLVM IR ignoring all the advantages that MLIR provides.
Being
> > able to compile the code for heterogeneous hardware is one of the
biggest
> > advantages that MLIR brings to the table. That is being completely
missed
> > here. This also requires solving the problem of handling target
> information
> > in MLIR. But that is a problem which needs to be solved anyway. Using
GPU
> > dialect also gives us an opportunity to represent offloading semantics
in
> > MLIR.
> >
> > Given the ability to represent multiple ModuleOps and the existence of
> GPU
> > dialect, couldn't higher level optimizations on offloaded code be
done at
> > MLIR level?. The proposed design would lead us to the same problems
that
> we
> > are currently facing in LLVM IR.
> >
> > Also, OpenMP codegen will automatically benefit from the GPU dialect
> based
> > optimizations. For example, it would be way easier to hoist a memory
> > reference out of GPU kernel in MLIR than in LLVM IR.
> _______________________________________________
> flang-dev mailing list
> flang-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/flang-dev
>-- 
Thank you,
River Riddle
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200213/c86712e3/attachment.html>

Kiran Chandramohan via llvm-dev

2020-Feb-14 01:22 UTC

head link

[llvm-dev] [flang-dev] About OpenMP dialect in MLIR

Hello Vinay,

Thanks for your mail about the OpenMP dialect in MLIR. Happy to know that you
and several other groups are interested in the OpenMP dialect. At the outset, I
must point out that the design is not set in stone and will change as we make
progress. You are welcome to participate, provide feedback and criticism to
change the design as well as to contribute to the implementation. I provide some
clarifications and replies to your comments below. If it is OK we can have
further discussions in discourse as River points out.
1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and clang
frontends. Note that this proposal was before considering MLIR for FIR.
A correction here. The proposal for OpenMPIRBuilder was made when MLIR was being
considered for FIR.
(i) Gary Klimowicz's minutes for Flang call in April 2019 mentions
considering MLIR for FIR.
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-April/000194.html
(ii) My reply to Johaness's proposal in May 2019 mentions MLIR for FIR.
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000220.html


b. Review of barrier construct is in progress: https://reviews.llvm.org/D72962

Minor correction here. The addition of barrier construct was accepted and has
landed (https://reviews.llvm.org/D7240<https://reviews.llvm.org/D72400>).
It is the review for translation to LLVM IR that is in progress.

It looks like the design has evolved over time and there is no one place which
contains the latest design decisions that fits all the different pieces of the
puzzle. I will try to deduce it from the above mentioned references. Please
correct me If I am referring to anything which has changed.
Yes, the design has mildly changed over time to incorporate feedback. But the
latest is what is there in the RFC in discourse.

For most OpenMP design discussions, FIR examples are used (as seen in (2) and
(3)). The MLIR examples mentioned in the design only talks about FIR dialect and
LLVM dialect.
Our initial concern was how will all these pieces (FIR, LLVM Dialect,
OpenMPIRBuilder, LLVM IR) fit together. Hence you see the prominence of FIR and
LLVM dialect and more information about lowering/translation than
transformations/optimisations.

This completely ignores the likes of standard, affine (where most loop
transformations are supposed to happen) and loop dialects.
Adding to the reply above. We would like to take advantage of the
transformations in cases that are possible. FIR loops will be converted to
affine/loop dialect. So the loop inside an omp.do can be in these dialects as
clarified in the discussion in discourse and also shown in slide 20 of the
fosdem presentation (links to both below).
https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397/7?u=kiranchandramohan
https://fosdem.org/2020/schedule/event/llvm_flang/attachments/slides/3839/export/events/attachments/llvm_flang/slides/3839/flang_llvm_frontend.pdf

I must also point out that the question of where to do loop transformations is a
topic we have not fully converged on. See the following thread for discussions.
http://lists.llvm.org/pipermail/flang-dev/2019-September/000042.html

Is it the same omp.do operation which now contains the bounds and induction
variables of the loop after the LLVM conversion?
The point here is that i) we need to keep the loops separately so as to take
advantage of transformations that other dialects like affine/loop would provide.
ii) We will need the loop information while lowering the OpenMP do operation.
For implementation, if reusing the same operation (in different contexts) is
difficult then we can add a new operation.
It is also not mentioned how clauses like firstprivate, shared, private, reduce,
map, etc are lowered to OpenMP dialect.
Yes, it is not mentioned. We did a study of a few constructs and clauses which
was shared as mails to flang-dev and the RFC. As we make progress and before
implementation, we will share further details.

it would be beneficial to have an omp.parallel_do operation which has semantics
similar to other loop structures (may not be LoopLikeInterface) in MLIR.
I am not against adding parallel_do if it can help with transformations or
reduce the complexity of lowering. Please share the details in discourse as a
reply to the RFC or a separate thread.
it looks like having OpenMP operations based on standard MLIR types and
operations (scalars and memrefs mainly) is the right way to go.
This will definitely be the first version that we implement. But I do not
understand why we should restrict to only the standard types and operations. To
ease lowering and translation and to avoid adding OpenMP operations to other
dialects, I believe OpenMP dialect should also be able to exist with other
dialects like FIR and LLVM.
E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct lowering
to LLVM IR ignoring all the advantages that MLIR provides.
Also, OpenMP codegen will automatically benefit from the GPU dialect based
optimizations. For example, it would be way easier to hoist a memory reference
out of GPU kernel in MLIR than in LLVM IR.
I might not have fully understood you here. But the dialect lives independently
of the translation to LLVM IR. If there are optimisations (like hoisting that
you mention here) I believe they can be performed as transformation passes on
the dialect. It is not ruled out.

--Kiran
________________________________
From: flang-dev <flang-dev-bounces at lists.llvm.org> on behalf of Vinay
Madhusudan via flang-dev <flang-dev at lists.llvm.org>
Sent: 13 February 2020 16:33
To: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>; flang-dev at
lists.llvm.org <flang-dev at lists.llvm.org>
Subject: [flang-dev] About OpenMP dialect in MLIR


Hi,


I have few questions / concerns regarding the design of OpenMP dialect in MLIR
that is currently being implemented, mainly for the f18 compiler. Below, I
summarize the current state of various efforts in clang / f18 / MLIR / LLVM
regarding this. Feel free to add to the list in case I have missed something.


1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and clang
frontends. Note that this proposal was before considering MLIR for FIR.

a. llvm-dev proposal :
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html

b. Patches in review: https://reviews.llvm.org/D70290. This also includes the
clang codegen changes.


2.  [July - September 2019] OpenMP dialect for MLIR was discussed / proposed
with respect to the f18 compilation stack (keeping FIR in mind).

a. flang-dev discussion link:
https://lists.llvm.org/pipermail/flang-dev/2019-September/000020.html

b. Design decisions captured in PPT:
https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view

c. MLIR google groups discussion:
https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw

d. Target constructs  design:
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html

e. SIMD constructs design:
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html


3.  [Jan 2020] OpenMP dialect RFC in llvm discourse :
https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397


4.  [Jan- Feb 2020] Implementation of OpenMP dialect in MLIR:

a. The first patch which introduces the OpenMP dialect was pushed.

b. Review of barrier construct is in progress: https://reviews.llvm.org/D72962

https://reviews.llvm.org/D72400

I have tried to list below different topics of interest (to different people)
around this work. Most of these are in the design phase (or very new) and
multiple parties are interested with different sets of goals in mind.

I.  Flang frontend and its integration

II. Fortran representation in MLIR / FIR development

III. OpenMP development for flang,  OpenMP builder in LLVM.

IV. Loop Transformations in MLIR / LLVM with respect to OpenMP.


It looks like the design has evolved over time and there is no one place which
contains the latest design decisions that fits all the different pieces of the
puzzle. I will try to deduce it from the above mentioned references. Please
correct me If I am referring to anything which has changed.


A. For most OpenMP design discussions, FIR examples are used (as seen in (2) and
(3)). The MLIR examples mentioned in the design only talks about FIR dialect and
LLVM dialect.


This completely ignores the likes of standard, affine (where most loop
transformations are supposed to happen) and loop dialects. I think it is
critical to decouple the OpenMP dialect development in MLIR from the current
flang / FIR effort. It would be useful if someone can mention these examples
using existing dialects in MLIR and also how the different transformations /
lowerings are planned.


B. In latest RFC(3), it is mentioned that the initial OpenMP dialect version
will be as follows,


  omp.parallel {

    omp.do {

       fir.do %i = 0 to %ub3 : !fir.integer {

        ...

       }

    }

  }


and then after the "LLVM conversion" it is converted as follows:


  omp.parallel {

    %ub3 
    omp.do %i = 0 to %ub3 : !llvm.integer {

    ...

    }

  }



a. Is it the same omp.do operation which now contains the bounds and induction
variables of the loop after the LLVM conversion? If so, will the same operation
have two different semantics during a single compilation?


b. Will there be different lowerings for various loop operations from different
dialects? loop.for and affine.for under omp operations would need different
OpenMP / LLVM lowerings. Currently, both of them are lowered to the CFG based
loops during the LLVM dialect conversion (which is much before the proposed
OpenMP dialect lowering).


There would be no standard way to represent OpenMP operations (especially the
ones which involve loops) in MLIR. This would drastically complicate lowering.


C. It is also not mentioned how clauses like firstprivate, shared, private,
reduce, map, etc are lowered to OpenMP dialect. The example in the RFC contains
FIR and LLVM types and nothing about std dialect types. Consider the below
example:


#pragma omp parallel for reduction(+:x)

for (int i = 0; i < N; ++i)

  x += a[i];


How would the above be represented in OpenMP dialect? and What type would
"x" be in MLIR?  It is not mentioned in the design as to how the
various SSA values for various OpenMP clauses are passed around in OpenMP
operations.


D. Because of (A), (B) and (C), it would be beneficial to have an
omp.parallel_do operation which has semantics similar to other loop structures
(may not be LoopLikeInterface) in MLIR. To me, it looks like having OpenMP
operations based on standard MLIR types and operations (scalars and memrefs
mainly) is the right way to go.


Why not have omp.parallel_do operation with AffineMap based bounds, so as to
decouple it from Value/Type similar to affine.for?


1. With the current design, the number of transformations / optimizations that
one can write on OpenMP constructs would become limited as there can be any
custom loop structure with custom operations / types inside it.


2. It would also be easier to transform the Loop nests containing OpenMP
constructs if the body of the OpenMP operations is well defined (i.e., does not
accept arbitrary loop structures). Having nested redundant "parallel"
, "target" and "do" regions seems unnecessary.


3. There would also be new sets of loop structures in new dialects when C/C++ is
compiled to MLIR. It would complicate the number of possible combinations inside
the OpenMP region.


E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct lowering
to LLVM IR ignoring all the advantages that MLIR provides. Being able to compile
the code for heterogeneous hardware is one of the biggest advantages that MLIR
brings to the table. That is being completely missed here. This also requires
solving the problem of handling target information in MLIR. But that is a
problem which needs to be solved anyway. Using GPU dialect also gives us an
opportunity to represent offloading semantics in MLIR.


Given the ability to represent multiple ModuleOps and the existence of GPU
dialect, couldn't higher level optimizations on offloaded code be done at
MLIR level?. The proposed design would lead us to the same problems that we are
currently facing in LLVM IR.


Also, OpenMP codegen will automatically benefit from the GPU dialect based
optimizations. For example, it would be way easier to hoist a memory reference
out of GPU kernel in MLIR than in LLVM IR.


Thanks,

Vinay

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200214/49c8427b/attachment.html>

Vinay Madhusudan via llvm-dev

2020-Feb-14 18:20 UTC

head link

[llvm-dev] About OpenMP dialect in MLIR

Thanks for the reply!

It sounds like LLVM IR is being considered for optimizations in OpenMP
constructs. There seems to be plans regarding improvement of LLVM IR
Framework for providing things required for OpenMP / flang(?)

Are there any design considerations which contain pros and cons about using
the MLIR vs LLVM IR for various OpenMP related optimizations/
transformations?

The latest RFC [ (3) in my original post ] mentions that:
> So there exist some questions regarding where the optimisations should becarried out.

Could you please provide more details on this?

I would like to quote Chris here:

“if you ignore the engineering expense, it would clearly make sense to
reimplement the mid-level LLVM optimizers on top of MLIR and replace
include/llvm/IR with a dialect definition in MLIR instead.“ --
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138341.html

*Rest of the comment are inlined.*

On Thu, Feb 13, 2020 at 11:48 PM Johannes Doerfert <jdoerfert at anl.gov>
wrote:
> Hi Vinay,
>
> Thanks for taking an interest and the detailed discussion.
>
> To start by picking a few paragraph from your email to clarify a couple
> of things that lead to the current design or that might otherwise need
> clarification. We can talk about other points later as well.
>
> [
>   Site notes:
>     1) I'm not an MLIR person.
>     2) It seems unfortnuate that we do not have a mlir-dev list.
> ]
>
>
> > 1. With the current design, the number of transformations /
optimizations
> > that one can write on OpenMP constructs would become limited as there
can
> > be any custom loop structure with custom operations / types inside it.
>
> OpenMP, as an input language, does not make many assumptions about the
> code inside of constructs*.

This isn’t entirely correct because the current OpenMP API specification (
https://www.openmp.org/spec-html/5.0/openmpch1.html) assumes that the code
inside the constructs belong to C, C++ and Fortran programs.

> So, inside a parallel can be almost anything
> the base language has to offer, both lexically and dynamically.
>

I am mostly concerned with the MLIR side of things for OpenMP
representation.

 MLIR can not only support operations for General Purpose languages like
C,C++, Fortran, etc but also various Domain Specific Language
representations as dialects (Example, ML, etc.). Note that there is also
SPIR V dialect which is again meant for “Parallel Compute”.

 It becomes important to define the scope of the dialects / operations /
types supported inside OpenMP operations in MLIR.

> Assuming otherwise is not going to work. Analyzing a "generic"
OpenMP
> representation in order to determine if can be represented as a more
> restricted "op" seems at least plausible. You will run into
various
> issue, some mentioned explicitly below.


Isn’t it the other way around? For example, it doesn’t make much sense to
wrap OpenMP operations for SPIR-V operations / types.

I think it is important to specify (in the design) which existing MLIR
dialects are supported in this effort and the various lowerings /
transformations / optimizations which are planned for them.

> For starters, you still have to
> generate proper OpenMP runtime calls, e.g., from your GPU dialect, even
> if it is "just" to make sure the OMPD/OMPT interfaces expose
useful
> information.
>
>You can have a well-defined call-like mlir::Operation which calls the GPU
kernel. Perform all cross-device transformations in an easier way.
Then, this operation can be lowered to OpenMP runtime calls during LLVM
dialect conversion. I think this is much better than directly having calls
to the OpenMP runtime library based on a kernel name mentioned in
llvm::GlobalVariable.

>
> * I preclude the `omp loop` construct here as it is not even implemented
>   anywhere as far as I know.
>
>
> > 2. It would also be easier to transform the Loop nests containing
OpenMP
> > constructs if the body of the OpenMP operations is well defined (i.e.,
> does
> > not accept arbitrary loop structures). Having nested redundant
> "parallel" ,
> > "target" and "do" regions seems unnecessary.
>
> As mentioned above, you cannot start with the assumption OpenMP input is
> structured this this way. You have to analyze it first. This is the same
> reason we cannot simply transform C/C++ `for loops` into `affine.for`
> without proper analysis of the loop body.
>
> Now, more concrete. Nested parallel and target regions are not
> necessarily redundant, nor can/should we require the user not to have
> them. Nested parallelism can easily make sense, depending on the problem
> decomposition. Nested target will make a lot of sense with reverse
> offload, which is already in the standard, and it also should be allowed
> for the sake of a modular (user) code base.
>
Just to be clear, having all three of “target”, “parallel” and “do” doesn’t
represent “Nested parallelism” at all in the proposed design! ( 2(d) ).

omp.target {

  omp.parallel {

     omp.do {

      …...

      }

   }

}

Above invokes a call to the tgt_target() for the code inside omp.do as
mentioned in the proposal.

>
> > 3. There would also be new sets of loop structures in new dialects
when
> > C/C++ is compiled to MLIR. It would complicate the number of possible
> > combinations inside the OpenMP region.
>
> Is anyone working on this? If so, what is the timeline? I personally was
> not expecting Clang to switch over to MLIR any time soon but I am happy
> if someone wants to correct me on this. I mention this only because it
> interacts with the arguments I will make below.
>
>
> > E. Lowering of target constructs mentioned in ( 2(d) ) specifies
direct
> > lowering to LLVM IR ignoring all the advantages that MLIR provides.
Being
> > able to compile the code for heterogeneous hardware is one of the
biggest
> > advantages that MLIR brings to the table. That is being completely
missed
> > here. This also requires solving the problem of handling target
> information
> > in MLIR. But that is a problem which needs to be solved anyway. Using
GPU
> > dialect also gives us an opportunity to represent offloading semantics
in
> > MLIR.
>
> I'm unsure what the problem with "handling target information in
MLIR" is
> but
> whatever design we end up with, we need to know about the target
> (triple) in all stages of the pipeline, even if it is just to pass it
> down.
>
>
> > Given the ability to represent multiple ModuleOps and the existence of
> GPU
> > dialect, couldn't higher level optimizations on offloaded code be
done at
> > MLIR level?. The proposed design would lead us to the same problems
that
> we
> > are currently facing in LLVM IR.
> >
> > Also, OpenMP codegen will automatically benefit from the GPU dialect
> based
> > optimizations. For example, it would be way easier to hoist a memory
> > reference out of GPU kernel in MLIR than in LLVM IR.
>
> While I agree with the premise that you can potentially reuse MLIR
> transformations, it might not be as simple in practice.
>
> As mentioned above, you cannot assume much about OpenMP codes, almost
> nothing for a lot of application codes I have seen. Some examples:
>
> If you have a function call, or any synchronization event for that
> matter, located between two otherwise adjacent target regions (see
> below), you cannot assume the two target regions will be offloaded to
> the same device.
> ```
>   #omp target
>   {}
>   foo();
>   #omp target
>   {}
> ```
>
These kinds of optimizations are much easier to write in MLIR:

LLVM IR for the above code would contain a series of instructions of OpenMP
runtime call setup and foo() in the middle followed by another set of
OpenMP runtime related instructions. The body of the two target constructs
would be in two different outlined functions (if not modules).

It takes quite a bit of code to do analysis / transformation to write any
optimization on the generated LLVM IR.

vs.

MLIR provides a way to represent the operations closer to the source. It is
as simple as checking the next operation(s) in the mlir::Block. OpenMP
target operation contains an inlined region which can easily be fused/
split /  or any other valid transformation for that matter.

Note that you can also perform various Control Structure Analysis /
Transformations much easier in MLIR. For example, you can decide to execute
foo() based on certain conditions, and you can merge the two target regions
in the else path.

> Similarly, you cannot assume a `omp parallel` is allowed to be executed
> with more than a single thread, or that a `omp [parallel] for` does not
> have loop carried data-dependences, ...
>
With multi-dimensional index support for arrays, wouldn’t it be better to
do the data dependence analysis in MLIR?

LLVM IR has linearized subscripts for multi-dimensional arrays.
llvm::DependenceAnalysis tries to “guess” the indices based on different
patterns in SCEV. It takes an intrinsic
<http://llvm.org/devmtg/2020-04/talks.html#LightningTalk_88> or metadata
or
some other mechanism of communication from the front end (not the built-in
set of instructions) to solve this problem.

> Data-sharing attributes are also something that has to be treated
> carefully:
> ```
> x = 5;
> #omp task
>   x = 3;
> print(x);
> ```
> Should print 5, not 3.
>
You can have “x” as a locally defined variable inside the “task” contained
region in MLIR OR custom data-sharing attributes in OpenMP dialect.
>
> I hope I convinced you that OpenMP is not trivially mappable to existing
> dialects without proper analysis. If not, please let me know why you
> expect it to be.
>
> I do not see much reason why the issues you mentioned can’t trivially bemapped to the MLIR infrastructure. There is an easy way to define custom
operations / types / attributes in OpenMP dialect and perform optimizations
based on the *IR that is created especially for OpenMP*. The analysis /
transformations required can be easily written on the custom operations
defined rather than having a lowered form in the LLVM IR.

The various dialects / transformations in MLIR are in development / early
phase (Example, GPU dialect) waiting to be improved with use cases such as
this!

>
> Now when it comes to code analyses, LLVM-IR offers a variety of
> interesting features, ranging from a mature set of passes to the
> cross-language LTO capabilities. We are working on the missing parts,
> e.g., heterogeneous llvm::Modules as we speak. Simple OpenMP
> optimizations are already present in LLVM and interesting ones are
> prototyped for a while now (let me know if you want to see more not-yet
> merged patches/optimizations). I also have papers, results, and
> talks that might be interesting here. Let me know if you need pointers
> to them.
>
>
> Cheers,
>   Johannes
>
>
>
> On 02/13, Vinay Madhusudan via llvm-dev wrote:
> > Hi,
> >
> > I have few questions / concerns regarding the design of OpenMP dialect
in
> > MLIR that is currently being implemented, mainly for the f18 compiler.
> > Below, I summarize the current state of various efforts in clang / f18
/
> > MLIR / LLVM regarding this. Feel free to add to the list in case I
have
> > missed something.
> >
> > 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and
clang
> > frontends. Note that this proposal was before considering MLIR for
FIR.
> >
> > a. llvm-dev proposal :
> >
>
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html
> >
> > b. Patches in review: https://reviews.llvm.org/D70290. This also
> includes
> > the clang codegen changes.
> >
> > 2.  [July - September 2019] OpenMP dialect for MLIR was discussed /
> > proposed with respect to the f18 compilation stack (keeping FIR in
mind).
> >
> > a. flang-dev discussion link:
> > https://lists.llvm.org/pipermail/flang-dev/2019-September/000020.html
> >
> > b. Design decisions captured in PPT:
> > https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view
> >
> > c. MLIR google groups discussion:
> >
> https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw
> >
> > d. Target constructs  design:
> >
>
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html
> >
> > e. SIMD constructs design:
> >
>
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html
> >
> > 3.  [Jan 2020] OpenMP dialect RFC in llvm discourse :
> > https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397
> >
> > 4.  [Jan- Feb 2020] Implementation of OpenMP dialect in MLIR:
> >
> > a. The first patch which introduces the OpenMP dialect was pushed.
> >
> > b. Review of barrier construct is in progress:
> > https://reviews.llvm.org/D72962
> >
> > I have tried to list below different topics of interest (to different
> > people) around this work. Most of these are in the design phase (or
very
> > new) and multiple parties are interested with different sets of goals
in
> > mind.
> >
> > I.  Flang frontend and its integration
> >
> > II. Fortran representation in MLIR / FIR development
> >
> > III. OpenMP development for flang,  OpenMP builder in LLVM.
> >
> > IV. Loop Transformations in MLIR / LLVM with respect to OpenMP.
> >
> > It looks like the design has evolved over time and there is no one
place
> > which contains the latest design decisions that fits all the different
> > pieces of the puzzle. I will try to deduce it from the above mentioned
> > references. Please correct me If I am referring to anything which has
> > changed.
> >
> > A. For most OpenMP design discussions, FIR examples are used (as seen
in
> > (2) and (3)). The MLIR examples mentioned in the design only talks
about
> > FIR dialect and LLVM dialect.
> >
> > This completely ignores the likes of standard, affine (where most loop
> > transformations are supposed to happen) and loop dialects. I think it
is
> > critical to decouple the OpenMP dialect development in MLIR from the
> > current flang / FIR effort. It would be useful if someone can mention
> these
> > examples using existing dialects in MLIR and also how the different
> > transformations / lowerings are planned.
> >
> > B. In latest RFC(3), it is mentioned that the initial OpenMP dialect
> > version will be as follows,
> >
> >   omp.parallel {
> >
> >     omp.do {
> >
> >        fir.do %i = 0 to %ub3 : !fir.integer {
> >
> >         ...
> >
> >        }
> >
> >     }
> >
> >   }
> >
> > and then after the "LLVM conversion" it is converted as
follows:
> >
> >   omp.parallel {
> >
> >     %ub3 > >
> >     omp.do %i = 0 to %ub3 : !llvm.integer {
> >
> >     ...
> >
> >     }
> >
> >   }
> >
> >
> > a. Is it the same omp.do operation which now contains the bounds and
> > induction variables of the loop after the LLVM conversion? If so, will
> the
> > same operation have two different semantics during a single
compilation?
> >
> > b. Will there be different lowerings for various loop operations from
> > different dialects? loop.for and affine.for under omp operations would
> need
> > different OpenMP / LLVM lowerings. Currently, both of them are lowered
to
> > the CFG based loops during the LLVM dialect conversion (which is much
> > before the proposed OpenMP dialect lowering).
> >
> > There would be no standard way to represent OpenMP operations
(especially
> > the ones which involve loops) in MLIR. This would drastically
complicate
> > lowering.
> >
> > C. It is also not mentioned how clauses like firstprivate, shared,
> private,
> > reduce, map, etc are lowered to OpenMP dialect. The example in the RFC
> > contains FIR and LLVM types and nothing about std dialect types.
Consider
> > the below example:
> >
> > #pragma omp parallel for reduction(+:x)
> >
> > for (int i = 0; i < N; ++i)
> >
> >   x += a[i];
> >
> > How would the above be represented in OpenMP dialect? and What type
would
> > "x" be in MLIR?  It is not mentioned in the design as to how
the various
> > SSA values for various OpenMP clauses are passed around in OpenMP
> > operations.
> >
> > D. Because of (A), (B) and (C), it would be beneficial to have an omp.
> > parallel_do operation which has semantics similar to other loop
> structures
> > (may not be LoopLikeInterface) in MLIR. To me, it looks like having
> OpenMP
> > operations based on standard MLIR types and operations (scalars and
> memrefs
> > mainly) is the right way to go.
> >
> > Why not have omp.parallel_do operation with AffineMap based bounds, so
as
> > to decouple it from Value/Type similar to affine.for?
> >
> > 1. With the current design, the number of transformations /
optimizations
> > that one can write on OpenMP constructs would become limited as there
can
> > be any custom loop structure with custom operations / types inside it.
> >
> > 2. It would also be easier to transform the Loop nests containing
OpenMP
> > constructs if the body of the OpenMP operations is well defined (i.e.,
> does
> > not accept arbitrary loop structures). Having nested redundant
> "parallel" ,
> > "target" and "do" regions seems unnecessary.
> >
> > 3. There would also be new sets of loop structures in new dialects
when
> > C/C++ is compiled to MLIR. It would complicate the number of possible
> > combinations inside the OpenMP region.
> >
> > E. Lowering of target constructs mentioned in ( 2(d) ) specifies
direct
> > lowering to LLVM IR ignoring all the advantages that MLIR provides.
Being
> > able to compile the code for heterogeneous hardware is one of the
biggest
> > advantages that MLIR brings to the table. That is being completely
missed
> > here. This also requires solving the problem of handling target
> information
> > in MLIR. But that is a problem which needs to be solved anyway. Using
GPU
> > dialect also gives us an opportunity to represent offloading semantics
in
> > MLIR.
> >
> > Given the ability to represent multiple ModuleOps and the existence of
> GPU
> > dialect, couldn't higher level optimizations on offloaded code be
done at
> > MLIR level?. The proposed design would lead us to the same problems
that
> we
> > are currently facing in LLVM IR.
> >
> > Also, OpenMP codegen will automatically benefit from the GPU dialect
> based
> > optimizations. For example, it would be way easier to hoist a memory
> > reference out of GPU kernel in MLIR than in LLVM IR.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200214/edacb30c/attachment-0001.html>

Vinay Madhusudan via llvm-dev

2020-Feb-15 16:22 UTC

head link

[llvm-dev] [flang-dev] About OpenMP dialect in MLIR

Reply to Kiran Chandramohan:
> You are welcome to participate, provide feedback and criticism to changethe design as well as to contribute to the implementation.

Thank you Kiran.
> But the latest is what is there in the RFC in discourse.
I have used this as reference for the response.
> We did a study of a few constructs and clauses which was shared as mailsto flang-dev and the RFC. As we make progress and before implementation, we
will share further details.
> “ Yes, parallel and flush would be the next two constructs that we willdo.” -- from a comment in latest RFC

For the above mentioned reasons, I will try to restrict my reply to how the
“parallel (do)” construct would be lowered.
> If it is OK we can have further discussions in discourse as River pointsout.

Given that the multiple components of the LLVM project, namely clang,
flang, MLIR and LLVM are involved, llvm-dev is probably a better place,
with a much wider audience, until it is clear how different components must
interact.
> It is the review for translation to LLVM IR that is in progress.
> “If we decide that the OpenMP construct (for e.g. collapse) can behandled fully in MLIR and  that is the best place to do it (based on
experiments) then we will not use the OpenMP IRBuilder for these constructs.”
-- latest RFC in discourse

If it is not finalized that the OpenMPIRBuilder will be used for all the
constructs, wouldn’t it be better to delay the submission of “translation
to LLVM IR” patch in MLIR? Lowering code will become inconsistent if the
OpenMPIRBuilder is used only for a few constructs and not for others.

Also, the patch does OpenMP dialect lowering *alongside* LLVM Dialect to
LLVM IR. This is different from most dialects which get directly lowered to
LLVM Dialect. I think lowering to LLVM Dialect would be a cleaner way if
OpenMPIRBuilder is not being considered for all constructs.

Mehdi also seems to have the same suggestion: “I agree that having dialect
lowering would be cleaner” in https://reviews.llvm.org/D72962
> Yes, the design has mildly changed over time to incorporate feedback.But the latest is what is there in the RFC in discourse.

RFC fails to discuss the following (I have also mentioned some of them in
my reply to Johannes):
> The proposed plan involves a) lowering F18 AST with OpenMP directly to amix of OpenMP and FIR dialects. b) converting this finally to a mix of
OpenMP and LLVM dialects.

It is unclear in the RFC what other dialects are considered as supported
for OpenMP dialect  (std, affine, vector, loop, etc) and how it would be
transformed, used and lowered from FIR to LLVM.

It becomes important to list down the various dialects / operations / types
supported for OpenMP (which is mainly defined for C, C++ and Fortran
programs. MLIR has a much wider scope.

It wouldn’t add much value for the proposed OpenMP dialect to be in the
MLIR tree if it cannot support at least the relevant standard dialect types
/ operations.
> We would like to take advantage of the transformations in cases that arepossible. FIR loops will be converted to affine/loop dialect. So the loop
inside an omp.do can be in these dialects as clarified in the discussion in
discourse and also shown in slide 20 of the fosdem presentation (links to
both below).

https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397/7?u=kiranchandramohan

https://fosdem.org/2020/schedule/event/llvm_flang/attachments/slides/3839/export/events/attachments/llvm_flang/slides/3839/flang_llvm_frontend.pdf

Although it is mentioned that the affine/ loop.for is used, following
things are unclear:

I am assuming that there will be lowering / conversion code in f18 repo
dialect from fir.do to loop.for / affine.for. Is it the case? If so, I
think it is worth mentioning it in the “sequential code flow
representation” in the RFC.

This raises the following questions.



   1.

   Which types are supported? Standard dialect types and FIR types?


For example, what types are used for Fortran arrays used inside OpenMP
regions? Is it std.memref OR Fortran array representation in FIR dialect
(fir.array?) OR both?  Note that Fortran has support for column major
arrays. std.memref supports custom memory layouts. What custom layouts are
supported?


How would different non-scalar types in standard dialect  be lowered to
LLVM IR and passed to OpenMP runtime calls? Can you please elaborate on
this?

The example provided in slide 20 of the fosdem presentation contains

“loop.for %j = %lb2 to %ub2 : !integer {“

But loop.for accepts “index” type. Not sure what type “!integer” represents
here.


   1.

   What are the different memory access operations which are supported
   inside the OpenMP region and lowered to proper OpenMP runtime calls in LLVM
   IR?


The possibilities are:

   1.

   affine.load / affine.store
   2.

   std.load / std.store
   3.

   FIR dialect memory access operations.

> I must also point out that the question of where to do looptransformations is a topic we have not fully converged on. See the
following thread for discussions.
http://lists.llvm.org/pipermail/flang-dev/2019-September/000042.html

Looks like placement (MLIR / LLVM) of various transformations related to
OpenMP has not been finalized, from what I could infer from Johannes’s
reply and the below text in the latest RFC in discourse:

“So there exist some questions regarding where the optimisations should be
carried out. We will decide on which framework to choose only after some
experimentation.”
> i) we need to keep the loops separately so as to take advantage oftransformations that other dialects like affine/loop would provide.

1) Keeping the loops separate from the OpenMP operations will expose them
to the “regular” transformations passes in MLIR inside the OpenMP region.
Most of them are invalid or in-efficient for OpenMP operations.

Examples:

   1.

   Constant propagation example mentioned by Johannes in this thread. (omp
   task shared(x))
   2.

   Loop (nest) transformations (permute / split / fuse / tile, etc) will
   happen ignoring the surrounding OpenMP operations.
   3.

   Hoisting and sinking of various memory/ SSA values inside the OpenMP
   region. This goes against the likes of “map”, “firstprivate”, shared, etc
   clauses and more.


2) Various loop operations (loop.for, affine.for, fir.do) have (or will
have) different transformations/ optimization passes which are different
from one another.

Example:

   1.

   AffineLoopInvariantCodeMotion.cpp is different from
   LoopInvariantCodeMotion.cpp.
   2.

   Other Loop transformation passes for affine.for


These loops also use different Types and memory access operations in
general for transformations. Example, most Affine dialect transformations
(if not all) work on affine.load and affine.store operations.

Supporting different loop operations means that there would be *OpenMP
specific transformations* for each one of them and also requires a way to
restrict each of them from existing transformations (when nested in OpenMP
constructs).

There would be different lowerings for different loop operations as well.
Example, affine.for and loop.for would have to be lowered to omp.do in
different ways.
>From slide 20 of fosdem presentation you mentioned, the LLVM + OpenMPdialect representation is as follows:

------------------------------

Mlir.region(…) {

   omp.parallel  {

     %ub3 = …

     omp.do %i = 0 to %ub3 : !integer  {

     …

     }

  }

}

-------------------------------

Currently, the LLVM Dialect doesn’t contain a high level loop operation. It
is all based on CFG implementation.

Will omp.do follow the same structure (SizedRegion<1>) as loop.for? OR
there would be CFG for LLVM Dialect based loop operation?

Can you please mention how the OpenMP + LLVM dialect will look like for the
below parallel do construct?

integer :: i=1, k=10

integer :: a(10), b(10), c(10)

...

 !$omp parallel do

  do i = 1, k

    if (i .ne. 1) *cycle*

    c(i) = a(i) + b(i)

  end do

  !$omp end parallel do

print *,c


Thanks,

Vinay

On Fri, Feb 14, 2020 at 6:52 AM Kiran Chandramohan via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hello Vinay,
>
> Thanks for your mail about the OpenMP dialect in MLIR. Happy to know that
> you and several other groups are interested in the OpenMP dialect. At the
> outset, I must point out that the design is not set in stone and will
> change as we make progress. You are welcome to participate, provide
> feedback and criticism to change the design as well as to contribute to the
> implementation. I provide some clarifications and replies to your comments
> below. If it is OK we can have further discussions in discourse as River
> points out.
>
> 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and clang
> frontends. Note that this proposal was before considering MLIR for FIR.
>
> A correction here. The proposal for OpenMPIRBuilder was made when MLIR was
> being considered for FIR.
> (i) Gary Klimowicz's minutes for Flang call in April 2019 mentions
> considering MLIR for FIR.
>
>
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-April/000194.html
> (ii) My reply to Johaness's proposal in May 2019 mentions MLIR for FIR.
>
>
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000220.html
>
> b. Review of barrier construct is in progress:
> https://reviews.llvm.org/D72962
>
> Minor correction here. The addition of barrier construct was accepted and
> has landed (https://reviews.llvm.org/D7240
> <https://reviews.llvm.org/D72400>). It is the review for translation
to
> LLVM IR that is in progress.
>
> It looks like the design has evolved over time and there is no one place
> which contains the latest design decisions that fits all the different
> pieces of the puzzle. I will try to deduce it from the above mentioned
> references. Please correct me If I am referring to anything which has
> changed.
>
> Yes, the design has mildly changed over time to incorporate feedback. But
> the latest is what is there in the RFC in discourse.
>
> For most OpenMP design discussions, FIR examples are used (as seen in (2)
> and (3)). The MLIR examples mentioned in the design only talks about FIR
> dialect and LLVM dialect.
>
> Our initial concern was how will all these pieces (FIR, LLVM Dialect,
> OpenMPIRBuilder, LLVM IR) fit together. Hence you see the prominence of FIR
> and LLVM dialect and more information about lowering/translation than
> transformations/optimisations.
>
> This completely ignores the likes of standard, affine (where most loop
> transformations are supposed to happen) and loop dialects.
>
> Adding to the reply above. We would like to take advantage of the
> transformations in cases that are possible. FIR loops will be converted to
> affine/loop dialect. So the loop inside an omp.do can be in these dialects
> as clarified in the discussion in discourse and also shown in slide 20 of
> the fosdem presentation (links to both below).
>
>
https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397/7?u=kiranchandramohan
>
>
https://fosdem.org/2020/schedule/event/llvm_flang/attachments/slides/3839/export/events/attachments/llvm_flang/slides/3839/flang_llvm_frontend.pdf
>
> I must also point out that the question of where to do loop
> transformations is a topic we have not fully converged on. See the
> following thread for discussions.
> http://lists.llvm.org/pipermail/flang-dev/2019-September/000042.html
>
> Is it the same omp.do operation which now contains the bounds and
> induction variables of the loop after the LLVM conversion?
>
> The point here is that i) we need to keep the loops separately so as to
> take advantage of transformations that other dialects like affine/loop
> would provide. ii) We will need the loop information while lowering the
> OpenMP do operation. For implementation, if reusing the same operation (in
> different contexts) is difficult then we can add a new operation.
>
> It is also not mentioned how clauses like firstprivate, shared, private,
> reduce, map, etc are lowered to OpenMP dialect.
>
> Yes, it is not mentioned. We did a study of a few constructs and clauses
> which was shared as mails to flang-dev and the RFC. As we make progress and
> before implementation, we will share further details.
>
> it would be beneficial to have an omp.parallel_do operation which has
> semantics similar to other loop structures (may not be LoopLikeInterface)
> in MLIR.
>
> I am not against adding parallel_do if it can help with transformations or
> reduce the complexity of lowering. Please share the details in discourse as
> a reply to the RFC or a separate thread.
>
> it looks like having OpenMP operations based on standard MLIR types and
> operations (scalars and memrefs mainly) is the right way to go.
>
> This will definitely be the first version that we implement. But I do not
> understand why we should restrict to only the standard types and
> operations. To ease lowering and translation and to avoid adding OpenMP
> operations to other dialects, I believe OpenMP dialect should also be able
> to exist with other dialects like FIR and LLVM.
>
> E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct
> lowering to LLVM IR ignoring all the advantages that MLIR provides.
>
> Also, OpenMP codegen will automatically benefit from the GPU dialect based
> optimizations. For example, it would be way easier to hoist a memory
> reference out of GPU kernel in MLIR than in LLVM IR.
>
> I might not have fully understood you here. But the dialect lives
> independently of the translation to LLVM IR. If there are optimisations
> (like hoisting that you mention here) I believe they can be performed as
> transformation passes on the dialect. It is not ruled out.
>
> --Kiran
> ------------------------------
> *From:* flang-dev <flang-dev-bounces at lists.llvm.org> on behalf of
Vinay
> Madhusudan via flang-dev <flang-dev at lists.llvm.org>
> *Sent:* 13 February 2020 16:33
> *To:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>;
> flang-dev at lists.llvm.org <flang-dev at lists.llvm.org>
> *Subject:* [flang-dev] About OpenMP dialect in MLIR
>
>
> Hi,
>
> I have few questions / concerns regarding the design of OpenMP dialect in
> MLIR that is currently being implemented, mainly for the f18 compiler.
> Below, I summarize the current state of various efforts in clang / f18 /
> MLIR / LLVM regarding this. Feel free to add to the list in case I have
> missed something.
>
> 1. [May 2019] An OpenMPIRBuilder in LLVM was proposed for flang and clang
> frontends. Note that this proposal was before considering MLIR for FIR.
>
> a. llvm-dev proposal :
>
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html
>
> b. Patches in review: https://reviews.llvm.org/D70290. This also includes
> the clang codegen changes.
>
> 2.  [July - September 2019] OpenMP dialect for MLIR was discussed /
> proposed with respect to the f18 compilation stack (keeping FIR in mind).
>
> a. flang-dev discussion link:
> https://lists.llvm.org/pipermail/flang-dev/2019-September/000020.html
>
> b. Design decisions captured in PPT:
> https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view
>
> c. MLIR google groups discussion:
> https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw
>
> d. Target constructs  design:
>
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html
>
> e. SIMD constructs design:
>
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html
>
> 3.  [Jan 2020] OpenMP dialect RFC in llvm discourse :
> https://llvm.discourse.group/t/rfc-openmp-dialect-in-mlir/397
>
> 4.  [Jan- Feb 2020] Implementation of OpenMP dialect in MLIR:
>
> a. The first patch which introduces the OpenMP dialect was pushed.
>
> b. Review of barrier construct is in progress:
> https://reviews.llvm.org/D72962
> https://reviews.llvm.org/D72400
>
> I have tried to list below different topics of interest (to different
> people) around this work. Most of these are in the design phase (or very
> new) and multiple parties are interested with different sets of goals in
> mind.
>
> I.  Flang frontend and its integration
>
> II. Fortran representation in MLIR / FIR development
>
> III. OpenMP development for flang,  OpenMP builder in LLVM.
>
> IV. Loop Transformations in MLIR / LLVM with respect to OpenMP.
>
> It looks like the design has evolved over time and there is no one place
> which contains the latest design decisions that fits all the different
> pieces of the puzzle. I will try to deduce it from the above mentioned
> references. Please correct me If I am referring to anything which has
> changed.
>
> A. For most OpenMP design discussions, FIR examples are used (as seen in
> (2) and (3)). The MLIR examples mentioned in the design only talks about
> FIR dialect and LLVM dialect.
>
> This completely ignores the likes of standard, affine (where most loop
> transformations are supposed to happen) and loop dialects. I think it is
> critical to decouple the OpenMP dialect development in MLIR from the
> current flang / FIR effort. It would be useful if someone can mention these
> examples using existing dialects in MLIR and also how the different
> transformations / lowerings are planned.
>
> B. In latest RFC(3), it is mentioned that the initial OpenMP dialect
> version will be as follows,
>
>   omp.parallel {
>
>     omp.do {
>
>        fir.do %i = 0 to %ub3 : !fir.integer {
>
>         ...
>
>        }
>
>     }
>
>   }
>
> and then after the "LLVM conversion" it is converted as follows:
>
>   omp.parallel {
>
>     %ub3 >
>     omp.do %i = 0 to %ub3 : !llvm.integer {
>
>     ...
>
>     }
>
>   }
>
>
> a. Is it the same omp.do operation which now contains the bounds and
> induction variables of the loop after the LLVM conversion? If so, will the
> same operation have two different semantics during a single compilation?
>
> b. Will there be different lowerings for various loop operations from
> different dialects? loop.for and affine.for under omp operations would need
> different OpenMP / LLVM lowerings. Currently, both of them are lowered to
> the CFG based loops during the LLVM dialect conversion (which is much
> before the proposed OpenMP dialect lowering).
>
> There would be no standard way to represent OpenMP operations (especially
> the ones which involve loops) in MLIR. This would drastically complicate
> lowering.
>
> C. It is also not mentioned how clauses like firstprivate, shared,
> private, reduce, map, etc are lowered to OpenMP dialect. The example in
> the RFC contains FIR and LLVM types and nothing about std dialect types.
> Consider the below example:
>
> #pragma omp parallel for reduction(+:x)
>
> for (int i = 0; i < N; ++i)
>
>   x += a[i];
>
> How would the above be represented in OpenMP dialect? and What type would
> "x" be in MLIR?  It is not mentioned in the design as to how the
various
> SSA values for various OpenMP clauses are passed around in OpenMP
> operations.
>
> D. Because of (A), (B) and (C), it would be beneficial to have an omp.
> parallel_do operation which has semantics similar to other loop
> structures (may not be LoopLikeInterface) in MLIR. To me, it looks like
> having OpenMP operations based on standard MLIR types and operations
> (scalars and memrefs mainly) is the right way to go.
>
> Why not have omp.parallel_do operation with AffineMap based bounds, so as
> to decouple it from Value/Type similar to affine.for?
>
> 1. With the current design, the number of transformations / optimizations
> that one can write on OpenMP constructs would become limited as there can
> be any custom loop structure with custom operations / types inside it.
>
> 2. It would also be easier to transform the Loop nests containing OpenMP
> constructs if the body of the OpenMP operations is well defined (i.e., does
> not accept arbitrary loop structures). Having nested redundant
"parallel" ,
> "target" and "do" regions seems unnecessary.
>
> 3. There would also be new sets of loop structures in new dialects when
> C/C++ is compiled to MLIR. It would complicate the number of possible
> combinations inside the OpenMP region.
>
> E. Lowering of target constructs mentioned in ( 2(d) ) specifies direct
> lowering to LLVM IR ignoring all the advantages that MLIR provides. Being
> able to compile the code for heterogeneous hardware is one of the biggest
> advantages that MLIR brings to the table. That is being completely missed
> here. This also requires solving the problem of handling target information
> in MLIR. But that is a problem which needs to be solved anyway. Using GPU
> dialect also gives us an opportunity to represent offloading semantics in
> MLIR.
>
> Given the ability to represent multiple ModuleOps and the existence of GPU
> dialect, couldn't higher level optimizations on offloaded code be done
at
> MLIR level?. The proposed design would lead us to the same problems that we
> are currently facing in LLVM IR.
>
> Also, OpenMP codegen will automatically benefit from the GPU dialect based
> optimizations. For example, it would be way easier to hoist a memory
> reference out of GPU kernel in MLIR than in LLVM IR.
>
> Thanks,
>
> Vinay
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200215/d03a41e1/attachment-0001.html>

Nicolai Hähnle via llvm-dev

2020-Feb-15 21:06 UTC

head link

[llvm-dev] About OpenMP dialect in MLIR

On Thu, Feb 13, 2020 at 7:19 PM Johannes Doerfert via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> > 3. There would also be new sets of loop structures in new dialects
when
> > C/C++ is compiled to MLIR. It would complicate the number of possible
> > combinations inside the OpenMP region.
>
> Is anyone working on this? If so, what is the timeline? I personally was
> not expecting Clang to switch over to MLIR any time soon but I am happy
> if someone wants to correct me on this. I mention this only because it
> interacts with the arguments I will make below.
I'd point out that from an engineering perspective, instead of
switching everybody from LLVM IR to MLIR, considering the relative
maturity of the projects the smarter move would be to switch folks
from MLIR to LLVM IR, by making LLVM IR itself extensible via
dialects.

It's understandable that MLIR was developed as a completely separate
thing, given that there was no proven example of an extensible IR.
However, now that such an example exists, integrating porting the
extensibility features that have been proven to work into LLVM IR is a
logical next step.

Cheers,
Nicolai
-- 
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Feb 2020 - About OpenMP dialect in MLIR

[llvm-dev] About OpenMP dialect in MLIR

[llvm-dev] About OpenMP dialect in MLIR

[llvm-dev] [flang-dev] About OpenMP dialect in MLIR

[llvm-dev] [flang-dev] About OpenMP dialect in MLIR

[llvm-dev] About OpenMP dialect in MLIR

[llvm-dev] [flang-dev] About OpenMP dialect in MLIR

[llvm-dev] About OpenMP dialect in MLIR

Seemingly Similar Threads