thr3ads.net - llvm dev - [llvm-dev] [RFC] IR-level Region Annotations [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2017-Jan-11 22:02 UTC

[llvm-dev] [RFC] IR-level Region Annotations

A Proposal for adding an experimental IR-level region-annotation 
infrastructure
============================================================================= 

Hal Finkel (ANL) and Xinmin Tian (Intel)

This is a proposal for adding an experimental infrastructure to support
annotating regions in LLVM IR, making use of intrinsics and metadata, and
a generic analysis to allow transformations to easily make use of these
annotated regions. This infrastructure is flexible enough to support
representation of directives for parallelization, vectorization, and
offloading of both loops and more-general code regions. Under this scheme,
the conceptual distance between source-level directives and the region
annotations need not be significant, making the incremental cost of
supporting new directives and modifiers often small. It is not, however,
specific to those use cases.

Problem Statement
================There are a series of discussions on LLVM IR extensions for
representing
region
and loop annotations for parallelism, and other user-guided 
transformations,
among both industrial and academic members of the LLVM community. 
Increasing
the quality of our OpenMP implementation is an important motivating use 
case,
but certainly not the only one. For OpenMP in particular, we've discussed
having an IR representation for years. Presently, all OpenMP pragmas are
transformed directly into runtime-library calls in Clang, and outlining 
(i.e.
extracting parallel regions into their own functions to be invoked by the
runtime library) is done in Clang as well. Our implementation does not 
further
optimize OpenMP constructs, and a lot of thought has been put into how 
we might
improve this. For some optimizations, such as redundant barrier removal, we
could use a TargetLibraryInfo-like mechanism to recognize 
frontend-generated
runtime calls and proceed from there. Dealing with cases where we lose
pointer-aliasing information, information on loop bounds, etc. we could 
improve
by improving our inter-procedural-analysis capabilities. We should do that
regardless. However, there are important cases where the underlying 
scheme we
want to use to lower the various parallelism constructs, especially when
targeting accelerators, changes depending on what is in the parallel 
region.
In important cases where we can see everything (i.e. there aren't arbitrary
external calls), code generation should proceed in a way that is very 
different
from the general case. To have a sensible implementation, this must be done
after inlining. When using LTO, this should be done during the link-time 
phase.
As a result, we must move away from our purely-front-end based lowering 
scheme.
The question is what to do instead, and how to do it in a way that is 
generally
useful to the entire community.

Designs previously discussed can be classified into four categories:

(a) Add a large number of new kinds of LLVM metadata, and use them to 
annotate
     each necessary instruction for parallelism, data attributes, etc.
(b) Add several new LLVM instructions such as, for parallelism, fork, 
spawn,
     join, barrier, etc.
(c) Add a large number of LLVM intrinsics for directives and clauses, each
     intrinsic representing a directive or a clause.
(d) Add a small number of LLVM intrinsics for region or loop annotations,
     represent the directive/clause names using metadata and the remaining
     information using arguments.

Here we're proposing (d), and below is a brief pros and cons analysis 
based on
these discussions and our own experiences of supporting region/loop 
annotations
in LLVM-based compilers. The table below shows a short summary of our 
analysis.

Various commercial compilers (e.g. from Intel, IBM, Cray, PGI), and GCC 
[1,2],
have IR-level representations for parallelism constructs. Based on 
experience
from these previous developments, we'd like a solution for LLVM that 
maximizes
optimization enablement while minimizing the maintenance costs and 
complexity
increase experienced by the community as a whole.

Representing the desired information in the LLVM IR is just the first 
step. The
challenge is to maintain the desired semantics without blocking useful
optimizations. With options (c) and (d), dependencies can be preserved 
mainly
based on the use/def chain of the arguments of each intrinsic, and a 
manageable
set LLVM analysis and transformations can be made aware of certain kinds of
annotations in order to enable specific optimizations. In this regard,
options (c) and (d) are close with respect to maintenance efforts. However,
based on our experiences, option (d) is preferable because it is easier to
extend to support new directives and clauses in the future without the 
need to
add new intrinsics as required by option (c).

Table 1. Pros/cons summary of LLVM IR experimental extension options

--------+----------------------+----------------------------------------------- 

Options |         Pros         | Cons
--------+----------------------+----------------------------------------------- 

(a)     | No need to add new   | LLVM passes do not always maintain 
metadata.
         | instructions or      | Need to educate many passes (if not 
all) to
         | new intrinsics       | understand and handle them.
--------+----------------------+----------------------------------------------- 

(b)     | Parallelism becomes  | Huge effort for extending all LLVM 
passes and
         | first class citizen  | code generation to support new 
instructions.
         |                      | A large set of information still needs 
to be
         |                      | represented using other means.
--------+----------------------+----------------------------------------------- 

(c)     | Less impact on the   | A large number of intrinsics must be 
added.
         | exist LLVM passes.   | Some of the optimizations need to be
         | Fewer requirements   | educated to understand them.
         | for passes to        |
         | maintain metadata.   |
--------+----------------------+----------------------------------------------- 

(d)     | Minimal impact on    | Some of the optimizations need to be
         | existing LLVM        | educated to understand them.
         | optimizations passes.| No requirements for all passes to 
maintain
         | directive and clause | large set of metadata with values.
         | names use metadata   |
         | strings.             |
--------+----------------------+----------------------------------------------- 


Regarding (a), LLVM already uses metadata for certain loop information 
(e.g.
annotations directing loop transformations and assertions about 
loop-carried
dependencies), but there is no natural or consistent way to extend this 
scheme
to represent necessary data-movement or region information.


New Intrinsics for Region and Value Annotations
=============================================The following new (experimental)
intrinsics are proposed which allow:

a) Annotating a code region marked with directives / pragmas,
b) Annotating values associated with the region (or loops), that is, those
    values associated with directives / pragmas.
c) Providing information on LLVM IR transformations needed for the 
annotated
    code regions (or loops).

These can be used both by frontends and also by transformation passes (e.g.
automated parallelization). The names used here are similar to those 
used by
our internal prototype, but obviously we expect a community bikeshed
discussion.

def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty],
                                    [IntrArgMemOnly],
"llvm.experimental.directive">;

def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty],
[IntrArgMemOnly],
"llvm.experimental.dir.qual">;

def int_experimental_dir_qual_opnd : Intrinsic<[],
[llvm_metadata_ty, llvm_any_ty],
[IntrArgMemOnly],
"llvm.experimental.dir.qual.opnd">;

def int_experimental_dir_qual_opndlist : Intrinsic<
                                         [],
[llvm_metadata_ty, llvm_vararg_ty],
[IntrArgMemOnly],
"llvm.experimental.dir.qual.opndlist">;

Note that calls to these intrinsics might need to be annotated with the
convergent attribute when they represent fork/join operations, barriers, 
and
similar.

Usage Examples
=============
This section shows a few examples using these experimental intrinsics.
LLVM developers who will use these intrinsics can defined their own 
MDstring.
All details of using these intrinsics on representing OpenMP 4.5 
constructs are described in [1][3].


Example I: An OpenMP combined construct

#pragma omp target teams distribute parallel for simd
   loop

LLVM IR
-------
call void @llvm.experimental.directive(metadata !0)
call void @llvm.experimental.directive(metadata !1)
call void @llvm.experimental.directive(metadata !2)
call void @llvm.experimental.directive(metadata !3)
   loop
call void @llvm.experimental.directive(metadata !6)
call void @llvm.experimental.directive(metadata !5)
call void @llvm.experimental.directive(metadata !4)

!0 = metadata !{metadata !DIR.OMP.TARGET}
!1 = metadata !{metadata !DIR.OMP.TEAMS}
!2 = metadata !{metadata !DIR.OMP.DISTRIBUTE.PARLOOP.SIMD}

!6 = metadata !{metadata !DIR.OMP.END.DISTRIBUTE.PARLOOP.SIMD}
!5 = metadata !{metadata !DIR.OMP.END.TEAMS}
!4 = metadata !{metadata !DIR.OMP.END.TARGET}

Example II: Assume x,y,z are int variables, and s is a non-POD variable.
             Then, lastprivate(x,y,s,z) is represented as:

LLVM IR
-------
call void @llvm.experimental.dir.qual.opndlist(
                 metadata !1, %x, %y, metadata !2, %a, %ctor, %dtor, %z)

!1 = metadata !{metadata !QUAL.OMP.PRIVATE}
!2 = metadata !{metadata !QUAL.OPND.NONPOD}

Example III: A prefetch pragma example

// issue vprefetch1 for xp with a distance of 20 vectorized iterations 
ahead
// issue vprefetch0 for yp with a distance of 10 vectorized iterations 
ahead
#pragma prefetch x:1:20 y:0:10
for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }

LLVM IR
-------
call void @llvm.experimental.directive(metadata !0)
call void @llvm.experimental.dir.qual.opnslist(metadata !1, %xp, 1, 20,
                                                metadata !1, %yp, 0, 10)
   loop
call void @llvm.experimental.directive(metadata !3)

References
=========
[1] LLVM Framework and IR extensions for Parallelization, SIMD 
Vectorization
     and Offloading Support. SC'2016 LLVM-HPC3 Workshop. (Xinmin Tian 
et.al.)
     Saltlake City, Utah.

[2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer 
loop
     auto-vectorization. (Hideki Saito, et.al.) LLVM Developers' Meeting 
2016,
     San Jose.

[3] Intrinsics, Metadata, and Attributes: The Story continues! (Hal Finkel)
     LLVM Developers' Meeting, 2016. San Jose

[4] LLVM Intrinsic Function and Metadata String Interface for Directive (or
     Pragmas) Representation. Specification Draft v0.9, Intel 
Corporation, 2016.


Acknowledgements
===============We would like to thank Chandler Carruth (Google), Johannes
Doerfert
(Saarland
Univ.), Yaoqing Gao (HuaWei), Michael Wong (Codeplay), Ettore Tiotto,
Carlo Bertolli, Bardia Mahjour (IBM), and all other LLVM-HPC IR 
Extensions WG
members for their constructive feedback on the LLVM framework and IR 
extension
proposal.

Proposed Implementation
======================
Two sets of patches of supporting these experimental intrinsics and 
demonstrate
the usage are ready for community review.

a) Clang patches that support core OpenMP pragmas using this approach.
b) W-Region framework patches: CFG restructuring to form single-entry-
    single-exit work region (W-Region) based on annotations, Demand-driven
    intrinsic parsing, and WRegionInfo collection and analysis passes,
    Dump functions of WRegionInfo.

On top of this functionality, we will provide the transformation patches 
for
core OpenMP constructs (e.g. start with "#pragma omp parallel for"
loop for
lowering and outlining, and "#pragma omp simd" to hook it up with
LoopVectorize.cpp). We have internal implementations for many constructs 
now.
We will break this functionality up to create a series of patches for
community review.

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

David Majnemer via llvm-dev

2017-Jan-11 22:17 UTC

head link

[llvm-dev] [RFC] IR-level Region Annotations

FWIW, we needed to maintain single entry-multiple exit regions for WinEH
and we accomplished it via a different mechanism.

We had an instruction which produces a value of type Token (
http://llvm.org/docs/LangRef.html#token-type) which let us establish the
region and another instruction to exit the region by consuming it. The
dominance rules allowed us to avoid situations where the compiler might
trash the regions in weird ways and made sure that regions would be left
unharmed.

AFAIK, a similar approach using Token could work here. I think it would
reduce the amount of stuff you'd need LLVM to maintain.


On Wed, Jan 11, 2017 at 2:02 PM, Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> A Proposal for adding an experimental IR-level region-annotation
> infrastructure
>
============================================================================>
> Hal Finkel (ANL) and Xinmin Tian (Intel)
>
> This is a proposal for adding an experimental infrastructure to support
> annotating regions in LLVM IR, making use of intrinsics and metadata, and
> a generic analysis to allow transformations to easily make use of these
> annotated regions. This infrastructure is flexible enough to support
> representation of directives for parallelization, vectorization, and
> offloading of both loops and more-general code regions. Under this scheme,
> the conceptual distance between source-level directives and the region
> annotations need not be significant, making the incremental cost of
> supporting new directives and modifiers often small. It is not, however,
> specific to those use cases.
>
> Problem Statement
> ================> There are a series of discussions on LLVM IR
extensions for representing
> region
> and loop annotations for parallelism, and other user-guided
> transformations,
> among both industrial and academic members of the LLVM community.
> Increasing
> the quality of our OpenMP implementation is an important motivating use
> case,
> but certainly not the only one. For OpenMP in particular, we've
discussed
> having an IR representation for years. Presently, all OpenMP pragmas are
> transformed directly into runtime-library calls in Clang, and outlining
> (i.e.
> extracting parallel regions into their own functions to be invoked by the
> runtime library) is done in Clang as well. Our implementation does not
> further
> optimize OpenMP constructs, and a lot of thought has been put into how we
> might
> improve this. For some optimizations, such as redundant barrier removal, we
> could use a TargetLibraryInfo-like mechanism to recognize
> frontend-generated
> runtime calls and proceed from there. Dealing with cases where we lose
> pointer-aliasing information, information on loop bounds, etc. we could
> improve
> by improving our inter-procedural-analysis capabilities. We should do that
> regardless. However, there are important cases where the underlying scheme
> we
> want to use to lower the various parallelism constructs, especially when
> targeting accelerators, changes depending on what is in the parallel
> region.
> In important cases where we can see everything (i.e. there aren't
arbitrary
> external calls), code generation should proceed in a way that is very
> different
> from the general case. To have a sensible implementation, this must be done
> after inlining. When using LTO, this should be done during the link-time
> phase.
> As a result, we must move away from our purely-front-end based lowering
> scheme.
> The question is what to do instead, and how to do it in a way that is
> generally
> useful to the entire community.
>
> Designs previously discussed can be classified into four categories:
>
> (a) Add a large number of new kinds of LLVM metadata, and use them to
> annotate
>     each necessary instruction for parallelism, data attributes, etc.
> (b) Add several new LLVM instructions such as, for parallelism, fork,
> spawn,
>     join, barrier, etc.
> (c) Add a large number of LLVM intrinsics for directives and clauses, each
>     intrinsic representing a directive or a clause.
> (d) Add a small number of LLVM intrinsics for region or loop annotations,
>     represent the directive/clause names using metadata and the remaining
>     information using arguments.
>
> Here we're proposing (d), and below is a brief pros and cons analysis
> based on
> these discussions and our own experiences of supporting region/loop
> annotations
> in LLVM-based compilers. The table below shows a short summary of our
> analysis.
>
> Various commercial compilers (e.g. from Intel, IBM, Cray, PGI), and GCC
> [1,2],
> have IR-level representations for parallelism constructs. Based on
> experience
> from these previous developments, we'd like a solution for LLVM that
> maximizes
> optimization enablement while minimizing the maintenance costs and
> complexity
> increase experienced by the community as a whole.
>
> Representing the desired information in the LLVM IR is just the first
> step. The
> challenge is to maintain the desired semantics without blocking useful
> optimizations. With options (c) and (d), dependencies can be preserved
> mainly
> based on the use/def chain of the arguments of each intrinsic, and a
> manageable
> set LLVM analysis and transformations can be made aware of certain kinds of
> annotations in order to enable specific optimizations. In this regard,
> options (c) and (d) are close with respect to maintenance efforts. However,
> based on our experiences, option (d) is preferable because it is easier to
> extend to support new directives and clauses in the future without the
> need to
> add new intrinsics as required by option (c).
>
> Table 1. Pros/cons summary of LLVM IR experimental extension options
>
>
--------+----------------------+-----------------------------------------------
>
> Options |         Pros         | Cons
>
--------+----------------------+-----------------------------------------------
>
> (a)     | No need to add new   | LLVM passes do not always maintain
> metadata.
>         | instructions or      | Need to educate many passes (if not all)
> to
>         | new intrinsics       | understand and handle them.
>
--------+----------------------+-----------------------------------------------
>
> (b)     | Parallelism becomes  | Huge effort for extending all LLVM passes
> and
>         | first class citizen  | code generation to support new
> instructions.
>         |                      | A large set of information still needs to
> be
>         |                      | represented using other means.
>
--------+----------------------+-----------------------------------------------
>
> (c)     | Less impact on the   | A large number of intrinsics must be
> added.
>         | exist LLVM passes.   | Some of the optimizations need to be
>         | Fewer requirements   | educated to understand them.
>         | for passes to        |
>         | maintain metadata.   |
>
--------+----------------------+-----------------------------------------------
>
> (d)     | Minimal impact on    | Some of the optimizations need to be
>         | existing LLVM        | educated to understand them.
>         | optimizations passes.| No requirements for all passes to maintain
>         | directive and clause | large set of metadata with values.
>         | names use metadata   |
>         | strings.             |
>
--------+----------------------+-----------------------------------------------
>
>
> Regarding (a), LLVM already uses metadata for certain loop information
> (e.g.
> annotations directing loop transformations and assertions about
> loop-carried
> dependencies), but there is no natural or consistent way to extend this
> scheme
> to represent necessary data-movement or region information.
>
>
> New Intrinsics for Region and Value Annotations
> =============================================> The following new
(experimental) intrinsics are proposed which allow:
>
> a) Annotating a code region marked with directives / pragmas,
> b) Annotating values associated with the region (or loops), that is, those
>    values associated with directives / pragmas.
> c) Providing information on LLVM IR transformations needed for the
> annotated
>    code regions (or loops).
>
> These can be used both by frontends and also by transformation passes (e.g.
> automated parallelization). The names used here are similar to those used
> by
> our internal prototype, but obviously we expect a community bikeshed
> discussion.
>
> def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty],
>                                    [IntrArgMemOnly],
> "llvm.experimental.directive">;
>
> def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual">;
>
> def int_experimental_dir_qual_opnd : Intrinsic<[],
> [llvm_metadata_ty, llvm_any_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual.opnd">;
>
> def int_experimental_dir_qual_opndlist : Intrinsic<
>                                         [],
> [llvm_metadata_ty, llvm_vararg_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual.opndlist">;
>
> Note that calls to these intrinsics might need to be annotated with the
> convergent attribute when they represent fork/join operations, barriers,
> and
> similar.
>
> Usage Examples
> =============>
> This section shows a few examples using these experimental intrinsics.
> LLVM developers who will use these intrinsics can defined their own
> MDstring.
> All details of using these intrinsics on representing OpenMP 4.5
> constructs are described in [1][3].
>
>
> Example I: An OpenMP combined construct
>
> #pragma omp target teams distribute parallel for simd
>   loop
>
> LLVM IR
> -------
> call void @llvm.experimental.directive(metadata !0)
> call void @llvm.experimental.directive(metadata !1)
> call void @llvm.experimental.directive(metadata !2)
> call void @llvm.experimental.directive(metadata !3)
>   loop
> call void @llvm.experimental.directive(metadata !6)
> call void @llvm.experimental.directive(metadata !5)
> call void @llvm.experimental.directive(metadata !4)
>
> !0 = metadata !{metadata !DIR.OMP.TARGET}
> !1 = metadata !{metadata !DIR.OMP.TEAMS}
> !2 = metadata !{metadata !DIR.OMP.DISTRIBUTE.PARLOOP.SIMD}
>
> !6 = metadata !{metadata !DIR.OMP.END.DISTRIBUTE.PARLOOP.SIMD}
> !5 = metadata !{metadata !DIR.OMP.END.TEAMS}
> !4 = metadata !{metadata !DIR.OMP.END.TARGET}
>
> Example II: Assume x,y,z are int variables, and s is a non-POD variable.
>             Then, lastprivate(x,y,s,z) is represented as:
>
> LLVM IR
> -------
> call void @llvm.experimental.dir.qual.opndlist(
>                 metadata !1, %x, %y, metadata !2, %a, %ctor, %dtor, %z)
>
> !1 = metadata !{metadata !QUAL.OMP.PRIVATE}
> !2 = metadata !{metadata !QUAL.OPND.NONPOD}
>
> Example III: A prefetch pragma example
>
> // issue vprefetch1 for xp with a distance of 20 vectorized iterations
> ahead
> // issue vprefetch0 for yp with a distance of 10 vectorized iterations
> ahead
> #pragma prefetch x:1:20 y:0:10
> for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
>
> LLVM IR
> -------
> call void @llvm.experimental.directive(metadata !0)
> call void @llvm.experimental.dir.qual.opnslist(metadata !1, %xp, 1, 20,
>                                                metadata !1, %yp, 0, 10)
>   loop
> call void @llvm.experimental.directive(metadata !3)
>
> References
> =========>
> [1] LLVM Framework and IR extensions for Parallelization, SIMD
> Vectorization
>     and Offloading Support. SC'2016 LLVM-HPC3 Workshop. (Xinmin Tian
et.al
> .)
>     Saltlake City, Utah.
>
> [2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer
> loop
>     auto-vectorization. (Hideki Saito, et.al.) LLVM Developers' Meeting
> 2016,
>     San Jose.
>
> [3] Intrinsics, Metadata, and Attributes: The Story continues! (Hal Finkel)
>     LLVM Developers' Meeting, 2016. San Jose
>
> [4] LLVM Intrinsic Function and Metadata String Interface for Directive (or
>     Pragmas) Representation. Specification Draft v0.9, Intel Corporation,
> 2016.
>
>
> Acknowledgements
> ===============> We would like to thank Chandler Carruth (Google),
Johannes Doerfert
> (Saarland
> Univ.), Yaoqing Gao (HuaWei), Michael Wong (Codeplay), Ettore Tiotto,
> Carlo Bertolli, Bardia Mahjour (IBM), and all other LLVM-HPC IR Extensions
> WG
> members for their constructive feedback on the LLVM framework and IR
> extension
> proposal.
>
> Proposed Implementation
> ======================>
> Two sets of patches of supporting these experimental intrinsics and
> demonstrate
> the usage are ready for community review.
>
> a) Clang patches that support core OpenMP pragmas using this approach.
> b) W-Region framework patches: CFG restructuring to form single-entry-
>    single-exit work region (W-Region) based on annotations, Demand-driven
>    intrinsic parsing, and WRegionInfo collection and analysis passes,
>    Dump functions of WRegionInfo.
>
> On top of this functionality, we will provide the transformation patches
> for
> core OpenMP constructs (e.g. start with "#pragma omp parallel
for" loop for
> lowering and outlining, and "#pragma omp simd" to hook it up with
> LoopVectorize.cpp). We have internal implementations for many constructs
> now.
> We will break this functionality up to create a series of patches for
> community review.
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170111/80c0773c/attachment.html>

Tian, Xinmin via llvm-dev

2017-Jan-11 22:53 UTC

head link

[llvm-dev] [RFC] IR-level Region Annotations

David, one quick question, is there a way to preserve and associate a set of
“properties, value info/attr ” to the given region using Token?

Thanks,
Xinmin

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of David
Majnemer via llvm-dev
Sent: Wednesday, January 11, 2017 2:18 PM
To: Hal Finkel <hfinkel at anl.gov>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] [RFC] IR-level Region Annotations

FWIW, we needed to maintain single entry-multiple exit regions for WinEH and we
accomplished it via a different mechanism.

We had an instruction which produces a value of type Token
(http://llvm.org/docs/LangRef.html#token-type) which let us establish the region
and another instruction to exit the region by consuming it. The dominance rules
allowed us to avoid situations where the compiler might trash the regions in
weird ways and made sure that regions would be left unharmed.

AFAIK, a similar approach using Token could work here. I think it would reduce
the amount of stuff you'd need LLVM to maintain.

On Wed, Jan 11, 2017 at 2:02 PM, Hal Finkel via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
A Proposal for adding an experimental IR-level region-annotation infrastructure
============================================================================Hal
Finkel (ANL) and Xinmin Tian (Intel)

This is a proposal for adding an experimental infrastructure to support
annotating regions in LLVM IR, making use of intrinsics and metadata, and
a generic analysis to allow transformations to easily make use of these
annotated regions. This infrastructure is flexible enough to support
representation of directives for parallelization, vectorization, and
offloading of both loops and more-general code regions. Under this scheme,
the conceptual distance between source-level directives and the region
annotations need not be significant, making the incremental cost of
supporting new directives and modifiers often small. It is not, however,
specific to those use cases.

Problem Statement
================There are a series of discussions on LLVM IR extensions for
representing region
and loop annotations for parallelism, and other user-guided transformations,
among both industrial and academic members of the LLVM community. Increasing
the quality of our OpenMP implementation is an important motivating use case,
but certainly not the only one. For OpenMP in particular, we've discussed
having an IR representation for years. Presently, all OpenMP pragmas are
transformed directly into runtime-library calls in Clang, and outlining (i.e.
extracting parallel regions into their own functions to be invoked by the
runtime library) is done in Clang as well. Our implementation does not further
optimize OpenMP constructs, and a lot of thought has been put into how we might
improve this. For some optimizations, such as redundant barrier removal, we
could use a TargetLibraryInfo-like mechanism to recognize frontend-generated
runtime calls and proceed from there. Dealing with cases where we lose
pointer-aliasing information, information on loop bounds, etc. we could improve
by improving our inter-procedural-analysis capabilities. We should do that
regardless. However, there are important cases where the underlying scheme we
want to use to lower the various parallelism constructs, especially when
targeting accelerators, changes depending on what is in the parallel region.
In important cases where we can see everything (i.e. there aren't arbitrary
external calls), code generation should proceed in a way that is very different
from the general case. To have a sensible implementation, this must be done
after inlining. When using LTO, this should be done during the link-time phase.
As a result, we must move away from our purely-front-end based lowering scheme.
The question is what to do instead, and how to do it in a way that is generally
useful to the entire community.

Designs previously discussed can be classified into four categories:

(a) Add a large number of new kinds of LLVM metadata, and use them to annotate
    each necessary instruction for parallelism, data attributes, etc.
(b) Add several new LLVM instructions such as, for parallelism, fork, spawn,
    join, barrier, etc.
(c) Add a large number of LLVM intrinsics for directives and clauses, each
    intrinsic representing a directive or a clause.
(d) Add a small number of LLVM intrinsics for region or loop annotations,
    represent the directive/clause names using metadata and the remaining
    information using arguments.

Here we're proposing (d), and below is a brief pros and cons analysis based
on
these discussions and our own experiences of supporting region/loop annotations
in LLVM-based compilers. The table below shows a short summary of our analysis.

Various commercial compilers (e.g. from Intel, IBM, Cray, PGI), and GCC [1,2],
have IR-level representations for parallelism constructs. Based on experience
from these previous developments, we'd like a solution for LLVM that
maximizes
optimization enablement while minimizing the maintenance costs and complexity
increase experienced by the community as a whole.

Representing the desired information in the LLVM IR is just the first step. The
challenge is to maintain the desired semantics without blocking useful
optimizations. With options (c) and (d), dependencies can be preserved mainly
based on the use/def chain of the arguments of each intrinsic, and a manageable
set LLVM analysis and transformations can be made aware of certain kinds of
annotations in order to enable specific optimizations. In this regard,
options (c) and (d) are close with respect to maintenance efforts. However,
based on our experiences, option (d) is preferable because it is easier to
extend to support new directives and clauses in the future without the need to
add new intrinsics as required by option (c).

Table 1. Pros/cons summary of LLVM IR experimental extension options

--------+----------------------+-----------------------------------------------
Options |         Pros         | Cons
--------+----------------------+-----------------------------------------------
(a)     | No need to add new   | LLVM passes do not always maintain metadata.
        | instructions or      | Need to educate many passes (if not all) to
        | new intrinsics       | understand and handle them.
--------+----------------------+-----------------------------------------------
(b)     | Parallelism becomes  | Huge effort for extending all LLVM passes and
        | first class citizen  | code generation to support new instructions.
        |                      | A large set of information still needs to be
        |                      | represented using other means.
--------+----------------------+-----------------------------------------------
(c)     | Less impact on the   | A large number of intrinsics must be added.
        | exist LLVM passes.   | Some of the optimizations need to be
        | Fewer requirements   | educated to understand them.
        | for passes to        |
        | maintain metadata.   |
--------+----------------------+-----------------------------------------------
(d)     | Minimal impact on    | Some of the optimizations need to be
        | existing LLVM        | educated to understand them.
        | optimizations passes.| No requirements for all passes to maintain
        | directive and clause | large set of metadata with values.
        | names use metadata   |
        | strings.             |
--------+----------------------+-----------------------------------------------

Regarding (a), LLVM already uses metadata for certain loop information (e.g.
annotations directing loop transformations and assertions about loop-carried
dependencies), but there is no natural or consistent way to extend this scheme
to represent necessary data-movement or region information.

New Intrinsics for Region and Value Annotations
=============================================The following new (experimental)
intrinsics are proposed which allow:

a) Annotating a code region marked with directives / pragmas,
b) Annotating values associated with the region (or loops), that is, those
   values associated with directives / pragmas.
c) Providing information on LLVM IR transformations needed for the annotated
   code regions (or loops).

These can be used both by frontends and also by transformation passes (e.g.
automated parallelization). The names used here are similar to those used by
our internal prototype, but obviously we expect a community bikeshed
discussion.

def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty],
                                   [IntrArgMemOnly],
"llvm.experimental.directive">;

def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty],
[IntrArgMemOnly],
"llvm.experimental.dir.qual">;

def int_experimental_dir_qual_opnd : Intrinsic<[],
[llvm_metadata_ty, llvm_any_ty],
[IntrArgMemOnly],
"llvm.experimental.dir.qual.opnd">;

def int_experimental_dir_qual_opndlist : Intrinsic<
                                        [],
[llvm_metadata_ty, llvm_vararg_ty],
[IntrArgMemOnly],
"llvm.experimental.dir.qual.opndlist">;

Note that calls to these intrinsics might need to be annotated with the
convergent attribute when they represent fork/join operations, barriers, and
similar.

Usage Examples
=============
This section shows a few examples using these experimental intrinsics.
LLVM developers who will use these intrinsics can defined their own MDstring.
All details of using these intrinsics on representing OpenMP 4.5 constructs are
described in [1][3].

Example I: An OpenMP combined construct

#pragma omp target teams distribute parallel for simd
  loop

LLVM IR
-------
call void @llvm.experimental.directive(metadata !0)
call void @llvm.experimental.directive(metadata !1)
call void @llvm.experimental.directive(metadata !2)
call void @llvm.experimental.directive(metadata !3)
  loop
call void @llvm.experimental.directive(metadata !6)
call void @llvm.experimental.directive(metadata !5)
call void @llvm.experimental.directive(metadata !4)

!0 = metadata !{metadata !DIR.OMP.TARGET}
!1 = metadata !{metadata !DIR.OMP.TEAMS}
!2 = metadata !{metadata
!DIR.OMP.DISTRIBUTE.PARLOOP.SI<http://DIR.OMP.DISTRIBUTE.PARLOOP.SI>MD}

!6 = metadata !{metadata !DIR.OMP.END.DISTRIBUTE.PARLOOP.SIMD}
!5 = metadata !{metadata !DIR.OMP.END.TEAMS}
!4 = metadata !{metadata !DIR.OMP.END.TARGET}

Example II: Assume x,y,z are int variables, and s is a non-POD variable.
            Then, lastprivate(x,y,s,z) is represented as:

LLVM IR
-------
call void @llvm.experimental.dir.qual.opndlist(
                metadata !1, %x, %y, metadata !2, %a, %ctor, %dtor, %z)

!1 = metadata !{metadata !QUAL.OMP.PRIVATE}
!2 = metadata !{metadata !QUAL.OPND.NONPOD}

Example III: A prefetch pragma example

// issue vprefetch1 for xp with a distance of 20 vectorized iterations ahead
// issue vprefetch0 for yp with a distance of 10 vectorized iterations ahead
#pragma prefetch x:1:20 y:0:10
for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }

LLVM IR
-------
call void @llvm.experimental.directive(metadata !0)
call void @llvm.experimental.dir.qual.opnslist(metadata !1, %xp, 1, 20,
                                               metadata !1, %yp, 0, 10)
  loop
call void @llvm.experimental.directive(metadata !3)

References
=========
[1] LLVM Framework and IR extensions for Parallelization, SIMD Vectorization
    and Offloading Support. SC'2016 LLVM-HPC3 Workshop. (Xinmin Tian
et.al<http://et.al>.)
    Saltlake City, Utah.

[2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer loop
    auto-vectorization. (Hideki Saito, et.al<http://et.al>.) LLVM
Developers' Meeting 2016,
    San Jose.

[3] Intrinsics, Metadata, and Attributes: The Story continues! (Hal Finkel)
    LLVM Developers' Meeting, 2016. San Jose

[4] LLVM Intrinsic Function and Metadata String Interface for Directive (or
    Pragmas) Representation. Specification Draft v0.9, Intel Corporation, 2016.

Acknowledgements
===============We would like to thank Chandler Carruth (Google), Johannes
Doerfert (Saarland
Univ.), Yaoqing Gao (HuaWei), Michael Wong (Codeplay), Ettore Tiotto,
Carlo Bertolli, Bardia Mahjour (IBM), and all other LLVM-HPC IR Extensions WG
members for their constructive feedback on the LLVM framework and IR extension
proposal.

Proposed Implementation
======================
Two sets of patches of supporting these experimental intrinsics and demonstrate
the usage are ready for community review.

a) Clang patches that support core OpenMP pragmas using this approach.
b) W-Region framework patches: CFG restructuring to form single-entry-
   single-exit work region (W-Region) based on annotations, Demand-driven
   intrinsic parsing, and WRegionInfo collection and analysis passes,
   Dump functions of WRegionInfo.

On top of this functionality, we will provide the transformation patches for
core OpenMP constructs (e.g. start with "#pragma omp parallel for"
loop for
lowering and outlining, and "#pragma omp simd" to hook it up with
LoopVectorize.cpp). We have internal implementations for many constructs now.
We will break this functionality up to create a series of patches for
community review.

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170111/c6ca3d4a/attachment-0001.html>

Reid Kleckner via llvm-dev

2017-Jan-11 23:51 UTC

head link

[llvm-dev] [RFC] IR-level Region Annotations

+1, tokens are the current True Way to create single-entry multi-exit
regions. Your example for an annotated loop would look like:

%region = call token @llvm.openmp.regionstart(metadata ...) ; whatever
parameters you need here
  loop
call void @llvm.openmp.regionend(token %region)

If you use tokens, I would recommend proposal (c), where you introduce new
intrinsics for every new kind of region, instead of adding one overly
generic set of region intrinsics.

We already have a way to form regions with real barriers, and it's tokens.

On Wed, Jan 11, 2017 at 2:17 PM, David Majnemer via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> FWIW, we needed to maintain single entry-multiple exit regions for WinEH
> and we accomplished it via a different mechanism.
>
> We had an instruction which produces a value of type Token (
> http://llvm.org/docs/LangRef.html#token-type) which let us establish the
> region and another instruction to exit the region by consuming it. The
> dominance rules allowed us to avoid situations where the compiler might
> trash the regions in weird ways and made sure that regions would be left
> unharmed.
>
> AFAIK, a similar approach using Token could work here. I think it would
> reduce the amount of stuff you'd need LLVM to maintain.
>
>
> On Wed, Jan 11, 2017 at 2:02 PM, Hal Finkel via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> A Proposal for adding an experimental IR-level region-annotation
>> infrastructure
>>
============================================================================>>
>> Hal Finkel (ANL) and Xinmin Tian (Intel)
>>
>> This is a proposal for adding an experimental infrastructure to support
>> annotating regions in LLVM IR, making use of intrinsics and metadata,
and
>> a generic analysis to allow transformations to easily make use of these
>> annotated regions. This infrastructure is flexible enough to support
>> representation of directives for parallelization, vectorization, and
>> offloading of both loops and more-general code regions. Under this
scheme,
>> the conceptual distance between source-level directives and the region
>> annotations need not be significant, making the incremental cost of
>> supporting new directives and modifiers often small. It is not,
however,
>> specific to those use cases.
>>
>> Problem Statement
>> ================>> There are a series of discussions on LLVM IR
extensions for representing
>> region
>> and loop annotations for parallelism, and other user-guided
>> transformations,
>> among both industrial and academic members of the LLVM community.
>> Increasing
>> the quality of our OpenMP implementation is an important motivating use
>> case,
>> but certainly not the only one. For OpenMP in particular, we've
discussed
>> having an IR representation for years. Presently, all OpenMP pragmas
are
>> transformed directly into runtime-library calls in Clang, and outlining
>> (i.e.
>> extracting parallel regions into their own functions to be invoked by
the
>> runtime library) is done in Clang as well. Our implementation does not
>> further
>> optimize OpenMP constructs, and a lot of thought has been put into how
we
>> might
>> improve this. For some optimizations, such as redundant barrier
removal,
>> we
>> could use a TargetLibraryInfo-like mechanism to recognize
>> frontend-generated
>> runtime calls and proceed from there. Dealing with cases where we lose
>> pointer-aliasing information, information on loop bounds, etc. we could
>> improve
>> by improving our inter-procedural-analysis capabilities. We should do
that
>> regardless. However, there are important cases where the underlying
>> scheme we
>> want to use to lower the various parallelism constructs, especially
when
>> targeting accelerators, changes depending on what is in the parallel
>> region.
>> In important cases where we can see everything (i.e. there aren't
>> arbitrary
>> external calls), code generation should proceed in a way that is very
>> different
>> from the general case. To have a sensible implementation, this must be
>> done
>> after inlining. When using LTO, this should be done during the
link-time
>> phase.
>> As a result, we must move away from our purely-front-end based lowering
>> scheme.
>> The question is what to do instead, and how to do it in a way that is
>> generally
>> useful to the entire community.
>>
>> Designs previously discussed can be classified into four categories:
>>
>> (a) Add a large number of new kinds of LLVM metadata, and use them to
>> annotate
>>     each necessary instruction for parallelism, data attributes, etc.
>> (b) Add several new LLVM instructions such as, for parallelism, fork,
>> spawn,
>>     join, barrier, etc.
>> (c) Add a large number of LLVM intrinsics for directives and clauses,
each
>>     intrinsic representing a directive or a clause.
>> (d) Add a small number of LLVM intrinsics for region or loop
annotations,
>>     represent the directive/clause names using metadata and the
remaining
>>     information using arguments.
>>
>> Here we're proposing (d), and below is a brief pros and cons
analysis
>> based on
>> these discussions and our own experiences of supporting region/loop
>> annotations
>> in LLVM-based compilers. The table below shows a short summary of our
>> analysis.
>>
>> Various commercial compilers (e.g. from Intel, IBM, Cray, PGI), and GCC
>> [1,2],
>> have IR-level representations for parallelism constructs. Based on
>> experience
>> from these previous developments, we'd like a solution for LLVM
that
>> maximizes
>> optimization enablement while minimizing the maintenance costs and
>> complexity
>> increase experienced by the community as a whole.
>>
>> Representing the desired information in the LLVM IR is just the first
>> step. The
>> challenge is to maintain the desired semantics without blocking useful
>> optimizations. With options (c) and (d), dependencies can be preserved
>> mainly
>> based on the use/def chain of the arguments of each intrinsic, and a
>> manageable
>> set LLVM analysis and transformations can be made aware of certain
kinds
>> of
>> annotations in order to enable specific optimizations. In this regard,
>> options (c) and (d) are close with respect to maintenance efforts.
>> However,
>> based on our experiences, option (d) is preferable because it is easier
to
>> extend to support new directives and clauses in the future without the
>> need to
>> add new intrinsics as required by option (c).
>>
>> Table 1. Pros/cons summary of LLVM IR experimental extension options
>>
>>
--------+----------------------+-----------------------------------------------
>>
>> Options |         Pros         | Cons
>>
--------+----------------------+-----------------------------------------------
>>
>> (a)     | No need to add new   | LLVM passes do not always maintain
>> metadata.
>>         | instructions or      | Need to educate many passes (if not
all)
>> to
>>         | new intrinsics       | understand and handle them.
>>
--------+----------------------+-----------------------------------------------
>>
>> (b)     | Parallelism becomes  | Huge effort for extending all LLVM
>> passes and
>>         | first class citizen  | code generation to support new
>> instructions.
>>         |                      | A large set of information still needs
>> to be
>>         |                      | represented using other means.
>>
--------+----------------------+-----------------------------------------------
>>
>> (c)     | Less impact on the   | A large number of intrinsics must be
>> added.
>>         | exist LLVM passes.   | Some of the optimizations need to be
>>         | Fewer requirements   | educated to understand them.
>>         | for passes to        |
>>         | maintain metadata.   |
>>
--------+----------------------+-----------------------------------------------
>>
>> (d)     | Minimal impact on    | Some of the optimizations need to be
>>         | existing LLVM        | educated to understand them.
>>         | optimizations passes.| No requirements for all passes to
>> maintain
>>         | directive and clause | large set of metadata with values.
>>         | names use metadata   |
>>         | strings.             |
>>
--------+----------------------+-----------------------------------------------
>>
>>
>> Regarding (a), LLVM already uses metadata for certain loop information
>> (e.g.
>> annotations directing loop transformations and assertions about
>> loop-carried
>> dependencies), but there is no natural or consistent way to extend this
>> scheme
>> to represent necessary data-movement or region information.
>>
>>
>> New Intrinsics for Region and Value Annotations
>> =============================================>> The following new
(experimental) intrinsics are proposed which allow:
>>
>> a) Annotating a code region marked with directives / pragmas,
>> b) Annotating values associated with the region (or loops), that is,
those
>>    values associated with directives / pragmas.
>> c) Providing information on LLVM IR transformations needed for the
>> annotated
>>    code regions (or loops).
>>
>> These can be used both by frontends and also by transformation passes
>> (e.g.
>> automated parallelization). The names used here are similar to those
used
>> by
>> our internal prototype, but obviously we expect a community bikeshed
>> discussion.
>>
>> def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty],
>>                                    [IntrArgMemOnly],
>> "llvm.experimental.directive">;
>>
>> def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty],
>> [IntrArgMemOnly],
>> "llvm.experimental.dir.qual">;
>>
>> def int_experimental_dir_qual_opnd : Intrinsic<[],
>> [llvm_metadata_ty, llvm_any_ty],
>> [IntrArgMemOnly],
>> "llvm.experimental.dir.qual.opnd">;
>>
>> def int_experimental_dir_qual_opndlist : Intrinsic<
>>                                         [],
>> [llvm_metadata_ty, llvm_vararg_ty],
>> [IntrArgMemOnly],
>> "llvm.experimental.dir.qual.opndlist">;
>>
>> Note that calls to these intrinsics might need to be annotated with the
>> convergent attribute when they represent fork/join operations,
barriers,
>> and
>> similar.
>>
>> Usage Examples
>> =============>>
>> This section shows a few examples using these experimental intrinsics.
>> LLVM developers who will use these intrinsics can defined their own
>> MDstring.
>> All details of using these intrinsics on representing OpenMP 4.5
>> constructs are described in [1][3].
>>
>>
>> Example I: An OpenMP combined construct
>>
>> #pragma omp target teams distribute parallel for simd
>>   loop
>>
>> LLVM IR
>> -------
>> call void @llvm.experimental.directive(metadata !0)
>> call void @llvm.experimental.directive(metadata !1)
>> call void @llvm.experimental.directive(metadata !2)
>> call void @llvm.experimental.directive(metadata !3)
>>   loop
>> call void @llvm.experimental.directive(metadata !6)
>> call void @llvm.experimental.directive(metadata !5)
>> call void @llvm.experimental.directive(metadata !4)
>>
>> !0 = metadata !{metadata !DIR.OMP.TARGET}
>> !1 = metadata !{metadata !DIR.OMP.TEAMS}
>> !2 = metadata !{metadata !DIR.OMP.DISTRIBUTE.PARLOOP.SIMD}
>>
>> !6 = metadata !{metadata !DIR.OMP.END.DISTRIBUTE.PARLOOP.SIMD}
>> !5 = metadata !{metadata !DIR.OMP.END.TEAMS}
>> !4 = metadata !{metadata !DIR.OMP.END.TARGET}
>>
>> Example II: Assume x,y,z are int variables, and s is a non-POD
variable.
>>             Then, lastprivate(x,y,s,z) is represented as:
>>
>> LLVM IR
>> -------
>> call void @llvm.experimental.dir.qual.opndlist(
>>                 metadata !1, %x, %y, metadata !2, %a, %ctor, %dtor, %z)
>>
>> !1 = metadata !{metadata !QUAL.OMP.PRIVATE}
>> !2 = metadata !{metadata !QUAL.OPND.NONPOD}
>>
>> Example III: A prefetch pragma example
>>
>> // issue vprefetch1 for xp with a distance of 20 vectorized iterations
>> ahead
>> // issue vprefetch0 for yp with a distance of 10 vectorized iterations
>> ahead
>> #pragma prefetch x:1:20 y:0:10
>> for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
>>
>> LLVM IR
>> -------
>> call void @llvm.experimental.directive(metadata !0)
>> call void @llvm.experimental.dir.qual.opnslist(metadata !1, %xp, 1, 20,
>>                                                metadata !1, %yp, 0, 10)
>>   loop
>> call void @llvm.experimental.directive(metadata !3)
>>
>> References
>> =========>>
>> [1] LLVM Framework and IR extensions for Parallelization, SIMD
>> Vectorization
>>     and Offloading Support. SC'2016 LLVM-HPC3 Workshop. (Xinmin
Tian
>> et.al.)
>>     Saltlake City, Utah.
>>
>> [2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and
outer
>> loop
>>     auto-vectorization. (Hideki Saito, et.al.) LLVM Developers'
Meeting
>> 2016,
>>     San Jose.
>>
>> [3] Intrinsics, Metadata, and Attributes: The Story continues! (Hal
>> Finkel)
>>     LLVM Developers' Meeting, 2016. San Jose
>>
>> [4] LLVM Intrinsic Function and Metadata String Interface for Directive
>> (or
>>     Pragmas) Representation. Specification Draft v0.9, Intel
Corporation,
>> 2016.
>>
>>
>> Acknowledgements
>> ===============>> We would like to thank Chandler Carruth
(Google), Johannes Doerfert
>> (Saarland
>> Univ.), Yaoqing Gao (HuaWei), Michael Wong (Codeplay), Ettore Tiotto,
>> Carlo Bertolli, Bardia Mahjour (IBM), and all other LLVM-HPC IR
>> Extensions WG
>> members for their constructive feedback on the LLVM framework and IR
>> extension
>> proposal.
>>
>> Proposed Implementation
>> ======================>>
>> Two sets of patches of supporting these experimental intrinsics and
>> demonstrate
>> the usage are ready for community review.
>>
>> a) Clang patches that support core OpenMP pragmas using this approach.
>> b) W-Region framework patches: CFG restructuring to form single-entry-
>>    single-exit work region (W-Region) based on annotations,
Demand-driven
>>    intrinsic parsing, and WRegionInfo collection and analysis passes,
>>    Dump functions of WRegionInfo.
>>
>> On top of this functionality, we will provide the transformation
patches
>> for
>> core OpenMP constructs (e.g. start with "#pragma omp parallel
for" loop
>> for
>> lowering and outlining, and "#pragma omp simd" to hook it up
with
>> LoopVectorize.cpp). We have internal implementations for many
constructs
>> now.
>> We will break this functionality up to create a series of patches for
>> community review.
>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170111/7e571dd9/attachment-0001.html>

Daniel Berlin via llvm-dev

2017-Jan-12 01:15 UTC

head link

[llvm-dev] [RFC] IR-level Region Annotations

>
> def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty],
>                                    [IntrArgMemOnly],
> "llvm.experimental.directive">;
>
> def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual">;
>
> def int_experimental_dir_qual_opnd : Intrinsic<[],
> [llvm_metadata_ty, llvm_any_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual.opnd">;
>
> def int_experimental_dir_qual_opndlist : Intrinsic<
>                                         [],
> [llvm_metadata_ty, llvm_vararg_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual.opndlist">;
>
>I'll bite.

What does argmemonly mean when the operands are metadata/?
:)

If the rest is an attempt to keep the intrinsic from being floated or
removed, i'm strongly against extending a way we already know to have
significant effect on optimization (fake memory dependence) to do this.
Particularly for something so major.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170111/22c486f2/attachment.html>

Tian, Xinmin via llvm-dev

2017-Jan-13 06:59 UTC

head link

[llvm-dev] [RFC] IR-level Region Annotations

Thank you all David, Hongbin, Reid, Mehdi, Daniel, Vikram for your review and
constructive feedback for this RFC. We will update our Clang FE patch to use
Token and Tags suggested by David, Hongbin, et.al. instead of using metadata and
function arguments for IR-annotation intrinsic function calls to see how it goes
to preserve all necessary information for our LLVM middle-end / back-end
transformation.  Going with Token and Tag approach, the changes need to be made
in our W-Region framework is relative small as well.

Vikram, many points you made below are well-taken.  Hal and I had a long
discussion at SC'16 on how to build an practical infrastructure for people
to experiment with and study all pros and cons for IR extensions for expressing
parallelism.  optimization parallel code, and many other usage for
directive/pragma information. Personally, I would agree, eventually, the
solution likely could be a combination of b and d when we go with parallel IR,
after the community finally agreed on what are the most common to be represented
as LLVM instructions.

Having said that, this RFC serves as the first step, the intrinsics we proposed
are language neutral, but, "tag" or "metadata" are specific
to language constructs/directive/pragma...we are expecting more and more
feedback and discussion on this work. Thank you all again.


From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of David
Majnemer via llvm-dev
Sent: Wednesday, January 11, 2017 2:18 PM
To: Hal Finkel <hfinkel at anl.gov>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] [RFC] IR-level Region Annotations

FWIW, we needed to maintain single entry-multiple exit regions for WinEH and we
accomplished it via a different mechanism.

We had an instruction which produces a value of type Token
(http://llvm.org/docs/LangRef.html#token-type) which let us establish the region
and another instruction to exit the region by consuming it. The dominance rules
allowed us to avoid situations where the compiler might trash the regions in
weird ways and made sure that regions would be left unharmed.

AFAIK, a similar approach using Token could work here. I think it would reduce
the amount of stuff you'd need LLVM to maintain.


On Wed, Jan 11, 2017 at 2:02 PM, Hal Finkel via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
A Proposal for adding an experimental IR-level region-annotation infrastructure
============================================================================Hal
Finkel (ANL) and Xinmin Tian (Intel)

This is a proposal for adding an experimental infrastructure to support
annotating regions in LLVM IR, making use of intrinsics and metadata, and
a generic analysis to allow transformations to easily make use of these
annotated regions. This infrastructure is flexible enough to support
representation of directives for parallelization, vectorization, and
offloading of both loops and more-general code regions. Under this scheme,
the conceptual distance between source-level directives and the region
annotations need not be significant, making the incremental cost of
supporting new directives and modifiers often small. It is not, however,
specific to those use cases.

Problem Statement
================There are a series of discussions on LLVM IR extensions for
representing region
and loop annotations for parallelism, and other user-guided transformations,
among both industrial and academic members of the LLVM community. Increasing
the quality of our OpenMP implementation is an important motivating use case,
but certainly not the only one. For OpenMP in particular, we've discussed
having an IR representation for years. Presently, all OpenMP pragmas are
transformed directly into runtime-library calls in Clang, and outlining (i.e.
extracting parallel regions into their own functions to be invoked by the
runtime library) is done in Clang as well. Our implementation does not further
optimize OpenMP constructs, and a lot of thought has been put into how we might
improve this. For some optimizations, such as redundant barrier removal, we
could use a TargetLibraryInfo-like mechanism to recognize frontend-generated
runtime calls and proceed from there. Dealing with cases where we lose
pointer-aliasing information, information on loop bounds, etc. we could improve
by improving our inter-procedural-analysis capabilities. We should do that
regardless. However, there are important cases where the underlying scheme we
want to use to lower the various parallelism constructs, especially when
targeting accelerators, changes depending on what is in the parallel region.
In important cases where we can see everything (i.e. there aren't arbitrary
external calls), code generation should proceed in a way that is very different
from the general case. To have a sensible implementation, this must be done
after inlining. When using LTO, this should be done during the link-time phase.
As a result, we must move away from our purely-front-end based lowering scheme.
The question is what to do instead, and how to do it in a way that is generally
useful to the entire community.

Designs previously discussed can be classified into four categories:

(a) Add a large number of new kinds of LLVM metadata, and use them to annotate
    each necessary instruction for parallelism, data attributes, etc.
(b) Add several new LLVM instructions such as, for parallelism, fork, spawn,
    join, barrier, etc.
(c) Add a large number of LLVM intrinsics for directives and clauses, each
    intrinsic representing a directive or a clause.
(d) Add a small number of LLVM intrinsics for region or loop annotations,
    represent the directive/clause names using metadata and the remaining
    information using arguments.

Here we're proposing (d), and below is a brief pros and cons analysis based
on
these discussions and our own experiences of supporting region/loop annotations
in LLVM-based compilers. The table below shows a short summary of our analysis.

Various commercial compilers (e.g. from Intel, IBM, Cray, PGI), and GCC [1,2],
have IR-level representations for parallelism constructs. Based on experience
from these previous developments, we'd like a solution for LLVM that
maximizes
optimization enablement while minimizing the maintenance costs and complexity
increase experienced by the community as a whole.

Representing the desired information in the LLVM IR is just the first step. The
challenge is to maintain the desired semantics without blocking useful
optimizations. With options (c) and (d), dependencies can be preserved mainly
based on the use/def chain of the arguments of each intrinsic, and a manageable
set LLVM analysis and transformations can be made aware of certain kinds of
annotations in order to enable specific optimizations. In this regard,
options (c) and (d) are close with respect to maintenance efforts. However,
based on our experiences, option (d) is preferable because it is easier to
extend to support new directives and clauses in the future without the need to
add new intrinsics as required by option (c).

Table 1. Pros/cons summary of LLVM IR experimental extension options

--------+----------------------+-----------------------------------------------
Options |         Pros         | Cons
--------+----------------------+-----------------------------------------------
(a)     | No need to add new   | LLVM passes do not always maintain metadata.
        | instructions or      | Need to educate many passes (if not all) to
        | new intrinsics       | understand and handle them.
--------+----------------------+-----------------------------------------------
(b)     | Parallelism becomes  | Huge effort for extending all LLVM passes and
        | first class citizen  | code generation to support new instructions.
        |                      | A large set of information still needs to be
        |                      | represented using other means.
--------+----------------------+-----------------------------------------------
(c)     | Less impact on the   | A large number of intrinsics must be added.
        | exist LLVM passes.   | Some of the optimizations need to be
        | Fewer requirements   | educated to understand them.
        | for passes to        |
        | maintain metadata.   |
--------+----------------------+-----------------------------------------------
(d)     | Minimal impact on    | Some of the optimizations need to be
        | existing LLVM        | educated to understand them.
        | optimizations passes.| No requirements for all passes to maintain
        | directive and clause | large set of metadata with values.
        | names use metadata   |
        | strings.             |
--------+----------------------+-----------------------------------------------

Regarding (a), LLVM already uses metadata for certain loop information (e.g.
annotations directing loop transformations and assertions about loop-carried
dependencies), but there is no natural or consistent way to extend this scheme
to represent necessary data-movement or region information.


New Intrinsics for Region and Value Annotations
=============================================The following new (experimental)
intrinsics are proposed which allow:

a) Annotating a code region marked with directives / pragmas,
b) Annotating values associated with the region (or loops), that is, those
   values associated with directives / pragmas.
c) Providing information on LLVM IR transformations needed for the annotated
   code regions (or loops).

These can be used both by frontends and also by transformation passes (e.g.
automated parallelization). The names used here are similar to those used by
our internal prototype, but obviously we expect a community bikeshed
discussion.

def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty],
                                   [IntrArgMemOnly],
"llvm.experimental.directive">;

def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty],
[IntrArgMemOnly],
"llvm.experimental.dir.qual">;

def int_experimental_dir_qual_opnd : Intrinsic<[],
[llvm_metadata_ty, llvm_any_ty],
[IntrArgMemOnly],
"llvm.experimental.dir.qual.opnd">;

def int_experimental_dir_qual_opndlist : Intrinsic<
                                        [],
[llvm_metadata_ty, llvm_vararg_ty],
[IntrArgMemOnly],
"llvm.experimental.dir.qual.opndlist">;

Note that calls to these intrinsics might need to be annotated with the
convergent attribute when they represent fork/join operations, barriers, and
similar.

Usage Examples
=============
This section shows a few examples using these experimental intrinsics.
LLVM developers who will use these intrinsics can defined their own MDstring.
All details of using these intrinsics on representing OpenMP 4.5 constructs are
described in [1][3].


Example I: An OpenMP combined construct

#pragma omp target teams distribute parallel for simd
  loop

LLVM IR
-------
call void @llvm.experimental.directive(metadata !0)
call void @llvm.experimental.directive(metadata !1)
call void @llvm.experimental.directive(metadata !2)
call void @llvm.experimental.directive(metadata !3)
  loop
call void @llvm.experimental.directive(metadata !6)
call void @llvm.experimental.directive(metadata !5)
call void @llvm.experimental.directive(metadata !4)

!0 = metadata !{metadata !DIR.OMP.TARGET}
!1 = metadata !{metadata !DIR.OMP.TEAMS}
!2 = metadata !{metadata
!DIR.OMP.DISTRIBUTE.PARLOOP.SI<http://DIR.OMP.DISTRIBUTE.PARLOOP.SI>MD}

!6 = metadata !{metadata !DIR.OMP.END.DISTRIBUTE.PARLOOP.SIMD}
!5 = metadata !{metadata !DIR.OMP.END.TEAMS}
!4 = metadata !{metadata !DIR.OMP.END.TARGET}

Example II: Assume x,y,z are int variables, and s is a non-POD variable.
            Then, lastprivate(x,y,s,z) is represented as:

LLVM IR
-------
call void @llvm.experimental.dir.qual.opndlist(
                metadata !1, %x, %y, metadata !2, %a, %ctor, %dtor, %z)

!1 = metadata !{metadata !QUAL.OMP.PRIVATE}
!2 = metadata !{metadata !QUAL.OPND.NONPOD}

Example III: A prefetch pragma example

// issue vprefetch1 for xp with a distance of 20 vectorized iterations ahead
// issue vprefetch0 for yp with a distance of 10 vectorized iterations ahead
#pragma prefetch x:1:20 y:0:10
for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }

LLVM IR
-------
call void @llvm.experimental.directive(metadata !0)
call void @llvm.experimental.dir.qual.opnslist(metadata !1, %xp, 1, 20,
                                               metadata !1, %yp, 0, 10)
  loop
call void @llvm.experimental.directive(metadata !3)

References
=========
[1] LLVM Framework and IR extensions for Parallelization, SIMD Vectorization
    and Offloading Support. SC'2016 LLVM-HPC3 Workshop. (Xinmin Tian
et.al<http://et.al>.)
    Saltlake City, Utah.

[2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer loop
    auto-vectorization. (Hideki Saito, et.al<http://et.al>.) LLVM
Developers' Meeting 2016,
    San Jose.

[3] Intrinsics, Metadata, and Attributes: The Story continues! (Hal Finkel)
    LLVM Developers' Meeting, 2016. San Jose

[4] LLVM Intrinsic Function and Metadata String Interface for Directive (or
    Pragmas) Representation. Specification Draft v0.9, Intel Corporation, 2016.


Acknowledgements
===============We would like to thank Chandler Carruth (Google), Johannes
Doerfert (Saarland
Univ.), Yaoqing Gao (HuaWei), Michael Wong (Codeplay), Ettore Tiotto,
Carlo Bertolli, Bardia Mahjour (IBM), and all other LLVM-HPC IR Extensions WG
members for their constructive feedback on the LLVM framework and IR extension
proposal.

Proposed Implementation
======================
Two sets of patches of supporting these experimental intrinsics and demonstrate
the usage are ready for community review.

a) Clang patches that support core OpenMP pragmas using this approach.
b) W-Region framework patches: CFG restructuring to form single-entry-
   single-exit work region (W-Region) based on annotations, Demand-driven
   intrinsic parsing, and WRegionInfo collection and analysis passes,
   Dump functions of WRegionInfo.

On top of this functionality, we will provide the transformation patches for
core OpenMP constructs (e.g. start with "#pragma omp parallel for"
loop for
lowering and outlining, and "#pragma omp simd" to hook it up with
LoopVectorize.cpp). We have internal implementations for many constructs now.
We will break this functionality up to create a series of patches for
community review.

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170113/2d3e4872/attachment.html>

Mehdi Amini via llvm-dev

2017-Jan-13 07:06 UTC

head link

[llvm-dev] [RFC] IR-level Region Annotations

> On Jan 11, 2017, at 2:02 PM, Hal Finkel via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> A Proposal for adding an experimental IR-level region-annotation
infrastructure
>
=============================================================================
> Hal Finkel (ANL) and Xinmin Tian (Intel)
> 
> This is a proposal for adding an experimental infrastructure to support
> annotating regions in LLVM IR, making use of intrinsics and metadata, and
> a generic analysis to allow transformations to easily make use of these
> annotated regions. This infrastructure is flexible enough to support
> representation of directives for parallelization, vectorization, and
> offloading of both loops and more-general code regions. Under this scheme,
> the conceptual distance between source-level directives and the region
> annotations need not be significant, making the incremental cost of
> supporting new directives and modifiers often small. It is not, however,
> specific to those use cases.
> 
> Problem Statement
> ================> There are a series of discussions on LLVM IR
extensions for representing region
> and loop annotations for parallelism, and other user-guided
transformations,
> among both industrial and academic members of the LLVM community.
Increasing
> the quality of our OpenMP implementation is an important motivating use
case,
> but certainly not the only one. For OpenMP in particular, we've
discussed
> having an IR representation for years. Presently, all OpenMP pragmas are
> transformed directly into runtime-library calls in Clang, and outlining
(i.e.
> extracting parallel regions into their own functions to be invoked by the
> runtime library) is done in Clang as well. Our implementation does not
further
> optimize OpenMP constructs, and a lot of thought has been put into how we
might
> improve this. For some optimizations, such as redundant barrier removal, we
> could use a TargetLibraryInfo-like mechanism to recognize
frontend-generated
> runtime calls and proceed from there. Dealing with cases where we lose
> pointer-aliasing information, information on loop bounds, etc. we could
improve
> by improving our inter-procedural-analysis capabilities. We should do that
> regardless. However, there are important cases where the underlying scheme
we
> want to use to lower the various parallelism constructs, especially when
> targeting accelerators, changes depending on what is in the parallel
region.
> In important cases where we can see everything (i.e. there aren't
arbitrary
> external calls), code generation should proceed in a way that is very
different
> from the general case. To have a sensible implementation, this must be done
> after inlining. When using LTO, this should be done during the link-time
phase.
> As a result, we must move away from our purely-front-end based lowering
scheme.
> The question is what to do instead, and how to do it in a way that is
generally
> useful to the entire community.
> 
> Designs previously discussed can be classified into four categories:
> 
> (a) Add a large number of new kinds of LLVM metadata, and use them to
annotate
>    each necessary instruction for parallelism, data attributes, etc.
> (b) Add several new LLVM instructions such as, for parallelism, fork,
spawn,
>    join, barrier, etc.
> (c) Add a large number of LLVM intrinsics for directives and clauses, each
>    intrinsic representing a directive or a clause.
> (d) Add a small number of LLVM intrinsics for region or loop annotations,
>    represent the directive/clause names using metadata and the remaining
>    information using arguments.
> 
> Here we're proposing (d), and below is a brief pros and cons analysis
based on
> these discussions and our own experiences of supporting region/loop
annotations
> in LLVM-based compilers. The table below shows a short summary of our
analysis.
> 
> Various commercial compilers (e.g. from Intel, IBM, Cray, PGI), and GCC
[1,2],
> have IR-level representations for parallelism constructs. Based on
experience
> from these previous developments, we'd like a solution for LLVM that
maximizes
> optimization enablement while minimizing the maintenance costs and
complexity
> increase experienced by the community as a whole.
> 
> Representing the desired information in the LLVM IR is just the first step.
The
> challenge is to maintain the desired semantics without blocking useful
> optimizations. With options (c) and (d), dependencies can be preserved
mainly
> based on the use/def chain of the arguments of each intrinsic, and a
manageable
> set LLVM analysis and transformations can be made aware of certain kinds of
> annotations in order to enable specific optimizations. In this regard,
> options (c) and (d) are close with respect to maintenance efforts. However,
> based on our experiences, option (d) is preferable because it is easier to
> extend to support new directives and clauses in the future without the need
to
> add new intrinsics as required by option (c).
> 
> Table 1. Pros/cons summary of LLVM IR experimental extension options
> 
>
--------+----------------------+-----------------------------------------------
> Options |         Pros         | Cons
>
--------+----------------------+-----------------------------------------------
> (a)     | No need to add new   | LLVM passes do not always maintain
metadata.
>        | instructions or      | Need to educate many passes (if not all) to
>        | new intrinsics       | understand and handle them.
>
--------+----------------------+-----------------------------------------------
> (b)     | Parallelism becomes  | Huge effort for extending all LLVM passes
and
>        | first class citizen  | code generation to support new
instructions.
>        |                      | A large set of information still needs to
be
>        |                      | represented using other means.
>
--------+----------------------+-----------------------------------------------
> (c)     | Less impact on the   | A large number of intrinsics must be
added.
>        | exist LLVM passes.   | Some of the optimizations need to be
>        | Fewer requirements   | educated to understand them.
>        | for passes to        |
>        | maintain metadata.   |
>
--------+----------------------+-----------------------------------------------
> (d)     | Minimal impact on    | Some of the optimizations need to be
>        | existing LLVM        | educated to understand them.
>        | optimizations passes.| No requirements for all passes to maintain
>        | directive and clause | large set of metadata with values.
>        | names use metadata   |
>        | strings.             |
>
--------+----------------------+-----------------------------------------------
> 
> Regarding (a), LLVM already uses metadata for certain loop information
(e.g.
> annotations directing loop transformations and assertions about
loop-carried
> dependencies), but there is no natural or consistent way to extend this
scheme
> to represent necessary data-movement or region information.
> 
> 
> New Intrinsics for Region and Value Annotations
> =============================================> The following new
(experimental) intrinsics are proposed which allow:
> 
> a) Annotating a code region marked with directives / pragmas,
> b) Annotating values associated with the region (or loops), that is, those
>   values associated with directives / pragmas.
> c) Providing information on LLVM IR transformations needed for the
annotated
>   code regions (or loops).
> 
> These can be used both by frontends and also by transformation passes (e.g.
> automated parallelization). The names used here are similar to those used
by
> our internal prototype, but obviously we expect a community bikeshed
> discussion.
> 
> def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty],
>                                   [IntrArgMemOnly],
> "llvm.experimental.directive">;
> 
> def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual">;
> 
> def int_experimental_dir_qual_opnd : Intrinsic<[],
> [llvm_metadata_ty, llvm_any_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual.opnd">;
> 
> def int_experimental_dir_qual_opndlist : Intrinsic<
>                                        [],
> [llvm_metadata_ty, llvm_vararg_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual.opndlist">;
> 
> Note that calls to these intrinsics might need to be annotated with the
> convergent attribute when they represent fork/join operations, barriers,
and
> similar.
> 
> Usage Examples
> =============> 
> This section shows a few examples using these experimental intrinsics.
> LLVM developers who will use these intrinsics can defined their own
MDstring.
> All details of using these intrinsics on representing OpenMP 4.5 constructs
are described in [1][3].
> 
> 
> Example I: An OpenMP combined construct
> 
> #pragma omp target teams distribute parallel for simd
>  loop
> 
> LLVM IR
> -------
> call void @llvm.experimental.directive(metadata !0)
> call void @llvm.experimental.directive(metadata !1)
> call void @llvm.experimental.directive(metadata !2)
> call void @llvm.experimental.directive(metadata !3)
>  loop
> call void @llvm.experimental.directive(metadata !6)
> call void @llvm.experimental.directive(metadata !5)
> call void @llvm.experimental.directive(metadata !4)
> 
> !0 = metadata !{metadata !DIR.OMP.TARGET}
> !1 = metadata !{metadata !DIR.OMP.TEAMS}
> !2 = metadata !{metadata !DIR.OMP.DISTRIBUTE.PARLOOP.SIMD}
> 
> !6 = metadata !{metadata !DIR.OMP.END.DISTRIBUTE.PARLOOP.SIMD}
> !5 = metadata !{metadata !DIR.OMP.END.TEAMS}
> !4 = metadata !{metadata !DIR.OMP.END.TARGET}

Something isn’t clear to me about how do you preserve the validity of the region
annotations since regular passes don’t know about the attached semantic?

For example, if a region is marking a loop as parallel from an OpenMP pragma,
but a strength reduction transformation introduces a loop-carried dependency and
thus invalidate the “parallel” semantic?

Another issue is how much are these intrinsics acting as “barrier” for regular
optimizations? For example what prevents reordering a loop such that it is
executed *before* the intrinsic that mark the beginning of the region?

I feel I missed a piece (but maybe I should start with the provided references?)
:)

— 
Mehdi


> 
> Example II: Assume x,y,z are int variables, and s is a non-POD variable.
>            Then, lastprivate(x,y,s,z) is represented as:
> 
> LLVM IR
> -------
> call void @llvm.experimental.dir.qual.opndlist(
>                metadata !1, %x, %y, metadata !2, %a, %ctor, %dtor, %z)
> 
> !1 = metadata !{metadata !QUAL.OMP.PRIVATE}
> !2 = metadata !{metadata !QUAL.OPND.NONPOD}
> 
> Example III: A prefetch pragma example
> 
> // issue vprefetch1 for xp with a distance of 20 vectorized iterations
ahead
> // issue vprefetch0 for yp with a distance of 10 vectorized iterations
ahead
> #pragma prefetch x:1:20 y:0:10
> for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
> 
> LLVM IR
> -------
> call void @llvm.experimental.directive(metadata !0)
> call void @llvm.experimental.dir.qual.opnslist(metadata !1, %xp, 1, 20,
>                                               metadata !1, %yp, 0, 10)
>  loop
> call void @llvm.experimental.directive(metadata !3)
> 
> References
> =========> 
> [1] LLVM Framework and IR extensions for Parallelization, SIMD
Vectorization
>    and Offloading Support. SC'2016 LLVM-HPC3 Workshop. (Xinmin Tian
et.al.)
>    Saltlake City, Utah.
> 
> [2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer
loop
>    auto-vectorization. (Hideki Saito, et.al.) LLVM Developers' Meeting
2016,
>    San Jose.
> 
> [3] Intrinsics, Metadata, and Attributes: The Story continues! (Hal Finkel)
>    LLVM Developers' Meeting, 2016. San Jose
> 
> [4] LLVM Intrinsic Function and Metadata String Interface for Directive (or
>    Pragmas) Representation. Specification Draft v0.9, Intel Corporation,
2016.
> 
> 
> Acknowledgements
> ===============> We would like to thank Chandler Carruth (Google),
Johannes Doerfert (Saarland
> Univ.), Yaoqing Gao (HuaWei), Michael Wong (Codeplay), Ettore Tiotto,
> Carlo Bertolli, Bardia Mahjour (IBM), and all other LLVM-HPC IR Extensions
WG
> members for their constructive feedback on the LLVM framework and IR
extension
> proposal.
> 
> Proposed Implementation
> ======================> 
> Two sets of patches of supporting these experimental intrinsics and
demonstrate
> the usage are ready for community review.
> 
> a) Clang patches that support core OpenMP pragmas using this approach.
> b) W-Region framework patches: CFG restructuring to form single-entry-
>   single-exit work region (W-Region) based on annotations, Demand-driven
>   intrinsic parsing, and WRegionInfo collection and analysis passes,
>   Dump functions of WRegionInfo.
> 
> On top of this functionality, we will provide the transformation patches
for
> core OpenMP constructs (e.g. start with "#pragma omp parallel
for" loop for
> lowering and outlining, and "#pragma omp simd" to hook it up with
> LoopVectorize.cpp). We have internal implementations for many constructs
now.
> We will break this functionality up to create a series of patches for
> community review.
> 
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Tian, Xinmin via llvm-dev

2017-Jan-13 17:00 UTC

head link

[llvm-dev] [RFC] IR-level Region Annotations

Mehdi, thanks for good questions.
>>>>>Something isn’t clear to me about how do you preserve the
validity of the region annotations since regular passes don’t know about the
attached semantic?
There are some small changes we have to make in some optimizations to make sure
the optimizations does not validation attached annotation semantics. 1) provide
hand-shaking / query utils for optimization to know the region is parallel loop,
2) setup a proper optimization phase ordering. In our product compiler ICC, we
used both approaches.
>>>>>For example, if a region is marking a loop as parallel from
an OpenMP pragma, but a strength reduction transformation introduces a
loop-carried dependency and thus invalidate the “parallel” semantic?
Yes, there are a list of such cases, e.g. forward substitution, strength
reduction, gloable constant propagation. Here is another example, under serial
semantic, you can do constant propagation, but, under parallel semantics,  we
can't do constant propagation. All these issues are considered

Int x = 100;

parallel num_threads(4) 
{
    ....
     atomic { 
           x = x + 600
     } 
}

These issues exists already when you do IPO optimization cross OpenCL or Cuda
kernel functions, or outlined function from ClangFE.
>>>>>Another issue is how much are these intrinsics acting as
“barrier” for regular optimizations? For example what prevents reordering a loop
such that it is executed *before* the intrinsic that mark the beginning of the
region?
ClangFE will need set the "convergent" attribute for the intrinsic
calls (call side) based on the language construct semantics.

Thanks,
Xinmin 


-----Original Message-----
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Mehdi
Amini via llvm-dev
Sent: Thursday, January 12, 2017 11:07 PM
To: Hal Finkel <hfinkel at anl.gov>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] [RFC] IR-level Region Annotations

> On Jan 11, 2017, at 2:02 PM, Hal Finkel via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> A Proposal for adding an experimental IR-level region-annotation 
> infrastructure 
> =====================================================================>
======= Hal Finkel (ANL) and Xinmin Tian (Intel)
> 
> This is a proposal for adding an experimental infrastructure to 
> support annotating regions in LLVM IR, making use of intrinsics and 
> metadata, and a generic analysis to allow transformations to easily 
> make use of these annotated regions. This infrastructure is flexible 
> enough to support representation of directives for parallelization, 
> vectorization, and offloading of both loops and more-general code 
> regions. Under this scheme, the conceptual distance between 
> source-level directives and the region annotations need not be 
> significant, making the incremental cost of supporting new directives 
> and modifiers often small. It is not, however, specific to those use cases.
> 
> Problem Statement
> ================> There are a series of discussions on LLVM IR
extensions for
> representing region and loop annotations for parallelism, and other 
> user-guided transformations, among both industrial and academic 
> members of the LLVM community. Increasing the quality of our OpenMP 
> implementation is an important motivating use case, but certainly not 
> the only one. For OpenMP in particular, we've discussed having an IR 
> representation for years. Presently, all OpenMP pragmas are transformed
directly into runtime-library calls in Clang, and outlining (i.e.
> extracting parallel regions into their own functions to be invoked by 
> the runtime library) is done in Clang as well. Our implementation does 
> not further optimize OpenMP constructs, and a lot of thought has been 
> put into how we might improve this. For some optimizations, such as 
> redundant barrier removal, we could use a TargetLibraryInfo-like 
> mechanism to recognize frontend-generated runtime calls and proceed 
> from there. Dealing with cases where we lose pointer-aliasing 
> information, information on loop bounds, etc. we could improve by 
> improving our inter-procedural-analysis capabilities. We should do 
> that regardless. However, there are important cases where the 
> underlying scheme we want to use to lower the various parallelism
constructs, especially when targeting accelerators, changes depending on what is
in the parallel region.
> In important cases where we can see everything (i.e. there aren't 
> arbitrary external calls), code generation should proceed in a way 
> that is very different from the general case. To have a sensible 
> implementation, this must be done after inlining. When using LTO, this
should be done during the link-time phase.
> As a result, we must move away from our purely-front-end based lowering
scheme.
> The question is what to do instead, and how to do it in a way that is 
> generally useful to the entire community.
> 
> Designs previously discussed can be classified into four categories:
> 
> (a) Add a large number of new kinds of LLVM metadata, and use them to
annotate
>    each necessary instruction for parallelism, data attributes, etc.
> (b) Add several new LLVM instructions such as, for parallelism, fork,
spawn,
>    join, barrier, etc.
> (c) Add a large number of LLVM intrinsics for directives and clauses, each
>    intrinsic representing a directive or a clause.
> (d) Add a small number of LLVM intrinsics for region or loop annotations,
>    represent the directive/clause names using metadata and the remaining
>    information using arguments.
> 
> Here we're proposing (d), and below is a brief pros and cons analysis 
> based on these discussions and our own experiences of supporting 
> region/loop annotations in LLVM-based compilers. The table below shows a
short summary of our analysis.
> 
> Various commercial compilers (e.g. from Intel, IBM, Cray, PGI), and 
> GCC [1,2], have IR-level representations for parallelism constructs. 
> Based on experience from these previous developments, we'd like a 
> solution for LLVM that maximizes optimization enablement while 
> minimizing the maintenance costs and complexity increase experienced by the
community as a whole.
> 
> Representing the desired information in the LLVM IR is just the first 
> step. The challenge is to maintain the desired semantics without 
> blocking useful optimizations. With options (c) and (d), dependencies 
> can be preserved mainly based on the use/def chain of the arguments of 
> each intrinsic, and a manageable set LLVM analysis and transformations 
> can be made aware of certain kinds of annotations in order to enable 
> specific optimizations. In this regard, options (c) and (d) are close 
> with respect to maintenance efforts. However, based on our 
> experiences, option (d) is preferable because it is easier to extend 
> to support new directives and clauses in the future without the need to add
new intrinsics as required by option (c).
> 
> Table 1. Pros/cons summary of LLVM IR experimental extension options
> 
> --------+----------------------+--------------------------------------
> --------+----------------------+---------
> Options |         Pros         | Cons
> --------+----------------------+--------------------------------------
> --------+----------------------+---------
> (a)     | No need to add new   | LLVM passes do not always maintain
metadata.
>        | instructions or      | Need to educate many passes (if not all) to
>        | new intrinsics       | understand and handle them.
> --------+----------------------+--------------------------------------
> --------+----------------------+---------
> (b)     | Parallelism becomes  | Huge effort for extending all LLVM passes
and
>        | first class citizen  | code generation to support new
instructions.
>        |                      | A large set of information still needs to
be
>        |                      | represented using other means.
> --------+----------------------+--------------------------------------
> --------+----------------------+---------
> (c)     | Less impact on the   | A large number of intrinsics must be
added.
>        | exist LLVM passes.   | Some of the optimizations need to be
>        | Fewer requirements   | educated to understand them.
>        | for passes to        |
>        | maintain metadata.   |
> --------+----------------------+--------------------------------------
> --------+----------------------+---------
> (d)     | Minimal impact on    | Some of the optimizations need to be
>        | existing LLVM        | educated to understand them.
>        | optimizations passes.| No requirements for all passes to maintain
>        | directive and clause | large set of metadata with values.
>        | names use metadata   |
>        | strings.             |
> --------+----------------------+--------------------------------------
> --------+----------------------+---------
> 
> Regarding (a), LLVM already uses metadata for certain loop information
(e.g.
> annotations directing loop transformations and assertions about 
> loop-carried dependencies), but there is no natural or consistent way 
> to extend this scheme to represent necessary data-movement or region
information.
> 
> 
> New Intrinsics for Region and Value Annotations 
> =============================================> The following new
(experimental) intrinsics are proposed which allow:
> 
> a) Annotating a code region marked with directives / pragmas,
> b) Annotating values associated with the region (or loops), that is, those
>   values associated with directives / pragmas.
> c) Providing information on LLVM IR transformations needed for the
annotated
>   code regions (or loops).
> 
> These can be used both by frontends and also by transformation passes (e.g.
> automated parallelization). The names used here are similar to those 
> used by our internal prototype, but obviously we expect a community 
> bikeshed discussion.
> 
> def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty],
>                                   [IntrArgMemOnly], 
> "llvm.experimental.directive">;
> 
> def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty], 
> [IntrArgMemOnly], "llvm.experimental.dir.qual">;
> 
> def int_experimental_dir_qual_opnd : Intrinsic<[], [llvm_metadata_ty, 
> llvm_any_ty], [IntrArgMemOnly],
"llvm.experimental.dir.qual.opnd">;
> 
> def int_experimental_dir_qual_opndlist : Intrinsic<
>                                        [], [llvm_metadata_ty, 
> llvm_vararg_ty], [IntrArgMemOnly], 
> "llvm.experimental.dir.qual.opndlist">;
> 
> Note that calls to these intrinsics might need to be annotated with 
> the convergent attribute when they represent fork/join operations, 
> barriers, and similar.
> 
> Usage Examples
> =============> 
> This section shows a few examples using these experimental intrinsics.
> LLVM developers who will use these intrinsics can defined their own
MDstring.
> All details of using these intrinsics on representing OpenMP 4.5 constructs
are described in [1][3].
> 
> 
> Example I: An OpenMP combined construct
> 
> #pragma omp target teams distribute parallel for simd  loop
> 
> LLVM IR
> -------
> call void @llvm.experimental.directive(metadata !0) call void 
> @llvm.experimental.directive(metadata !1) call void 
> @llvm.experimental.directive(metadata !2) call void 
> @llvm.experimental.directive(metadata !3)  loop call void 
> @llvm.experimental.directive(metadata !6) call void 
> @llvm.experimental.directive(metadata !5) call void 
> @llvm.experimental.directive(metadata !4)
> 
> !0 = metadata !{metadata !DIR.OMP.TARGET}
> !1 = metadata !{metadata !DIR.OMP.TEAMS}
> !2 = metadata !{metadata !DIR.OMP.DISTRIBUTE.PARLOOP.SIMD}
> 
> !6 = metadata !{metadata !DIR.OMP.END.DISTRIBUTE.PARLOOP.SIMD}
> !5 = metadata !{metadata !DIR.OMP.END.TEAMS}
> !4 = metadata !{metadata !DIR.OMP.END.TARGET}

Something isn’t clear to me about how do you preserve the validity of the region
annotations since regular passes don’t know about the attached semantic?

For example, if a region is marking a loop as parallel from an OpenMP pragma,
but a strength reduction transformation introduces a loop-carried dependency and
thus invalidate the “parallel” semantic?

Another issue is how much are these intrinsics acting as “barrier” for regular
optimizations? For example what prevents reordering a loop such that it is
executed *before* the intrinsic that mark the beginning of the region?

I feel I missed a piece (but maybe I should start with the provided references?)
:)

—
Mehdi


> 
> Example II: Assume x,y,z are int variables, and s is a non-POD variable.
>            Then, lastprivate(x,y,s,z) is represented as:
> 
> LLVM IR
> -------
> call void @llvm.experimental.dir.qual.opndlist(
>                metadata !1, %x, %y, metadata !2, %a, %ctor, %dtor, %z)
> 
> !1 = metadata !{metadata !QUAL.OMP.PRIVATE}
> !2 = metadata !{metadata !QUAL.OPND.NONPOD}
> 
> Example III: A prefetch pragma example
> 
> // issue vprefetch1 for xp with a distance of 20 vectorized iterations
ahead
> // issue vprefetch0 for yp with a distance of 10 vectorized iterations
ahead
> #pragma prefetch x:1:20 y:0:10
> for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
> 
> LLVM IR
> -------
> call void @llvm.experimental.directive(metadata !0)
> call void @llvm.experimental.dir.qual.opnslist(metadata !1, %xp, 1, 20,
>                                               metadata !1, %yp, 0, 10)
>  loop
> call void @llvm.experimental.directive(metadata !3)
> 
> References
> =========> 
> [1] LLVM Framework and IR extensions for Parallelization, SIMD
Vectorization
>    and Offloading Support. SC'2016 LLVM-HPC3 Workshop. (Xinmin Tian
et.al.)
>    Saltlake City, Utah.
> 
> [2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer
loop
>    auto-vectorization. (Hideki Saito, et.al.) LLVM Developers' Meeting
2016,
>    San Jose.
> 
> [3] Intrinsics, Metadata, and Attributes: The Story continues! (Hal Finkel)
>    LLVM Developers' Meeting, 2016. San Jose
> 
> [4] LLVM Intrinsic Function and Metadata String Interface for Directive (or
>    Pragmas) Representation. Specification Draft v0.9, Intel Corporation,
2016.
> 
> 
> Acknowledgements
> ===============> We would like to thank Chandler Carruth (Google),
Johannes Doerfert (Saarland
> Univ.), Yaoqing Gao (HuaWei), Michael Wong (Codeplay), Ettore Tiotto,
> Carlo Bertolli, Bardia Mahjour (IBM), and all other LLVM-HPC IR Extensions
WG
> members for their constructive feedback on the LLVM framework and IR
extension
> proposal.
> 
> Proposed Implementation
> ======================> 
> Two sets of patches of supporting these experimental intrinsics and
demonstrate
> the usage are ready for community review.
> 
> a) Clang patches that support core OpenMP pragmas using this approach.
> b) W-Region framework patches: CFG restructuring to form single-entry-
>   single-exit work region (W-Region) based on annotations, Demand-driven
>   intrinsic parsing, and WRegionInfo collection and analysis passes,
>   Dump functions of WRegionInfo.
> 
> On top of this functionality, we will provide the transformation patches
for
> core OpenMP constructs (e.g. start with "#pragma omp parallel
for" loop for
> lowering and outlining, and "#pragma omp simd" to hook it up with
> LoopVectorize.cpp). We have internal implementations for many constructs
now.
> We will break this functionality up to create a series of patches for
> community review.
> 
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Tian, Xinmin via llvm-dev

2017-Jan-13 18:20 UTC

head link

[llvm-dev] [RFC] IR-level Region Annotations

Yonghong,

Hal and I have been very careful about this RFC, given the long time experience
Hal had with the community, as you can see, we positioned it as “experimental”.
So, we can add an infrastructure for people to use, improve and extend over
time.

For your API calls, SPMD divergence, our current implementation does not cover
them, as the 1st step, we focused on language constructs like parallel for,
simd, cilk_for.  We are open for any suggestion and proposal to cover your usage
cases.

Thanks,
Xinmin

From: Yonghong Yan [mailto:yan at oakland.edu]
Sent: Friday, January 13, 2017 9:55 AM
To: Tian, Xinmin <xinmin.tian at intel.com>
Cc: David Majnemer <david.majnemer at gmail.com>; Hal Finkel <hfinkel
at anl.gov>; llvm-dev at lists.llvm.org; llvm-dev-request at lists.llvm.org
Subject: Re: [llvm-dev] [RFC] IR-level Region Annotations

I have been following the discussion and this is something we are looking for
for years. I am glad that you have the patch that at least we can use. I
however, have several comments/requests based on our experience:

1. The idea of creating a set of representations for language-neutral parallel
constructs and then allows for extending for language-specific representation
and passes sounds very well, and it is definitely worth to give a try as next
step I think. We did a survey of multiple threading programming interfaces
(https://www.hpcwire.com/2015/03/02/a-comparison-of-heterogeneous-and-manycore-programming-models)
and it is obvious that parallel interfaces (even including inter-node model such
as PGAS, APGAS) share some common mechanisms for representing parallelism,
data/affinity, synchronization and mutual exclusion.

2. There are APIs calls or typical statements that are meant for parallelism,
but the language-based IR extensions are not able to represent them. E.g.
pthread_create/join (or other runtime calls such as C++ thread/async, etc) are
fork/join parallelism. Frontend can be enhanced to recognize those calls and
create/append PIR info to those calls. It however would be nicer if we have a
meta approach, e.g. providing a file that tells the frontend that a parallel IR
should be used for specific API calls.

3. SPMD divergence such as the following, or if statement inside vector loop
body

   if (omp_get_thread_num() == 4) { /* same for UPC or MPI internode/PGAS as we
use this to different computation for each thread/proc*/

   } else {

   }

Do your patch have support for that? Basically I can imagine we need to add some
metadata/token/tags to branching IR.

Thank you!

Yonghong Yan
Assistant Professor
Department of Computer Science and Engineering
School of Engineering and Computer Science
Oakland University
Office: EC 534
Phone: 248-370-4087<tel:(248)%20370-4087>
Email: yan at oakland.edu<mailto:yan at oakland.edu>
www.secs.oakland.edu/~yan<http://www.secs.oakland.edu/~yan>

On Fri, Jan 13, 2017 at 1:59 AM, Tian, Xinmin via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Thank you all David, Hongbin, Reid, Mehdi, Daniel, Vikram for your review and
constructive feedback for this RFC. We will update our Clang FE patch to use
Token and Tags suggested by David, Hongbin, et.al<http://et.al>. instead
of using metadata and function arguments for IR-annotation intrinsic function
calls to see how it goes to preserve all necessary information for our LLVM
middle-end / back-end transformation.  Going with Token and Tag approach, the
changes need to be made in our W-Region framework is relative small as well.

Vikram, many points you made below are well-taken.  Hal and I had a long
discussion at SC'16 on how to build an practical infrastructure for people
to experiment with and study all pros and cons for IR extensions for expressing
parallelism.  optimization parallel code, and many other usage for
directive/pragma information. Personally, I would agree, eventually, the
solution likely could be a combination of b and d when we go with parallel IR,
after the community finally agreed on what are the most common to be represented
as LLVM instructions.

Having said that, this RFC serves as the first step, the intrinsics we proposed
are language neutral, but, "tag" or "metadata" are specific
to language constructs/directive/pragma...we are expecting more and more
feedback and discussion on this work. Thank you all again.


From: llvm-dev [mailto:llvm-dev-bounces at
lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of
David Majnemer via llvm-dev
Sent: Wednesday, January 11, 2017 2:18 PM
To: Hal Finkel <hfinkel at anl.gov<mailto:hfinkel at anl.gov>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Subject: Re: [llvm-dev] [RFC] IR-level Region Annotations

FWIW, we needed to maintain single entry-multiple exit regions for WinEH and we
accomplished it via a different mechanism.

We had an instruction which produces a value of type Token
(http://llvm.org/docs/LangRef.html#token-type) which let us establish the region
and another instruction to exit the region by consuming it. The dominance rules
allowed us to avoid situations where the compiler might trash the regions in
weird ways and made sure that regions would be left unharmed.

AFAIK, a similar approach using Token could work here. I think it would reduce
the amount of stuff you'd need LLVM to maintain.


On Wed, Jan 11, 2017 at 2:02 PM, Hal Finkel via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
A Proposal for adding an experimental IR-level region-annotation infrastructure
============================================================================Hal
Finkel (ANL) and Xinmin Tian (Intel)

This is a proposal for adding an experimental infrastructure to support
annotating regions in LLVM IR, making use of intrinsics and metadata, and
a generic analysis to allow transformations to easily make use of these
annotated regions. This infrastructure is flexible enough to support
representation of directives for parallelization, vectorization, and
offloading of both loops and more-general code regions. Under this scheme,
the conceptual distance between source-level directives and the region
annotations need not be significant, making the incremental cost of
supporting new directives and modifiers often small. It is not, however,
specific to those use cases.

Problem Statement
================There are a series of discussions on LLVM IR extensions for
representing region
and loop annotations for parallelism, and other user-guided transformations,
among both industrial and academic members of the LLVM community. Increasing
the quality of our OpenMP implementation is an important motivating use case,
but certainly not the only one. For OpenMP in particular, we've discussed
having an IR representation for years. Presently, all OpenMP pragmas are
transformed directly into runtime-library calls in Clang, and outlining (i.e.
extracting parallel regions into their own functions to be invoked by the
runtime library) is done in Clang as well. Our implementation does not further
optimize OpenMP constructs, and a lot of thought has been put into how we might
improve this. For some optimizations, such as redundant barrier removal, we
could use a TargetLibraryInfo-like mechanism to recognize frontend-generated
runtime calls and proceed from there. Dealing with cases where we lose
pointer-aliasing information, information on loop bounds, etc. we could improve
by improving our inter-procedural-analysis capabilities. We should do that
regardless. However, there are important cases where the underlying scheme we
want to use to lower the various parallelism constructs, especially when
targeting accelerators, changes depending on what is in the parallel region.
In important cases where we can see everything (i.e. there aren't arbitrary
external calls), code generation should proceed in a way that is very different
from the general case. To have a sensible implementation, this must be done
after inlining. When using LTO, this should be done during the link-time phase.
As a result, we must move away from our purely-front-end based lowering scheme.
The question is what to do instead, and how to do it in a way that is generally
useful to the entire community.

Designs previously discussed can be classified into four categories:

(a) Add a large number of new kinds of LLVM metadata, and use them to annotate
    each necessary instruction for parallelism, data attributes, etc.
(b) Add several new LLVM instructions such as, for parallelism, fork, spawn,
    join, barrier, etc.
(c) Add a large number of LLVM intrinsics for directives and clauses, each
    intrinsic representing a directive or a clause.
(d) Add a small number of LLVM intrinsics for region or loop annotations,
    represent the directive/clause names using metadata and the remaining
    information using arguments.

Here we're proposing (d), and below is a brief pros and cons analysis based
on
these discussions and our own experiences of supporting region/loop annotations
in LLVM-based compilers. The table below shows a short summary of our analysis.

Various commercial compilers (e.g. from Intel, IBM, Cray, PGI), and GCC [1,2],
have IR-level representations for parallelism constructs. Based on experience
from these previous developments, we'd like a solution for LLVM that
maximizes
optimization enablement while minimizing the maintenance costs and complexity
increase experienced by the community as a whole.

Representing the desired information in the LLVM IR is just the first step. The
challenge is to maintain the desired semantics without blocking useful
optimizations. With options (c) and (d), dependencies can be preserved mainly
based on the use/def chain of the arguments of each intrinsic, and a manageable
set LLVM analysis and transformations can be made aware of certain kinds of
annotations in order to enable specific optimizations. In this regard,
options (c) and (d) are close with respect to maintenance efforts. However,
based on our experiences, option (d) is preferable because it is easier to
extend to support new directives and clauses in the future without the need to
add new intrinsics as required by option (c).

Table 1. Pros/cons summary of LLVM IR experimental extension options

--------+----------------------+-----------------------------------------------
Options |         Pros         | Cons
--------+----------------------+-----------------------------------------------
(a)     | No need to add new   | LLVM passes do not always maintain metadata.
        | instructions or      | Need to educate many passes (if not all) to
        | new intrinsics       | understand and handle them.
--------+----------------------+-----------------------------------------------
(b)     | Parallelism becomes  | Huge effort for extending all LLVM passes and
        | first class citizen  | code generation to support new instructions.
        |                      | A large set of information still needs to be
        |                      | represented using other means.
--------+----------------------+-----------------------------------------------
(c)     | Less impact on the   | A large number of intrinsics must be added.
        | exist LLVM passes.   | Some of the optimizations need to be
        | Fewer requirements   | educated to understand them.
        | for passes to        |
        | maintain metadata.   |
--------+----------------------+-----------------------------------------------
(d)     | Minimal impact on    | Some of the optimizations need to be
        | existing LLVM        | educated to understand them.
        | optimizations passes.| No requirements for all passes to maintain
        | directive and clause | large set of metadata with values.
        | names use metadata   |
        | strings.             |
--------+----------------------+-----------------------------------------------

Regarding (a), LLVM already uses metadata for certain loop information (e.g.
annotations directing loop transformations and assertions about loop-carried
dependencies), but there is no natural or consistent way to extend this scheme
to represent necessary data-movement or region information.


New Intrinsics for Region and Value Annotations
=============================================The following new (experimental)
intrinsics are proposed which allow:

a) Annotating a code region marked with directives / pragmas,
b) Annotating values associated with the region (or loops), that is, those
   values associated with directives / pragmas.
c) Providing information on LLVM IR transformations needed for the annotated
   code regions (or loops).

These can be used both by frontends and also by transformation passes (e.g.
automated parallelization). The names used here are similar to those used by
our internal prototype, but obviously we expect a community bikeshed
discussion.

def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty],
                                   [IntrArgMemOnly],
"llvm.experimental.directive">;

def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty],
[IntrArgMemOnly],
"llvm.experimental.dir.qual">;

def int_experimental_dir_qual_opnd : Intrinsic<[],
[llvm_metadata_ty, llvm_any_ty],
[IntrArgMemOnly],
"llvm.experimental.dir.qual.opnd">;

def int_experimental_dir_qual_opndlist : Intrinsic<
                                        [],
[llvm_metadata_ty, llvm_vararg_ty],
[IntrArgMemOnly],
"llvm.experimental.dir.qual.opndlist">;

Note that calls to these intrinsics might need to be annotated with the
convergent attribute when they represent fork/join operations, barriers, and
similar.

Usage Examples
=============
This section shows a few examples using these experimental intrinsics.
LLVM developers who will use these intrinsics can defined their own MDstring.
All details of using these intrinsics on representing OpenMP 4.5 constructs are
described in [1][3].


Example I: An OpenMP combined construct

#pragma omp target teams distribute parallel for simd
  loop

LLVM IR
-------
call void @llvm.experimental.directive(metadata !0)
call void @llvm.experimental.directive(metadata !1)
call void @llvm.experimental.directive(metadata !2)
call void @llvm.experimental.directive(metadata !3)
  loop
call void @llvm.experimental.directive(metadata !6)
call void @llvm.experimental.directive(metadata !5)
call void @llvm.experimental.directive(metadata !4)

!0 = metadata !{metadata !DIR.OMP.TARGET}
!1 = metadata !{metadata !DIR.OMP.TEAMS}
!2 = metadata !{metadata
!DIR.OMP.DISTRIBUTE.PARLOOP.SI<http://DIR.OMP.DISTRIBUTE.PARLOOP.SI>MD}

!6 = metadata !{metadata !DIR.OMP.END.DISTRIBUTE.PARLOOP.SIMD}
!5 = metadata !{metadata !DIR.OMP.END.TEAMS}
!4 = metadata !{metadata !DIR.OMP.END.TARGET}

Example II: Assume x,y,z are int variables, and s is a non-POD variable.
            Then, lastprivate(x,y,s,z) is represented as:

LLVM IR
-------
call void @llvm.experimental.dir.qual.opndlist(
                metadata !1, %x, %y, metadata !2, %a, %ctor, %dtor, %z)

!1 = metadata !{metadata !QUAL.OMP.PRIVATE}
!2 = metadata !{metadata !QUAL.OPND.NONPOD}

Example III: A prefetch pragma example

// issue vprefetch1 for xp with a distance of 20 vectorized iterations ahead
// issue vprefetch0 for yp with a distance of 10 vectorized iterations ahead
#pragma prefetch x:1:20 y:0:10
for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }

LLVM IR
-------
call void @llvm.experimental.directive(metadata !0)
call void @llvm.experimental.dir.qual.opnslist(metadata !1, %xp, 1, 20,
                                               metadata !1, %yp, 0, 10)
  loop
call void @llvm.experimental.directive(metadata !3)

References
=========
[1] LLVM Framework and IR extensions for Parallelization, SIMD Vectorization
    and Offloading Support. SC'2016 LLVM-HPC3 Workshop. (Xinmin Tian
et.al<http://et.al>.)
    Saltlake City, Utah.

[2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer loop
    auto-vectorization. (Hideki Saito, et.al<http://et.al>.) LLVM
Developers' Meeting 2016,
    San Jose.

[3] Intrinsics, Metadata, and Attributes: The Story continues! (Hal Finkel)
    LLVM Developers' Meeting, 2016. San Jose

[4] LLVM Intrinsic Function and Metadata String Interface for Directive (or
    Pragmas) Representation. Specification Draft v0.9, Intel Corporation, 2016.


Acknowledgements
===============We would like to thank Chandler Carruth (Google), Johannes
Doerfert (Saarland
Univ.), Yaoqing Gao (HuaWei), Michael Wong (Codeplay), Ettore Tiotto,
Carlo Bertolli, Bardia Mahjour (IBM), and all other LLVM-HPC IR Extensions WG
members for their constructive feedback on the LLVM framework and IR extension
proposal.

Proposed Implementation
======================
Two sets of patches of supporting these experimental intrinsics and demonstrate
the usage are ready for community review.

a) Clang patches that support core OpenMP pragmas using this approach.
b) W-Region framework patches: CFG restructuring to form single-entry-
   single-exit work region (W-Region) based on annotations, Demand-driven
   intrinsic parsing, and WRegionInfo collection and analysis passes,
   Dump functions of WRegionInfo.

On top of this functionality, we will provide the transformation patches for
core OpenMP constructs (e.g. start with "#pragma omp parallel for"
loop for
lowering and outlining, and "#pragma omp simd" to hook it up with
LoopVectorize.cpp). We have internal implementations for many constructs now.
We will break this functionality up to create a series of patches for
community review.

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170113/8b648c99/attachment-0001.html>

Johannes Doerfert via llvm-dev

2017-Jan-19 15:30 UTC

head link

[llvm-dev] [RFC] IR-level Region Annotations

Hi Hal, Hi Xinmin,

First let me thank you for pushing in this direction, it means more
people are interested in some kind of change here.

While "our" RFC will be sent out next week I want to comment on a
specific
point of this one right now:
> [...]
> (b) Add several new LLVM instructions such as, for parallelism, fork,
spawn,
>     join, barrier, etc.
> [...]
For me fork and spawn are serving the same purpose, most new schemes suggested
three new instructions in total.
> Options |         Pros         | Cons
> [...]
> (b)     | Parallelism becomes  | Huge effort for extending all LLVM passes
and
>         | first class citizen  | code generation to support new
instructions.
>         |                      | A large set of information still needs to
be
>         |                      | represented using other means.
> [...]
I am especially curious where you get your data from. Tapir [0] (and to
some degree PIR [1]) have shown that, counterintuitively, only a few changes
to LLVM passes are needed. Tapir was recently used in an MIT class with a
lot of students and it seemed to work well with only minimal changes
to analysis and especially transformation passes.

Also the "code generation" issues you mention do, in my opinion, apply
to
_all_ proposed schemes as all have to be lowered to either sequential
code or parallel library runtime calls eventually. The sequentialization
for our new Tapir/PIR hybrid has less than 50 lines and generating
parallel runtime calls will probably be similar in all schemes anyway.

Regarding the last point, "A large set of information still needs to be
represented using other means", I am curious why this is a bad thing. I
think IR should be simple, each instruction/intrinsic etc. should have
clear and minimal semantics. Also I think we already have a lot of
simple constructs in the IR to express high-level information properly,
e.g. 'atomicrmw' instructions for high-level reduction. While we
currently
lack analysis passes to extract information from such low-level
representations, these are certainly possible [2,3]. I would argue that such
analysis are a better way to do things than placing "high-level
intrinsics" in the IR to mark things like reductions.

Cheers,
  Johannes, on behalf of the Tapir and PIR team

[0] https://cpc2016.infor.uva.es/wp-content/uploads/2016/06/CPC2016_paper_12.pdf
[1] http://compilers.cs.uni-saarland.de/people/doerfert/parallelcfg.pdf
[2] Section 3 in https://arxiv.org/pdf/1505.07716
[3] Section 3.2 and 3.3 in
https://www.st.cs.uni-saarland.de/publications/files/streit-taco-2015.pdf



On 01/11, Hal Finkel via llvm-dev wrote:> A Proposal for adding an experimental IR-level region-annotation
> infrastructure
>
============================================================================>
> Hal Finkel (ANL) and Xinmin Tian (Intel)
> 
> This is a proposal for adding an experimental infrastructure to support
> annotating regions in LLVM IR, making use of intrinsics and metadata, and
> a generic analysis to allow transformations to easily make use of these
> annotated regions. This infrastructure is flexible enough to support
> representation of directives for parallelization, vectorization, and
> offloading of both loops and more-general code regions. Under this scheme,
> the conceptual distance between source-level directives and the region
> annotations need not be significant, making the incremental cost of
> supporting new directives and modifiers often small. It is not, however,
> specific to those use cases.
> 
> Problem Statement
> ================> There are a series of discussions on LLVM IR
extensions for representing
> region
> and loop annotations for parallelism, and other user-guided
transformations,
> among both industrial and academic members of the LLVM community.
Increasing
> the quality of our OpenMP implementation is an important motivating use
> case,
> but certainly not the only one. For OpenMP in particular, we've
discussed
> having an IR representation for years. Presently, all OpenMP pragmas are
> transformed directly into runtime-library calls in Clang, and outlining
> (i.e.
> extracting parallel regions into their own functions to be invoked by the
> runtime library) is done in Clang as well. Our implementation does not
> further
> optimize OpenMP constructs, and a lot of thought has been put into how we
> might
> improve this. For some optimizations, such as redundant barrier removal, we
> could use a TargetLibraryInfo-like mechanism to recognize
frontend-generated
> runtime calls and proceed from there. Dealing with cases where we lose
> pointer-aliasing information, information on loop bounds, etc. we could
> improve
> by improving our inter-procedural-analysis capabilities. We should do that
> regardless. However, there are important cases where the underlying scheme
> we
> want to use to lower the various parallelism constructs, especially when
> targeting accelerators, changes depending on what is in the parallel
region.
> In important cases where we can see everything (i.e. there aren't
arbitrary
> external calls), code generation should proceed in a way that is very
> different
> from the general case. To have a sensible implementation, this must be done
> after inlining. When using LTO, this should be done during the link-time
> phase.
> As a result, we must move away from our purely-front-end based lowering
> scheme.
> The question is what to do instead, and how to do it in a way that is
> generally
> useful to the entire community.
> 
> Designs previously discussed can be classified into four categories:
> 
> (a) Add a large number of new kinds of LLVM metadata, and use them to
> annotate
>     each necessary instruction for parallelism, data attributes, etc.
> (b) Add several new LLVM instructions such as, for parallelism, fork,
spawn,
>     join, barrier, etc.
> (c) Add a large number of LLVM intrinsics for directives and clauses, each
>     intrinsic representing a directive or a clause.
> (d) Add a small number of LLVM intrinsics for region or loop annotations,
>     represent the directive/clause names using metadata and the remaining
>     information using arguments.
> 
> Here we're proposing (d), and below is a brief pros and cons analysis
based
> on
> these discussions and our own experiences of supporting region/loop
> annotations
> in LLVM-based compilers. The table below shows a short summary of our
> analysis.
> 
> Various commercial compilers (e.g. from Intel, IBM, Cray, PGI), and GCC
> [1,2],
> have IR-level representations for parallelism constructs. Based on
> experience
> from these previous developments, we'd like a solution for LLVM that
> maximizes
> optimization enablement while minimizing the maintenance costs and
> complexity
> increase experienced by the community as a whole.
> 
> Representing the desired information in the LLVM IR is just the first step.
> The
> challenge is to maintain the desired semantics without blocking useful
> optimizations. With options (c) and (d), dependencies can be preserved
> mainly
> based on the use/def chain of the arguments of each intrinsic, and a
> manageable
> set LLVM analysis and transformations can be made aware of certain kinds of
> annotations in order to enable specific optimizations. In this regard,
> options (c) and (d) are close with respect to maintenance efforts. However,
> based on our experiences, option (d) is preferable because it is easier to
> extend to support new directives and clauses in the future without the need
> to
> add new intrinsics as required by option (c).
> 
> Table 1. Pros/cons summary of LLVM IR experimental extension options
> 
>
--------+----------------------+-----------------------------------------------
> 
> Options |         Pros         | Cons
>
--------+----------------------+-----------------------------------------------
> 
> (a)     | No need to add new   | LLVM passes do not always maintain
> metadata.
>         | instructions or      | Need to educate many passes (if not all)
to
>         | new intrinsics       | understand and handle them.
>
--------+----------------------+-----------------------------------------------
> 
> (b)     | Parallelism becomes  | Huge effort for extending all LLVM passes
> and
>         | first class citizen  | code generation to support new
> instructions.
>         |                      | A large set of information still needs to
> be
>         |                      | represented using other means.
>
--------+----------------------+-----------------------------------------------
> 
> (c)     | Less impact on the   | A large number of intrinsics must be
added.
>         | exist LLVM passes.   | Some of the optimizations need to be
>         | Fewer requirements   | educated to understand them.
>         | for passes to        |
>         | maintain metadata.   |
>
--------+----------------------+-----------------------------------------------
> 
> (d)     | Minimal impact on    | Some of the optimizations need to be
>         | existing LLVM        | educated to understand them.
>         | optimizations passes.| No requirements for all passes to maintain
>         | directive and clause | large set of metadata with values.
>         | names use metadata   |
>         | strings.             |
>
--------+----------------------+-----------------------------------------------
> 
> 
> Regarding (a), LLVM already uses metadata for certain loop information
(e.g.
> annotations directing loop transformations and assertions about
loop-carried
> dependencies), but there is no natural or consistent way to extend this
> scheme
> to represent necessary data-movement or region information.
> 
> 
> New Intrinsics for Region and Value Annotations
> =============================================> The following new
(experimental) intrinsics are proposed which allow:
> 
> a) Annotating a code region marked with directives / pragmas,
> b) Annotating values associated with the region (or loops), that is, those
>    values associated with directives / pragmas.
> c) Providing information on LLVM IR transformations needed for the
annotated
>    code regions (or loops).
> 
> These can be used both by frontends and also by transformation passes (e.g.
> automated parallelization). The names used here are similar to those used
by
> our internal prototype, but obviously we expect a community bikeshed
> discussion.
> 
> def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty],
>                                    [IntrArgMemOnly],
> "llvm.experimental.directive">;
> 
> def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual">;
> 
> def int_experimental_dir_qual_opnd : Intrinsic<[],
> [llvm_metadata_ty, llvm_any_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual.opnd">;
> 
> def int_experimental_dir_qual_opndlist : Intrinsic<
>                                         [],
> [llvm_metadata_ty, llvm_vararg_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual.opndlist">;
> 
> Note that calls to these intrinsics might need to be annotated with the
> convergent attribute when they represent fork/join operations, barriers,
and
> similar.
> 
> Usage Examples
> =============> 
> This section shows a few examples using these experimental intrinsics.
> LLVM developers who will use these intrinsics can defined their own
> MDstring.
> All details of using these intrinsics on representing OpenMP 4.5 constructs
> are described in [1][3].
> 
> 
> Example I: An OpenMP combined construct
> 
> #pragma omp target teams distribute parallel for simd
>   loop
> 
> LLVM IR
> -------
> call void @llvm.experimental.directive(metadata !0)
> call void @llvm.experimental.directive(metadata !1)
> call void @llvm.experimental.directive(metadata !2)
> call void @llvm.experimental.directive(metadata !3)
>   loop
> call void @llvm.experimental.directive(metadata !6)
> call void @llvm.experimental.directive(metadata !5)
> call void @llvm.experimental.directive(metadata !4)
> 
> !0 = metadata !{metadata !DIR.OMP.TARGET}
> !1 = metadata !{metadata !DIR.OMP.TEAMS}
> !2 = metadata !{metadata !DIR.OMP.DISTRIBUTE.PARLOOP.SIMD}
> 
> !6 = metadata !{metadata !DIR.OMP.END.DISTRIBUTE.PARLOOP.SIMD}
> !5 = metadata !{metadata !DIR.OMP.END.TEAMS}
> !4 = metadata !{metadata !DIR.OMP.END.TARGET}
> 
> Example II: Assume x,y,z are int variables, and s is a non-POD variable.
>             Then, lastprivate(x,y,s,z) is represented as:
> 
> LLVM IR
> -------
> call void @llvm.experimental.dir.qual.opndlist(
>                 metadata !1, %x, %y, metadata !2, %a, %ctor, %dtor, %z)
> 
> !1 = metadata !{metadata !QUAL.OMP.PRIVATE}
> !2 = metadata !{metadata !QUAL.OPND.NONPOD}
> 
> Example III: A prefetch pragma example
> 
> // issue vprefetch1 for xp with a distance of 20 vectorized iterations
ahead
> // issue vprefetch0 for yp with a distance of 10 vectorized iterations
ahead
> #pragma prefetch x:1:20 y:0:10
> for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
> 
> LLVM IR
> -------
> call void @llvm.experimental.directive(metadata !0)
> call void @llvm.experimental.dir.qual.opnslist(metadata !1, %xp, 1, 20,
>                                                metadata !1, %yp, 0, 10)
>   loop
> call void @llvm.experimental.directive(metadata !3)
> 
> References
> =========> 
> [1] LLVM Framework and IR extensions for Parallelization, SIMD
Vectorization
>     and Offloading Support. SC'2016 LLVM-HPC3 Workshop. (Xinmin Tian
et.al.)
>     Saltlake City, Utah.
> 
> [2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer
> loop
>     auto-vectorization. (Hideki Saito, et.al.) LLVM Developers' Meeting
> 2016,
>     San Jose.
> 
> [3] Intrinsics, Metadata, and Attributes: The Story continues! (Hal Finkel)
>     LLVM Developers' Meeting, 2016. San Jose
> 
> [4] LLVM Intrinsic Function and Metadata String Interface for Directive (or
>     Pragmas) Representation. Specification Draft v0.9, Intel Corporation,
> 2016.
> 
> 
> Acknowledgements
> ===============> We would like to thank Chandler Carruth (Google),
Johannes Doerfert
> (Saarland
> Univ.), Yaoqing Gao (HuaWei), Michael Wong (Codeplay), Ettore Tiotto,
> Carlo Bertolli, Bardia Mahjour (IBM), and all other LLVM-HPC IR Extensions
> WG
> members for their constructive feedback on the LLVM framework and IR
> extension
> proposal.
> 
> Proposed Implementation
> ======================> 
> Two sets of patches of supporting these experimental intrinsics and
> demonstrate
> the usage are ready for community review.
> 
> a) Clang patches that support core OpenMP pragmas using this approach.
> b) W-Region framework patches: CFG restructuring to form single-entry-
>    single-exit work region (W-Region) based on annotations, Demand-driven
>    intrinsic parsing, and WRegionInfo collection and analysis passes,
>    Dump functions of WRegionInfo.
> 
> On top of this functionality, we will provide the transformation patches
for
> core OpenMP constructs (e.g. start with "#pragma omp parallel
for" loop for
> lowering and outlining, and "#pragma omp simd" to hook it up with
> LoopVectorize.cpp). We have internal implementations for many constructs
> now.
> We will break this functionality up to create a series of patches for
> community review.
> 
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

Johannes Doerfert
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland Informatics Campus, Germany
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : doerfert at cs.uni-saarland.de
Fax. +49 (0)681 302-3065  : http://www.cdl.uni-saarland.de/people/doerfert
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: Digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170119/b4a4d2e2/attachment.sig>

Johannes Doerfert via llvm-dev

2017-Jan-20 13:32 UTC

head link

[llvm-dev] [RFC] IR-level Region Annotations

On 01/11, Daniel Berlin via llvm-dev wrote:> >
> > def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty],
> >                                    [IntrArgMemOnly],
> > "llvm.experimental.directive">;
> >
> > def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty],
> > [IntrArgMemOnly],
> > "llvm.experimental.dir.qual">;
> >
> > def int_experimental_dir_qual_opnd : Intrinsic<[],
> > [llvm_metadata_ty, llvm_any_ty],
> > [IntrArgMemOnly],
> > "llvm.experimental.dir.qual.opnd">;
> >
> > def int_experimental_dir_qual_opndlist : Intrinsic<
> >                                         [],
> > [llvm_metadata_ty, llvm_vararg_ty],
> > [IntrArgMemOnly],
> > "llvm.experimental.dir.qual.opndlist">;
> >
> >
> I'll bite.
> 
> What does argmemonly mean when the operands are metadata/?
> :)
> 
> If the rest is an attempt to keep the intrinsic from being floated or
> removed, i'm strongly against extending a way we already know to have
> significant effect on optimization (fake memory dependence) to do this.
> Particularly for something so major.
I guess that any kind of extension that does pretend to have some sort
of side effect will have a significant effect on optimizations. The
tricky part is to find the right representation (and side effects) that
implicitly keep the semantics/invariants of parallel code preserved
while allowing as much transformations as possible.

[The following paragraph might be a bit abstract. If it is unclear
 please tell me and I will add a code example.]

In the example by Sanjoy [0] we saw that "parallel regions markers"
need
to be a barrier for alloca movement, though we might want some
transformations to "move" them nevertheless, e.g., to aggregate in
parallel executed allocas outside the parallel region as a means of
communication. To make a transformation like this happening but prevent
the movement Sanjoy described at the same time, we probably have to
educate some passes on the semantics of "parallel region markers".
Alternatively, (my hope is that) if we use use known concepts (mainly
dominance) to encode parts of the parallel invariants such optimizations
should come at a much lower cost.

Cheers,
  Johannes

[0] http://lists.llvm.org/pipermail/llvm-dev/2017-January/109302.html
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 

Johannes Doerfert
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland Informatics Campus, Germany
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : doerfert at cs.uni-saarland.de
Fax. +49 (0)681 302-3065  : http://www.cdl.uni-saarland.de/people/doerfert
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: Digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170120/79802694/attachment.sig>

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Jan 2017 - [RFC] IR-level Region Annotations

[llvm-dev] [RFC] IR-level Region Annotations

[llvm-dev] [RFC] IR-level Region Annotations

[llvm-dev] [RFC] IR-level Region Annotations

[llvm-dev] [RFC] IR-level Region Annotations

[llvm-dev] [RFC] IR-level Region Annotations

[llvm-dev] [RFC] IR-level Region Annotations

[llvm-dev] [RFC] IR-level Region Annotations

[llvm-dev] [RFC] IR-level Region Annotations

[llvm-dev] [RFC] IR-level Region Annotations

[llvm-dev] [RFC] IR-level Region Annotations

[llvm-dev] [RFC] IR-level Region Annotations

Seemingly Similar Threads