thr3ads.net - llvm dev - [llvm-dev] [RFC] IR-level Region Annotations [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Reid Kleckner via llvm-dev

2017-Jan-11 23:51 UTC

[llvm-dev] [RFC] IR-level Region Annotations

+1, tokens are the current True Way to create single-entry multi-exit
regions. Your example for an annotated loop would look like:

%region = call token @llvm.openmp.regionstart(metadata ...) ; whatever
parameters you need here
  loop
call void @llvm.openmp.regionend(token %region)

If you use tokens, I would recommend proposal (c), where you introduce new
intrinsics for every new kind of region, instead of adding one overly
generic set of region intrinsics.

We already have a way to form regions with real barriers, and it's tokens.

On Wed, Jan 11, 2017 at 2:17 PM, David Majnemer via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> FWIW, we needed to maintain single entry-multiple exit regions for WinEH
> and we accomplished it via a different mechanism.
>
> We had an instruction which produces a value of type Token (
> http://llvm.org/docs/LangRef.html#token-type) which let us establish the
> region and another instruction to exit the region by consuming it. The
> dominance rules allowed us to avoid situations where the compiler might
> trash the regions in weird ways and made sure that regions would be left
> unharmed.
>
> AFAIK, a similar approach using Token could work here. I think it would
> reduce the amount of stuff you'd need LLVM to maintain.
>
>
> On Wed, Jan 11, 2017 at 2:02 PM, Hal Finkel via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> A Proposal for adding an experimental IR-level region-annotation
>> infrastructure
>>
============================================================================>>
>> Hal Finkel (ANL) and Xinmin Tian (Intel)
>>
>> This is a proposal for adding an experimental infrastructure to support
>> annotating regions in LLVM IR, making use of intrinsics and metadata,
and
>> a generic analysis to allow transformations to easily make use of these
>> annotated regions. This infrastructure is flexible enough to support
>> representation of directives for parallelization, vectorization, and
>> offloading of both loops and more-general code regions. Under this
scheme,
>> the conceptual distance between source-level directives and the region
>> annotations need not be significant, making the incremental cost of
>> supporting new directives and modifiers often small. It is not,
however,
>> specific to those use cases.
>>
>> Problem Statement
>> ================>> There are a series of discussions on LLVM IR
extensions for representing
>> region
>> and loop annotations for parallelism, and other user-guided
>> transformations,
>> among both industrial and academic members of the LLVM community.
>> Increasing
>> the quality of our OpenMP implementation is an important motivating use
>> case,
>> but certainly not the only one. For OpenMP in particular, we've
discussed
>> having an IR representation for years. Presently, all OpenMP pragmas
are
>> transformed directly into runtime-library calls in Clang, and outlining
>> (i.e.
>> extracting parallel regions into their own functions to be invoked by
the
>> runtime library) is done in Clang as well. Our implementation does not
>> further
>> optimize OpenMP constructs, and a lot of thought has been put into how
we
>> might
>> improve this. For some optimizations, such as redundant barrier
removal,
>> we
>> could use a TargetLibraryInfo-like mechanism to recognize
>> frontend-generated
>> runtime calls and proceed from there. Dealing with cases where we lose
>> pointer-aliasing information, information on loop bounds, etc. we could
>> improve
>> by improving our inter-procedural-analysis capabilities. We should do
that
>> regardless. However, there are important cases where the underlying
>> scheme we
>> want to use to lower the various parallelism constructs, especially
when
>> targeting accelerators, changes depending on what is in the parallel
>> region.
>> In important cases where we can see everything (i.e. there aren't
>> arbitrary
>> external calls), code generation should proceed in a way that is very
>> different
>> from the general case. To have a sensible implementation, this must be
>> done
>> after inlining. When using LTO, this should be done during the
link-time
>> phase.
>> As a result, we must move away from our purely-front-end based lowering
>> scheme.
>> The question is what to do instead, and how to do it in a way that is
>> generally
>> useful to the entire community.
>>
>> Designs previously discussed can be classified into four categories:
>>
>> (a) Add a large number of new kinds of LLVM metadata, and use them to
>> annotate
>>     each necessary instruction for parallelism, data attributes, etc.
>> (b) Add several new LLVM instructions such as, for parallelism, fork,
>> spawn,
>>     join, barrier, etc.
>> (c) Add a large number of LLVM intrinsics for directives and clauses,
each
>>     intrinsic representing a directive or a clause.
>> (d) Add a small number of LLVM intrinsics for region or loop
annotations,
>>     represent the directive/clause names using metadata and the
remaining
>>     information using arguments.
>>
>> Here we're proposing (d), and below is a brief pros and cons
analysis
>> based on
>> these discussions and our own experiences of supporting region/loop
>> annotations
>> in LLVM-based compilers. The table below shows a short summary of our
>> analysis.
>>
>> Various commercial compilers (e.g. from Intel, IBM, Cray, PGI), and GCC
>> [1,2],
>> have IR-level representations for parallelism constructs. Based on
>> experience
>> from these previous developments, we'd like a solution for LLVM
that
>> maximizes
>> optimization enablement while minimizing the maintenance costs and
>> complexity
>> increase experienced by the community as a whole.
>>
>> Representing the desired information in the LLVM IR is just the first
>> step. The
>> challenge is to maintain the desired semantics without blocking useful
>> optimizations. With options (c) and (d), dependencies can be preserved
>> mainly
>> based on the use/def chain of the arguments of each intrinsic, and a
>> manageable
>> set LLVM analysis and transformations can be made aware of certain
kinds
>> of
>> annotations in order to enable specific optimizations. In this regard,
>> options (c) and (d) are close with respect to maintenance efforts.
>> However,
>> based on our experiences, option (d) is preferable because it is easier
to
>> extend to support new directives and clauses in the future without the
>> need to
>> add new intrinsics as required by option (c).
>>
>> Table 1. Pros/cons summary of LLVM IR experimental extension options
>>
>>
--------+----------------------+-----------------------------------------------
>>
>> Options |         Pros         | Cons
>>
--------+----------------------+-----------------------------------------------
>>
>> (a)     | No need to add new   | LLVM passes do not always maintain
>> metadata.
>>         | instructions or      | Need to educate many passes (if not
all)
>> to
>>         | new intrinsics       | understand and handle them.
>>
--------+----------------------+-----------------------------------------------
>>
>> (b)     | Parallelism becomes  | Huge effort for extending all LLVM
>> passes and
>>         | first class citizen  | code generation to support new
>> instructions.
>>         |                      | A large set of information still needs
>> to be
>>         |                      | represented using other means.
>>
--------+----------------------+-----------------------------------------------
>>
>> (c)     | Less impact on the   | A large number of intrinsics must be
>> added.
>>         | exist LLVM passes.   | Some of the optimizations need to be
>>         | Fewer requirements   | educated to understand them.
>>         | for passes to        |
>>         | maintain metadata.   |
>>
--------+----------------------+-----------------------------------------------
>>
>> (d)     | Minimal impact on    | Some of the optimizations need to be
>>         | existing LLVM        | educated to understand them.
>>         | optimizations passes.| No requirements for all passes to
>> maintain
>>         | directive and clause | large set of metadata with values.
>>         | names use metadata   |
>>         | strings.             |
>>
--------+----------------------+-----------------------------------------------
>>
>>
>> Regarding (a), LLVM already uses metadata for certain loop information
>> (e.g.
>> annotations directing loop transformations and assertions about
>> loop-carried
>> dependencies), but there is no natural or consistent way to extend this
>> scheme
>> to represent necessary data-movement or region information.
>>
>>
>> New Intrinsics for Region and Value Annotations
>> =============================================>> The following new
(experimental) intrinsics are proposed which allow:
>>
>> a) Annotating a code region marked with directives / pragmas,
>> b) Annotating values associated with the region (or loops), that is,
those
>>    values associated with directives / pragmas.
>> c) Providing information on LLVM IR transformations needed for the
>> annotated
>>    code regions (or loops).
>>
>> These can be used both by frontends and also by transformation passes
>> (e.g.
>> automated parallelization). The names used here are similar to those
used
>> by
>> our internal prototype, but obviously we expect a community bikeshed
>> discussion.
>>
>> def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty],
>>                                    [IntrArgMemOnly],
>> "llvm.experimental.directive">;
>>
>> def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty],
>> [IntrArgMemOnly],
>> "llvm.experimental.dir.qual">;
>>
>> def int_experimental_dir_qual_opnd : Intrinsic<[],
>> [llvm_metadata_ty, llvm_any_ty],
>> [IntrArgMemOnly],
>> "llvm.experimental.dir.qual.opnd">;
>>
>> def int_experimental_dir_qual_opndlist : Intrinsic<
>>                                         [],
>> [llvm_metadata_ty, llvm_vararg_ty],
>> [IntrArgMemOnly],
>> "llvm.experimental.dir.qual.opndlist">;
>>
>> Note that calls to these intrinsics might need to be annotated with the
>> convergent attribute when they represent fork/join operations,
barriers,
>> and
>> similar.
>>
>> Usage Examples
>> =============>>
>> This section shows a few examples using these experimental intrinsics.
>> LLVM developers who will use these intrinsics can defined their own
>> MDstring.
>> All details of using these intrinsics on representing OpenMP 4.5
>> constructs are described in [1][3].
>>
>>
>> Example I: An OpenMP combined construct
>>
>> #pragma omp target teams distribute parallel for simd
>>   loop
>>
>> LLVM IR
>> -------
>> call void @llvm.experimental.directive(metadata !0)
>> call void @llvm.experimental.directive(metadata !1)
>> call void @llvm.experimental.directive(metadata !2)
>> call void @llvm.experimental.directive(metadata !3)
>>   loop
>> call void @llvm.experimental.directive(metadata !6)
>> call void @llvm.experimental.directive(metadata !5)
>> call void @llvm.experimental.directive(metadata !4)
>>
>> !0 = metadata !{metadata !DIR.OMP.TARGET}
>> !1 = metadata !{metadata !DIR.OMP.TEAMS}
>> !2 = metadata !{metadata !DIR.OMP.DISTRIBUTE.PARLOOP.SIMD}
>>
>> !6 = metadata !{metadata !DIR.OMP.END.DISTRIBUTE.PARLOOP.SIMD}
>> !5 = metadata !{metadata !DIR.OMP.END.TEAMS}
>> !4 = metadata !{metadata !DIR.OMP.END.TARGET}
>>
>> Example II: Assume x,y,z are int variables, and s is a non-POD
variable.
>>             Then, lastprivate(x,y,s,z) is represented as:
>>
>> LLVM IR
>> -------
>> call void @llvm.experimental.dir.qual.opndlist(
>>                 metadata !1, %x, %y, metadata !2, %a, %ctor, %dtor, %z)
>>
>> !1 = metadata !{metadata !QUAL.OMP.PRIVATE}
>> !2 = metadata !{metadata !QUAL.OPND.NONPOD}
>>
>> Example III: A prefetch pragma example
>>
>> // issue vprefetch1 for xp with a distance of 20 vectorized iterations
>> ahead
>> // issue vprefetch0 for yp with a distance of 10 vectorized iterations
>> ahead
>> #pragma prefetch x:1:20 y:0:10
>> for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
>>
>> LLVM IR
>> -------
>> call void @llvm.experimental.directive(metadata !0)
>> call void @llvm.experimental.dir.qual.opnslist(metadata !1, %xp, 1, 20,
>>                                                metadata !1, %yp, 0, 10)
>>   loop
>> call void @llvm.experimental.directive(metadata !3)
>>
>> References
>> =========>>
>> [1] LLVM Framework and IR extensions for Parallelization, SIMD
>> Vectorization
>>     and Offloading Support. SC'2016 LLVM-HPC3 Workshop. (Xinmin
Tian
>> et.al.)
>>     Saltlake City, Utah.
>>
>> [2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and
outer
>> loop
>>     auto-vectorization. (Hideki Saito, et.al.) LLVM Developers'
Meeting
>> 2016,
>>     San Jose.
>>
>> [3] Intrinsics, Metadata, and Attributes: The Story continues! (Hal
>> Finkel)
>>     LLVM Developers' Meeting, 2016. San Jose
>>
>> [4] LLVM Intrinsic Function and Metadata String Interface for Directive
>> (or
>>     Pragmas) Representation. Specification Draft v0.9, Intel
Corporation,
>> 2016.
>>
>>
>> Acknowledgements
>> ===============>> We would like to thank Chandler Carruth
(Google), Johannes Doerfert
>> (Saarland
>> Univ.), Yaoqing Gao (HuaWei), Michael Wong (Codeplay), Ettore Tiotto,
>> Carlo Bertolli, Bardia Mahjour (IBM), and all other LLVM-HPC IR
>> Extensions WG
>> members for their constructive feedback on the LLVM framework and IR
>> extension
>> proposal.
>>
>> Proposed Implementation
>> ======================>>
>> Two sets of patches of supporting these experimental intrinsics and
>> demonstrate
>> the usage are ready for community review.
>>
>> a) Clang patches that support core OpenMP pragmas using this approach.
>> b) W-Region framework patches: CFG restructuring to form single-entry-
>>    single-exit work region (W-Region) based on annotations,
Demand-driven
>>    intrinsic parsing, and WRegionInfo collection and analysis passes,
>>    Dump functions of WRegionInfo.
>>
>> On top of this functionality, we will provide the transformation
patches
>> for
>> core OpenMP constructs (e.g. start with "#pragma omp parallel
for" loop
>> for
>> lowering and outlining, and "#pragma omp simd" to hook it up
with
>> LoopVectorize.cpp). We have internal implementations for many
constructs
>> now.
>> We will break this functionality up to create a series of patches for
>> community review.
>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170111/7e571dd9/attachment-0001.html>

Hongbin Zheng via llvm-dev

2017-Jan-12 00:09 UTC

head link

[llvm-dev] [RFC] IR-level Region Annotations

On Wed, Jan 11, 2017 at 3:51 PM, Reid Kleckner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> +1, tokens are the current True Way to create single-entry multi-exit
> regions. Your example for an annotated loop would look like:
>
> %region = call token @llvm.openmp.regionstart(metadata ...) ; whatever
> parameters you need here
>   loop
> call void @llvm.openmp.regionend(token %region)
>
> If you use tokens, I would recommend proposal (c), where you introduce new
> intrinsics for every new kind of region, instead of adding one overly
> generic set of region intrinsics.
>Maybe we can come up with several categories of regions, and create new
intrinsic for each category, instead of creating new intrinsic for every
*kind*.

Thanks
Hongbin


>
> We already have a way to form regions with real barriers, and it's
tokens.
>
> On Wed, Jan 11, 2017 at 2:17 PM, David Majnemer via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> FWIW, we needed to maintain single entry-multiple exit regions for
WinEH
>> and we accomplished it via a different mechanism.
>>
>> We had an instruction which produces a value of type Token (
>> http://llvm.org/docs/LangRef.html#token-type) which let us establish
the
>> region and another instruction to exit the region by consuming it. The
>> dominance rules allowed us to avoid situations where the compiler might
>> trash the regions in weird ways and made sure that regions would be
left
>> unharmed.
>>
>> AFAIK, a similar approach using Token could work here. I think it would
>> reduce the amount of stuff you'd need LLVM to maintain.
>>
>>
>> On Wed, Jan 11, 2017 at 2:02 PM, Hal Finkel via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> A Proposal for adding an experimental IR-level region-annotation
>>> infrastructure
>>>
============================================================================>>>
>>> Hal Finkel (ANL) and Xinmin Tian (Intel)
>>>
>>> This is a proposal for adding an experimental infrastructure to
support
>>> annotating regions in LLVM IR, making use of intrinsics and
metadata, and
>>> a generic analysis to allow transformations to easily make use of
these
>>> annotated regions. This infrastructure is flexible enough to
support
>>> representation of directives for parallelization, vectorization,
and
>>> offloading of both loops and more-general code regions. Under this
>>> scheme,
>>> the conceptual distance between source-level directives and the
region
>>> annotations need not be significant, making the incremental cost of
>>> supporting new directives and modifiers often small. It is not,
however,
>>> specific to those use cases.
>>>
>>> Problem Statement
>>> ================>>> There are a series of discussions on
LLVM IR extensions for representing
>>> region
>>> and loop annotations for parallelism, and other user-guided
>>> transformations,
>>> among both industrial and academic members of the LLVM community.
>>> Increasing
>>> the quality of our OpenMP implementation is an important motivating
use
>>> case,
>>> but certainly not the only one. For OpenMP in particular, we've
discussed
>>> having an IR representation for years. Presently, all OpenMP
pragmas are
>>> transformed directly into runtime-library calls in Clang, and
outlining
>>> (i.e.
>>> extracting parallel regions into their own functions to be invoked
by the
>>> runtime library) is done in Clang as well. Our implementation does
not
>>> further
>>> optimize OpenMP constructs, and a lot of thought has been put into
how
>>> we might
>>> improve this. For some optimizations, such as redundant barrier
removal,
>>> we
>>> could use a TargetLibraryInfo-like mechanism to recognize
>>> frontend-generated
>>> runtime calls and proceed from there. Dealing with cases where we
lose
>>> pointer-aliasing information, information on loop bounds, etc. we
could
>>> improve
>>> by improving our inter-procedural-analysis capabilities. We should
do
>>> that
>>> regardless. However, there are important cases where the underlying
>>> scheme we
>>> want to use to lower the various parallelism constructs, especially
when
>>> targeting accelerators, changes depending on what is in the
parallel
>>> region.
>>> In important cases where we can see everything (i.e. there
aren't
>>> arbitrary
>>> external calls), code generation should proceed in a way that is
very
>>> different
>>> from the general case. To have a sensible implementation, this must
be
>>> done
>>> after inlining. When using LTO, this should be done during the
link-time
>>> phase.
>>> As a result, we must move away from our purely-front-end based
lowering
>>> scheme.
>>> The question is what to do instead, and how to do it in a way that
is
>>> generally
>>> useful to the entire community.
>>>
>>> Designs previously discussed can be classified into four
categories:
>>>
>>> (a) Add a large number of new kinds of LLVM metadata, and use them
to
>>> annotate
>>>     each necessary instruction for parallelism, data attributes,
etc.
>>> (b) Add several new LLVM instructions such as, for parallelism,
fork,
>>> spawn,
>>>     join, barrier, etc.
>>> (c) Add a large number of LLVM intrinsics for directives and
clauses,
>>> each
>>>     intrinsic representing a directive or a clause.
>>> (d) Add a small number of LLVM intrinsics for region or loop
annotations,
>>>     represent the directive/clause names using metadata and the
remaining
>>>     information using arguments.
>>>
>>> Here we're proposing (d), and below is a brief pros and cons
analysis
>>> based on
>>> these discussions and our own experiences of supporting region/loop
>>> annotations
>>> in LLVM-based compilers. The table below shows a short summary of
our
>>> analysis.
>>>
>>> Various commercial compilers (e.g. from Intel, IBM, Cray, PGI), and
GCC
>>> [1,2],
>>> have IR-level representations for parallelism constructs. Based on
>>> experience
>>> from these previous developments, we'd like a solution for LLVM
that
>>> maximizes
>>> optimization enablement while minimizing the maintenance costs and
>>> complexity
>>> increase experienced by the community as a whole.
>>>
>>> Representing the desired information in the LLVM IR is just the
first
>>> step. The
>>> challenge is to maintain the desired semantics without blocking
useful
>>> optimizations. With options (c) and (d), dependencies can be
preserved
>>> mainly
>>> based on the use/def chain of the arguments of each intrinsic, and
a
>>> manageable
>>> set LLVM analysis and transformations can be made aware of certain
kinds
>>> of
>>> annotations in order to enable specific optimizations. In this
regard,
>>> options (c) and (d) are close with respect to maintenance efforts.
>>> However,
>>> based on our experiences, option (d) is preferable because it is
easier
>>> to
>>> extend to support new directives and clauses in the future without
the
>>> need to
>>> add new intrinsics as required by option (c).
>>>
>>> Table 1. Pros/cons summary of LLVM IR experimental extension
options
>>>
>>>
--------+----------------------+-----------------------------------------------
>>>
>>> Options |         Pros         | Cons
>>>
--------+----------------------+-----------------------------------------------
>>>
>>> (a)     | No need to add new   | LLVM passes do not always maintain
>>> metadata.
>>>         | instructions or      | Need to educate many passes (if
not
>>> all) to
>>>         | new intrinsics       | understand and handle them.
>>>
--------+----------------------+-----------------------------------------------
>>>
>>> (b)     | Parallelism becomes  | Huge effort for extending all LLVM
>>> passes and
>>>         | first class citizen  | code generation to support new
>>> instructions.
>>>         |                      | A large set of information still
needs
>>> to be
>>>         |                      | represented using other means.
>>>
--------+----------------------+-----------------------------------------------
>>>
>>> (c)     | Less impact on the   | A large number of intrinsics must
be
>>> added.
>>>         | exist LLVM passes.   | Some of the optimizations need to
be
>>>         | Fewer requirements   | educated to understand them.
>>>         | for passes to        |
>>>         | maintain metadata.   |
>>>
--------+----------------------+-----------------------------------------------
>>>
>>> (d)     | Minimal impact on    | Some of the optimizations need to
be
>>>         | existing LLVM        | educated to understand them.
>>>         | optimizations passes.| No requirements for all passes to
>>> maintain
>>>         | directive and clause | large set of metadata with values.
>>>         | names use metadata   |
>>>         | strings.             |
>>>
--------+----------------------+-----------------------------------------------
>>>
>>>
>>> Regarding (a), LLVM already uses metadata for certain loop
information
>>> (e.g.
>>> annotations directing loop transformations and assertions about
>>> loop-carried
>>> dependencies), but there is no natural or consistent way to extend
this
>>> scheme
>>> to represent necessary data-movement or region information.
>>>
>>>
>>> New Intrinsics for Region and Value Annotations
>>> =============================================>>> The
following new (experimental) intrinsics are proposed which allow:
>>>
>>> a) Annotating a code region marked with directives / pragmas,
>>> b) Annotating values associated with the region (or loops), that
is,
>>> those
>>>    values associated with directives / pragmas.
>>> c) Providing information on LLVM IR transformations needed for the
>>> annotated
>>>    code regions (or loops).
>>>
>>> These can be used both by frontends and also by transformation
passes
>>> (e.g.
>>> automated parallelization). The names used here are similar to
those
>>> used by
>>> our internal prototype, but obviously we expect a community
bikeshed
>>> discussion.
>>>
>>> def int_experimental_directive : Intrinsic<[],
[llvm_metadata_ty],
>>>                                    [IntrArgMemOnly],
>>> "llvm.experimental.directive">;
>>>
>>> def int_experimental_dir_qual : Intrinsic<[],
[llvm_metadata_ty],
>>> [IntrArgMemOnly],
>>> "llvm.experimental.dir.qual">;
>>>
>>> def int_experimental_dir_qual_opnd : Intrinsic<[],
>>> [llvm_metadata_ty, llvm_any_ty],
>>> [IntrArgMemOnly],
>>> "llvm.experimental.dir.qual.opnd">;
>>>
>>> def int_experimental_dir_qual_opndlist : Intrinsic<
>>>                                         [],
>>> [llvm_metadata_ty, llvm_vararg_ty],
>>> [IntrArgMemOnly],
>>> "llvm.experimental.dir.qual.opndlist">;
>>>
>>> Note that calls to these intrinsics might need to be annotated with
the
>>> convergent attribute when they represent fork/join operations,
barriers,
>>> and
>>> similar.
>>>
>>> Usage Examples
>>> =============>>>
>>> This section shows a few examples using these experimental
intrinsics.
>>> LLVM developers who will use these intrinsics can defined their own
>>> MDstring.
>>> All details of using these intrinsics on representing OpenMP 4.5
>>> constructs are described in [1][3].
>>>
>>>
>>> Example I: An OpenMP combined construct
>>>
>>> #pragma omp target teams distribute parallel for simd
>>>   loop
>>>
>>> LLVM IR
>>> -------
>>> call void @llvm.experimental.directive(metadata !0)
>>> call void @llvm.experimental.directive(metadata !1)
>>> call void @llvm.experimental.directive(metadata !2)
>>> call void @llvm.experimental.directive(metadata !3)
>>>   loop
>>> call void @llvm.experimental.directive(metadata !6)
>>> call void @llvm.experimental.directive(metadata !5)
>>> call void @llvm.experimental.directive(metadata !4)
>>>
>>> !0 = metadata !{metadata !DIR.OMP.TARGET}
>>> !1 = metadata !{metadata !DIR.OMP.TEAMS}
>>> !2 = metadata !{metadata !DIR.OMP.DISTRIBUTE.PARLOOP.SIMD}
>>>
>>> !6 = metadata !{metadata !DIR.OMP.END.DISTRIBUTE.PARLOOP.SIMD}
>>> !5 = metadata !{metadata !DIR.OMP.END.TEAMS}
>>> !4 = metadata !{metadata !DIR.OMP.END.TARGET}
>>>
>>> Example II: Assume x,y,z are int variables, and s is a non-POD
variable.
>>>             Then, lastprivate(x,y,s,z) is represented as:
>>>
>>> LLVM IR
>>> -------
>>> call void @llvm.experimental.dir.qual.opndlist(
>>>                 metadata !1, %x, %y, metadata !2, %a, %ctor, %dtor,
%z)
>>>
>>> !1 = metadata !{metadata !QUAL.OMP.PRIVATE}
>>> !2 = metadata !{metadata !QUAL.OPND.NONPOD}
>>>
>>> Example III: A prefetch pragma example
>>>
>>> // issue vprefetch1 for xp with a distance of 20 vectorized
iterations
>>> ahead
>>> // issue vprefetch0 for yp with a distance of 10 vectorized
iterations
>>> ahead
>>> #pragma prefetch x:1:20 y:0:10
>>> for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
>>>
>>> LLVM IR
>>> -------
>>> call void @llvm.experimental.directive(metadata !0)
>>> call void @llvm.experimental.dir.qual.opnslist(metadata !1, %xp, 1,
20,
>>>                                                metadata !1, %yp, 0,
10)
>>>   loop
>>> call void @llvm.experimental.directive(metadata !3)
>>>
>>> References
>>> =========>>>
>>> [1] LLVM Framework and IR extensions for Parallelization, SIMD
>>> Vectorization
>>>     and Offloading Support. SC'2016 LLVM-HPC3 Workshop. (Xinmin
Tian
>>> et.al.)
>>>     Saltlake City, Utah.
>>>
>>> [2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and
outer
>>> loop
>>>     auto-vectorization. (Hideki Saito, et.al.) LLVM Developers'
Meeting
>>> 2016,
>>>     San Jose.
>>>
>>> [3] Intrinsics, Metadata, and Attributes: The Story continues! (Hal
>>> Finkel)
>>>     LLVM Developers' Meeting, 2016. San Jose
>>>
>>> [4] LLVM Intrinsic Function and Metadata String Interface for
Directive
>>> (or
>>>     Pragmas) Representation. Specification Draft v0.9, Intel
>>> Corporation, 2016.
>>>
>>>
>>> Acknowledgements
>>> ===============>>> We would like to thank Chandler Carruth
(Google), Johannes Doerfert
>>> (Saarland
>>> Univ.), Yaoqing Gao (HuaWei), Michael Wong (Codeplay), Ettore
Tiotto,
>>> Carlo Bertolli, Bardia Mahjour (IBM), and all other LLVM-HPC IR
>>> Extensions WG
>>> members for their constructive feedback on the LLVM framework and
IR
>>> extension
>>> proposal.
>>>
>>> Proposed Implementation
>>> ======================>>>
>>> Two sets of patches of supporting these experimental intrinsics and
>>> demonstrate
>>> the usage are ready for community review.
>>>
>>> a) Clang patches that support core OpenMP pragmas using this
approach.
>>> b) W-Region framework patches: CFG restructuring to form
single-entry-
>>>    single-exit work region (W-Region) based on annotations,
Demand-driven
>>>    intrinsic parsing, and WRegionInfo collection and analysis
passes,
>>>    Dump functions of WRegionInfo.
>>>
>>> On top of this functionality, we will provide the transformation
patches
>>> for
>>> core OpenMP constructs (e.g. start with "#pragma omp parallel
for" loop
>>> for
>>> lowering and outlining, and "#pragma omp simd" to hook it
up with
>>> LoopVectorize.cpp). We have internal implementations for many
constructs
>>> now.
>>> We will break this functionality up to create a series of patches
for
>>> community review.
>>>
>>> --
>>> Hal Finkel
>>> Lead, Compiler Technology and Programming Languages
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170111/553e3044/attachment.html>

Mehdi Amini via llvm-dev

2017-Jan-12 04:13 UTC

head link

[llvm-dev] [RFC] IR-level Region Annotations

> On Jan 11, 2017, at 3:51 PM, Reid Kleckner via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> +1, tokens are the current True Way to create single-entry multi-exit
regions. Your example for an annotated loop would look like:
> 
> %region = call token @llvm.openmp.regionstart(metadata ...) ; whatever
parameters you need here
>   loop
> call void @llvm.openmp.regionend(token %region)
> 
> If you use tokens, I would recommend proposal (c), where you introduce new
intrinsics for every new kind of region, instead of adding one overly generic
set of region intrinsics.
Can you elaborate why? I’m curious.

Thanks,

— 
Mehdi

> 
> We already have a way to form regions with real barriers, and it's
tokens.
> 
> On Wed, Jan 11, 2017 at 2:17 PM, David Majnemer via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> FWIW, we needed to maintain single entry-multiple exit regions for WinEH
and we accomplished it via a different mechanism.
> 
> We had an instruction which produces a value of type Token
(http://llvm.org/docs/LangRef.html#token-type
<http://llvm.org/docs/LangRef.html#token-type>) which let us establish the
region and another instruction to exit the region by consuming it. The dominance
rules allowed us to avoid situations where the compiler might trash the regions
in weird ways and made sure that regions would be left unharmed.
> 
> AFAIK, a similar approach using Token could work here. I think it would
reduce the amount of stuff you'd need LLVM to maintain.
> 
> 
> On Wed, Jan 11, 2017 at 2:02 PM, Hal Finkel via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> A Proposal for adding an experimental IR-level region-annotation
infrastructure
>
=============================================================================
> Hal Finkel (ANL) and Xinmin Tian (Intel)
> 
> This is a proposal for adding an experimental infrastructure to support
> annotating regions in LLVM IR, making use of intrinsics and metadata, and
> a generic analysis to allow transformations to easily make use of these
> annotated regions. This infrastructure is flexible enough to support
> representation of directives for parallelization, vectorization, and
> offloading of both loops and more-general code regions. Under this scheme,
> the conceptual distance between source-level directives and the region
> annotations need not be significant, making the incremental cost of
> supporting new directives and modifiers often small. It is not, however,
> specific to those use cases.
> 
> Problem Statement
> ================> There are a series of discussions on LLVM IR
extensions for representing region
> and loop annotations for parallelism, and other user-guided
transformations,
> among both industrial and academic members of the LLVM community.
Increasing
> the quality of our OpenMP implementation is an important motivating use
case,
> but certainly not the only one. For OpenMP in particular, we've
discussed
> having an IR representation for years. Presently, all OpenMP pragmas are
> transformed directly into runtime-library calls in Clang, and outlining
(i.e.
> extracting parallel regions into their own functions to be invoked by the
> runtime library) is done in Clang as well. Our implementation does not
further
> optimize OpenMP constructs, and a lot of thought has been put into how we
might
> improve this. For some optimizations, such as redundant barrier removal, we
> could use a TargetLibraryInfo-like mechanism to recognize
frontend-generated
> runtime calls and proceed from there. Dealing with cases where we lose
> pointer-aliasing information, information on loop bounds, etc. we could
improve
> by improving our inter-procedural-analysis capabilities. We should do that
> regardless. However, there are important cases where the underlying scheme
we
> want to use to lower the various parallelism constructs, especially when
> targeting accelerators, changes depending on what is in the parallel
region.
> In important cases where we can see everything (i.e. there aren't
arbitrary
> external calls), code generation should proceed in a way that is very
different
> from the general case. To have a sensible implementation, this must be done
> after inlining. When using LTO, this should be done during the link-time
phase.
> As a result, we must move away from our purely-front-end based lowering
scheme.
> The question is what to do instead, and how to do it in a way that is
generally
> useful to the entire community.
> 
> Designs previously discussed can be classified into four categories:
> 
> (a) Add a large number of new kinds of LLVM metadata, and use them to
annotate
>     each necessary instruction for parallelism, data attributes, etc.
> (b) Add several new LLVM instructions such as, for parallelism, fork,
spawn,
>     join, barrier, etc.
> (c) Add a large number of LLVM intrinsics for directives and clauses, each
>     intrinsic representing a directive or a clause.
> (d) Add a small number of LLVM intrinsics for region or loop annotations,
>     represent the directive/clause names using metadata and the remaining
>     information using arguments.
> 
> Here we're proposing (d), and below is a brief pros and cons analysis
based on
> these discussions and our own experiences of supporting region/loop
annotations
> in LLVM-based compilers. The table below shows a short summary of our
analysis.
> 
> Various commercial compilers (e.g. from Intel, IBM, Cray, PGI), and GCC
[1,2],
> have IR-level representations for parallelism constructs. Based on
experience
> from these previous developments, we'd like a solution for LLVM that
maximizes
> optimization enablement while minimizing the maintenance costs and
complexity
> increase experienced by the community as a whole.
> 
> Representing the desired information in the LLVM IR is just the first step.
The
> challenge is to maintain the desired semantics without blocking useful
> optimizations. With options (c) and (d), dependencies can be preserved
mainly
> based on the use/def chain of the arguments of each intrinsic, and a
manageable
> set LLVM analysis and transformations can be made aware of certain kinds of
> annotations in order to enable specific optimizations. In this regard,
> options (c) and (d) are close with respect to maintenance efforts. However,
> based on our experiences, option (d) is preferable because it is easier to
> extend to support new directives and clauses in the future without the need
to
> add new intrinsics as required by option (c).
> 
> Table 1. Pros/cons summary of LLVM IR experimental extension options
> 
>
--------+----------------------+-----------------------------------------------
> Options |         Pros         | Cons
>
--------+----------------------+-----------------------------------------------
> (a)     | No need to add new   | LLVM passes do not always maintain
metadata.
>         | instructions or      | Need to educate many passes (if not all)
to
>         | new intrinsics       | understand and handle them.
>
--------+----------------------+-----------------------------------------------
> (b)     | Parallelism becomes  | Huge effort for extending all LLVM passes
and
>         | first class citizen  | code generation to support new
instructions.
>         |                      | A large set of information still needs to
be
>         |                      | represented using other means.
>
--------+----------------------+-----------------------------------------------
> (c)     | Less impact on the   | A large number of intrinsics must be
added.
>         | exist LLVM passes.   | Some of the optimizations need to be
>         | Fewer requirements   | educated to understand them.
>         | for passes to        |
>         | maintain metadata.   |
>
--------+----------------------+-----------------------------------------------
> (d)     | Minimal impact on    | Some of the optimizations need to be
>         | existing LLVM        | educated to understand them.
>         | optimizations passes.| No requirements for all passes to maintain
>         | directive and clause | large set of metadata with values.
>         | names use metadata   |
>         | strings.             |
>
--------+----------------------+-----------------------------------------------
> 
> Regarding (a), LLVM already uses metadata for certain loop information
(e.g.
> annotations directing loop transformations and assertions about
loop-carried
> dependencies), but there is no natural or consistent way to extend this
scheme
> to represent necessary data-movement or region information.
> 
> 
> New Intrinsics for Region and Value Annotations
> =============================================> The following new
(experimental) intrinsics are proposed which allow:
> 
> a) Annotating a code region marked with directives / pragmas,
> b) Annotating values associated with the region (or loops), that is, those
>    values associated with directives / pragmas.
> c) Providing information on LLVM IR transformations needed for the
annotated
>    code regions (or loops).
> 
> These can be used both by frontends and also by transformation passes (e.g.
> automated parallelization). The names used here are similar to those used
by
> our internal prototype, but obviously we expect a community bikeshed
> discussion.
> 
> def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty],
>                                    [IntrArgMemOnly],
> "llvm.experimental.directive">;
> 
> def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual">;
> 
> def int_experimental_dir_qual_opnd : Intrinsic<[],
> [llvm_metadata_ty, llvm_any_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual.opnd">;
> 
> def int_experimental_dir_qual_opndlist : Intrinsic<
>                                         [],
> [llvm_metadata_ty, llvm_vararg_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual.opndlist">;
> 
> Note that calls to these intrinsics might need to be annotated with the
> convergent attribute when they represent fork/join operations, barriers,
and
> similar.
> 
> Usage Examples
> =============> 
> This section shows a few examples using these experimental intrinsics.
> LLVM developers who will use these intrinsics can defined their own
MDstring.
> All details of using these intrinsics on representing OpenMP 4.5 constructs
are described in [1][3].
> 
> 
> Example I: An OpenMP combined construct
> 
> #pragma omp target teams distribute parallel for simd
>   loop
> 
> LLVM IR
> -------
> call void @llvm.experimental.directive(metadata !0)
> call void @llvm.experimental.directive(metadata !1)
> call void @llvm.experimental.directive(metadata !2)
> call void @llvm.experimental.directive(metadata !3)
>   loop
> call void @llvm.experimental.directive(metadata !6)
> call void @llvm.experimental.directive(metadata !5)
> call void @llvm.experimental.directive(metadata !4)
> 
> !0 = metadata !{metadata !DIR.OMP.TARGET}
> !1 = metadata !{metadata !DIR.OMP.TEAMS}
> !2 = metadata !{metadata !DIR.OMP.DISTRIBUTE.PARLOOP.SI
<http://dir.omp.distribute.parloop.si/>MD}
> 
> !6 = metadata !{metadata !DIR.OMP.END.DISTRIBUTE.PARLOOP.SIMD}
> !5 = metadata !{metadata !DIR.OMP.END.TEAMS}
> !4 = metadata !{metadata !DIR.OMP.END.TARGET}
> 
> Example II: Assume x,y,z are int variables, and s is a non-POD variable.
>             Then, lastprivate(x,y,s,z) is represented as:
> 
> LLVM IR
> -------
> call void @llvm.experimental.dir.qual.opndlist(
>                 metadata !1, %x, %y, metadata !2, %a, %ctor, %dtor, %z)
> 
> !1 = metadata !{metadata !QUAL.OMP.PRIVATE}
> !2 = metadata !{metadata !QUAL.OPND.NONPOD}
> 
> Example III: A prefetch pragma example
> 
> // issue vprefetch1 for xp with a distance of 20 vectorized iterations
ahead
> // issue vprefetch0 for yp with a distance of 10 vectorized iterations
ahead
> #pragma prefetch x:1:20 y:0:10
> for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
> 
> LLVM IR
> -------
> call void @llvm.experimental.directive(metadata !0)
> call void @llvm.experimental.dir.qual.opnslist(metadata !1, %xp, 1, 20,
>                                                metadata !1, %yp, 0, 10)
>   loop
> call void @llvm.experimental.directive(metadata !3)
> 
> References
> =========> 
> [1] LLVM Framework and IR extensions for Parallelization, SIMD
Vectorization
>     and Offloading Support. SC'2016 LLVM-HPC3 Workshop. (Xinmin Tian
et.al <http://et.al/>.)
>     Saltlake City, Utah.
> 
> [2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer
loop
>     auto-vectorization. (Hideki Saito, et.al <http://et.al/>.) LLVM
Developers' Meeting 2016,
>     San Jose.
> 
> [3] Intrinsics, Metadata, and Attributes: The Story continues! (Hal Finkel)
>     LLVM Developers' Meeting, 2016. San Jose
> 
> [4] LLVM Intrinsic Function and Metadata String Interface for Directive (or
>     Pragmas) Representation. Specification Draft v0.9, Intel Corporation,
2016.
> 
> 
> Acknowledgements
> ===============> We would like to thank Chandler Carruth (Google),
Johannes Doerfert (Saarland
> Univ.), Yaoqing Gao (HuaWei), Michael Wong (Codeplay), Ettore Tiotto,
> Carlo Bertolli, Bardia Mahjour (IBM), and all other LLVM-HPC IR Extensions
WG
> members for their constructive feedback on the LLVM framework and IR
extension
> proposal.
> 
> Proposed Implementation
> ======================> 
> Two sets of patches of supporting these experimental intrinsics and
demonstrate
> the usage are ready for community review.
> 
> a) Clang patches that support core OpenMP pragmas using this approach.
> b) W-Region framework patches: CFG restructuring to form single-entry-
>    single-exit work region (W-Region) based on annotations, Demand-driven
>    intrinsic parsing, and WRegionInfo collection and analysis passes,
>    Dump functions of WRegionInfo.
> 
> On top of this functionality, we will provide the transformation patches
for
> core OpenMP constructs (e.g. start with "#pragma omp parallel
for" loop for
> lowering and outlining, and "#pragma omp simd" to hook it up with
> LoopVectorize.cpp). We have internal implementations for many constructs
now.
> We will break this functionality up to create a series of patches for
> community review.
> 
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170111/92094bd5/attachment-0001.html>

Reid Kleckner via llvm-dev

2017-Jan-13 00:20 UTC

head link

[llvm-dev] [RFC] IR-level Region Annotations

On Wed, Jan 11, 2017 at 8:13 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:>
> Can you elaborate why? I’m curious.
>
The con of proposal c was that many passes would need to learn about many
region intrinsics. With tokens, you only need to teach all passes about
tokens, which they should already know about because WinEH and other things
use them.

With tokens, we can add as many region-introducing intrinsics as makes
sense without any additional cost to the middle end. We don't need to make
one omnibus region intrinsic set that describes every parallel loop
annotation scheme supported by LLVM. Instead we would factor things
according to other software design considerations.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170112/e686b7ca/attachment.html>

llvm dev - Jan 2017 - [RFC] IR-level Region Annotations

[llvm-dev] [RFC] IR-level Region Annotations

[llvm-dev] [RFC] IR-level Region Annotations

[llvm-dev] [RFC] IR-level Region Annotations

[llvm-dev] [RFC] IR-level Region Annotations