Chris Lattner
2012-Oct-03 05:17 UTC
[LLVMdev] [cfe-dev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
On Oct 2, 2012, at 11:42 AM, Hal Finkel <hfinkel at anl.gov> wrote:> As I've stated, whether the metadata is preserved is not really the > relevant metric. It is fine for a pass that does not understand > parallelization metadata to drop it. The important part is that dropping > the metadata, and moving instructions to which that metadata is > attached, must not cause miscompiles. For example: > > - Instructions with unknown side effects or dependencies must not be > moved from outside a parallel region to inside a parallel region. > - Serialized subregions inside of parallel regions cannot be deleted > without deleting the enclosing parallel region. > > The outstanding proposals have ways of dealing with these things. In > the case of my proposal, it is though cross-referencing the metadata > sufficiently and using function boundaries to prevent unwanted code > motion.I haven't looked at your proposal, but I completely agree in principle that using procedure boundaries is a good way to handle this.> In Intel's case, it is by using the barriers implied by the > intrinsics calls.That's just it - intrinsics using metadata don't imply barriers that would restrict code motion. -Chris
James Courtier-Dutton
2012-Oct-03 16:30 UTC
[LLVMdev] [cfe-dev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
On 3 October 2012 06:17, Chris Lattner <clattner at apple.com> wrote:> On Oct 2, 2012, at 11:42 AM, Hal Finkel <hfinkel at anl.gov> wrote: >> As I've stated, whether the metadata is preserved is not really the >> relevant metric. It is fine for a pass that does not understand >> parallelization metadata to drop it. The important part is that dropping >> the metadata, and moving instructions to which that metadata is >> attached, must not cause miscompiles. For example: >> >> - Instructions with unknown side effects or dependencies must not be >> moved from outside a parallel region to inside a parallel region. >> - Serialized subregions inside of parallel regions cannot be deleted >> without deleting the enclosing parallel region. >> >> The outstanding proposals have ways of dealing with these things. In >> the case of my proposal, it is though cross-referencing the metadata >> sufficiently and using function boundaries to prevent unwanted code >> motion. > > I haven't looked at your proposal, but I completely agree in principle that using procedure boundaries is a good way to handle this. > >> In Intel's case, it is by using the barriers implied by the >> intrinsics calls. > > That's just it - intrinsics using metadata don't imply barriers that would restrict code motion. >Would another approach be to work from the bottom up. Determine the details of what optimizations you wish to be able to do on a parallel program and then implement functionallity in the LLVM IR to achieve it. I.e. A new type of barrier to restrict code motion. New barrier types, or special zones where only a subset of machine instructions can be used when lowering. There are already items in the LLVM IR for atomics, adding a new type of barrier might be all that is needed to achieve the optimizations wished for. Kind Regards James
Hal Finkel
2012-Oct-03 18:16 UTC
[LLVMdev] [cfe-dev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
On Wed, 3 Oct 2012 17:30:54 +0100 James Courtier-Dutton <james.dutton at gmail.com> wrote:> On 3 October 2012 06:17, Chris Lattner <clattner at apple.com> wrote: > > On Oct 2, 2012, at 11:42 AM, Hal Finkel <hfinkel at anl.gov> wrote: > >> As I've stated, whether the metadata is preserved is not really the > >> relevant metric. It is fine for a pass that does not understand > >> parallelization metadata to drop it. The important part is that > >> dropping the metadata, and moving instructions to which that > >> metadata is attached, must not cause miscompiles. For example: > >> > >> - Instructions with unknown side effects or dependencies must not > >> be moved from outside a parallel region to inside a parallel > >> region. > >> - Serialized subregions inside of parallel regions cannot be > >> deleted without deleting the enclosing parallel region. > >> > >> The outstanding proposals have ways of dealing with these things. > >> In the case of my proposal, it is though cross-referencing the > >> metadata sufficiently and using function boundaries to prevent > >> unwanted code motion. > > > > I haven't looked at your proposal, but I completely agree in > > principle that using procedure boundaries is a good way to handle > > this. > > > >> In Intel's case, it is by using the barriers implied by the > >> intrinsics calls. > > > > That's just it - intrinsics using metadata don't imply barriers > > that would restrict code motion. > > > > Would another approach be to work from the bottom up. > Determine the details of what optimizations you wish to be able to do > on a parallel program and then implement functionallity in the LLVM IR > to achieve it. > I.e. A new type of barrier to restrict code motion. > New barrier types, or special zones where only a subset of machine > instructions can be used when lowering. > There are already items in the LLVM IR for atomics, adding a new type > of barrier might be all that is needed to achieve the optimizations > wished for.Agreed. Generally speaking, I have two primary requirements: 1. Enabling loop optimizations. To understand the problem, consider the following: #pragma omp parallel for for (int i = 0; i < n; ++i) { if (i > n) do_a(i, n); else do_b(i, n); } Under normal circumstances, the compiler would be able to eliminate the comparisons and simplify the loop. This seems like a silly example, but in the context of expanding C++ templated code, it happens often. If we lower the OpenMP constructs too early, then we loose this ability. This is because such lowering would transform the loop into something like this: void __loop1(__cxt *c) { int start = __loop_get_start(); // uses TLS int end = __loop_get_end(); // uses TLS int n = c->n; for (int i = start; i < end; ++i) { if (i > n) do_a(i, n); else do_b(i, n); } } __loop_in_parallel(__loop1, 0, n); When optimizing the loop inside the __loop1 function, the relationship between 'end' and 'n' has been lost, and there is no way for the compiler to eliminate the comparison in the loop (without a combination of both IPO and a specific understanding of the runtime calls). Implementation of other loop optimizations, like fusion and splitting, also seems to be much more difficult in the presence of early lowering. Proving non-aliasing of pointers in the context structure might also be tricky. 2. Enabling target-specific implementations of underlying concepts, specifically atomics and synchronization, but also thread startup and handling. The former is important on almost all systems, the latter is, for the moment, important on embedded and heterogeneous systems. Doing this without cluttering the frontend with target-specific code would be nice. -Hal> > Kind Regards > > James-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Apparently Analagous Threads
- [LLVMdev] [cfe-dev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
- [LLVMdev] [cfe-dev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
- [LLVMdev] [cfe-dev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
- [LLVMdev] [cfe-dev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
- [LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)