Julia and OpenMP 4.0 have features where the user can bless a loop as having no memory dependences that prevent vectorization, thus enabling vectorization of loops not amenable to compile-time or run-time dependence analysis. LLVM currently has no metadata to express such, as explained further below. I'd like to propose new metadata that enables front-ends to tell the vectorizer that "memory dependences are not a problem for vectorization widths up to n". I'd appreciate any comments before I spend time prototyping it. BACKGROUND ---------- OpenMP 4.0 allows the programmer to write something such as: void foo( float* a, int n, int k ) { #pragma omp simd safelen(8) for( int i=0; i<n; ++i ) { a[i] = 2*a[i+k]; } The pragma tells the compiler that the loop is safe to vectorize with a width of 8 or less. (I.e. it's up to the programmer to ensure that either k<=-8 or k>=0.) If safelen is missing, the safe width is unbounded. The example is a trivial one for exposition. Real examples may involve complex subscripting patters or scatter/gather patterns well beyond what is practical to check with a run-time check. Julia has an @simd annotation with a similar purpose, but without a safelen parameter. It's documented as not allowing any dependences, but that's because of the LLVM shortcoming addressed by this proposal. WHY CURRENT METADATA DOES NOT SUFFICE ------------------------------------- There are currently two pieces of metadata that come close, but miss the desired semantics. * llvm.loop.vectorize.width - hints at what vectorization width to use *if* the loop is safe to vectorize. It does not specify that the loop is safe to vectorize. * llvm.mem.parallel_loop_access - indicates that accesses do not have a loop-carried dependence between them. That's too broad a brush, as it precludes loops that do have dependences (e.g. "forward lexical dependences") that are nonetheless vectorizable. PROPOSAL -------- Add new metadata "llvm.loop.vectorize.safelen" that has a parameter n of type i32. The metadata tells the vectorizer that the loop is safe to vectorize with any vectorization width less or equal to n. The loop vectorizer is free to choose any vectorization width within that constraint. - Arch D. Robison Intel Corporation
On 12 August 2014 18:24, Robison, Arch <arch.robison at intel.com> wrote:> Add new metadata "llvm.loop.vectorize.safelen" that has a parameter n > of type i32. The metadata tells the vectorizer that the loop is safe > to vectorize with any vectorization width less or equal to n. The loop > vectorizer is free to choose any vectorization width within that > constraint.Hi Arch, That was the intention, yes, and I believe that was the exact semantics we thought about. This metadata should be applied and kept in the same way as other loop metadata. Arnold or Nadav should know better, though, since they are up-to-date with the current developments. cheers, --renato
Johannes Doerfert
2014-Aug-12 17:49 UTC
[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"
Hello Arch, I very much like the idea of such an annotation, especially since I was looking for the same thing in the recent past. My use case is different from yours, thus it might provide a second reason to support this. I recently submitted a patch to the list [1] which would allow Polly to extract the dependency distance for each analyzable loop. While the distance is often not constant but parametric we would also need to version the vectorized loop based on the actual runtime values. However, the versioning doesn't need to be done by the vectorizer but could also be part of Polly (depending on whether or not the vectorizer will have that capability). I admit that we would need a good heuristic in order to turn this feature on as a default optimization, but I will work on that in the near future too. Best regards, Johannes [1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140804/230137.html On 08/12, Robison, Arch wrote:> Julia and OpenMP 4.0 have features where the user can bless a loop as > having no memory dependences that prevent vectorization, thus enabling > vectorization of loops not amenable to compile-time or run-time > dependence analysis. LLVM currently has no metadata to express such, > as explained further below. > > I'd like to propose new metadata that enables front-ends to tell the > vectorizer that "memory dependences are not a problem for vectorization > widths up to n". I'd appreciate any comments before I spend time > prototyping it. > > BACKGROUND > ---------- > > OpenMP 4.0 allows the programmer to write something such as: > > void foo( float* a, int n, int k ) { > #pragma omp simd safelen(8) > for( int i=0; i<n; ++i ) { > a[i] = 2*a[i+k]; > } > > The pragma tells the compiler that the loop is safe to vectorize with > a width of 8 or less. (I.e. it's up to the programmer to ensure that > either k<=-8 or k>=0.) If safelen is missing, the safe width is > unbounded. The example is a trivial one for exposition. Real examples > may involve complex subscripting patters or scatter/gather patterns > well beyond what is practical to check with a run-time check. > > Julia has an @simd annotation with a similar purpose, but without a > safelen parameter. It's documented as not allowing any dependences, > but that's because of the LLVM shortcoming addressed by this proposal. > > WHY CURRENT METADATA DOES NOT SUFFICE > ------------------------------------- > > There are currently two pieces of metadata that come close, but miss the > desired semantics. > > * llvm.loop.vectorize.width - hints at what vectorization width to use > *if* the loop is safe to vectorize. It does not specify that the > loop is safe to vectorize. > > * llvm.mem.parallel_loop_access - indicates that accesses do not > have a loop-carried dependence between them. That's too broad a > brush, as it precludes loops that do have dependences (e.g. "forward > lexical dependences") that are nonetheless vectorizable. > > PROPOSAL > -------- > > Add new metadata "llvm.loop.vectorize.safelen" that has a parameter n > of type i32. The metadata tells the vectorizer that the loop is safe > to vectorize with any vectorization width less or equal to n. The loop > vectorizer is free to choose any vectorization width within that > constraint. > > - Arch D. Robison > Intel Corporation > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- Johannes Doerfert Researcher / PhD Student Compiler Design Lab (Prof. Hack) Saarland University, Computer Science Building E1.3, Room 4.26 Tel. +49 (0)681 302-57521 : doerfert at cs.uni-saarland.de Fax. +49 (0)681 302-3065 : http://www.cdl.uni-saarland.de/people/doerfert -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 213 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140812/6d2a8548/attachment.sig>
On 12 August 2014 18:49, Johannes Doerfert <doerfert at cs.uni-saarland.de> wrote:> I recently submitted a patch to the list [1] which would allow Polly to > extract the dependency distance for each analyzable loop. While the > distance is often not constant but parametric we would also need to > version the vectorized loop based on the actual runtime values.Since this is a metadata node, we can actually add it with whatever we want, from an integer constant to whatever we want. But of course, complexity is an issue. I believe a constant is a good starting point. Remember that this annotation is saying that "the loop *as it is* is safe in a N vector", but things can change between the annotation (generally source code pragmas, but could be a Polly thing) and actual vectorization. This is a game that the user must be ready to play to use these advanced features and Polly has to be extremely conservative (since the user is *not* directly recommending safety boundaries), but can also allow some extra room (similar to -ffast-math, we could do -ffast-polly if needed),> However, > the versioning doesn't need to be done by the vectorizer but could also > be part of Polly (depending on whether or not the vectorizer will have > that capability). I admit that we would need a good heuristic in order > to turn this feature on as a default optimization, but I will work on > that in the near future too.The original idea for vectorization annotations was to help multiple passes to communicate. Initially, it was used between two passes of the vectorizer (so it wouldn't vectorize the same loop again) but later was used as a vehicle to source code pragmas. Though, the idea of Polly sharing the annotation space was in it from the beginning, and I think we'll come up with a lot more metadata than just wide/safe. Arnold had some slides about it. cheers, --renato
Arnold Schwaighofer
2014-Aug-12 21:44 UTC
[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"
> On Aug 12, 2014, at 10:24 AM, Robison, Arch <arch.robison at intel.com> wrote: > > Julia and OpenMP 4.0 have features where the user can bless a loop as > having no memory dependences that prevent vectorization, thus enabling > vectorization of loops not amenable to compile-time or run-time > dependence analysis. LLVM currently has no metadata to express such, > as explained further below. > > I'd like to propose new metadata that enables front-ends to tell the > vectorizer that "memory dependences are not a problem for vectorization > widths up to n". I'd appreciate any comments before I spend time > prototyping it. >In general, I think this is a useful addition. We just have to get the semantics nailed down. Is this annotation - if present - meant as a restriction to accesses marked with “llvm.mem.parallel_loop_access”? - That there is no loop carried dependence at a |distance| < k but there might be one at >= k between marked accesses. Thanks, Arnold
On 12 August 2014 22:44, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:> Is this annotation - if present - meant as a restriction to accesses marked with “llvm.mem.parallel_loop_access”? - That there is no loop carried dependence at a |distance| < k but there might be one at >= k between marked accesses.Hum, indeed, Polly and the vectorizer should be careful when adding/changing annotations on loops that the user has already annotated, or we run the risk of trying to be smarter than the user and getting it wrong. If I remember correctly, the safelen semantics was just a hint to the validation that, despite its lack of knowledge, the loop was valid at length N, so that we could skip directly to the cost model. But it wasn't intended to force any particular width. cheers, --renato
> > WHY CURRENT METADATA DOES NOT SUFFICE > ------------------------------------- > > There are currently two pieces of metadata that come close, but miss the > desired semantics. > > * llvm.loop.vectorize.width - hints at what vectorization width to use > *if* the loop is safe to vectorize. It does not specify that the > loop is safe to vectorize. > > * llvm.mem.parallel_loop_access - indicates that accesses do not > have a loop-carried dependence between them. That's too broad a > brush, as it precludes loops that do have dependences (e.g. "forward > lexical dependences") that are nonetheless vectorizable. >How does this relate to the recent additions by Hal on invariants using llvm.assume? [0] Can we translate llvm.mem.vectorize.safelen into an invariant on k similarly as to what you're proposing that the programmer should ensure? Cheers, Roel [0] http://comments.gmane.org/gmane.comp.compilers.llvm.devel/74941
----- Original Message -----> From: "Roel Jordans" <r.jordans at tue.nl> > To: llvmdev at cs.uiuc.edu > Sent: Wednesday, August 13, 2014 5:57:15 AM > Subject: Re: [LLVMdev] Proposal for ""llvm.mem.vectorize.safelen" > > > > > WHY CURRENT METADATA DOES NOT SUFFICE > > ------------------------------------- > > > > There are currently two pieces of metadata that come close, but > > miss the > > desired semantics. > > > > * llvm.loop.vectorize.width - hints at what vectorization width to > > use > > *if* the loop is safe to vectorize. It does not specify that > > the > > loop is safe to vectorize. > > > > * llvm.mem.parallel_loop_access - indicates that accesses do not > > have a loop-carried dependence between them. That's too broad a > > brush, as it precludes loops that do have dependences (e.g. > > "forward > > lexical dependences") that are nonetheless vectorizable. > > > > How does this relate to the recent additions by Hal on invariants > using > llvm.assume? [0]I don't think this related because the assumptions don't provide any direct way of asserting things about memory aliasing. That might be an interesting thing to do, but we've not really thought about it yet. Regarding the proposal, I'm in favor. I don't like using the name 'savelen' however. I can forgive OpenMP for choosing such a short name because people need to type it, but I'd prefer that the metadata have a more-descriptive name. minimum_dependency_distance is perhaps better. -Hal> > Can we translate llvm.mem.vectorize.safelen into an invariant on k > similarly as to what you're proposing that the programmer should > ensure? > > Cheers, > Roel > > [0] http://comments.gmane.org/gmane.comp.compilers.llvm.devel/74941 > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory