----- Original Message -----> From: "Renato Golin" <renato.golin at linaro.org> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "Arnold Schwaighofer" <aschwaighofer at apple.com>, "Arch Robison" <arch.robison at intel.com>, "LLVM Dev" > <llvmdev at cs.uiuc.edu> > Sent: Wednesday, August 20, 2014 2:21:08 PM > Subject: Re: [LLVMdev] Proposal for ""llvm.mem.vectorize.safelen" > > On 20 August 2014 20:18, Hal Finkel <hfinkel at anl.gov> wrote: > > I don't understand. I think that the numbering would need to be > > specific to the loop id. > > I thought the idea was to add it to every load/store in the loop. If > this loop then gets fused with another, or inlined/unrolled becoming > part of another loop, etc., wouldn't those ids conflict?Concatenation unrolling is an interesting case, we'll need to think about that. Pure inlining should not be a problem. We don't currently have a loop fusion transformation, but any such transformation would need to update the metadata or drop it (but that's true for the loop metadata generally). -Hal> > cheers, > --renato >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
> I thought the idea was to add it to every load/store in the loop. If > this loop then gets fused with another, or inlined/unrolled becoming > part of another loop, etc., wouldn't those ids conflict?Handling inlining correctly would seem to require that front-ends mark the lexical position of call sites as well as memory accesses, so that inlined accesses could be given their proper context information. The rule applies recursively down the call chain from a simd loop. If an inliner encounters an unmarked access, it inlines it as unmarked, causing vectorization to fail safe if dependence analysis can't figure it out. If the lexical position markers were conceptually strings of integers, the inlining bookkeeping might be simpler. E.g., given a call site with position string A, and a callee instruction with position string B, the inlined instruction would have string AB. A list representation of the strings (with shared tails) could keep the memory requirements down. The conceptual string representation might simplify bookkeeping for concatenation unrolling too, but I haven't worked out the details. - Arch D. Robison Intel Corporation
> Concatenation unrolling is an interesting case, we'll need to think about that.I played with unrolling on paper. With per-access order information, concatenation unrolling works fine as long as it copies the order information, in the sense that the resulting order information correctly indicates that the loop is no longer trivially vectorizable. For example, consider a loop body with two accesses per iteration: #pragma omp simd for(int i=0; i<4; ++i) { ...access Xi... ...access Yi... } Preservation of forward lexical dependencies requires that for two iterations j and k with j<k, we have to guarantee that Xj happens before Xk. Before unrolling, the accesses will be marked with positions A and B, with A<B. Now partially unroll by a factor of 2. We have (please pardon the informal notation): for(int i=0; i<4; i+=2 ) { ...access Xi...marked with A ...access Yi...marked with B ...access X(i+1)...marked with A ...access Y(i+1)...marked with B } } Now there is an out-of-order sequence in the middle of ABAB, so the vectorizer has to punt. But that's good, because the sequence really is no longer trivially safe to vectorize since it would not preserve the constraint "X1 must happen before Y2" that was present in the original code. - Arch D. Robison Intel Corporation
Here's an attempt to nail down the annotation semantics with support for respecting forward lexical dependences. Each load, store, call, or invoke instruction can be labeled with !llvm.mem.vector_loop_access, which has two operands: * The first operand is an integer denoting lexical position. The positions need not be consecutive, and may contain duplicates. * The second operand is the same as the first operand to llvm.mem.parallel_loop_access. It's second so that it can be omitted - see mention of inlining further below. The LoopID can have "llvm.loop.safelen" metadata. Here is an example with three accesses with positions {10, 15 17} and a safelen of 42. define void @foo(float* %a, float* %b) { entry: br label %for.body for.body: ; preds = %for.body, %entry ... %0 = load float* %arrayidx, !llvm.mem.vector_loop_access !{metadata i32 10, !0} ... %1 = load float* %arrayidx2, !llvm.mem.vector_loop_access !{metadata i32 15, !0} ... store float %add3, float* %arrayidx5, !llvm.mem.vector_loop_access !{metadata i32 17, !0} ... br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0 for.end: ; preds = %for.body ret void } !0 = metadata !{metadata !0, metadata !1} !1 = metadata !{metadata !"llvm.loop.safelen", i32 42} Let lex(x) denote the lexical position metadata for access x. If two accesses A and B: * Are marked with llvm.mem.vector_loop_access that reference a loop L AND * lex(A)<lex(B) AND * L has llvm.loop.safelen with value K THEN for loop L, the dependence distance from B to A is at least K iterations. When llvm.mem.vector_loop_access is used on a call/invoke instruction, any accesses therein inherit that lexical position. Open issue: when inlining a callee with more than one memory access, the accesses will end up with the same lexical position, and thus lose the dependence distance clue. I don't know how often this drawback would show up in practice. A possibility is to allow the callee instructions to be annotated with a !llvm.mem.vector_loop_access that omits the LoopId operand, i.e. just has lexical position information. Then inlining could be more clever. - Arch Robison Intel Corporation