thr3ads.net - llvm dev - [LLVMdev] Proposal for ""llvm.mem.vectorize.safelen" [Aug 2014]

If this information is useful, please help other people find it:
Share via:

Hal Finkel

2014-Aug-20 19:32 UTC

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

----- Original Message -----> From: "Renato Golin" <renato.golin at linaro.org>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Arnold Schwaighofer" <aschwaighofer at apple.com>,
"Arch Robison" <arch.robison at intel.com>, "LLVM Dev"
> <llvmdev at cs.uiuc.edu>
> Sent: Wednesday, August 20, 2014 2:21:08 PM
> Subject: Re: [LLVMdev] Proposal for
""llvm.mem.vectorize.safelen"
> 
> On 20 August 2014 20:18, Hal Finkel <hfinkel at anl.gov> wrote:
> > I don't understand. I think that the numbering would need to be
> > specific to the loop id.
> 
> I thought the idea was to add it to every load/store in the loop. If
> this loop then gets fused with another, or inlined/unrolled becoming
> part of another loop, etc., wouldn't those ids conflict?
Concatenation unrolling is an interesting case, we'll need to think about
that. Pure inlining should not be a problem. We don't currently have a loop
fusion transformation, but any such transformation would need to update the
metadata or drop it (but that's true for the loop metadata generally).

 -Hal
> 
> cheers,
> --renato
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Robison, Arch

2014-Aug-20 20:54 UTC

head link

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

> I thought the idea was to add it to every load/store in the loop. If 
> this loop then gets fused with another, or inlined/unrolled becoming 
> part of another loop, etc., wouldn't those ids conflict?
Handling inlining correctly would seem to require that front-ends mark the 
lexical position of call sites as well as memory accesses, so that inlined
accesses could be given their proper context information.  The rule
applies recursively down the call chain from a simd loop.  If an inliner 
encounters an unmarked access, it inlines it as unmarked, causing 
vectorization to fail safe if dependence analysis can't figure it out.

If the lexical position markers were conceptually strings of integers,
the inlining bookkeeping might be simpler.  E.g., given a call site
with position string A, and a callee instruction with position string
B, the inlined instruction would have string AB.  A list
representation of the strings (with shared tails) could keep the memory
requirements down.

The conceptual string representation might simplify bookkeeping for
concatenation unrolling too, but I haven't worked out the details.

- Arch D. Robison
  Intel Corporation

Robison, Arch

2014-Aug-20 22:53 UTC

head link

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

> Concatenation unrolling is an interesting case, we'll need to think
about that.
I played with unrolling on paper.  With per-access order information,
concatenation unrolling works fine as long as it copies the order information,
in the sense that the resulting order information correctly indicates that the
loop is no longer trivially vectorizable.

For example, consider a loop body with two accesses per iteration:

    #pragma omp simd
    for(int i=0; i<4; ++i) {
        ...access Xi...
        ...access Yi...
     }

Preservation of forward lexical dependencies requires that for two iterations j
and k with j<k, we have to guarantee that Xj happens before Xk.  Before
unrolling, the accesses will be marked with positions A and B, with A<B.  Now
partially unroll by a factor of 2.  We have (please pardon the informal
notation):

    for(int i=0; i<4; i+=2 ) {
         ...access Xi...marked with A
         ...access Yi...marked with B
         ...access X(i+1)...marked with A
         ...access Y(i+1)...marked with B    }
    }

Now there is an out-of-order sequence in the middle of ABAB, so the vectorizer
has to punt.  But that's good, because the sequence really is no longer
trivially safe to vectorize since it would not preserve the constraint "X1
must happen before Y2" that was present in the original code.

- Arch D. Robison
  Intel Corporation

Robison, Arch

2014-Aug-21 19:17 UTC

head link

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

Here's an attempt to nail down the annotation semantics with support for
respecting forward lexical dependences.

Each load, store, call, or invoke instruction can be labeled with
!llvm.mem.vector_loop_access, which has two operands:
* The first operand is an integer denoting lexical position.  The positions need
not be consecutive, and may contain duplicates.
* The second operand is the same as the first operand to
llvm.mem.parallel_loop_access.  It's second so that it can be omitted - see
mention of inlining further below.
The LoopID can have "llvm.loop.safelen" metadata.

Here is an example with three accesses with positions {10, 15 17} and a safelen
of 42.

    define void @foo(float* %a, float* %b) {
    entry:
      br label %for.body

    for.body:                                         ; preds = %for.body,
%entry
      ...
      %0 = load float* %arrayidx, !llvm.mem.vector_loop_access !{metadata i32
10, !0}
      ...
      %1 = load float* %arrayidx2, !llvm.mem.vector_loop_access !{metadata i32
15, !0}
      ...
      store float %add3, float* %arrayidx5, !llvm.mem.vector_loop_access
!{metadata i32 17, !0}
      ...
      br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

    for.end:                                          ; preds = %for.body
      ret void
    }

    !0 = metadata !{metadata !0, metadata !1}
    !1 = metadata !{metadata !"llvm.loop.safelen", i32 42}

Let lex(x) denote the lexical position metadata for access x.  If two accesses A
and B:
* Are marked with llvm.mem.vector_loop_access that reference a loop L AND
* lex(A)<lex(B) AND
* L has llvm.loop.safelen with value K
THEN for loop L, the dependence distance from B to A is at least K iterations.

When llvm.mem.vector_loop_access is used on a call/invoke instruction, any
accesses therein inherit that lexical position.

Open issue: when inlining a callee with more than one memory access, the
accesses will end up with the same lexical position, and thus lose the
dependence distance clue.  I don't know how often this drawback would show
up in practice.  A possibility is to allow the callee instructions to be
annotated with a !llvm.mem.vector_loop_access that omits the LoopId operand,
i.e. just has lexical position information.  Then inlining could be more clever.

- Arch Robison
  Intel Corporation

llvm dev - Aug 2014 - [LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"