----- Original Message -----> From: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi>
> To: "Tobias Grosser" <tobias at grosser.es>
> Cc: llvmdev at cs.uiuc.edu
> Sent: Sunday, March 3, 2013 11:49:03 AM
> Subject: Re: [LLVMdev] parallel loop metadata simplification
>
> On 03/03/2013 06:43 PM, Tobias Grosser wrote:
> > Very good example, indeed. Is there a formal definition of what
> > #pragma ivdeps means? I see two options here:
>
> In the previous discussion we could not find a proper
> definition for #pragma ivdep so we concluded we can treat it
> as a statement of "treat the loop as parallel, I do not expect
> any dependency checking by the compiler", thus what the
> llvm.loop.parallel is now defined to denote.
>
> > 1) No memory based dependences at all
> >
> > We assume all t[*] allocations point to the same memory
> > location. By defining #pragma ivdep the user states that there
> > are no memory dependences caused by the t[*] array. In this case
> > the example above would be an invalid use of '#pragma ivdep'.
> >
> > The right thing here would be to annotate all loads/stores with the
> > llvm.mem.*
> > intrinsics.
>
> Yes, it's a matter of definition, but I do not think it's an
> intuitive
> interpretation to assume t is shared across iterations. Otherwise,
> the programmer would not have marked the loop as parallel or would
> have reused the same array across the whole loop.
>
> > 2) Memory based dependences for private arrays allowed
> >
> > We assume there will be different instances of t[*], hence there
> > can
> > not be memory dependences along the t dimension. This is a valid
> > use
> > of '#pragma ivdep' and the compiler can only vectorize the
loop if
> > it
> > really creates different instances of t.
> >
> > In this case, we can only annotate loads/stores to t[*] if we
> > ensure each
> > iteration of i will access a different array t[*]. If there is just
> > a single
> > memory locations for t[*] which is shared by all loop iterations,
> > we must not
> > annotate loads/stores to t[*].
>
> I think this leads to the need to replicate the private data for
> each iteration, thus similar handling that is done in pocl's
> work-group
> generation.
>
> >> In your example where you moved t outside the loop
> >> it's a programmer's mistake (icc might vectorize it but
the
> >> results are undefined due to the dependency).
> >
> > Are you sure about this? How do you come to the conclusion? Is
> > there some icc
> > documentation? I am very unsure about the semantics of #pragma
> > ivdeps. Your
> > interpretation makes sense, but I could also imagine that a
> > compiler is expected
> > to always resolve / understand dependences on scalar variables. Do
> > we have any
> > example where a compiler miscompiles code due to scalar dependences
> > that it
> > ignored after #pragma ivdep was added?
>
> What was unclear in the previous discussion about #pragma ivdep is
> the
> expected set of dependency analysis done by the compiler (ivdep> ignore
*assumed* vector dependencies). So either the icc can be
> assumed
> to perform privatization of variables (here t), or the result is
> undefined
> (it leaves the scalar dependency and vectorizes regardless). You are
> probably right it performs (and the programmer can assume) the
> former.
>
> > > but here I don't
> >> think it is. The t array is supposed to be a loop-private
> >> variable,
> >> and each parallel iteration refer to their own isolated instance.
> >
> > Again, I can follow this intuition. However, it would be good to
> > formally
> > document the behavior (and to understand and choose a behavior
> > according to how
> > other compilers interpret #pragma ivdep). Also, if we follow your
> > interpretation
> > and if clang currently does not make the t array loop private, it
> > would be
> > incorrect to attach meta-data to loads and stores that reference
> > the t array.
> > This last point makes me actually
> > think your interpretation may be difficult to implement. Is it in
> > all cases
> > possible to figure out if a memory access accesses a loop private
> > array?
>
> Essentially it would require treating private variables (arrays or
> not)
> in parallel loops differently, by "privatizing them" (create the
> arrays)
> already during Clang code gen. I'm not familiar with Clang enough to
> know how
> much work this would involve. For now, it probably makes sense to
> simply not
> annotate alloca accesses at all and rely on later optimizations to
> privatize
> the added dependencies in one way or another.
I agree; we should only annotate accesses that some from language-level
array/pointer(/reference) accesses. We should then rely on other passes to clean
up (or not) the remainder.
I think that the best way to handle local arrays is to issue a warning when they
occur inside an annotated loop that the local array will probably inhibit
vectorization (I say probably because SROA, etc. may make it go away). We can
think about implementing privatization transformations in the future. Even this
warning can be a follow-up enhancement.
-Hal
>
> --
> --Pekka
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>