On 03/03/2013 06:43 PM, Tobias Grosser wrote:> Very good example, indeed. Is there a formal definition of what
> #pragma ivdeps means? I see two options here:
In the previous discussion we could not find a proper
definition for #pragma ivdep so we concluded we can treat it
as a statement of "treat the loop as parallel, I do not expect
any dependency checking by the compiler", thus what the
llvm.loop.parallel is now defined to denote.
> 1) No memory based dependences at all
>
> We assume all t[*] allocations point to the same memory
> location. By defining #pragma ivdep the user states that there
> are no memory dependences caused by the t[*] array. In this case
> the example above would be an invalid use of '#pragma ivdep'.
>
> The right thing here would be to annotate all loads/stores with the
llvm.mem.*
> intrinsics.
Yes, it's a matter of definition, but I do not think it's an intuitive
interpretation to assume t is shared across iterations. Otherwise,
the programmer would not have marked the loop as parallel or would
have reused the same array across the whole loop.
> 2) Memory based dependences for private arrays allowed
>
> We assume there will be different instances of t[*], hence there can
> not be memory dependences along the t dimension. This is a valid use
> of '#pragma ivdep' and the compiler can only vectorize the loop if
it
> really creates different instances of t.
>
> In this case, we can only annotate loads/stores to t[*] if we ensure each
> iteration of i will access a different array t[*]. If there is just a
single
> memory locations for t[*] which is shared by all loop iterations, we must
not
> annotate loads/stores to t[*].
I think this leads to the need to replicate the private data for
each iteration, thus similar handling that is done in pocl's work-group
generation.
>> In your example where you moved t outside the loop
>> it's a programmer's mistake (icc might vectorize it but the
>> results are undefined due to the dependency).
>
> Are you sure about this? How do you come to the conclusion? Is there some
icc
> documentation? I am very unsure about the semantics of #pragma ivdeps. Your
> interpretation makes sense, but I could also imagine that a compiler is
expected
> to always resolve / understand dependences on scalar variables. Do we have
any
> example where a compiler miscompiles code due to scalar dependences that it
> ignored after #pragma ivdep was added?
What was unclear in the previous discussion about #pragma ivdep is the
expected set of dependency analysis done by the compiler (ivdepignore *assumed*
vector dependencies). So either the icc can be assumed
to perform privatization of variables (here t), or the result is undefined
(it leaves the scalar dependency and vectorizes regardless). You are
probably right it performs (and the programmer can assume) the former.
> > but here I don't
>> think it is. The t array is supposed to be a loop-private variable,
>> and each parallel iteration refer to their own isolated instance.
>
> Again, I can follow this intuition. However, it would be good to formally
> document the behavior (and to understand and choose a behavior according to
how
> other compilers interpret #pragma ivdep). Also, if we follow your
interpretation
> and if clang currently does not make the t array loop private, it would be
> incorrect to attach meta-data to loads and stores that reference the t
array.
> This last point makes me actually
> think your interpretation may be difficult to implement. Is it in all cases
> possible to figure out if a memory access accesses a loop private array?
Essentially it would require treating private variables (arrays or not)
in parallel loops differently, by "privatizing them" (create the
arrays)
already during Clang code gen. I'm not familiar with Clang enough to know
how
much work this would involve. For now, it probably makes sense to simply not
annotate alloca accesses at all and rely on later optimizations to privatize
the added dependencies in one way or another.
--
--Pekka