thr3ads.net - llvm dev - [LLVMdev] parallel loop metadata simplification [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Pekka Jääskeläinen

2013-Mar-03 17:49 UTC

[LLVMdev] parallel loop metadata simplification

On 03/03/2013 06:43 PM, Tobias Grosser wrote:> Very good example, indeed. Is there a formal definition of what
> #pragma ivdeps means? I see two options here:
In the previous discussion we could not find a proper
definition for #pragma ivdep so we concluded we can treat it
as a statement of "treat the loop as parallel, I do not expect
any dependency checking by the compiler", thus what the
llvm.loop.parallel is now defined to denote.
> 1) No memory based dependences at all
>
> We assume all t[*] allocations point to the same memory
> location. By defining #pragma ivdep the user states that there
> are no memory dependences caused by the t[*] array. In this case
> the example above would be an invalid use of '#pragma ivdep'.
>
> The right thing here would be to annotate all loads/stores with the
llvm.mem.*
> intrinsics.
Yes, it's a matter of definition, but I do not think it's an intuitive
interpretation to assume t is shared across iterations. Otherwise,
the programmer would not have marked the loop as parallel or would
have reused the same array across the whole loop.
> 2) Memory based dependences for private arrays allowed
>
> We assume there will be different instances of t[*], hence there can
> not be memory dependences along the t dimension. This is a valid use
> of '#pragma ivdep' and the compiler can only vectorize the loop if
it
> really creates different instances of t.
>
> In this case, we can only annotate loads/stores to t[*] if we ensure each
> iteration of i will access a different array t[*]. If there is just a
single
> memory locations for t[*] which is shared by all loop iterations, we must
not
> annotate loads/stores to t[*].
I think this leads to the need to replicate the private data for
each iteration, thus similar handling that is done in pocl's work-group
generation.
>> In your example where you moved t outside the loop
>> it's a programmer's mistake (icc might vectorize it but the
>> results are undefined due to the dependency).
>
> Are you sure about this? How do you come to the conclusion? Is there some
icc
> documentation? I am very unsure about the semantics of #pragma ivdeps. Your
> interpretation makes sense, but I could also imagine that a compiler is
expected
> to always resolve / understand dependences on scalar variables. Do we have
any
> example where a compiler miscompiles code due to scalar dependences that it
> ignored after #pragma ivdep was added?
What was unclear in the previous discussion about #pragma ivdep is the
expected set of dependency analysis done by the compiler (ivdepignore *assumed*
vector dependencies). So either the icc can be assumed
to perform privatization of variables (here t), or the result is undefined
(it leaves the scalar dependency and vectorizes regardless). You are
probably right it performs (and the programmer can assume) the former.
>  >  but here I don't
>> think it is. The t array is supposed to be a loop-private variable,
>> and each parallel iteration refer to their own isolated instance.
>
> Again, I can follow this intuition. However, it would be good to formally
> document the behavior (and to understand and choose a behavior according to
how
> other compilers interpret #pragma ivdep). Also, if we follow your
interpretation
> and if clang currently does not make the t array loop private, it would be
> incorrect to attach meta-data to loads and stores that reference the t
array.
> This last point makes me actually
> think your interpretation may be difficult to implement. Is it in all cases
> possible to figure out if a memory access accesses a loop private array?
Essentially it would require treating private variables (arrays or not)
in parallel loops differently, by "privatizing them" (create the
arrays)
already during Clang code gen. I'm not familiar with Clang enough to know
how
much work this would involve. For now, it probably makes sense to simply not
annotate alloca accesses at all and rely on later optimizations to privatize
the added dependencies in one way or another.

-- 
--Pekka

Hal Finkel

2013-Mar-03 19:58 UTC

head link

[LLVMdev] parallel loop metadata simplification

----- Original Message -----> From: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi>
> To: "Tobias Grosser" <tobias at grosser.es>
> Cc: llvmdev at cs.uiuc.edu
> Sent: Sunday, March 3, 2013 11:49:03 AM
> Subject: Re: [LLVMdev] parallel loop metadata simplification
> 
> On 03/03/2013 06:43 PM, Tobias Grosser wrote:
> > Very good example, indeed. Is there a formal definition of what
> > #pragma ivdeps means? I see two options here:
> 
> In the previous discussion we could not find a proper
> definition for #pragma ivdep so we concluded we can treat it
> as a statement of "treat the loop as parallel, I do not expect
> any dependency checking by the compiler", thus what the
> llvm.loop.parallel is now defined to denote.
> 
> > 1) No memory based dependences at all
> >
> > We assume all t[*] allocations point to the same memory
> > location. By defining #pragma ivdep the user states that there
> > are no memory dependences caused by the t[*] array. In this case
> > the example above would be an invalid use of '#pragma ivdep'.
> >
> > The right thing here would be to annotate all loads/stores with the
> > llvm.mem.*
> > intrinsics.
> 
> Yes, it's a matter of definition, but I do not think it's an
> intuitive
> interpretation to assume t is shared across iterations. Otherwise,
> the programmer would not have marked the loop as parallel or would
> have reused the same array across the whole loop.
> 
> > 2) Memory based dependences for private arrays allowed
> >
> > We assume there will be different instances of t[*], hence there
> > can
> > not be memory dependences along the t dimension. This is a valid
> > use
> > of '#pragma ivdep' and the compiler can only vectorize the
loop if
> > it
> > really creates different instances of t.
> >
> > In this case, we can only annotate loads/stores to t[*] if we
> > ensure each
> > iteration of i will access a different array t[*]. If there is just
> > a single
> > memory locations for t[*] which is shared by all loop iterations,
> > we must not
> > annotate loads/stores to t[*].
> 
> I think this leads to the need to replicate the private data for
> each iteration, thus similar handling that is done in pocl's
> work-group
> generation.
> 
> >> In your example where you moved t outside the loop
> >> it's a programmer's mistake (icc might vectorize it but
the
> >> results are undefined due to the dependency).
> >
> > Are you sure about this? How do you come to the conclusion? Is
> > there some icc
> > documentation? I am very unsure about the semantics of #pragma
> > ivdeps. Your
> > interpretation makes sense, but I could also imagine that a
> > compiler is expected
> > to always resolve / understand dependences on scalar variables. Do
> > we have any
> > example where a compiler miscompiles code due to scalar dependences
> > that it
> > ignored after #pragma ivdep was added?
> 
> What was unclear in the previous discussion about #pragma ivdep is
> the
> expected set of dependency analysis done by the compiler (ivdep> ignore
*assumed* vector dependencies). So either the icc can be
> assumed
> to perform privatization of variables (here t), or the result is
> undefined
> (it leaves the scalar dependency and vectorizes regardless). You are
> probably right it performs (and the programmer can assume) the
> former.
> 
> >  >  but here I don't
> >> think it is. The t array is supposed to be a loop-private
> >> variable,
> >> and each parallel iteration refer to their own isolated instance.
> >
> > Again, I can follow this intuition. However, it would be good to
> > formally
> > document the behavior (and to understand and choose a behavior
> > according to how
> > other compilers interpret #pragma ivdep). Also, if we follow your
> > interpretation
> > and if clang currently does not make the t array loop private, it
> > would be
> > incorrect to attach meta-data to loads and stores that reference
> > the t array.
> > This last point makes me actually
> > think your interpretation may be difficult to implement. Is it in
> > all cases
> > possible to figure out if a memory access accesses a loop private
> > array?
> 
> Essentially it would require treating private variables (arrays or
> not)
> in parallel loops differently, by "privatizing them" (create the
> arrays)
> already during Clang code gen. I'm not familiar with Clang enough to
> know how
> much work this would involve. For now, it probably makes sense to
> simply not
> annotate alloca accesses at all and rely on later optimizations to
> privatize
> the added dependencies in one way or another.
I agree; we should only annotate accesses that some from language-level
array/pointer(/reference) accesses. We should then rely on other passes to clean
up (or not) the remainder.

I think that the best way to handle local arrays is to issue a warning when they
occur inside an annotated loop that the local array will probably inhibit
vectorization (I say probably because SROA, etc. may make it go away). We can
think about implementing privatization transformations in the future. Even this
warning can be a follow-up enhancement.

 -Hal
> 
> --
> --Pekka
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Redmond, Paul

2013-Mar-05 17:12 UTC

head link

[LLVMdev] parallel loop metadata simplification

Hi,

On 2013-03-03, at 2:58 PM, Hal Finkel wrote:
>
> I agree; we should only annotate accesses that some from language-level
array/pointer(/reference) accesses. We should then rely on other passes to clean
up (or not) the remainder.
>
> I think that the best way to handle local arrays is to issue a warning when
they occur inside an annotated loop that the local array will probably inhibit
vectorization (I say probably because SROA, etc. may make it go away). We can
think about implementing privatization transformations in the future. Even this
warning can be a follow-up enhancement.
>
> -Hal
>
Attached is my most recent patch for clang. Maybe someone wants to play with it
or has ideas on how to refine the llvm.mem metadata generation.

paul



-------------- next part --------------
A non-text attachment was scrubbed...
Name: ivdep.diff
Type: application/octet-stream
Size: 23634 bytes
Desc: ivdep.diff
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130305/c30cc2fc/attachment.obj>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Mar 2013 - [LLVMdev] parallel loop metadata simplification

[LLVMdev] parallel loop metadata simplification

[LLVMdev] parallel loop metadata simplification

[LLVMdev] parallel loop metadata simplification

Apparently Analagous Threads