thr3ads.net - llvm dev - [LLVMdev] Parallel Loop Metadata [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Pekka Jääskeläinen

2013-Feb-08 15:10 UTC

[LLVMdev] Parallel Loop Metadata

On 02/08/2013 04:26 PM, Daniel Berlin wrote:> I'd love to see example cases where the pair analysis is the
> difficulty, rather than the access analysis of any single memory piece
> being the difficulty.
I'm not completely sure what you mean, but is there really a difference
between doing "pair analysis" across multiple iterations of the same
instruction than doing it with any other instruction?

Say, a loop with memory operations Aw, Br, Cw (w=write, r=read).
This is unrolled just for the sake of the example. The same applies to
widening the instructions in a loop vectorizer, similar dependency
analysis is needed.

i1: Aw Br Cw
i2: Aw' Br' Cw'

To do a legal code motion across the two iterations in such a way that
you move e.g., Aw' before Br, in parallel with Aw (e.g. to pack them to a
vector instruction or statically schedule in a VLIW) requires you to know
that none of Aw, Br nor Cw access the same location of Aw'.

If we know these are parallel loop accesses we know that Aw' doesn't
alias with any of Aw, Br or Cw, thus we can perform the code motion
with a peaceful mind, and execute Aw and Aw' in parallel e.g. in a
vector instruction.

If we get some non-parallel-loop-annotated mem instructions injected in
the loop, it's up to the analyzer again to prove the code motion is legal.

For example, the reg2mem case (where Sw is produced by it and not
marked with the llvm.mem.parallel_loop_access MD).

i1: Aw, Sw, Br, Cw
i2: Aw', Sw', Br', Cw'

Cannot do the same code motion here unless the analyzer can prove Aw' and
Sw do not alias.

--
Pekka

Daniel Berlin

2013-Feb-08 15:17 UTC

head link

[LLVMdev] Parallel Loop Metadata

On Fri, Feb 8, 2013 at 10:10 AM, Pekka Jääskeläinen
<pekka.jaaskelainen at tut.fi> wrote:> On 02/08/2013 04:26 PM, Daniel Berlin wrote:
>>
>> I'd love to see example cases where the pair analysis is the
>> difficulty, rather than the access analysis of any single memory piece
>> being the difficulty.
>
>
> I'm not completely sure what you mean, but is there really a difference
> between doing "pair analysis" across multiple iterations of the
same
> instruction than doing it with any other instruction?
>
> Say, a loop with memory operations Aw, Br, Cw (w=write, r=read).
> This is unrolled just for the sake of the example. The same applies to
> widening the instructions in a loop vectorizer, similar dependency
> analysis is needed.
>
> i1: Aw  Br  Cw
> i2: Aw' Br' Cw'
>
> To do a legal code motion across the two iterations in such a way that
> you move e.g., Aw' before Br, in parallel with Aw (e.g. to pack them to
a
> vector instruction or statically schedule in a VLIW) requires you to know
> that none of Aw, Br nor Cw access the same location of Aw'.
>
> If we know these are parallel loop accesses we know that Aw'
doesn't
> alias with any of Aw, Br or Cw, thus we can perform the code motion
> with a peaceful mind, and execute Aw and Aw' in parallel e.g. in a
> vector instruction.
>
> If we get some non-parallel-loop-annotated mem instructions injected in
> the loop, it's up to the analyzer again to prove the code motion is
legal.
Yes, that's exactly my point.
You said "This is an interesting alternative! Do you mean that we
would still add
the llvm.mem.parallel_loop_access metadata, but only to such mem accesses
that are assumed to be "hard or impossible to analyze" (to prove to be
no
alias cases)? Then we'd forget about the "parallel loop metadata"
as is.

Then we would rely on the regular loop carried dependency analyzer by
default, but let those (mem) annotations just *help* in the "tricky
cases".

The llvm.mem.parallel_loop_access metadata would only communicate "this
instruction does not alias with any other similarly annotated instruction
from any other iteration in this loop"."

IE we wouldn't need to annotate every memory access in the loop, only
"tricky" ones.
I don't see how this is possible, given the suggested semantics.

Let me give a motivating example myself, to explain what I mean.


Loop 1:

for ...
   ...
   load %w    <--- A
   ...
   store ..., %x  <---B
   ...
   load %y <---C
   ...
   store ..., %z,  <---D


Assume that only %w and %z are actually complicated access functions
that we could not prove anything about without metadata.

In practice, we are going to ask about pairs (D, C), (D, B), (D, A)

If "llvm.mem.parallel_loop_access" means that, per above, things
annotated with it do not conflict, in the above loop, you stil need to
annotate all of A, B, C, D with it to get any impact. It is *not*
sufficient to just annotate D, or even D and A, because the problem is
*D* is hard to analyze and *A* is hard to analyze, not that D,
<something> is hard to analyze.   Without annotating everything, we
are still going to fail to know anything about D, C, even if we know
something about D, A.

So i don't see how you can simply annotate *certain* memory accesses,
and get any value out of it, given the semantics suggested.

>
> For example, the reg2mem case (where Sw is produced by it and not
> marked with the llvm.mem.parallel_loop_access MD).
>
> i1: Aw,  Sw,  Br,  Cw
> i2: Aw', Sw', Br', Cw'
>
> Cannot do the same code motion here unless the analyzer can prove Aw'
and
> Sw do not alias.
Yes, I agree, which is why i'm trying to understand how you don't end
up having to mark *every* memory in the loop with this instruction in
practice to get anywhere.
>
> --
> Pekka

Pekka Jääskeläinen

2013-Feb-08 15:48 UTC

head link

[LLVMdev] Parallel Loop Metadata

On 02/08/2013 05:17 PM, Daniel Berlin wrote:> Yes, I agree, which is why i'm trying to understand how you don't
end
> up having to mark*every*  memory in the loop with this instruction in
> practice to get anywhere.
OK. Got what you mean now, thanks.

You are right, when the parallel loop information is added to the loop
the first time, all of the memory instructions need to be annotated,
so this doesn't actually reduce the amount of metadata needed afterall :-I

However, it might help the fallback case. Instead of making
a single unannotated access ruin the whole parallelism data, the
remaining metadata can still be used to shortcut some checks.

E.g., in the example

i1: Aw,  Sw,  Br,  Cw
i2: Aw', Sw', Br', Cw'

The MD can be still exploited to do a code motion of Aw' before Br.
Vectorization might be ruined but some ILP can be still exploited.
To get A vectorized, the analyzer needs to prove !(Sw -> Aw'),
but not the other pairs.

-- 
Pekka

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Feb 2013 - [LLVMdev] Parallel Loop Metadata

[LLVMdev] Parallel Loop Metadata

[LLVMdev] Parallel Loop Metadata

[LLVMdev] Parallel Loop Metadata

Possibly Parallel Threads