thr3ads.net - llvm dev - [LLVMdev] Parallel Loop Metadata [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Nadav Rotem

2013-Feb-11 21:31 UTC

[LLVMdev] Parallel Loop Metadata

Now that we have a better understanding of the proposal for using
per-instruction metadata, I think that we need to revisit the "single
metedata" approach (Pekka's original suggestion).

Reg2mem is indeed a problem, but the loop vectorizer can solve this in more than
one way (detect or fix). The example pass that you mentioned below (the
instrumentation pass), can be taught to handle the parallelism pragmas.

Can you think of other passes that we will need to modify ? 

On Feb 8, 2013, at 1:56 AM, Tobias Grosser <tobias at grosser.es> wrote:
> On 02/08/2013 06:35 AM, Nadav Rotem wrote:
>> Hi Tobi,
>> 
>> Thanks for reviewing the proposal. I imagine that it may also affects
your parallelization work in Polly.
> 
> Sure. I am interested in using it.
> 
>>> I am not sure if I am able to follow your reasoning. How could the
-loop-vectorizer detect parallelism violations? I had the feeling that we
introduce the llvm.loop meta-data for the case where we want to inform the loop
vectorizer that it can assume the absence of dependences even though it can not
prove their absence statically. Do you possibly mean that the -loop-vectorizer
should in some way detect if the llvm.loop.parallel metadata is still correct?
>> 
>> Yes, the loop vectorizer can detect the kind of violations of the
"llvm.loop.parallel" metadata that we are worried about.
> 
> OK. I see. Most of the references that -mem2reg introduces are obviously
> destroying parallelism and I can see that the loop vectorizer could detect
these obvious violations and refuse to parallelize. However, the
> question remains if the loop vectorizer can (and should) detect all
possible violations. I am still concerned that this is in general not
> possible. Here another piece of code (+ transformation), where it is
> a lot harder (impossible?) to detect the violation:
> 
> // b is always bigger than 100
> #parallel
> for (int i = 0; i < 100; i++) {
> S1:   A[i % b] += i;
> }
> 
> Depending on the values of 'b' the loop is either parallel or not.
It is impossible for the -loop-vectorizer to reason about this, but with
> the additional information the user has, he can easily annotate the loop
such that the loop vectorizer can optimize it.
> 
> Now we have some new instrumentation pass, which collects information
> and uses for this a buffer with 'Size' elements. The
instrumentation pass just adds a couple of additional instructions, which do not
> change the sequential behavior of the program.
> 
> int *B = get_buffer();
> int Size = get_buffer_size();
> 
> // b is always bigger than 100
> #parallel
> for (int i = 0; i < 100; i++) {
> s1:   A[i % b] += i;
> S2:   B[i % Size] += i
> }
> 
> However, depending on the value of Size, the parallel execution of
> the updated loop may not be legal any more. Without further information we
have to assume that the loop is not parallel anymore.
> 
> For this case, will we require the loop-vectorizer to detect the outdated
metadata? In case we do, how could this work?
> 
> Or do we require the instrumentation pass, to reason about the
llvm.loop.parallel data and remove it in case it gets invalidated?
> 
> Or can we just assume that such an instrumentation pass can or will never
exist?
> 
> Cheers
> Tobi
> 
> P.S: I know that due to the modulo, we will not be able to prove stride one
access and vectorization may not be profitable. The example was written to
demonstrate legality issues. We can assume surrounding
> instructions which would make vectorization profitable.

Tobias Grosser

2013-Feb-11 22:58 UTC

head link

[LLVMdev] Parallel Loop Metadata

On 02/11/2013 10:31 PM, Nadav Rotem wrote:> Now that we have a better understanding of the proposal for using
per-instruction metadata, I think that we need to revisit the "single
metedata" approach (Pekka's original suggestion).
Following Andrew's comments we understood that Sebastian's proposal 
causes issues with inlining and unrolling. It seems we all agree that 
his proposal is not an option we can go with.

Besides that, from my point there are still two options:

1) llvm.loop + nothing else

2) llvm.loop + llvm.mem

Is there any specific reason you only want to revisit option 1)?

I have the feeling option 2) does not work for you, but I don't yet 
understand your reasons. The last comment from you was: "I don't fully 
understand the semantics of this metadata and I am not sure why we need 
it. And even if we do need it, I think that it would require too many 
passes to change". Is this still your opinion or could Pekka's 
explanation help you to understand this better?
> Reg2mem is indeed a problem, but the loop vectorizer can solve this in more
than one way (detect or fix). The example pass that you mentioned below (the
instrumentation pass), can be taught to handle the parallelism pragmas.
When LLVM metadata was introduced [1], one of the core principles was:

"]...] we don't want the optimizers to have to know about
metadata."
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By requiring instrumentation passes like the one I explained to take
the llvm.loop meta-data into account, we do not follow this principle 
any more. To my knowledge for instruction level meta-data, this would be 
the first case where we ignore this core principle.

Are you suggesting we now require passes to reason about meta-data to 
maintain correctness?
> Can you think of other passes that we will need to modify ?
I could probably imagine other passes. Though, my main concern are 
passes that I do not understand. We would e.g. need to reason about
the thread-sanitizer, address-sanitizer and memory-sanitizer passes.
What about the (path) profiling passes? Would we need to inspect all of 
them?

I am really not a fan of adding more meta-data than necessary. However, 
I have the feeling there are some good reason to believe llvm.loop 
meta-data may not be enough. And given the fact that with tbaa meta-data 
has proven that a per memory-access meta-data can work nicely, llvm.mem 
may be a feasible option.

All the best,
Tobias

[1] http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html

Hal Finkel

2013-Feb-11 23:15 UTC

head link

[LLVMdev] Parallel Loop Metadata

----- Original Message -----> From: "Tobias Grosser" <tobias at grosser.es>
> To: "Nadav Rotem" <nrotem at apple.com>
> Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu>
> Sent: Monday, February 11, 2013 4:58:50 PM
> Subject: Re: [LLVMdev] Parallel Loop Metadata
> 
> On 02/11/2013 10:31 PM, Nadav Rotem wrote:
> > Now that we have a better understanding of the proposal for using
> > per-instruction metadata, I think that we need to revisit the
> > "single metedata" approach (Pekka's original
suggestion).
> 
> Following Andrew's comments we understood that Sebastian's proposal
> causes issues with inlining and unrolling. It seems we all agree that
> his proposal is not an option we can go with.
> 
> Besides that, from my point there are still two options:
> 
> 1) llvm.loop + nothing else
> 
> 2) llvm.loop + llvm.mem
> 
> Is there any specific reason you only want to revisit option 1)?
> 
> I have the feeling option 2) does not work for you, but I don't yet
> understand your reasons. The last comment from you was: "I don't
> fully
> understand the semantics of this metadata and I am not sure why we
> need
> it. And even if we do need it, I think that it would require too many
> passes to change". Is this still your opinion or could Pekka's
> explanation help you to understand this better?
> 
> > Reg2mem is indeed a problem, but the loop vectorizer can solve this
> > in more than one way (detect or fix). The example pass that you
> > mentioned below (the instrumentation pass), can be taught to
> > handle the parallelism pragmas.
> 
> When LLVM metadata was introduced [1], one of the core principles
> was:
> 
> "]...] we don't want the optimizers to have to know about
metadata."
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> By requiring instrumentation passes like the one I explained to take
> the llvm.loop meta-data into account, we do not follow this principle
> any more. To my knowledge for instruction level meta-data, this would
> be
> the first case where we ignore this core principle.
> 
> Are you suggesting we now require passes to reason about meta-data to
> maintain correctness?
> 
> > Can you think of other passes that we will need to modify ?
> 
> I could probably imagine other passes. Though, my main concern are
> passes that I do not understand. We would e.g. need to reason about
> the thread-sanitizer, address-sanitizer and memory-sanitizer passes.
> What about the (path) profiling passes? Would we need to inspect all
> of
> them?
> 
> I am really not a fan of adding more meta-data than necessary.
> However,
> I have the feeling there are some good reason to believe llvm.loop
> meta-data may not be enough. And given the fact that with tbaa
> meta-data
> has proven that a per memory-access meta-data can work nicely,
> llvm.mem
> may be a feasible option.
I agree with this. We should err on the side of safety even at the cost of
larger IR. Relying on other passes to drop metadata on loop backedges, even if
those passes are making (seemingly) completely-unrelated changes is asking for
trouble; and in this case, trouble means miscompiles. Also, the changes to
support propagating this metadata in existing passes are well defined: just grep
for all places where TBAA metadata is handled. Some refactoring may be in order,
but that is not a bad thing.

 -Hal
> 
> All the best,
> Tobias
> 
> [1] http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Redmond, Paul

2013-Feb-12 16:18 UTC

head link

[LLVMdev] Parallel Loop Metadata

Hi,

I have a semi-working clang patch that adds support for #pragma ivdep. The
codegen is hacked in but it attaches a dummy llvm.loop.ivdep metadata. This may
be useful for experimenting with different representations (just modify the
codegen)

paul



On 2013-02-11, at 4:31 PM, Nadav Rotem wrote:
> Now that we have a better understanding of the proposal for using
per-instruction metadata, I think that we need to revisit the "single
metedata" approach (Pekka's original suggestion).
>
> Reg2mem is indeed a problem, but the loop vectorizer can solve this in more
than one way (detect or fix). The example pass that you mentioned below (the
instrumentation pass), can be taught to handle the parallelism pragmas.
>
> Can you think of other passes that we will need to modify ?
>
> On Feb 8, 2013, at 1:56 AM, Tobias Grosser <tobias at grosser.es>
wrote:
>
>> On 02/08/2013 06:35 AM, Nadav Rotem wrote:
>>> Hi Tobi,
>>>
>>> Thanks for reviewing the proposal. I imagine that it may also
affects your parallelization work in Polly.
>>
>> Sure. I am interested in using it.
>>
>>>> I am not sure if I am able to follow your reasoning. How could
the -loop-vectorizer detect parallelism violations? I had the feeling that we
introduce the llvm.loop meta-data for the case where we want to inform the loop
vectorizer that it can assume the absence of dependences even though it can not
prove their absence statically. Do you possibly mean that the -loop-vectorizer
should in some way detect if the llvm.loop.parallel metadata is still correct?
>>>
>>> Yes, the loop vectorizer can detect the kind of violations of the
"llvm.loop.parallel" metadata that we are worried about.
>>
>> OK. I see. Most of the references that -mem2reg introduces are
obviously
>> destroying parallelism and I can see that the loop vectorizer could
detect these obvious violations and refuse to parallelize. However, the
>> question remains if the loop vectorizer can (and should) detect all
possible violations. I am still concerned that this is in general not
>> possible. Here another piece of code (+ transformation), where it is
>> a lot harder (impossible?) to detect the violation:
>>
>> // b is always bigger than 100
>> #parallel
>> for (int i = 0; i < 100; i++) {
>> S1:   A[i % b] += i;
>> }
>>
>> Depending on the values of 'b' the loop is either parallel or
not. It is impossible for the -loop-vectorizer to reason about this, but with
>> the additional information the user has, he can easily annotate the
loop such that the loop vectorizer can optimize it.
>>
>> Now we have some new instrumentation pass, which collects information
>> and uses for this a buffer with 'Size' elements. The
instrumentation pass just adds a couple of additional instructions, which do not
>> change the sequential behavior of the program.
>>
>> int *B = get_buffer();
>> int Size = get_buffer_size();
>>
>> // b is always bigger than 100
>> #parallel
>> for (int i = 0; i < 100; i++) {
>> s1:   A[i % b] += i;
>> S2:   B[i % Size] += i
>> }
>>
>> However, depending on the value of Size, the parallel execution of
>> the updated loop may not be legal any more. Without further information
we have to assume that the loop is not parallel anymore.
>>
>> For this case, will we require the loop-vectorizer to detect the
outdated metadata? In case we do, how could this work?
>>
>> Or do we require the instrumentation pass, to reason about the
llvm.loop.parallel data and remove it in case it gets invalidated?
>>
>> Or can we just assume that such an instrumentation pass can or will
never exist?
>>
>> Cheers
>> Tobi
>>
>> P.S: I know that due to the modulo, we will not be able to prove stride
one access and vectorization may not be profitable. The example was written to
demonstrate legality issues. We can assume surrounding
>> instructions which would make vectorization profitable.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ivdep.diff
Type: application/octet-stream
Size: 12525 bytes
Desc: ivdep.diff
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130212/d826146a/attachment.obj>

Nadav Rotem

2013-Feb-12 16:54 UTC

head link

[LLVMdev] Parallel Loop Metadata

> I have the feeling option 2) does not work for you, but I don't yet
understand your reasons.
My inclination to prefer #1 is due to its simplicity. But, if #1 does not work
because it creates a correctness problems then #2 is the only option that is
left on the table.

Andrew Trick

2013-Feb-17 20:32 UTC

head link

[LLVMdev] Parallel Loop Metadata

On Feb 11, 2013, at 2:58 PM, Tobias Grosser <tobias at grosser.es> wrote:
> On 02/11/2013 10:31 PM, Nadav Rotem wrote:
>> Now that we have a better understanding of the proposal for using
per-instruction metadata, I think that we need to revisit the "single
metedata" approach (Pekka's original suggestion).
> 
> Following Andrew's comments we understood that Sebastian's proposal
causes issues with inlining and unrolling. It seems we all agree that his
proposal is not an option we can go with.
Work is well on it's way now, and that's great. But just so people
don't get the wrong impression from this thread. I'm not aware of any
correctness issue with the "single metadata" approach. Inlining seems
to work fine for me. Here's a hacky little example to see what I mean.

define void @wrapper(i32* %arg) {
  %1 = call i32 @loop(i32* %arg)
  ret void
}

define i32 @loop(i32* %a) {
entry:
  br label %loop

loop:                                             ; preds = %loop, %entry
  %p = phi i32* [%a,%entry], [%p1, %loop]
  %p1 = getelementptr i32* %p, i64 1
  %v = load i32* %p
  %cond = icmp eq i32 %v, 0
  br i1 %cond, label %loop, label %exit, !metadata !{i32* %p, metadata !0}

exit:                                             ; preds = %loop
  ret i32 %v
}

!0 = metadata !{metadata !"loop"}

$ opt -inline -S < 

define void @wrapper(i32* %arg) {
  br label %loop.i

loop.i:                                           ; preds = %loop.i, %0
  %p.i = phi i32* [ %arg, %0 ], [ %p1.i, %loop.i ]
  %p1.i = getelementptr i32* %p.i, i64 1
  %v.i = load i32* %p.i
  %cond.i = icmp eq i32 %v.i, 0
  br i1 %cond.i, label %loop.i, label %loop.exit, !metadata !{i32* %p.i,
metadata !0}

loop.exit:                                        ; preds = %loop.i
  ret void
}

!0 = metadata !{metadata !"loop"}

---
Unrolling OTOH should be aware of and preserve any loop metadata.

-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130217/28b8e6b5/attachment.html>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Feb 2013 - [LLVMdev] Parallel Loop Metadata

[LLVMdev] Parallel Loop Metadata

[LLVMdev] Parallel Loop Metadata

[LLVMdev] Parallel Loop Metadata

[LLVMdev] Parallel Loop Metadata

[LLVMdev] Parallel Loop Metadata

[LLVMdev] Parallel Loop Metadata

Possibly Parallel Threads