thr3ads.net - llvm dev - [LLVMdev] Parallel Loop Metadata [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Nadav Rotem

2013-Feb-08 05:35 UTC

[LLVMdev] Parallel Loop Metadata

Hi Tobi, 

Thanks for reviewing the proposal. I imagine that it may also affects your
parallelization work in Polly.> 
> I am not sure if I am able to follow your reasoning. How could the
-loop-vectorizer detect parallelism violations? I had the feeling that we
introduce the llvm.loop meta-data for the case where we want to inform the loop
vectorizer that it can assume the absence of dependences even though it can not
prove their absence statically. Do you possibly mean that the -loop-vectorizer
should in some way detect if the llvm.loop.parallel metadata is still correct?
Yes, the loop vectorizer can detect the kind of violations of the
"llvm.loop.parallel" metadata that we are worried about.
> Given we go without llvm.mem meta-data. How would the -loop-vectorizer
> detect that the test case in parallel-loops-after-reg2mem.ll (see Pekkas
patch) is not parallel any more even though the llvm.loop.parallel metadata is
present?
We already detect this case right now. Its really easy to do.  The
llvm.loop.parallel should only provide information that is not easily available
within this compilation unit.  For example, assumption that the input pointers
don't overlap, or that dynamic indices are within a certain range that allow
us to vectorize.

> The llvm.mem meta-data would give the necessary information that there was
a new memory access introduced that was not covered by the llvm.loop.parallel
meta-data. However, without this additional information, I have a hard time to
see how you can verify if the llvm.loop.parallel metadata is still up to date.
Thanks,
Nadav

Tobias Grosser

2013-Feb-08 09:56 UTC

head link

[LLVMdev] Parallel Loop Metadata

On 02/08/2013 06:35 AM, Nadav Rotem wrote:> Hi Tobi,
>
> Thanks for reviewing the proposal. I imagine that it may also affects your
parallelization work in Polly.
Sure. I am interested in using it.
>> I am not sure if I am able to follow your reasoning. How could the
-loop-vectorizer detect parallelism violations? I had the feeling that we
introduce the llvm.loop meta-data for the case where we want to inform the loop
vectorizer that it can assume the absence of dependences even though it can not
prove their absence statically. Do you possibly mean that the -loop-vectorizer
should in some way detect if the llvm.loop.parallel metadata is still correct?
>
> Yes, the loop vectorizer can detect the kind of violations of the
"llvm.loop.parallel" metadata that we are worried about.
OK. I see. Most of the references that -mem2reg introduces are obviously
destroying parallelism and I can see that the loop vectorizer could 
detect these obvious violations and refuse to parallelize. However, the
question remains if the loop vectorizer can (and should) detect all 
possible violations. I am still concerned that this is in general not
possible. Here another piece of code (+ transformation), where it is
a lot harder (impossible?) to detect the violation:

// b is always bigger than 100
#parallel
for (int i = 0; i < 100; i++) {
S1:   A[i % b] += i;
}

Depending on the values of 'b' the loop is either parallel or not. It is
impossible for the -loop-vectorizer to reason about this, but with
the additional information the user has, he can easily annotate the loop 
such that the loop vectorizer can optimize it.

Now we have some new instrumentation pass, which collects information
and uses for this a buffer with 'Size' elements. The instrumentation 
pass just adds a couple of additional instructions, which do not
change the sequential behavior of the program.

int *B = get_buffer();
int Size = get_buffer_size();

// b is always bigger than 100
#parallel
for (int i = 0; i < 100; i++) {
s1:   A[i % b] += i;
S2:   B[i % Size] += i
}

However, depending on the value of Size, the parallel execution of
the updated loop may not be legal any more. Without further information 
we have to assume that the loop is not parallel anymore.

For this case, will we require the loop-vectorizer to detect the 
outdated metadata? In case we do, how could this work?

Or do we require the instrumentation pass, to reason about the 
llvm.loop.parallel data and remove it in case it gets invalidated?

Or can we just assume that such an instrumentation pass can or will 
never exist?

Cheers
Tobi

P.S: I know that due to the modulo, we will not be able to prove stride 
one access and vectorization may not be profitable. The example was 
written to demonstrate legality issues. We can assume surrounding
instructions which would make vectorization profitable.

Pekka Jääskeläinen

2013-Feb-08 10:52 UTC

head link

[LLVMdev] Parallel Loop Metadata

Good day Nadav and Tobias,

Yes, the fundamental idea of this metadata was that it would be
specifically used to avoid the need for any loop-carried dependency
analysis.

On 02/08/2013 07:35 AM, Nadav Rotem wrote:> We already detect this case right now. Its really easy to do.  The
> llvm.loop.parallel should only provide information that is not easily
> available within this compilation unit.  For example, assumption that the
> input pointers don't overlap, or that dynamic indices are within a
certain
> range that allow us to vectorize.
I see your point, but then this goes back to the similar discussion we had
with #pragma ivdep.

The risky part in this approach is to define the set of safely
ignored dependency cases in such a way that it never ignores deps that
should not be ignored.

The problematic cases are the "unknown alias" cases. Some of them
might
originate from the original parallel loop (the reason why the programmer
might have added the parallellism info the first place), and some of them
might come from parallel-loop-unaware passes and cannot be safely ignored
if they have converted the loop back to a sequential one (added a
loop carried dependency).

I totally agree that annotating all the memory accesses in the parallel
loops feels somehow excessive (this wasn't done in my original patch),
but so far I do not see a more robust way, and I prefer playing it safe.

BR,
-- 
Pekka

Pekka Jääskeläinen

2013-Feb-08 11:06 UTC

head link

[LLVMdev] Parallel Loop Metadata

On 02/08/2013 11:56 AM, Tobias Grosser wrote:> Or do we require the instrumentation pass, to reason about the
> llvm.loop.parallel data and remove it in case it gets invalidated?
It's against the metadata guidelines.
> Or can we just assume that such an instrumentation pass can or will
> never exist?
I think it's safe to say we can't.

One additional motivational point I want to make is that the llvm.mem
metadata style might become useful for other things too. Cases like
annotating the pointer with its original function context to
preserve "restricted pointer" (noalias) information across function
call inlines, etc.

-- 
Pekka

Renato Golin

2013-Feb-08 13:07 UTC

head link

[LLVMdev] Parallel Loop Metadata

On 8 February 2013 05:35, Nadav Rotem <nrotem at apple.com> wrote:
> For example, assumption that the input pointers don't overlap, or that
> dynamic indices are within a certain range that allow us to vectorize.
>
In this case, I'd prefer metadata on the variables that are assumed not to
alias, like the restrict keyword.

It seems to me that having metadata on the loop basic blocks, since they
can be invalidated, will not help that much with the vectorizer more than
specific annotation on specific values (which are harder to lose). I'm not
saying we should annotate *all* memory instructions on a loop, just the
ones that make sense, or will help the vectorizer default to a sane value.

I'm not a big fan of basic block annotation, unless the BBs are created for
very specific reasons and no pass it allowed to touch it (especially
inliners).

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130208/39bc8d65/attachment.html>

Pekka Jääskeläinen

2013-Feb-08 14:16 UTC

head link

[LLVMdev] Parallel Loop Metadata

Hi Renato,

On 02/08/2013 03:07 PM, Renato Golin wrote:> In this case, I'd prefer metadata on the variables that are assumed not
> to alias, like the restrict keyword.
 >> It seems to me that having metadata on the loop basic blocks, since they
> can be invalidated, will not help that much with the vectorizer more
> than specific annotation on specific values (which are harder to lose).
> I'm not saying we should annotate *all* memory instructions on a loop,
> just the ones that make sense, or will help the vectorizer default to a
> sane value.
This is an interesting alternative! Do you mean that we would still add
the llvm.mem.parallel_loop_access metadata, but only to such mem accesses
that are assumed to be "hard or impossible to analyze" (to prove to be
no
alias cases)? Then we'd forget about the "parallel loop metadata"
as is.

Then we would rely on the regular loop carried dependency analyzer by
default, but let those (mem) annotations just *help* in the "tricky
cases".

The llvm.mem.parallel_loop_access metadata would only communicate "this
instruction does not alias with any other similarly annotated instruction
from any other iteration in this loop".

Quickly thinking, this might work and might not loose the
parallelism info too easily. Anyways, the info still has to be
connected to a loop to avoid breakup in inlining, multi-level loops, etc.

Summarizing, the new metadata would be:

llvm.loop:
Just to mark a loop (points to a unique id metadata).

llvm.mem.parallel_loop_access:
The above mentioned new semantics, connected to the llvm.loop's id metadata.

What do others think? Nadav?

-- 
Pekka

Nadav Rotem

2013-Feb-11 21:31 UTC

head link

[LLVMdev] Parallel Loop Metadata

Now that we have a better understanding of the proposal for using
per-instruction metadata, I think that we need to revisit the "single
metedata" approach (Pekka's original suggestion).

Reg2mem is indeed a problem, but the loop vectorizer can solve this in more than
one way (detect or fix). The example pass that you mentioned below (the
instrumentation pass), can be taught to handle the parallelism pragmas.

Can you think of other passes that we will need to modify ? 

On Feb 8, 2013, at 1:56 AM, Tobias Grosser <tobias at grosser.es> wrote:
> On 02/08/2013 06:35 AM, Nadav Rotem wrote:
>> Hi Tobi,
>> 
>> Thanks for reviewing the proposal. I imagine that it may also affects
your parallelization work in Polly.
> 
> Sure. I am interested in using it.
> 
>>> I am not sure if I am able to follow your reasoning. How could the
-loop-vectorizer detect parallelism violations? I had the feeling that we
introduce the llvm.loop meta-data for the case where we want to inform the loop
vectorizer that it can assume the absence of dependences even though it can not
prove their absence statically. Do you possibly mean that the -loop-vectorizer
should in some way detect if the llvm.loop.parallel metadata is still correct?
>> 
>> Yes, the loop vectorizer can detect the kind of violations of the
"llvm.loop.parallel" metadata that we are worried about.
> 
> OK. I see. Most of the references that -mem2reg introduces are obviously
> destroying parallelism and I can see that the loop vectorizer could detect
these obvious violations and refuse to parallelize. However, the
> question remains if the loop vectorizer can (and should) detect all
possible violations. I am still concerned that this is in general not
> possible. Here another piece of code (+ transformation), where it is
> a lot harder (impossible?) to detect the violation:
> 
> // b is always bigger than 100
> #parallel
> for (int i = 0; i < 100; i++) {
> S1:   A[i % b] += i;
> }
> 
> Depending on the values of 'b' the loop is either parallel or not.
It is impossible for the -loop-vectorizer to reason about this, but with
> the additional information the user has, he can easily annotate the loop
such that the loop vectorizer can optimize it.
> 
> Now we have some new instrumentation pass, which collects information
> and uses for this a buffer with 'Size' elements. The
instrumentation pass just adds a couple of additional instructions, which do not
> change the sequential behavior of the program.
> 
> int *B = get_buffer();
> int Size = get_buffer_size();
> 
> // b is always bigger than 100
> #parallel
> for (int i = 0; i < 100; i++) {
> s1:   A[i % b] += i;
> S2:   B[i % Size] += i
> }
> 
> However, depending on the value of Size, the parallel execution of
> the updated loop may not be legal any more. Without further information we
have to assume that the loop is not parallel anymore.
> 
> For this case, will we require the loop-vectorizer to detect the outdated
metadata? In case we do, how could this work?
> 
> Or do we require the instrumentation pass, to reason about the
llvm.loop.parallel data and remove it in case it gets invalidated?
> 
> Or can we just assume that such an instrumentation pass can or will never
exist?
> 
> Cheers
> Tobi
> 
> P.S: I know that due to the modulo, we will not be able to prove stride one
access and vectorization may not be profitable. The example was written to
demonstrate legality issues. We can assume surrounding
> instructions which would make vectorization profitable.

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Feb 2013 - [LLVMdev] Parallel Loop Metadata

[LLVMdev] Parallel Loop Metadata

[LLVMdev] Parallel Loop Metadata

[LLVMdev] Parallel Loop Metadata

[LLVMdev] Parallel Loop Metadata

[LLVMdev] Parallel Loop Metadata

[LLVMdev] Parallel Loop Metadata

[LLVMdev] Parallel Loop Metadata

Maybe Matching Threads