Hi Tobi, Thanks for reviewing the proposal. I imagine that it may also affects your parallelization work in Polly.> > I am not sure if I am able to follow your reasoning. How could the -loop-vectorizer detect parallelism violations? I had the feeling that we introduce the llvm.loop meta-data for the case where we want to inform the loop vectorizer that it can assume the absence of dependences even though it can not prove their absence statically. Do you possibly mean that the -loop-vectorizer should in some way detect if the llvm.loop.parallel metadata is still correct?Yes, the loop vectorizer can detect the kind of violations of the "llvm.loop.parallel" metadata that we are worried about.> Given we go without llvm.mem meta-data. How would the -loop-vectorizer > detect that the test case in parallel-loops-after-reg2mem.ll (see Pekkas patch) is not parallel any more even though the llvm.loop.parallel metadata is present?We already detect this case right now. Its really easy to do. The llvm.loop.parallel should only provide information that is not easily available within this compilation unit. For example, assumption that the input pointers don't overlap, or that dynamic indices are within a certain range that allow us to vectorize.> The llvm.mem meta-data would give the necessary information that there was a new memory access introduced that was not covered by the llvm.loop.parallel meta-data. However, without this additional information, I have a hard time to see how you can verify if the llvm.loop.parallel metadata is still up to date.Thanks, Nadav
On 02/08/2013 06:35 AM, Nadav Rotem wrote:> Hi Tobi, > > Thanks for reviewing the proposal. I imagine that it may also affects your parallelization work in Polly.Sure. I am interested in using it.>> I am not sure if I am able to follow your reasoning. How could the -loop-vectorizer detect parallelism violations? I had the feeling that we introduce the llvm.loop meta-data for the case where we want to inform the loop vectorizer that it can assume the absence of dependences even though it can not prove their absence statically. Do you possibly mean that the -loop-vectorizer should in some way detect if the llvm.loop.parallel metadata is still correct? > > Yes, the loop vectorizer can detect the kind of violations of the "llvm.loop.parallel" metadata that we are worried about.OK. I see. Most of the references that -mem2reg introduces are obviously destroying parallelism and I can see that the loop vectorizer could detect these obvious violations and refuse to parallelize. However, the question remains if the loop vectorizer can (and should) detect all possible violations. I am still concerned that this is in general not possible. Here another piece of code (+ transformation), where it is a lot harder (impossible?) to detect the violation: // b is always bigger than 100 #parallel for (int i = 0; i < 100; i++) { S1: A[i % b] += i; } Depending on the values of 'b' the loop is either parallel or not. It is impossible for the -loop-vectorizer to reason about this, but with the additional information the user has, he can easily annotate the loop such that the loop vectorizer can optimize it. Now we have some new instrumentation pass, which collects information and uses for this a buffer with 'Size' elements. The instrumentation pass just adds a couple of additional instructions, which do not change the sequential behavior of the program. int *B = get_buffer(); int Size = get_buffer_size(); // b is always bigger than 100 #parallel for (int i = 0; i < 100; i++) { s1: A[i % b] += i; S2: B[i % Size] += i } However, depending on the value of Size, the parallel execution of the updated loop may not be legal any more. Without further information we have to assume that the loop is not parallel anymore. For this case, will we require the loop-vectorizer to detect the outdated metadata? In case we do, how could this work? Or do we require the instrumentation pass, to reason about the llvm.loop.parallel data and remove it in case it gets invalidated? Or can we just assume that such an instrumentation pass can or will never exist? Cheers Tobi P.S: I know that due to the modulo, we will not be able to prove stride one access and vectorization may not be profitable. The example was written to demonstrate legality issues. We can assume surrounding instructions which would make vectorization profitable.
Good day Nadav and Tobias, Yes, the fundamental idea of this metadata was that it would be specifically used to avoid the need for any loop-carried dependency analysis. On 02/08/2013 07:35 AM, Nadav Rotem wrote:> We already detect this case right now. Its really easy to do. The > llvm.loop.parallel should only provide information that is not easily > available within this compilation unit. For example, assumption that the > input pointers don't overlap, or that dynamic indices are within a certain > range that allow us to vectorize.I see your point, but then this goes back to the similar discussion we had with #pragma ivdep. The risky part in this approach is to define the set of safely ignored dependency cases in such a way that it never ignores deps that should not be ignored. The problematic cases are the "unknown alias" cases. Some of them might originate from the original parallel loop (the reason why the programmer might have added the parallellism info the first place), and some of them might come from parallel-loop-unaware passes and cannot be safely ignored if they have converted the loop back to a sequential one (added a loop carried dependency). I totally agree that annotating all the memory accesses in the parallel loops feels somehow excessive (this wasn't done in my original patch), but so far I do not see a more robust way, and I prefer playing it safe. BR, -- Pekka
On 02/08/2013 11:56 AM, Tobias Grosser wrote:> Or do we require the instrumentation pass, to reason about the > llvm.loop.parallel data and remove it in case it gets invalidated?It's against the metadata guidelines.> Or can we just assume that such an instrumentation pass can or will > never exist?I think it's safe to say we can't. One additional motivational point I want to make is that the llvm.mem metadata style might become useful for other things too. Cases like annotating the pointer with its original function context to preserve "restricted pointer" (noalias) information across function call inlines, etc. -- Pekka
On 8 February 2013 05:35, Nadav Rotem <nrotem at apple.com> wrote:> For example, assumption that the input pointers don't overlap, or that > dynamic indices are within a certain range that allow us to vectorize. >In this case, I'd prefer metadata on the variables that are assumed not to alias, like the restrict keyword. It seems to me that having metadata on the loop basic blocks, since they can be invalidated, will not help that much with the vectorizer more than specific annotation on specific values (which are harder to lose). I'm not saying we should annotate *all* memory instructions on a loop, just the ones that make sense, or will help the vectorizer default to a sane value. I'm not a big fan of basic block annotation, unless the BBs are created for very specific reasons and no pass it allowed to touch it (especially inliners). cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130208/39bc8d65/attachment.html>
Hi Renato, On 02/08/2013 03:07 PM, Renato Golin wrote:> In this case, I'd prefer metadata on the variables that are assumed not > to alias, like the restrict keyword.>> It seems to me that having metadata on the loop basic blocks, since they > can be invalidated, will not help that much with the vectorizer more > than specific annotation on specific values (which are harder to lose). > I'm not saying we should annotate *all* memory instructions on a loop, > just the ones that make sense, or will help the vectorizer default to a > sane value.This is an interesting alternative! Do you mean that we would still add the llvm.mem.parallel_loop_access metadata, but only to such mem accesses that are assumed to be "hard or impossible to analyze" (to prove to be no alias cases)? Then we'd forget about the "parallel loop metadata" as is. Then we would rely on the regular loop carried dependency analyzer by default, but let those (mem) annotations just *help* in the "tricky cases". The llvm.mem.parallel_loop_access metadata would only communicate "this instruction does not alias with any other similarly annotated instruction from any other iteration in this loop". Quickly thinking, this might work and might not loose the parallelism info too easily. Anyways, the info still has to be connected to a loop to avoid breakup in inlining, multi-level loops, etc. Summarizing, the new metadata would be: llvm.loop: Just to mark a loop (points to a unique id metadata). llvm.mem.parallel_loop_access: The above mentioned new semantics, connected to the llvm.loop's id metadata. What do others think? Nadav? -- Pekka
Now that we have a better understanding of the proposal for using per-instruction metadata, I think that we need to revisit the "single metedata" approach (Pekka's original suggestion). Reg2mem is indeed a problem, but the loop vectorizer can solve this in more than one way (detect or fix). The example pass that you mentioned below (the instrumentation pass), can be taught to handle the parallelism pragmas. Can you think of other passes that we will need to modify ? On Feb 8, 2013, at 1:56 AM, Tobias Grosser <tobias at grosser.es> wrote:> On 02/08/2013 06:35 AM, Nadav Rotem wrote: >> Hi Tobi, >> >> Thanks for reviewing the proposal. I imagine that it may also affects your parallelization work in Polly. > > Sure. I am interested in using it. > >>> I am not sure if I am able to follow your reasoning. How could the -loop-vectorizer detect parallelism violations? I had the feeling that we introduce the llvm.loop meta-data for the case where we want to inform the loop vectorizer that it can assume the absence of dependences even though it can not prove their absence statically. Do you possibly mean that the -loop-vectorizer should in some way detect if the llvm.loop.parallel metadata is still correct? >> >> Yes, the loop vectorizer can detect the kind of violations of the "llvm.loop.parallel" metadata that we are worried about. > > OK. I see. Most of the references that -mem2reg introduces are obviously > destroying parallelism and I can see that the loop vectorizer could detect these obvious violations and refuse to parallelize. However, the > question remains if the loop vectorizer can (and should) detect all possible violations. I am still concerned that this is in general not > possible. Here another piece of code (+ transformation), where it is > a lot harder (impossible?) to detect the violation: > > // b is always bigger than 100 > #parallel > for (int i = 0; i < 100; i++) { > S1: A[i % b] += i; > } > > Depending on the values of 'b' the loop is either parallel or not. It is impossible for the -loop-vectorizer to reason about this, but with > the additional information the user has, he can easily annotate the loop such that the loop vectorizer can optimize it. > > Now we have some new instrumentation pass, which collects information > and uses for this a buffer with 'Size' elements. The instrumentation pass just adds a couple of additional instructions, which do not > change the sequential behavior of the program. > > int *B = get_buffer(); > int Size = get_buffer_size(); > > // b is always bigger than 100 > #parallel > for (int i = 0; i < 100; i++) { > s1: A[i % b] += i; > S2: B[i % Size] += i > } > > However, depending on the value of Size, the parallel execution of > the updated loop may not be legal any more. Without further information we have to assume that the loop is not parallel anymore. > > For this case, will we require the loop-vectorizer to detect the outdated metadata? In case we do, how could this work? > > Or do we require the instrumentation pass, to reason about the llvm.loop.parallel data and remove it in case it gets invalidated? > > Or can we just assume that such an instrumentation pass can or will never exist? > > Cheers > Tobi > > P.S: I know that due to the modulo, we will not be able to prove stride one access and vectorization may not be profitable. The example was written to demonstrate legality issues. We can assume surrounding > instructions which would make vectorization profitable.