Now that we have a better understanding of the proposal for using per-instruction metadata, I think that we need to revisit the "single metedata" approach (Pekka's original suggestion). Reg2mem is indeed a problem, but the loop vectorizer can solve this in more than one way (detect or fix). The example pass that you mentioned below (the instrumentation pass), can be taught to handle the parallelism pragmas. Can you think of other passes that we will need to modify ? On Feb 8, 2013, at 1:56 AM, Tobias Grosser <tobias at grosser.es> wrote:> On 02/08/2013 06:35 AM, Nadav Rotem wrote: >> Hi Tobi, >> >> Thanks for reviewing the proposal. I imagine that it may also affects your parallelization work in Polly. > > Sure. I am interested in using it. > >>> I am not sure if I am able to follow your reasoning. How could the -loop-vectorizer detect parallelism violations? I had the feeling that we introduce the llvm.loop meta-data for the case where we want to inform the loop vectorizer that it can assume the absence of dependences even though it can not prove their absence statically. Do you possibly mean that the -loop-vectorizer should in some way detect if the llvm.loop.parallel metadata is still correct? >> >> Yes, the loop vectorizer can detect the kind of violations of the "llvm.loop.parallel" metadata that we are worried about. > > OK. I see. Most of the references that -mem2reg introduces are obviously > destroying parallelism and I can see that the loop vectorizer could detect these obvious violations and refuse to parallelize. However, the > question remains if the loop vectorizer can (and should) detect all possible violations. I am still concerned that this is in general not > possible. Here another piece of code (+ transformation), where it is > a lot harder (impossible?) to detect the violation: > > // b is always bigger than 100 > #parallel > for (int i = 0; i < 100; i++) { > S1: A[i % b] += i; > } > > Depending on the values of 'b' the loop is either parallel or not. It is impossible for the -loop-vectorizer to reason about this, but with > the additional information the user has, he can easily annotate the loop such that the loop vectorizer can optimize it. > > Now we have some new instrumentation pass, which collects information > and uses for this a buffer with 'Size' elements. The instrumentation pass just adds a couple of additional instructions, which do not > change the sequential behavior of the program. > > int *B = get_buffer(); > int Size = get_buffer_size(); > > // b is always bigger than 100 > #parallel > for (int i = 0; i < 100; i++) { > s1: A[i % b] += i; > S2: B[i % Size] += i > } > > However, depending on the value of Size, the parallel execution of > the updated loop may not be legal any more. Without further information we have to assume that the loop is not parallel anymore. > > For this case, will we require the loop-vectorizer to detect the outdated metadata? In case we do, how could this work? > > Or do we require the instrumentation pass, to reason about the llvm.loop.parallel data and remove it in case it gets invalidated? > > Or can we just assume that such an instrumentation pass can or will never exist? > > Cheers > Tobi > > P.S: I know that due to the modulo, we will not be able to prove stride one access and vectorization may not be profitable. The example was written to demonstrate legality issues. We can assume surrounding > instructions which would make vectorization profitable.
On 02/11/2013 10:31 PM, Nadav Rotem wrote:> Now that we have a better understanding of the proposal for using per-instruction metadata, I think that we need to revisit the "single metedata" approach (Pekka's original suggestion).Following Andrew's comments we understood that Sebastian's proposal causes issues with inlining and unrolling. It seems we all agree that his proposal is not an option we can go with. Besides that, from my point there are still two options: 1) llvm.loop + nothing else 2) llvm.loop + llvm.mem Is there any specific reason you only want to revisit option 1)? I have the feeling option 2) does not work for you, but I don't yet understand your reasons. The last comment from you was: "I don't fully understand the semantics of this metadata and I am not sure why we need it. And even if we do need it, I think that it would require too many passes to change". Is this still your opinion or could Pekka's explanation help you to understand this better?> Reg2mem is indeed a problem, but the loop vectorizer can solve this in more than one way (detect or fix). The example pass that you mentioned below (the instrumentation pass), can be taught to handle the parallelism pragmas.When LLVM metadata was introduced [1], one of the core principles was: "]...] we don't want the optimizers to have to know about metadata." ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ By requiring instrumentation passes like the one I explained to take the llvm.loop meta-data into account, we do not follow this principle any more. To my knowledge for instruction level meta-data, this would be the first case where we ignore this core principle. Are you suggesting we now require passes to reason about meta-data to maintain correctness?> Can you think of other passes that we will need to modify ?I could probably imagine other passes. Though, my main concern are passes that I do not understand. We would e.g. need to reason about the thread-sanitizer, address-sanitizer and memory-sanitizer passes. What about the (path) profiling passes? Would we need to inspect all of them? I am really not a fan of adding more meta-data than necessary. However, I have the feeling there are some good reason to believe llvm.loop meta-data may not be enough. And given the fact that with tbaa meta-data has proven that a per memory-access meta-data can work nicely, llvm.mem may be a feasible option. All the best, Tobias [1] http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html
----- Original Message -----> From: "Tobias Grosser" <tobias at grosser.es> > To: "Nadav Rotem" <nrotem at apple.com> > Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu> > Sent: Monday, February 11, 2013 4:58:50 PM > Subject: Re: [LLVMdev] Parallel Loop Metadata > > On 02/11/2013 10:31 PM, Nadav Rotem wrote: > > Now that we have a better understanding of the proposal for using > > per-instruction metadata, I think that we need to revisit the > > "single metedata" approach (Pekka's original suggestion). > > Following Andrew's comments we understood that Sebastian's proposal > causes issues with inlining and unrolling. It seems we all agree that > his proposal is not an option we can go with. > > Besides that, from my point there are still two options: > > 1) llvm.loop + nothing else > > 2) llvm.loop + llvm.mem > > Is there any specific reason you only want to revisit option 1)? > > I have the feeling option 2) does not work for you, but I don't yet > understand your reasons. The last comment from you was: "I don't > fully > understand the semantics of this metadata and I am not sure why we > need > it. And even if we do need it, I think that it would require too many > passes to change". Is this still your opinion or could Pekka's > explanation help you to understand this better? > > > Reg2mem is indeed a problem, but the loop vectorizer can solve this > > in more than one way (detect or fix). The example pass that you > > mentioned below (the instrumentation pass), can be taught to > > handle the parallelism pragmas. > > When LLVM metadata was introduced [1], one of the core principles > was: > > "]...] we don't want the optimizers to have to know about metadata." > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > By requiring instrumentation passes like the one I explained to take > the llvm.loop meta-data into account, we do not follow this principle > any more. To my knowledge for instruction level meta-data, this would > be > the first case where we ignore this core principle. > > Are you suggesting we now require passes to reason about meta-data to > maintain correctness? > > > Can you think of other passes that we will need to modify ? > > I could probably imagine other passes. Though, my main concern are > passes that I do not understand. We would e.g. need to reason about > the thread-sanitizer, address-sanitizer and memory-sanitizer passes. > What about the (path) profiling passes? Would we need to inspect all > of > them? > > I am really not a fan of adding more meta-data than necessary. > However, > I have the feeling there are some good reason to believe llvm.loop > meta-data may not be enough. And given the fact that with tbaa > meta-data > has proven that a per memory-access meta-data can work nicely, > llvm.mem > may be a feasible option.I agree with this. We should err on the side of safety even at the cost of larger IR. Relying on other passes to drop metadata on loop backedges, even if those passes are making (seemingly) completely-unrelated changes is asking for trouble; and in this case, trouble means miscompiles. Also, the changes to support propagating this metadata in existing passes are well defined: just grep for all places where TBAA metadata is handled. Some refactoring may be in order, but that is not a bad thing. -Hal> > All the best, > Tobias > > [1] http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Hi, I have a semi-working clang patch that adds support for #pragma ivdep. The codegen is hacked in but it attaches a dummy llvm.loop.ivdep metadata. This may be useful for experimenting with different representations (just modify the codegen) paul On 2013-02-11, at 4:31 PM, Nadav Rotem wrote:> Now that we have a better understanding of the proposal for using per-instruction metadata, I think that we need to revisit the "single metedata" approach (Pekka's original suggestion). > > Reg2mem is indeed a problem, but the loop vectorizer can solve this in more than one way (detect or fix). The example pass that you mentioned below (the instrumentation pass), can be taught to handle the parallelism pragmas. > > Can you think of other passes that we will need to modify ? > > On Feb 8, 2013, at 1:56 AM, Tobias Grosser <tobias at grosser.es> wrote: > >> On 02/08/2013 06:35 AM, Nadav Rotem wrote: >>> Hi Tobi, >>> >>> Thanks for reviewing the proposal. I imagine that it may also affects your parallelization work in Polly. >> >> Sure. I am interested in using it. >> >>>> I am not sure if I am able to follow your reasoning. How could the -loop-vectorizer detect parallelism violations? I had the feeling that we introduce the llvm.loop meta-data for the case where we want to inform the loop vectorizer that it can assume the absence of dependences even though it can not prove their absence statically. Do you possibly mean that the -loop-vectorizer should in some way detect if the llvm.loop.parallel metadata is still correct? >>> >>> Yes, the loop vectorizer can detect the kind of violations of the "llvm.loop.parallel" metadata that we are worried about. >> >> OK. I see. Most of the references that -mem2reg introduces are obviously >> destroying parallelism and I can see that the loop vectorizer could detect these obvious violations and refuse to parallelize. However, the >> question remains if the loop vectorizer can (and should) detect all possible violations. I am still concerned that this is in general not >> possible. Here another piece of code (+ transformation), where it is >> a lot harder (impossible?) to detect the violation: >> >> // b is always bigger than 100 >> #parallel >> for (int i = 0; i < 100; i++) { >> S1: A[i % b] += i; >> } >> >> Depending on the values of 'b' the loop is either parallel or not. It is impossible for the -loop-vectorizer to reason about this, but with >> the additional information the user has, he can easily annotate the loop such that the loop vectorizer can optimize it. >> >> Now we have some new instrumentation pass, which collects information >> and uses for this a buffer with 'Size' elements. The instrumentation pass just adds a couple of additional instructions, which do not >> change the sequential behavior of the program. >> >> int *B = get_buffer(); >> int Size = get_buffer_size(); >> >> // b is always bigger than 100 >> #parallel >> for (int i = 0; i < 100; i++) { >> s1: A[i % b] += i; >> S2: B[i % Size] += i >> } >> >> However, depending on the value of Size, the parallel execution of >> the updated loop may not be legal any more. Without further information we have to assume that the loop is not parallel anymore. >> >> For this case, will we require the loop-vectorizer to detect the outdated metadata? In case we do, how could this work? >> >> Or do we require the instrumentation pass, to reason about the llvm.loop.parallel data and remove it in case it gets invalidated? >> >> Or can we just assume that such an instrumentation pass can or will never exist? >> >> Cheers >> Tobi >> >> P.S: I know that due to the modulo, we will not be able to prove stride one access and vectorization may not be profitable. The example was written to demonstrate legality issues. We can assume surrounding >> instructions which would make vectorization profitable. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- A non-text attachment was scrubbed... Name: ivdep.diff Type: application/octet-stream Size: 12525 bytes Desc: ivdep.diff URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130212/d826146a/attachment.obj>
> I have the feeling option 2) does not work for you, but I don't yet understand your reasons.My inclination to prefer #1 is due to its simplicity. But, if #1 does not work because it creates a correctness problems then #2 is the only option that is left on the table.
On Feb 11, 2013, at 2:58 PM, Tobias Grosser <tobias at grosser.es> wrote:> On 02/11/2013 10:31 PM, Nadav Rotem wrote: >> Now that we have a better understanding of the proposal for using per-instruction metadata, I think that we need to revisit the "single metedata" approach (Pekka's original suggestion). > > Following Andrew's comments we understood that Sebastian's proposal causes issues with inlining and unrolling. It seems we all agree that his proposal is not an option we can go with.Work is well on it's way now, and that's great. But just so people don't get the wrong impression from this thread. I'm not aware of any correctness issue with the "single metadata" approach. Inlining seems to work fine for me. Here's a hacky little example to see what I mean. define void @wrapper(i32* %arg) { %1 = call i32 @loop(i32* %arg) ret void } define i32 @loop(i32* %a) { entry: br label %loop loop: ; preds = %loop, %entry %p = phi i32* [%a,%entry], [%p1, %loop] %p1 = getelementptr i32* %p, i64 1 %v = load i32* %p %cond = icmp eq i32 %v, 0 br i1 %cond, label %loop, label %exit, !metadata !{i32* %p, metadata !0} exit: ; preds = %loop ret i32 %v } !0 = metadata !{metadata !"loop"} $ opt -inline -S < define void @wrapper(i32* %arg) { br label %loop.i loop.i: ; preds = %loop.i, %0 %p.i = phi i32* [ %arg, %0 ], [ %p1.i, %loop.i ] %p1.i = getelementptr i32* %p.i, i64 1 %v.i = load i32* %p.i %cond.i = icmp eq i32 %v.i, 0 br i1 %cond.i, label %loop.i, label %loop.exit, !metadata !{i32* %p.i, metadata !0} loop.exit: ; preds = %loop.i ret void } !0 = metadata !{metadata !"loop"} --- Unrolling OTOH should be aware of and preserve any loop metadata. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130217/28b8e6b5/attachment.html>