On Feb 11, 2013, at 2:58 PM, Tobias Grosser <tobias at grosser.es> wrote:> On 02/11/2013 10:31 PM, Nadav Rotem wrote: >> Now that we have a better understanding of the proposal for using per-instruction metadata, I think that we need to revisit the "single metedata" approach (Pekka's original suggestion). > > Following Andrew's comments we understood that Sebastian's proposal causes issues with inlining and unrolling. It seems we all agree that his proposal is not an option we can go with.Work is well on it's way now, and that's great. But just so people don't get the wrong impression from this thread. I'm not aware of any correctness issue with the "single metadata" approach. Inlining seems to work fine for me. Here's a hacky little example to see what I mean. define void @wrapper(i32* %arg) { %1 = call i32 @loop(i32* %arg) ret void } define i32 @loop(i32* %a) { entry: br label %loop loop: ; preds = %loop, %entry %p = phi i32* [%a,%entry], [%p1, %loop] %p1 = getelementptr i32* %p, i64 1 %v = load i32* %p %cond = icmp eq i32 %v, 0 br i1 %cond, label %loop, label %exit, !metadata !{i32* %p, metadata !0} exit: ; preds = %loop ret i32 %v } !0 = metadata !{metadata !"loop"} $ opt -inline -S < define void @wrapper(i32* %arg) { br label %loop.i loop.i: ; preds = %loop.i, %0 %p.i = phi i32* [ %arg, %0 ], [ %p1.i, %loop.i ] %p1.i = getelementptr i32* %p.i, i64 1 %v.i = load i32* %p.i %cond.i = icmp eq i32 %v.i, 0 br i1 %cond.i, label %loop.i, label %loop.exit, !metadata !{i32* %p.i, metadata !0} loop.exit: ; preds = %loop.i ret void } !0 = metadata !{metadata !"loop"} --- Unrolling OTOH should be aware of and preserve any loop metadata. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130217/28b8e6b5/attachment.html>
----- Original Message -----> From: "Andrew Trick" <atrick at apple.com> > To: "Tobias Grosser" <tobias at grosser.es> > Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu> > Sent: Sunday, February 17, 2013 2:32:25 PM > Subject: Re: [LLVMdev] Parallel Loop Metadata > > > > > > On Feb 11, 2013, at 2:58 PM, Tobias Grosser < tobias at grosser.es > > wrote: > > > On 02/11/2013 10:31 PM, Nadav Rotem wrote: > > > Now that we have a better understanding of the proposal for using > per-instruction metadata, I think that we need to revisit the > "single metedata" approach (Pekka's original suggestion). > > Following Andrew's comments we understood that Sebastian's proposal > causes issues with inlining and unrolling. It seems we all agree > that his proposal is not an option we can go with. > > Work is well on it's way now, and that's great. But just so people > don't get the wrong impression from this thread. I'm not aware of > any correctness issue with the "single metadata" approach. Inlining > seems to work fine for me. Here's a hacky little example to see what > I mean. > > > > define void @wrapper(i32* %arg) { > %1 = call i32 @loop(i32* %arg) > ret void > } > > > define i32 @loop(i32* %a) { > entry: > br label %loop > > > loop: ; preds = %loop, %entry > %p = phi i32* [%a,%entry], [%p1, %loop] > %p1 = getelementptr i32* %p, i64 1 > %v = load i32* %p > %cond = icmp eq i32 %v, 0 > br i1 %cond, label %loop, label %exit, !metadata !{i32* %p, metadata > !0} > > > exit: ; preds = %loop > ret i32 %v > } > > > !0 = metadata !{metadata !"loop"} > > > $ opt -inline -S < > > > > define void @wrapper(i32* %arg) { > br label %loop.i > > > loop.i: ; preds = %loop.i, %0 > %p.i = phi i32* [ %arg, %0 ], [ %p1.i, %loop.i ] > %p1.i = getelementptr i32* %p.i, i64 1 > %v.i = load i32* %p.i > %cond.i = icmp eq i32 %v.i, 0 > br i1 %cond.i, label %loop.i, label %loop.exit, !metadata !{i32* > %p.i, metadata !0} > > > loop.exit: ; preds = %loop.i > ret void > } > > > !0 = metadata !{metadata !"loop"} > > > --- > Unrolling OTOH should be aware of and preserve any loop metadata.If the unroller somehow differentiates the metadata coming from different loop iterations, then BBVectorize can use this information as well. Even better, we could make BasicAA understand that appropriately marked loads and stores from different iterations don't alias. Then the AA-based dependency breaker in the scheduler could also make use of the information. Thoughts? -Hal> > > -Andy > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On Feb 17, 2013, at 1:15 PM, Hal Finkel <hfinkel at anl.gov> wrote:>> Unrolling OTOH should be aware of and preserve any loop metadata. > > If the unroller somehow differentiates the metadata coming from different loop iterations, then BBVectorize can use this information as well. Even better, we could make BasicAA understand that appropriately marked loads and stores from different iterations don't alias. Then the AA-based dependency breaker in the scheduler could also make use of the information. Thoughts?That could work. Eventually the LoopVectorizer pass should be able to unroll and overlap iterations. If then it's still important for BBVectorize to disambiguate, that information could be provided either by updating the metadata or a separate analysis preserved by LoopVectorizer. BBVectorize should run immediately afterward. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130217/1bb1c4d4/attachment.html>
Pekka Jääskeläinen
2013-Feb-18 08:32 UTC
[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
On 02/17/2013 11:15 PM, Hal Finkel wrote:> If the unroller somehow differentiates the metadata coming from different > loop iterations, then BBVectorize can use this information as well. Even > better, we could make BasicAA understand that appropriately marked loads > and stores from different iterations don't alias. Then the AA-based > dependency breaker in the scheduler could also make use of the information. > Thoughts?This is roughly what we did in our first version of work-group autovectorization in pocl that works on "fully unrolled wi-loops" (we call it the 'replication' work group generation method). We forked the BBVectorize to the pocl code base and added explicit knowledge of the separate work-items (that are really just parallel loop iterations) so it tries to pair the matching instructions from the different iterations (WIs) directly. We also have an AA that exploits the independent iterations (WIs) information along with the other OpenCL AA helping features (disjoint address spaces). We use this AA down to the custom instruction scheduler of ours with the TCE target to help the VLIW-style scheduling/bundling of multiple WIs. I have hoped to get the BBVectorizer and the "unrolled parallel loop AA" functionality upstreamed as it applies to all fully parallel loops, not just the OpenCL "work-item loops", and I hate to have the forked BBVectorizer in pocl. The metadata scheme should be thought through, however, to make it cleaner than our OpenCL-specific hackish attempt, and possibly usable for other similar "context-dependent scenarios". The earlier idea I had was to attach "context information" to the memory accesses. In this case it would communicate that the mem access belongs (or belonged, if fully unrolled) to a loop and it can alias only with the accesses from the same iteration, or with accesses without the metadata. Something like: llvm.mem.parallel_loop_iteration [loopid] [iteration_id_integer] This can help the "pairing" of the BBVectorizer: it can try to pair with the different iterations first. The ParallelLoopIterationAA can look at this metadata and if the other instruction has also a parallel_loop_iteration MD that points to the same loopid (the self-referencing id metadata from the llvm.loop.parallel patch), check their iteration identifier, and if it's different, return NO ALIAS. The similar idea could be applied to preserve the 'restrict' info across function inlines: llvm.mem.restricted_access [funcid] [pointerid] Similarly, if the RestrictedPointerAA finds that both of the accesses are marked with this metadata and point to the same funcid, and the [pointerid] is different, it can return NO ALIAS. -- Pekka
Maybe Matching Threads
- [LLVMdev] Parallel Loop Metadata
- [LLVMdev] Parallel Loop Metadata
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)