Hal Finkel
2013-Feb-19  15:51 UTC
[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
----- Original Message -----> From: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi> > To: "Nadav Rotem" <nrotem at apple.com> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "Tobias Grosser" <tobias at grosser.es>, "llvmdev at cs.uiuc.edu Dev" > <llvmdev at cs.uiuc.edu> > Sent: Tuesday, February 19, 2013 2:27:09 AM > Subject: Re: [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata) > > Hi, > > On 02/19/2013 02:31 AM, Nadav Rotem wrote: > > vectorization. However, I don't completely understand something. > > If we > > already have the information that consecutive iterations of the > > loops are > > independent, then the loop vectorizer should already vectorize the > > loop. > > Yes, the loop vectorizer should be a better match for the parallel > (inner)loops. > > That's why I've been pushing the parallel loop metadata: to go > towards > using the generic loop vectorizer instead of the hacked bbvectorizer > for > work-group autovectorization in pocl. Unfortunately, it needs some > more work > still to be efficient for this purpose (like discussed), but a step > towards it > has been now made and it can vectorize some work-group functions with > pocl. > That's good. > > BTW, there's at least one thing the bbvectorizer handles better now: > intra-kernel/function vector datatypes. IIRC, it doesn't choke when > it sees > vectors already present in the input, but calmly tries to combine > multiple > vector instructions to a larger one. Thus, it might be useful in the > case > where the loop at hand is not nicely and cleanly vectorizable (e.g., > nasty > memory access patterns) to still provide some level of vector ISA > utilization.Indeed, that's the idea.> > Hal, this OpenCL WG autovectorization work, unfortunately, is not my > first > priority task at work currently (more like a pet project), so I > cannot make any > promises on when I might find time to work on it again. So, if you > want to > see the parallel loop iteration MD happen sooner, I'd recommend you > dig into > it. I think we'd like to start from the scratch for the bbvectorizer > utilization > in pocl anyways, that is, would add the metadata support first and > then use it > in a fresh bbvectorizer version. The current hacked version in pocl > seems not > to be upstreamable easily as it has lagged behind some LLVM versions > and is > rather dirty.Understood. If you have some time, it seems that there are several sub-tasks: - Update the language reference - Update the loop vectorizer (to update the metadata when it unrolls) - Update the regular unroller - Update the alias analysis (maybe this is sufficient for basic support in BBVectorize) - is your current code close enough for this? - Update the BB vectorizer to prefer pairings from different iterations Thanks again, Hal> > BR, > -- > --Pekka > >
Pekka Jääskeläinen
2013-Feb-19  16:25 UTC
[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
On 02/19/2013 05:51 PM, Hal Finkel wrote:> Understood. If you have some time, it seems that there are several sub-tasks: > > - Update the language referenceDocument the additional optional iteration id argument to llvm.mem.parallel_loop_access? I'll do this.> - Update the loop vectorizer (to update the metadata when it unrolls) > - Update the regular unrollerI'll update the pocl's work-item replicator first (of which output is effectively the same as a fully unrolled parallel wiloop) to mark the iterations with the iteration argument and see where it gets the WG vectorization using the upstream BBVectorizer.> - Update the alias analysis (maybe this is sufficient for basic support in BBVectorize) - is your current code close enough for this?The AA should be trivial. I can take a look at this too and try to prepare a patch. This can be the first consumer for the optional parallel_loop_access argument so it fits to the same patch. The work-item AA implementation seems to be rather clean in pocl so it should not require much additional work to generalize it to support parallel loop iterations of any kind: http://bazaar.launchpad.net/~pocl/pocl/trunk/view/head:/lib/llvmopencl/WorkItemAliasAnalysis.cc I might try to upstream the OpenCL disjoint address space AA part while I'm at it. It can use the OpenCL kernel metadata for checking it's an OCL kernel that is being processed.> - Update the BB vectorizer to prefer pairings from different iterationsI'd leave this as a last task. It might find good enough pairings solely with the additional AA, like you wrote. Let's see. -- Pekka
Vladimir Guzma
2013-Feb-20  03:44 UTC
[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
Hi all,>> Hal, this OpenCL WG autovectorization work, unfortunately, is not my >> first >> priority task at work currently (more like a pet project), so I >> cannot make any >> promises on when I might find time to work on it again. So, if you >> want to >> see the parallel loop iteration MD happen sooner, I'd recommend you >> dig into >> it. I think we'd like to start from the scratch for the bbvectorizer >> utilization >> in pocl anyways, that is, would add the metadata support first and >> then use it >> in a fresh bbvectorizer version. The current hacked version in pocl >> seems not >> to be upstreamable easily as it has lagged behind some LLVM versions >> and is >> rather dirty. > > Understood. If you have some time, it seems that there are several sub-tasks: > > - Update the language reference > - Update the loop vectorizer (to update the metadata when it unrolls) > - Update the regular unroller > - Update the alias analysis (maybe this is sufficient for basic support in BBVectorize) - is your current code close enough for this?Current implementation of AA uses work item metadata as well as 'region' metadata identifiers (regions begin separated by barriers). So in order to provide similar functionality, parallel loop metadata would need both, 'loop ID' and 'loop iteration ID'. This is perhaps not much of a immediate concern with respect to current discussion, but can become trouble in case there are two consecutive loops, both marked with parallel loop metadata, both fully unrolled (or partially unrolled followed by some loop fusion/combining pass). In this case there is need for 'loop ID' to distinguish origin, since 'loop iteration ID' is not enough.> - Update the BB vectorizer to prefer pairings from different iterationsUpdating BB vectorizer to use work item metadata was rather trivial addition of a test for difference in identifiers, very similar to one in AA (though we also record position of the instruction in the originating block) and should be trivial to add to BBvectorizer as well, using parallel metadata. It become major mess once complains started about speed, e.g. pocl version does not have any search limits or maximum instr per group etc, so finding all candidate pairs become rather time consuming. So the candidate selection is not compatible with BB vectorizer, and whole bunch of code was removed… Anyways, perhaps interesting parts for integrating to BBVectorizer could be (crude) caching during replaceOutputs to be used when vectorizing phi nodes. There is also some vectorization of getelementpointer instructions, creation vectors of allocas to get better vector memory accesses, some magic about computing addresses of stride memory accesses using vectors, some tweaks to eliminate unneeded shuffle instructions in replacement inputs etc. There are lot of assumptions that the instructions to be vectorized are really identical from different work items (due to recorded position in the originating code), which may not be case in general BB vectorized cases. Anyways, if the loop metadata gets updated, I can have a look at updating AA and moving it from pocl to LLVM, but not likely this week (maybe Pekka can provide it sooner if there is a rush). I can not make any promises about BBVectorization atm, unfortunately. regards Vlado> > Thanks again, > Hal > >> >> BR, >> -- >> --Pekka >> >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Hal Finkel
2013-Feb-20  04:24 UTC
[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
----- Original Message -----> From: "Vladimir Guzma" <vladimir.guzma at tut.fi> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi>, "Tobias Grosser" <tobias at grosser.es>, "llvmdev at cs.uiuc.edu Dev" > <llvmdev at cs.uiuc.edu> > Sent: Tuesday, February 19, 2013 9:44:02 PM > Subject: Re: [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata) > > Hi all, > > >> Hal, this OpenCL WG autovectorization work, unfortunately, is not > >> my > >> first > >> priority task at work currently (more like a pet project), so I > >> cannot make any > >> promises on when I might find time to work on it again. So, if you > >> want to > >> see the parallel loop iteration MD happen sooner, I'd recommend > >> you > >> dig into > >> it. I think we'd like to start from the scratch for the > >> bbvectorizer > >> utilization > >> in pocl anyways, that is, would add the metadata support first and > >> then use it > >> in a fresh bbvectorizer version. The current hacked version in > >> pocl > >> seems not > >> to be upstreamable easily as it has lagged behind some LLVM > >> versions > >> and is > >> rather dirty. > > > > Understood. If you have some time, it seems that there are several > > sub-tasks: > > > > - Update the language reference > > - Update the loop vectorizer (to update the metadata when it > > unrolls) > > - Update the regular unroller > > - Update the alias analysis (maybe this is sufficient for basic > > support in BBVectorize) - is your current code close enough for > > this? > > Current implementation of AA uses work item metadata as well as > 'region' metadata identifiers (regions begin separated by barriers). > So in order to provide similar functionality, parallel loop metadata > would need both, 'loop ID' and 'loop iteration ID'. > > This is perhaps not much of a immediate concern with respect to > current discussion, but can become trouble in case there are two > consecutive loops, both marked with parallel loop metadata, both > fully unrolled (or partially unrolled followed by some loop > fusion/combining pass). In this case there is need for 'loop ID' to > distinguish origin, since 'loop iteration ID' is not enough.Agreed, but I think that the current loop.parallel metadata scheme already has that. The memory access metadata refers to the metadata marking the loop backedges.> > > - Update the BB vectorizer to prefer pairings from different > > iterations > > Updating BB vectorizer to use work item metadata was rather trivial > addition of a test for difference in identifiers, very similar to > one in AA (though we also record position of the instruction in the > originating block) and should be trivial to add to BBvectorizer as > well, using parallel metadata. > It become major mess once complains started about speed, e.g. pocl > version does not have any search limits or maximum instr per group > etc, so finding all candidate pairs become rather time consuming. > So the candidate selection is not compatible with BB vectorizer, and > whole bunch of code was removed…Interesting.> > Anyways, perhaps interesting parts for integrating to BBVectorizer > could be (crude) caching during replaceOutputs to be used when > vectorizing phi nodes. There is also some vectorization of > getelementpointer instructions, creation vectors of allocas to get > better vector memory accesses, some magic about computing addresses > of stride memory accesses using vectors, some tweaks to eliminate > unneeded shuffle instructions in replacement inputs etc. There are > lot of assumptions that the instructions to be vectorized are really > identical from different work items (due to recorded position in the > originating code), which may not be case in general BB vectorized > cases.To clarify, are these features that you've implemented in your version?> > Anyways, if the loop metadata gets updated, I can have a look at > updating AA and moving it from pocl to LLVM, but not likely this > week (maybe Pekka can provide it sooner if there is a rush). > I can not make any promises about BBVectorization atm, unfortunately.Great, thanks! -Hal> > regards > Vlado > > > > Thanks again, > > Hal > > > >> > >> BR, > >> -- > >> --Pekka > >> > >> > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
Hi,> On 02/19/2013 05:51 PM, Hal Finkel wrote: >> Understood. If you have some time, it seems that there are several >> sub-tasks: >> >> - Update the language referenceOn 02/19/2013 06:25 PM, Pekka Jääskeläinen wrote:> Document the additional optional iteration id argument to > llvm.mem.parallel_loop_access? I'll do this.I finally found some time to think about this. The current idea to mark the unrolled parallel iterations is the following. In the current llvm.mem.parallel_loop_access metadata format the accesses are marked like this (a store in a 2-level nested loop): ... store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0 ... !0 = metadata !{ metadata !1, metadata !2 } ; a list of parallel loop identifiers !1 = metadata !{ metadata !1 } ; an identifier for the inner parallel loop !2 = metadata !{ metadata !2 } ; an identifier for the outer parallel loop The idea was to add an optional extra iteration identifier to the MD of the parallel_loop_access node. In essence, we want to identify the loop and the unrolled iteration of each access to help (at least) the alias analysis. Say, we unroll the inner loop once: ... store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0 ... store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !3 ... !0 = metadata !{ metadata !{ metadata !1, i64 0 }, metadata !2 } !1 = metadata !{ metadata !1 } ; loop id inner !2 = metadata !{ metadata !2 } ; loop id outer !3 = metadata !{ metadata !{ metadata !1, i64 1 }, metadata !2 } The llvm.loop.parallel MD of the branch will point to !1 and !2 directly because the unrolled iterations should not (still) alias with accesses from any (rolled) iteration. The outer loop iteration !2 has an implicit "iteration id" of 0. Or, should we use the "unique self-referring metadata node" trick also here to mark the iteration identifiers? The integer ids there are kind of arbitrary, and the unrollers need to find out an unique int if they decide to unroll more. Could the following work more robustly? !0 = metadata !{ metadata !{ metadata !1, metadata !0 }, metadata !2 } !1 = metadata !{ metadata !1 } ; loop id inner !2 = metadata !{ metadata !2 } ; loop id outer !3 = metadata !{ metadata !{ metadata !1, metadata !3 }, metadata !2 } Here the self-reference back to the "iteration" metadata (!0 and !3) would force the uniqueness of an iteration MD. If we now unroll once the outer loop too we get to: ; for the inner loop iterations: !0 = metadata !{ metadata !{ metadata !1, metadata !0 }, metadata !2 } !1 = metadata !{ metadata !1 } ; loop id inner !2 = metadata !{ metadata !2 } ; loop id outer !3 = metadata !{ metadata !{ metadata !1, metadata !3 }, metadata !2 } ; for the outer loop iterations: !4 = metadata !{ metadata !{ metadata !1, metadata !4 }, metadata !{ metadata !2, metadata !4 } } !7 = metadata !{ metadata !{ metadata !1, metadata !3 }, metadata !{ metadata !2, metadata !7 } } The MD should still fulfill the requirement of not needing to know about the metadata: if the unroller unknowingly copies the iteration MD, we just won't get the AA benefits, but nothing should break. The new unrolled instructions would look like a single parallel loop iteration which can alias with each other. If it doesn't copy the MD, the loop loses the "parallel property". The llvm.loop.parallel metadata would still work as previously: it points only to the loop id metadata nodes which the iteration id nodes also point to. Any comments/thoughts/better ideas before I move forward with this? Next I plan to emit the MD from pocl and add an alias analyzer to LLVM that understands it. Thanks, -- Pekka
Apparently Analagous Threads
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)