Hal Finkel
2013-Feb-20  04:24 UTC
[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
----- Original Message -----> From: "Vladimir Guzma" <vladimir.guzma at tut.fi> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi>, "Tobias Grosser" <tobias at grosser.es>, "llvmdev at cs.uiuc.edu Dev" > <llvmdev at cs.uiuc.edu> > Sent: Tuesday, February 19, 2013 9:44:02 PM > Subject: Re: [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata) > > Hi all, > > >> Hal, this OpenCL WG autovectorization work, unfortunately, is not > >> my > >> first > >> priority task at work currently (more like a pet project), so I > >> cannot make any > >> promises on when I might find time to work on it again. So, if you > >> want to > >> see the parallel loop iteration MD happen sooner, I'd recommend > >> you > >> dig into > >> it. I think we'd like to start from the scratch for the > >> bbvectorizer > >> utilization > >> in pocl anyways, that is, would add the metadata support first and > >> then use it > >> in a fresh bbvectorizer version. The current hacked version in > >> pocl > >> seems not > >> to be upstreamable easily as it has lagged behind some LLVM > >> versions > >> and is > >> rather dirty. > > > > Understood. If you have some time, it seems that there are several > > sub-tasks: > > > > - Update the language reference > > - Update the loop vectorizer (to update the metadata when it > > unrolls) > > - Update the regular unroller > > - Update the alias analysis (maybe this is sufficient for basic > > support in BBVectorize) - is your current code close enough for > > this? > > Current implementation of AA uses work item metadata as well as > 'region' metadata identifiers (regions begin separated by barriers). > So in order to provide similar functionality, parallel loop metadata > would need both, 'loop ID' and 'loop iteration ID'. > > This is perhaps not much of a immediate concern with respect to > current discussion, but can become trouble in case there are two > consecutive loops, both marked with parallel loop metadata, both > fully unrolled (or partially unrolled followed by some loop > fusion/combining pass). In this case there is need for 'loop ID' to > distinguish origin, since 'loop iteration ID' is not enough.Agreed, but I think that the current loop.parallel metadata scheme already has that. The memory access metadata refers to the metadata marking the loop backedges.> > > - Update the BB vectorizer to prefer pairings from different > > iterations > > Updating BB vectorizer to use work item metadata was rather trivial > addition of a test for difference in identifiers, very similar to > one in AA (though we also record position of the instruction in the > originating block) and should be trivial to add to BBvectorizer as > well, using parallel metadata. > It become major mess once complains started about speed, e.g. pocl > version does not have any search limits or maximum instr per group > etc, so finding all candidate pairs become rather time consuming. > So the candidate selection is not compatible with BB vectorizer, and > whole bunch of code was removed…Interesting.> > Anyways, perhaps interesting parts for integrating to BBVectorizer > could be (crude) caching during replaceOutputs to be used when > vectorizing phi nodes. There is also some vectorization of > getelementpointer instructions, creation vectors of allocas to get > better vector memory accesses, some magic about computing addresses > of stride memory accesses using vectors, some tweaks to eliminate > unneeded shuffle instructions in replacement inputs etc. There are > lot of assumptions that the instructions to be vectorized are really > identical from different work items (due to recorded position in the > originating code), which may not be case in general BB vectorized > cases.To clarify, are these features that you've implemented in your version?> > Anyways, if the loop metadata gets updated, I can have a look at > updating AA and moving it from pocl to LLVM, but not likely this > week (maybe Pekka can provide it sooner if there is a rush). > I can not make any promises about BBVectorization atm, unfortunately.Great, thanks! -Hal> > regards > Vlado > > > > Thanks again, > > Hal > > > >> > >> BR, > >> -- > >> --Pekka > >> > >> > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
Vladimir Guzma
2013-Feb-20  04:37 UTC
[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
>>> >>> - Update the language reference >>> - Update the loop vectorizer (to update the metadata when it >>> unrolls) >>> - Update the regular unroller >>> - Update the alias analysis (maybe this is sufficient for basic >>> support in BBVectorize) - is your current code close enough for >>> this? >> >> Current implementation of AA uses work item metadata as well as >> 'region' metadata identifiers (regions begin separated by barriers). >> So in order to provide similar functionality, parallel loop metadata >> would need both, 'loop ID' and 'loop iteration ID'. >> >> This is perhaps not much of a immediate concern with respect to >> current discussion, but can become trouble in case there are two >> consecutive loops, both marked with parallel loop metadata, both >> fully unrolled (or partially unrolled followed by some loop >> fusion/combining pass). In this case there is need for 'loop ID' to >> distinguish origin, since 'loop iteration ID' is not enough. > > Agreed, but I think that the current loop.parallel metadata scheme already has that. The memory access metadata refers to the metadata marking the loop back edges.Good.> >> >>> - Update the BB vectorizer to prefer pairings from different >>> iterations >> >> Updating BB vectorizer to use work item metadata was rather trivial >> addition of a test for difference in identifiers, very similar to >> one in AA (though we also record position of the instruction in the >> originating block) and should be trivial to add to BBvectorizer as >> well, using parallel metadata. >> It become major mess once complains started about speed, e.g. pocl >> version does not have any search limits or maximum instr per group >> etc, so finding all candidate pairs become rather time consuming. >> So the candidate selection is not compatible with BB vectorizer, and >> whole bunch of code was removed… > > Interesting. > >> >> Anyways, perhaps interesting parts for integrating to BBVectorizer >> could be (crude) caching during replaceOutputs to be used when >> vectorizing phi nodes. There is also some vectorization of >> getelementpointer instructions, creation vectors of allocas to get >> better vector memory accesses, some magic about computing addresses >> of stride memory accesses using vectors, some tweaks to eliminate >> unneeded shuffle instructions in replacement inputs etc. There are >> lot of assumptions that the instructions to be vectorized are really >> identical from different work items (due to recorded position in the >> originating code), which may not be case in general BB vectorized >> cases. > > To clarify, are these features that you've implemented in your version?Yes, these are there. As well as some stuff to clean up after vectorizer...>From performance point of view, addition of vectors of phi nodes was most beneficial for our main target (TTA architecture).regards Vlado> >> >> Anyways, if the loop metadata gets updated, I can have a look at >> updating AA and moving it from pocl to LLVM, but not likely this >> week (maybe Pekka can provide it sooner if there is a rush). >> I can not make any promises about BBVectorization atm, unfortunately. > > Great, thanks! > > -Hal > >> >> regards >> Vlado >>> >>> Thanks again, >>> Hal >>> >>>> >>>> BR, >>>> -- >>>> --Pekka >>>> >>>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >>
Hal Finkel
2013-Feb-20  05:33 UTC
[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
----- Original Message -----> From: "Vladimir Guzma" <vladimir.guzma at tut.fi> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi>, "Tobias Grosser" <tobias at grosser.es>, "llvmdev at cs.uiuc.edu Dev" > <llvmdev at cs.uiuc.edu> > Sent: Tuesday, February 19, 2013 10:37:38 PM > Subject: Re: [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata) > > >>> > >>> - Update the language reference > >>> - Update the loop vectorizer (to update the metadata when it > >>> unrolls) > >>> - Update the regular unroller > >>> - Update the alias analysis (maybe this is sufficient for basic > >>> support in BBVectorize) - is your current code close enough for > >>> this? > >> > >> Current implementation of AA uses work item metadata as well as > >> 'region' metadata identifiers (regions begin separated by > >> barriers). > >> So in order to provide similar functionality, parallel loop > >> metadata > >> would need both, 'loop ID' and 'loop iteration ID'. > >> > >> This is perhaps not much of a immediate concern with respect to > >> current discussion, but can become trouble in case there are two > >> consecutive loops, both marked with parallel loop metadata, both > >> fully unrolled (or partially unrolled followed by some loop > >> fusion/combining pass). In this case there is need for 'loop ID' > >> to > >> distinguish origin, since 'loop iteration ID' is not enough. > > > > Agreed, but I think that the current loop.parallel metadata scheme > > already has that. The memory access metadata refers to the > > metadata marking the loop back edges. > > Good. > > > >> > >>> - Update the BB vectorizer to prefer pairings from different > >>> iterations > >> > >> Updating BB vectorizer to use work item metadata was rather > >> trivial > >> addition of a test for difference in identifiers, very similar to > >> one in AA (though we also record position of the instruction in > >> the > >> originating block) and should be trivial to add to BBvectorizer as > >> well, using parallel metadata. > >> It become major mess once complains started about speed, e.g. pocl > >> version does not have any search limits or maximum instr per group > >> etc, so finding all candidate pairs become rather time consuming. > >> So the candidate selection is not compatible with BB vectorizer, > >> and > >> whole bunch of code was removed… > > > > Interesting. > > > >> > >> Anyways, perhaps interesting parts for integrating to BBVectorizer > >> could be (crude) caching during replaceOutputs to be used when > >> vectorizing phi nodes. There is also some vectorization of > >> getelementpointer instructions, creation vectors of allocas to get > >> better vector memory accesses, some magic about computing > >> addresses > >> of stride memory accesses using vectors, some tweaks to eliminate > >> unneeded shuffle instructions in replacement inputs etc. There are > >> lot of assumptions that the instructions to be vectorized are > >> really > >> identical from different work items (due to recorded position in > >> the > >> originating code), which may not be case in general BB vectorized > >> cases. > > > > To clarify, are these features that you've implemented in your > > version? > > Yes, these are there. As well as some stuff to clean up after > vectorizer... > From performance point of view, addition of vectors of phi nodes was > most beneficial for our main target (TTA architecture).It looks like the source is here: http://bazaar.launchpad.net/~pocl/pocl/trunk/view/head:/lib/llvmopencl/WIVectorize.cc Are there test cases for these new features? I'll look at this version; hopefully we'll be able to get most if not all of the improvements upstream. It looks like you've left your version under LLVM's license; is that correct? If not, may I have your explicit permission for relicensing? I've made some significant performance improvements in BBVectorize recently, and you may want to adopt those changes in your version as well (especially if people have complained about speed). Thanks again, Hal> > regards > Vlado > > > >> > >> Anyways, if the loop metadata gets updated, I can have a look at > >> updating AA and moving it from pocl to LLVM, but not likely this > >> week (maybe Pekka can provide it sooner if there is a rush). > >> I can not make any promises about BBVectorization atm, > >> unfortunately. > > > > Great, thanks! > > > > -Hal > > > >> > >> regards > >> Vlado > >>> > >>> Thanks again, > >>> Hal > >>> > >>>> > >>>> BR, > >>>> -- > >>>> --Pekka > >>>> > >>>> > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> > >> > >
Maybe Matching Threads
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)