Tobias Grosser
2010-Jun-18 14:03 UTC
[LLVMdev] Speculative Loop Parallelization on LLVM IR
Hi Javed, On 06/18/10 14:07, Javed Absar wrote:> Hi: > I worked on loop-optimizations techniques previously using ORC. > Currently i see lots of research on speculative parallelization of > loops ... specially because multicores [for embedded systems] is > becoming popular. In other words, because you have > multiple cores, you can start some loops [Fast-Track] as if there is no > or low data-dependence [Partial Parallel Loop-Nest]. > The normal part is to check later if there was some violation and then > rollback etc. [something like that ... I dont think people want to hear the > full story here]. > What do you guys think about the feasibility of "speculative run-time > loop optimization' using LLVM IR as base. > I know there are lots of issues to it - > 1. Loop transformation requires tools like PolyLib (Polyhedral > analysis). They are too computationally expensive for > run-time optimization.There is Polly (still in development) http://wiki.llvm.org/Polyhedral_optimization_framework And to say this. The current problem is not very often the computationally complexity. However it might be that we have not yet reached the point, when this gets relevant.> 2. I am not clear how to launch threads at LLVM IR level (you do need to > create new threads for some of these RT speculative loop parallelization > techniques).You could add calls to the posix functions that create a thread to the LLVM-IR or you could use some higher level libraries that support thread creation. The OpenMP runtime libraries e.g. can help you with this. Here a link to the GNU libaries that could be used to try this approach. http://gcc.gnu.org/onlinedocs/libgomp/> Is somebody already doing something like this? Does this proposal make > sense even to start...I do not yet understand what you want to do. You just want to execute some loops in parallel without knowing if they have data dependencies. And than afterwords you would like to check if there where dependencies. How would you check for violated dependencies? Can you give an example of a loop, that would have no dependencies and that you would like to improve? Tobi
Hi Tobias: Thanks for replying . So if I understand correctly, in LLVM currently, the Polyhedral model is being built ( LLVM IR -------> Poly Model ----------> LLVM IR ). This is for compile-time optimizations of loop-nests [e.g. loop-transformations to expose parallelism or improve locality etc]. Yes, thats great for optimizing loop-nests. As an additional, since the real value of LLVM to me is run-time optimizations possibilities, some loop-nest optimizations "needs" to be done at run-time. I say "needs" because many loops have unresolvable data-dependencies at compile-time [for example FOR i = 1 to N A[ B[i] ] = A [ i ] + func (....) END Can the iterations of this loop be run in parallel? ] There are two approaches for with the above kind of problems. One is inspector/executor, other is speculation. To keep the story short - basically, you run the loop-iterations in parallel and verify in the end if data-dependencies were violated. If yes, you rewind and run the loop sequentially. If for that particular case there was no data-dependencies violated, you have gained in execution time [yes, there is cost involved in verifying and nett gain is not always +ve ]. OK, so whats my point? To be able to do at least some loop-transformations at run-time to expose parallelism etc, perhaps some kind of LLVM IR --> Poly ---> LLVM IR support at run-time may be required. Definitely a scaled down version, since polyhedral transformations need a lot of processing in my opinion. So if i understand, we are not there yet .... and may be i can come back with some proposal/ideas and cross-check it with you guys. Thanks Javed On 18 June 2010 19:33, Tobias Grosser <grosser at fim.uni-passau.de> wrote:> Hi Javed, > > > > On 06/18/10 14:07, Javed Absar wrote: > >> Hi: >> I worked on loop-optimizations techniques previously using ORC. >> Currently i see lots of research on speculative parallelization of >> loops ... specially because multicores [for embedded systems] is >> becoming popular. In other words, because you have >> multiple cores, you can start some loops [Fast-Track] as if there is no >> or low data-dependence [Partial Parallel Loop-Nest]. >> The normal part is to check later if there was some violation and then >> rollback etc. [something like that ... I dont think people want to hear >> the >> full story here]. >> What do you guys think about the feasibility of "speculative run-time >> loop optimization' using LLVM IR as base. >> I know there are lots of issues to it - >> 1. Loop transformation requires tools like PolyLib (Polyhedral >> analysis). They are too computationally expensive for >> run-time optimization. >> > There is Polly (still in development) > http://wiki.llvm.org/Polyhedral_optimization_framework > > And to say this. The current problem is not very often the computationally > complexity. However it might be that we have not yet reached the point, when > this gets relevant. > > > 2. I am not clear how to launch threads at LLVM IR level (you do need to >> create new threads for some of these RT speculative loop parallelization >> techniques). >> > You could add calls to the posix functions that create a thread to the > LLVM-IR or you could use some higher level libraries that support thread > creation. > > The OpenMP runtime libraries e.g. can help you with this. Here a link to > the GNU libaries that could be used to try this approach. > http://gcc.gnu.org/onlinedocs/libgomp/ > > > Is somebody already doing something like this? Does this proposal make >> sense even to start... >> > I do not yet understand what you want to do. > > You just want to execute some loops in parallel without knowing if they > have data dependencies. And than afterwords you would like to check if there > where dependencies. > > How would you check for violated dependencies? > > Can you give an example of a loop, that would have no dependencies and that > you would like to improve? > > Tobi >-- my homepage: http://www.javedabsar.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100621/0d67fb5e/attachment.html>
On 21 June 2010 06:12, Javed Absar <javed.absar at gmail.com> wrote:> OK, so whats my point? To be able to do at least some loop-transformations > at run-time to expose parallelism etc, perhaps some kind of LLVM IR --> Poly > ---> LLVM IR > support at run-time may be required. Definitely a scaled down version, since > polyhedral transformations need a lot of processing in my opinion.Hi Javed, This is quite interesting, but there are many dangers that can be easily forgotten. First and most obvious, you'll have to put all computation into account when comparing if there was a net gain, not only the vectorized execution. It might not be trivial to calculate the gain if you don't have control of the cycles or if the pipeline is too deep. In big loops, it makes little difference, but there will be a threshold (depending on machine configurations etc) that there will be a big hit on performance, especially on program start-up, which is the most critical to the user. Another phantom is that, when dealing with pointers (nor arrays, as you suggest), the locality is relative and depend on *each* execution. You might tage the loop as "safe" if at first it seems so, but you still have to check *every* time for disparities not to get caught in that trap. That also add to the "extra cost" this mechanism brings. Last, as you said, the poly is very expensive. Stripped down versions might not be suitable to analyse all cases, so you might end up with a handful of sub-versions, and that adds time and space complexity to the program. This might also be running on a JITed environment, which also adds to the invisible cost, with unpredictable cache hits, memory usage etc. My tuppence. cheers, --renato
On Mon, Jun 21, 2010 at 1:12 AM, Javed Absar <javed.absar at gmail.com> wrote:> Hi Tobias: > > Thanks for replying . So if I understand correctly, in LLVM currently, the > Polyhedral model is being built ( LLVM IR -------> Poly Model ----------> > LLVM IR ). > This is for compile-time optimizations of loop-nests [e.g. > loop-transformations to expose parallelism or improve locality etc]. Yes, > thats great for optimizing loop-nests. > > As an additional, since the real value of LLVM to me is run-time > optimizations possibilities, some loop-nest optimizations "needs" to be done > at run-time. > I say "needs" because many loops have unresolvable data-dependencies at > compile-timeIn cases where it matters, most compilers will perform function versioning and include data dependence if checks to check at runtime which version (parallel or non) can be run. Then the JIT can simply eliminate the branches it proves are live/dead and you are left with either the parallelized or not version. As you say, this is the standard approach, why would you need to run Poly at runtime?
On Sun, Jun 20, 2010 at 10:12 PM, Javed Absar <javed.absar at gmail.com> wrote:> Hi Tobias: > > Thanks for replying . So if I understand correctly, in LLVM currently, the > Polyhedral model is being built ( LLVM IR -------> Poly Model ----------> > LLVM IR ). > This is for compile-time optimizations of loop-nests [e.g. > loop-transformations to expose parallelism or improve locality etc]. Yes, > thats great for optimizing loop-nests. > > As an additional, since the real value of LLVM to me is run-time > optimizations possibilities, some loop-nest optimizations "needs" to be done > at run-time. > I say "needs" because many loops have unresolvable data-dependencies at > compile-time > [for example > > FOR i = 1 to N > A[ B[i] ] = A [ i ] + func (....) > END > > Can the iterations of this loop be run in parallel? > ] > > There are two approaches for with the above kind of problems. One is > inspector/executor, other is speculation. > To keep the story short - basically, you run the loop-iterations in > parallel and verify in the end if data-dependencies were violated. If yes, > you rewind and run the loop sequentially. > If for that particular case there was no data-dependencies violated, you > have gained in execution time [yes, there is cost involved in verifying and > nett gain is not always +ve ]. > > OK, so whats my point? To be able to do at least some loop-transformations > at run-time to expose parallelism etc, perhaps some kind of LLVM IR --> Poly > ---> LLVM IR > support at run-time may be required. Definitely a scaled down version, since > polyhedral transformations need a lot of processing in my opinion. > So if i understand, we are not there yet .... and may be i can come back > with some proposal/ideas and cross-check it with you guys.Do you really need to do transformations at run time ? http://www.springerlink.com/content/ly5t6f8fyyr28m8y/ :) - Devang
Maybe Matching Threads
- [LLVMdev] Speculative Loop Parallelization on LLVM IR
- [LLVMdev] Speculative Loop Parallelization on LLVM IR
- [LLVMdev] Speculative Loop Parallelization on LLVM IR
- [LLVMdev] Speculative Loop Parallelization on LLVM IR
- [cfe-dev] proposal - pragma section directive in clang