On Mar 18, 2008, at 9:21 AM, Wojciech Matyjewicz wrote:> Hi, > >> Cool! I think the most critical part of this is to get a good >> interface for dependence analysis. There are lots of interesting >> implementations that have various time/space tradeoffs. >> >> For example, it would be great if Omega was available as an option, >> even if the compiler didn't use it by default. This argues for >> making >> dependence analysis implementations pluggable like alias analyses. > > Yes, I also thought about it that way. I think we may look at the > dependence analysis in LLVM at three levels (from the lowest to the > highest one): > > 1) Testing for dependence given two sequences of GEP indices assuming > that base pointers are the same. The indices could have a SCEV form or > even be preprocessed to something simpler (affine expressions for > example). > > 2) Testing for dependence of two instructions (that access memory). It > would use alias analysis to answer some queries immediately, or to > check > whether two GEP base pointers equal. If the latter is the case, 1) > would > be used to check for dependences. > > 3) Complex queries (for example: does the given loop carry > dependence?). > It would use 2) and summarize its results. > > Only the first level could be pluggable allowing to interchange or > chain > core dependency testing techniques. I think there will be no use in > making pluggable the higher ones (please, correct me if I am wrong). > This approach would require to divide the analysis structure into two > parts, say; DependenceAnalysis and IndexingTester.This all sounds great to me!> That said, I must admit I haven't made it that way in my prototype. I > have it in mind, but I'm currently trying to keep things simple and > just > to check whether the precision of my implementation is worth anything.I'm fine with starting simple and generalizing it out from there. I'd actually recommend against trying to implement a maximally precise dependence analyzer without a client. With no client, there is no way to test that you're getting correct results and whether the improvements in precision are actually useful. I'd suggest starting out with a simple checker, e.g. that handles simple affine cases, and some client (a loop xform?). This should be enough to get the transformation working on simple loops, and when the client is tested and working there, you can use it to evaluate whether the precision improvements are worthwhile. If you're looking for a simple client, I'd suggest something like loop reversal. This requires a pretty simple dependence test and can be useful for some targets. The idea is to turn: for (i = 0 .. 100) A[i] = 4; into: for (i = 100 .. 0) A[i] = 4; Another transform that needs dependence analysis which would be even more useful (but still very simple) is a "loops to memset/memcpy" pass. This optimization would speed up the 'viterbi' program in the test suite a *lot* for the various 'history' loops. A simple version of this is: for (i = 0 .. 100) A[i] = 0; -> memset(A, 0, sizeof(A[0])*100);>> I'd suggest looking at: >> >> Using the chains of recurrences algebra for data dependence testing >> and induction variable substitution >> MS Thesis, Johnie Birch >> >> Array Data Dependence Testing with the Chains of Recurrences Algebra >> http://citeseer.ist.psu.edu/vanengelen04array.html >> >> An empirical evaluation of chains of recurrences for array dependence >> testing >> http://doi.acm.org/10.1145/1152154.1152198 > > I've read the second one, but am not sure if it's easy to implement if > overflow and unknown signedness are taken into account. For now, I > will > stick to the classical Banerjee test. If time allows I'll return to > the > article.Ok. Starting simple is good. Thanks again for working on this, this is a major hole in llvm, -Chris
Chris Lattner wrote:> I'm fine with starting simple and generalizing it out from there. I'd > actually recommend against trying to implement a maximally precise > dependence analyzer without a client. With no client, there is no way > to test that you're getting correct results and whether the > improvements in precision are actually useful. > > I'd suggest starting out with a simple checker, e.g. that handles > simple affine cases, and some client (a loop xform?). This should be > enough to get the transformation working on simple loops, and when the > client is tested and working there, you can use it to evaluate whether > the precision improvements are worthwhile.In fact, I've a client to test dependence analysis. It's a simple loop parallelization pass. Basically, it tries to divide the set of iterations into smaller sets and execute them in separate threads. The loop is parallelized if it has appropriate shape (i.e. is rotated, has canonical indvar), has known iteration count (at runtime) and doesn't carry any dependence. The new dependence analysis is used to check if there are no memory dependences. Register dependences are simpler to detect as they are introduced only by the loop header PHI nodes. Some register dependences can be eliminated (in case of scalar reduction, for example). This pass is almost finished with some minor FIXMEs for corner cases. If there is an interest I can share/contribute it soon. However, I wouldn't expect much from it and would rather treat it as a toy. After your advice, I am going to prepare (locally) custom tests for llvm-test testsuite. The number of loops not carrying memory dependences, or parallelizable ones can be a good measure of dependence analysis precision. Comparing output of parallelized programs to the original ones would, probably, help to check its correctness.> If you're looking for a simple client, I'd suggest something like loop > reversal.I think it would exercise dependence analysis in the same way as the above pass. Off course, it's still worthy to implement it. Unfortunately, I am not sure if I'll have time for it in the nearest future. Maybe someone else is interested?:)> Another transform that needs dependence analysis which would be even > more useful (but still very simple) is a "loops to memset/memcpy" > pass. This optimization would speed up the 'viterbi' program in the > test suite a *lot* for the various 'history' loops.Ok, but, unfortunately, as above... Wojtek
On Mar 19, 2008, at 5:35 PM, Wojciech Matyjewicz wrote:> Chris Lattner wrote: >> I'm fine with starting simple and generalizing it out from there. >> I'd >> actually recommend against trying to implement a maximally precise >> dependence analyzer without a client. With no client, there is no >> way >> to test that you're getting correct results and whether the >> improvements in precision are actually useful. >> >> I'd suggest starting out with a simple checker, e.g. that handles >> simple affine cases, and some client (a loop xform?). This should be >> enough to get the transformation working on simple loops, and when >> the >> client is tested and working there, you can use it to evaluate >> whether >> the precision improvements are worthwhile. > > In fact, I've a client to test dependence analysis. It's a simple loop > parallelization pass. Basically, it tries to divide the set of > iterations into smaller sets and execute them in separate threads. The > loop is parallelized if it has appropriate shape (i.e. is rotated, has > canonical indvar), has known iteration count (at runtime) and doesn't > carry any dependence. The new dependence analysis is used to check if > there are no memory dependences. Register dependences are simpler to > detect as they are introduced only by the loop header PHI nodes. Some > register dependences can be eliminated (in case of scalar reduction, > for > example). This pass is almost finished with some minor FIXMEs for > corner > cases. If there is an interest I can share/contribute it soon. > However, > I wouldn't expect much from it and would rather treat it as a toy. > > After your advice, I am going to prepare (locally) custom tests for > llvm-test testsuite. The number of loops not carrying memory > dependences, or parallelizable ones can be a good measure of > dependence > analysis precision. Comparing output of parallelized programs to the > original ones would, probably, help to check its correctness.Makes sense, sounds fun! -Chris