On Fri, Dec 23, 2011 at 3:53 PM, Chris Lattner <clattner at apple.com> wrote:> > > On Dec 23, 2011, at 1:35 PM, Preston Briggs wrote: > > > Reading the comments in Analysis/DominanceFrontier.h, I see a note that the structure is deprecated and > > we're not to use it for anything new. > > > > Has it been replaced with something equally useful, or shall I redo the calculation for myself, or ...? > > We're hoping to remove them, because they're inherently an N^2 data structure and there are usually better ways to do things. > What are you trying to do? SSA construction? > > -ChrisTo do dependence analysis efficiently, we want to avoid testing all memory references against each other (another O(N^2) proposition). So we build FUD chains for memory references, guided by the Alias Sets analysis. It's very like SSA construction, but must make provision testing anti dependences. I had planned to use dominance frontiers to guide placement of phi nodes, as usual. I'm surprised they cause you a space problem. In my experience (admittedly, long ago), we never noticed a problem. We certainly had much less memory available at the time. Perhaps we could profitably re-visit how they are represented? Preston
On Dec 23, 2011, at 5:48 PM, Preston Briggs wrote:> To do dependence analysis efficiently, we want to avoid testing all > memory references against each other (another O(N^2) proposition). So > we build FUD chains for memory references, guided by the Alias Sets > analysis.Ok. Having worked in similar spaces (and caring about compile time :), I've quickly given up on approaches like that. Instead, I've found that lazy approaches (e.g. what is done in the "memory dependence analysis" pass) work better, because you burn compile time only when a client is making queries (instead of eagerly analyzing every possible alias relation). YMMV of course.> It's very like SSA construction, but must make provision > testing anti dependences. I had planned to use dominance frontiers to > guide placement of phi nodes, as usual.Ok, in that case, please check out include/llvm/Transforms/Utils/SSAUpdater.h, which is the preferred API (that doesn't use dom frontiers) to build SSA.> I'm surprised they cause you a space problem. In my experience > (admittedly, long ago), we never noticed a problem. We certainly had > much less memory available at the time. Perhaps we could profitably > re-visit how they are represented?Admittedly, our implementation is brutally inefficient :). That said, dom frontiers are another great example of a non-lazy data structure that you have to fully compute even if you end up not using it - and I'd much rather rip it out of LLVM than fix it. In my experience, there are better ways to solve problems that are traditionally formulated in terms of it. That said, my experience is limited to the domains that are already implemented in LLVM, so if you're trying to do something new... then that experience may not be very useful :) -Chris
On Dec 31, 2011, at 11:04 AM, Preston Briggs wrote:> Hi Chris, > > I wish we could talk about this at a white board, over a period of > weeks, but email will have to do…That would be nice :)> I don't entirely understand your position about dominance frontiers. > In my experience, they were trivial to compute, requiring very little > time, space, or code. Rereading Cytron et al., I would expect they'd > require O(N + E) time and O(N) space in most cases, and when we > consider that N and E are in nodes and edges in the CFG (i.e., pretty > small compared to the number of instructions), then I'd think we can > compute DF fram scratch in perhaps less time than required for a > single walk over the instructions in a routine. Probably the O(N) > memory allocations to store the results will dominate.Everything that you say is correct, but it is missing the point IMO. Regardless of the constant factor of computing DF for a function, you're doing work proportional to the size of the function speculatively, assuming it will be used. Lets be more concrete: you're using DF for some purpose: say rewriting a variable into SSA form. That variable may only be used in a small subset of the function, say within a loop body or some other scope. Does it make sense to compute DF for the entire function, rewrite the variable, then throw it away?> You mentioned updating... I wouldn't update, I'd just recompute. It > seems like LLVM is set up perfectly for this sort of thing. If the > CFG changes, you notice automagically that the dominator tree is > invalid and recompute it when necessary. Why not the same for DF?In my example above, of course it would be ridiculous to compute DF each time you want to rewrite a single variable. You need to reuse it, and some "cfg listener"-based approach might make sense. However, now you're in a position where you have little control over how and when it gets updated. Lets take a bigger example: say you're doing a loop transformation (like unswitching) which can occasionally require unstructured updates of the CFG. The natural way to write this transformation would be: for each loop if it is a candidate for unswitching hack up the CFG (e.g. duplicating a loop body) rewrite the variables that are now duplicated However, if you do that, your listener based approach will recompute DF for the entire function after each loop is processed. This is particularly bad given that we're working on natural loops, so even if there is unstructured control within the loop, the loop itself provides structure that recomputing DF from scratch ignores. Another way to write this is rewrite the transformation to process all of the loops in a function before rewriting the variables back into SSA form. However, this has a number of problems, including that you have to keep track of the variable equivalencies somehow, even across other loop updates. Alternatively, you can use the LLVM trick of rewriting things temporarily in terms of load/store to allocas. This then pessimizes size estimates for other loop unswitch operations. Both of these also have the small problem that it completely destroys the concept of LoopPassManager :)> You mentioned computing more than was absolutely necessary... I use > DF to place phi nodes. Seems like computing complete DF for a CFG > doesn't waste effort; we'll need the entire set of DF to handle all > the phi nodes.You certainly don't. void foo() { ... for () { int x; if (C) x = 1 else x = 2; use(x) } ... } rewriting "x" only requires a tiny amount of the CFG to be analyzed. If you're doing the initial rewrite from imperative load/stores into SSA then you have a lot of variables all across the function to rewrite. In this case, the tradeoff is more likely in favor of using DF to do it.> We're aiming to do a number of loop xforms, including parallelization, > loop fusion, distribution, rewriting of reductions and recurrences, > etc. The usual approach is to build a dependence graph connecting all > the memory references in a routine, then examine the graph looking for > connected components and such. Building the graph incrementally, on > demand, seems wasteful since we'll want the entire graph. Similarly, > since we're going to build the entire dependence graph, building a > complete set of FUD chains is also useful since we'll use the entire > thing.I don't have extensive experience with those loop transformations, so my perspective is limited. However, each of them seem to want to know very different kind of relations - e.g. some want cross loop iteration dependencies, some want within iteration dependences, some care about forward and others care about anti-dependences, etc. You'd really want to compute all of this? Further, many loop transformations have to immediately give up: for example, it might not be possible to vectorize a loop that contains a call to an unknown function, or a loop that does pointer chasing. Why burn compile time speculatively computing a bunch of information for these loops?>>> It's very like SSA construction, but must make provision >>> testing anti dependences. I had planned to use dominance frontiers to >>> guide placement of phi nodes, as usual. >> >> Ok, in that case, please check out include/llvm/Transforms/Utils/SSAUpdater.h, >> which is the preferred API (that doesn't use dom frontiers) to build SSA. > > What approach does it use? I look at the code, but don't really see > an explanation.Here is some background info: http://blog.llvm.org/2009/12/introduction-to-load-elimination-in-gvn.html http://blog.llvm.org/2009/12/advanced-topics-in-redundant-load.html -Chris
On Jan 6, 2012, at 5:08 PM, Chris Lattner wrote:>>> >>> It's very like SSA construction, but must make provision >>> testing anti dependences. I had planned to use dominance frontiers to >>> guide placement of phi nodes, as usual. >> >> Ok, in that case, please check out include/llvm/Transforms/Utils/SSAUpdater.h, >> which is the preferred API (that doesn't use dom frontiers) to build SSA. > > What approach does it use? I look at the code, but don't really see > an explanation.Sorry, my reading comprehension skills were low, I thought you were asking about MemDep for some reason. SSAUpdater uses simple local scans across the CFG, excelling at local updates of values. -Chris
On Fri, Jan 6, 2012 at 5:08 PM, Chris Lattner <clattner at apple.com> wrote:> [many interesting things, together with contributions from Daniel Berlin and Cameron Zwarich]Thanks.>> We're aiming to do a number of loop xforms, including parallelization, >> loop fusion, distribution, rewriting of reductions and recurrences, >> etc. The usual approach is to build a dependence graph connecting all >> the memory references in a routine, then examine the graph looking for >> connected components and such. Building the graph incrementally, on >> demand, seems wasteful since we'll want the entire graph. Similarly, >> since we're going to build the entire dependence graph, building a >> complete set of FUD chains is also useful since we'll use the entire >> thing. > > I don't have extensive experience with those loop transformations, > so my perspective is limited. However, each of them seem to want > to know very different kind of relations - e.g. some want cross loop > iteration dependencies, some want within iteration dependences, > some care about forward and others care about anti-dependences, etc. > You'd really want to compute all of this?Today, yes. Later, when I understand more, perhaps I'll be able to get by with less.> Further, many loop transformations have to immediately give up: > for example, it might not be possible to vectorize a loop that contains > a call to an unknown function, or a loop that does pointer chasing. > Why burn compile time speculatively computing a bunch of information for these loops?Because xforms of larger scope may be able to take advantage. Drawing on your examples, a loop containing a call can sometimes be distributed into several loops, some of which can be vectorized. A loop that does pointer chasing might be contained in another loop that can be parallelized. I don't imagine such things will be useful to everybody, but they're useful to us and we're happy to use an extra flag and pay the extra compile time. After all, it's so much better than having to restructure the code by hand. Preston