On Dec 31, 2011, at 11:04 AM, Preston Briggs wrote:> Hi Chris,
>
> I wish we could talk about this at a white board, over a period of
> weeks, but email will have to do…
That would be nice :)
> I don't entirely understand your position about dominance frontiers.
> In my experience, they were trivial to compute, requiring very little
> time, space, or code. Rereading Cytron et al., I would expect they'd
> require O(N + E) time and O(N) space in most cases, and when we
> consider that N and E are in nodes and edges in the CFG (i.e., pretty
> small compared to the number of instructions), then I'd think we can
> compute DF fram scratch in perhaps less time than required for a
> single walk over the instructions in a routine. Probably the O(N)
> memory allocations to store the results will dominate.
Everything that you say is correct, but it is missing the point IMO. Regardless
of the constant factor of computing DF for a function, you're doing work
proportional to the size of the function speculatively, assuming it will be
used.
Lets be more concrete: you're using DF for some purpose: say rewriting a
variable into SSA form. That variable may only be used in a small subset of the
function, say within a loop body or some other scope. Does it make sense to
compute DF for the entire function, rewrite the variable, then throw it away?
> You mentioned updating... I wouldn't update, I'd just recompute.
It
> seems like LLVM is set up perfectly for this sort of thing. If the
> CFG changes, you notice automagically that the dominator tree is
> invalid and recompute it when necessary. Why not the same for DF?
In my example above, of course it would be ridiculous to compute DF each time
you want to rewrite a single variable. You need to reuse it, and some "cfg
listener"-based approach might make sense. However, now you're in a
position where you have little control over how and when it gets updated.
Lets take a bigger example: say you're doing a loop transformation (like
unswitching) which can occasionally require unstructured updates of the CFG.
The natural way to write this transformation would be:
for each loop
if it is a candidate for unswitching
hack up the CFG (e.g. duplicating a loop body)
rewrite the variables that are now duplicated
However, if you do that, your listener based approach will recompute DF for the
entire function after each loop is processed. This is particularly bad given
that we're working on natural loops, so even if there is unstructured
control within the loop, the loop itself provides structure that recomputing DF
from scratch ignores.
Another way to write this is rewrite the transformation to process all of the
loops in a function before rewriting the variables back into SSA form. However,
this has a number of problems, including that you have to keep track of the
variable equivalencies somehow, even across other loop updates.
Alternatively, you can use the LLVM trick of rewriting things temporarily in
terms of load/store to allocas. This then pessimizes size estimates for other
loop unswitch operations.
Both of these also have the small problem that it completely destroys the
concept of LoopPassManager :)
> You mentioned computing more than was absolutely necessary... I use
> DF to place phi nodes. Seems like computing complete DF for a CFG
> doesn't waste effort; we'll need the entire set of DF to handle all
> the phi nodes.
You certainly don't.
void foo() {
...
for () {
int x;
if (C)
x = 1
else
x = 2;
use(x)
}
...
}
rewriting "x" only requires a tiny amount of the CFG to be analyzed.
If you're doing the initial rewrite from imperative load/stores into SSA
then you have a lot of variables all across the function to rewrite. In this
case, the tradeoff is more likely in favor of using DF to do it.
> We're aiming to do a number of loop xforms, including parallelization,
> loop fusion, distribution, rewriting of reductions and recurrences,
> etc. The usual approach is to build a dependence graph connecting all
> the memory references in a routine, then examine the graph looking for
> connected components and such. Building the graph incrementally, on
> demand, seems wasteful since we'll want the entire graph. Similarly,
> since we're going to build the entire dependence graph, building a
> complete set of FUD chains is also useful since we'll use the entire
> thing.
I don't have extensive experience with those loop transformations, so my
perspective is limited. However, each of them seem to want to know very
different kind of relations - e.g. some want cross loop iteration dependencies,
some want within iteration dependences, some care about forward and others care
about anti-dependences, etc. You'd really want to compute all of this?
Further, many loop transformations have to immediately give up: for example, it
might not be possible to vectorize a loop that contains a call to an unknown
function, or a loop that does pointer chasing. Why burn compile time
speculatively computing a bunch of information for these loops?
>>> It's very like SSA construction, but must make provision
>>> testing anti dependences. I had planned to use dominance frontiers
to
>>> guide placement of phi nodes, as usual.
>>
>> Ok, in that case, please check out
include/llvm/Transforms/Utils/SSAUpdater.h,
>> which is the preferred API (that doesn't use dom frontiers) to
build SSA.
>
> What approach does it use? I look at the code, but don't really see
> an explanation.
Here is some background info:
http://blog.llvm.org/2009/12/introduction-to-load-elimination-in-gvn.html
http://blog.llvm.org/2009/12/advanced-topics-in-redundant-load.html
-Chris