John Brawn via llvm-dev
2018-Apr-18 15:50 UTC
[llvm-dev] [RFC] Making GVN able to visit the same block more than once
Introduction =========== I'm currently in the middle of what initially looked to be a simple change in GVN but is turning out to have unexpected consequences that are turning out to be quite difficult to resolve, so I thought I'd send out an RFC to make sure that I'm not barking up the wrong tree with how I'm trying to do this. Motivation and current behaviour =============================== The motiviating example here for what I'm trying to do is std::min_element, or rather std::min_element after it's inlined into a function where the result is dereferenced. If we have fload min_element_example(float *first, float *last) { return *std::min_element(first, last); } then after inlining we have something that looks like float min_element(float *first, float *last) { for (float *p = first; p != last; ++p) { if (*p < *first) first = p; } return *first; } What I want to happen here is eliminate those loads of *first and get something that looks like float min_element_optimized(float *first, float *last) { float best = *first; for (float *p = first; p != last; ++p) { float pval = *p; if (pval < best) best = pval; } return best; } This looks like partial redundancy elimination to me: if we look at the *first in the loop then its value is one of: * The value of *first from the entry if this is the first iteration. * The value of *p from the previous iteration if the 'if' was true. * The value of *first from the previous iteration if the 'if' was false. We have two available values (those in the loop) and one unavailable values (the one from the entry block) so the value is partially redundant. There are two problems here preventing GVN PRE from doing anything: * The 'if' becomes a select and GVN and MemoryDependenceAnalysis can't cope with selects. This isn't too hard I think, and I have a prototype for this which essentially treats them as kind of like a phi and that mostly works. * We need to look at the same block more than once, and have more than one value available in a block, and this completely breaks things. This is the part that this RFC is about. Initial attempt ============== If we turn off select generation (so we can ignore the first of the two problems) then the IR we have coming into GVN looks like this: define float @min_element_phi(float* %first, float* %last) { entry: br label %for.body for.body: %p = phi float* [ %first, %entry ], [ %incdec.ptr, %for.inc ] %first.addr.1 = phi float* [ %first, %entry ], [ %first.addr.2, %for.inc ] %pval = load float, float* %p, align 4 %firstval = load float, float* %first.addr.1, align 4 %cmp1 = fcmp olt float %pval, %firstval br i1 %cmp1, label %if.then, label %for.inc if.then: br label %for.inc for.inc: %first.addr.2 = phi float* [ %p, %if.then ], [ %first.addr.1, %for.body ] %incdec.ptr = getelementptr inbounds float, float* %p, i64 1 %cmp = icmp eq float* %incdec.ptr, %last br i1 %cmp, label %exit, label %for.body exit: %ret = load float, float* %first.addr.2, align 4 ret float %ret } GVN calls MemoryDependenceAnalysis::getNonLocalPointerDependency to find the non-local dependencies of the load %ret. What it then does is: * Starts looking for an available value of *%first.addr.2 starting in exit. * Doesn't find anything in exit, goes to for.inc. * Doesn't find anything in for.inc, phi translates to be looking for a value of *%p in if.then. * Doesn't find anything in if.then, goes to for.body. * Finds the value of pval in for.body. * Goes back to for.inc and phi-translates to be looking for the value of *%first.addr.1 in for.body. * Gives up because we're already looked at for.body for %p. The thing that keeps track of which blocks we've already looked at is DenseMap<BasicBlock *, Value *> Visited, which is a map of a block to the value that we looked at that block for. The way the logic works is that is we try to visit a block that's already been visited: * If we previously visited it for the same value that we're now visiting it for then we can safely ignore the block. * If we're visiting it for a different value then we can't handle it and give up. The approach I tried here is: * Turn Visited into a set of <BasicBlock*,Value*> pairs and keep the behaviour of ignoring blocks that we've already visited for a value but allow visiting a block more than once if it's for different values. * Adjust NonLocalDepResult to mark a result as originating in the block we started looking in (after phi-translation), not the one we eventually found it in. With this GVN then goes on to optimise this function exactly as we want. However we then get problems elsewhere. The problem =========== Let's take a look at this function: define i32 @multiple_path_test(i32* %ptr1, i32* %ptr2, i32* %ptr3) { entry: %val1 = load i32, i32* %ptr1, align 4 %val2 = load i32, i32* %ptr2, align 4 %val3 = load i32, i32* %ptr3, align 4 %cmp1 = icmp slt i32 %val1, %val2 br i1 %cmp1, label %a, label %b a: %cmp2 = icmp slt i32 %val1, %val3 br i1 %cmp2, label %end, label %b b: %bphi = phi i32* [ %ptr2, %a ], [ %ptr3, %entry ] br label %end end: %retptr = phi i32* [ %ptr1, %a ], [ %bphi, %b ] %ret = load i32, i32* %retptr, align 4 ret i32 %ret } Here this load of %ret is fully redundant: on every path from entry to end there is a load which already has the value that %ret would load. Currently in GVN after it does AnalyzeLoadAvailability what we have is: * %val1 is available in %entry (for the entry->a->end path) * %bphi is unknown in %b (we tried to visit %entry for %ptr3 but failed because we already visited it for %ptr1) What then happens is: * We have one available and one unavailable value, so PRE is done * This causes a load of %bphi to be inserted in %b * We then analyze the load of %bphi in %b * This turns out to be fully redundant and is turned into a phi of %val2 and %val3. With the above change to allow visiting the same block more than once then what we get is is: * %val1 is available in %a (for the a->end edge) * %val2 is available in %a (for the a->b edge) * %val3 is available in %entry (for the entry->b edge) What then happens is: * There are no unavailable values, so full redundancy elimination is done * ConstructSSAForLoadSet is called to do PHI construction * This then uses SSAUpdater * %val2 is set as the value for %a * %val1 is ignored as we already have a value for %a * %val3 is set as the value for %entry * Phi construction is done which completely ignores %val1 The problem here is that SSAUpdater follows the paradigm of one block = one available value, where if a value is available in a block then all successors to that block will use that value. This is not valid here, as for %a which value you get depends on which successor you go to. Currently I'm in the middle of modifying SSAUpdater and AvailableValueInBlock to be able to understand that which value you get can depend on both the source and destination block, and make it use that information in PHI construction. Questions ======== Some questions I have: * Is GVN actually the right place to do this transformation, or is there some other better place that I've missed? * I'm pretty sure I do need to do this "visit the same block more than once" thing, especially for select as the values on both sides of the select can be in the same block that the select is in so we have to visit the same block three times, but perhaps there's some other way to handle this? * Maybe instead of handling needing different values on different edges instead I should be making AnalyzeLoadAvailability (or something in MemoryDependenceAnalysis) recognise that this is happening and insert an 'unknown' into the appropriate place (%b in the above example) so that we get the same behaviour that we currently have? John
Daniel Berlin via llvm-dev
2018-Apr-19 03:02 UTC
[llvm-dev] [RFC] Making GVN able to visit the same block more than once
> > > ========> > Some questions I have: > * Is GVN actually the right place to do this transformation, or is there > some > other better place that I've missed? >Truthfully, at this point, GVN is probably not the right place for almost anything you want to do. As you've discovered, it's a mess at this point. The downside is the transform requires building some stuff from scratch elsewhere (NewGVN, etc). People often just want to get stuff done, so they don't. (i understand the mentality, for sure, i'm just stating the state of the world) On plus side, a bunch of people have posted mostly-correct, working, full pre implementations that are hooked into NewGVN. Note also that one thing you are likely hitting is that memdep's caching behavior is *badly* broken. IIRC (it's been a few weeks), it doesn't handle he case a non-local dep becomes local (or maybe it was vice versa) properly at all. See the pr's referenced by https://reviews.llvm.org/D45238 The " * The 'if' becomes a select and GVN and MemoryDependenceAnalysis can't cope with selects. This isn't too hard I think, and I have a prototype for this which essentially treats them as kind of like a phi and that mostly works." problem is well known now, and imho, this is yet another reason we should stop canonicalizing to select this early. There is a PR or two open about this, and work has been done to make it so we can. Trying to treat it like a phi, as you said, somewhat works, but there are cases it will not (detailed in the prs), and you really do have to have real control flow to compute the dataflow problem sanely (real only in the sense of seeing it as control flow. However, you would have to have fake instructions and fake blocks). In particular, the "treat as phi" situation fails very quickly when you have to insert instructions in the blocks, have select over select etc. So sorry, i don't have great answers for you here. But i don't think trying to hack this through gvn and memdep, vs work on better architecture, is the right solution. But i've also felt that for a long time, and spent quite a while on replacements :) So you may also want a second opinion. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180418/4c7d9f3f/attachment.html>