So, there are two primary ideas behind SSA form management in the loop optimizers of LLVM: - Require LCSSA form input, leverage its (very powerful) guarantees to simplify maintaining SSA form, and also maintain LCSSA form. - Don't bother with LCSSA form input, assume the worst, and use powerful incremental SSA formation utilities built on SSAUpdater to form SSA on demand when needed. (Note, there are plenty of places where SSAUpdater makes sense, so this isn't really about doing away with it at all.) In discussions, Andy had expressed a desire to move entirely away from LCSSA and Nick had expressed a desire to do the opposite, so I'd like to start a proper discusison of what people think and why. I've worked a lot of preserving both LCSSA and LoopSimplify form in all of our loop passes recently thanks to yanking off the bandaid we've been relying on for the past N years of letting the LoopPassManager simply re-create these at all loop nest levels on the fly as necessary. During the course of that I'm starting to form an opinion on the subject as well. I think that SSAUpdater works *great* for passes like GVN and friends that need to fast, incremental patching of the SSA graph. I also think it is completely incompatible with LCSSA in its current form. It just isn't built in a way that would be friendly to preserving this kind of information. I think that's OK, but it means that I think we *can't* mix the two solutions effectively long term. PR18688 is the current manifestation of this madness. Consistently, I am finding that doing incremental update and maintaining canonical form for loops is *substantially* simpler with LoopSimplify+LCSSA. The combination is just incredibly powerful. It also has a huge advantage in the way it is structures the updates: they avoid perturbing nested or enclosing loops. This property makes them ideal for working inside-out and iteratively across the loop nest as it is simplified. So I would personally want to see a really strong argument against relying on LCSSA. But I'm open to that argument existing, and sending this email to tease it out. =] If we decide to go forward with LCSSA as the canonical form for loops, I'd like to merge LCSSA and LoopSimplify into a single canonicalization pass, and then I'll work harder at re-casting the existing loop optimizations to leverage LCSSA more heavily and simplify their preservation of it as a consequence. Right now, we're just brute-force recomputing LCSSA because we don't really rely on it coming into the pass. It's kind of the worst of both worlds. =/ -Chandler -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140201/54c56190/attachment.html>
----- Original Message -----> From: "Chandler Carruth" <chandlerc at gmail.com> > To: "Andy Trick" <atrick at apple.com>, nicholas at mxc.ca, "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Saturday, February 1, 2014 6:33:26 AM > Subject: [LLVMdev] [RFC] LCSSA vs. SSAUpdater ... FIGHT! > > > > So, there are two primary ideas behind SSA form management in the > loop optimizers of LLVM: > > > - Require LCSSA form input, leverage its (very powerful) guarantees > to simplify maintaining SSA form, and also maintain LCSSA form. > > > - Don't bother with LCSSA form input, assume the worst, and use > powerful incremental SSA formation utilities built on SSAUpdater to > form SSA on demand when needed. > > > (Note, there are plenty of places where SSAUpdater makes sense, so > this isn't really about doing away with it at all.) > > > > > > In discussions, Andy had expressed a desire to move entirely away > from LCSSA and Nick had expressed a desire to do the opposite, so > I'd like to start a proper discusison of what people think and why. > > > > > I've worked a lot of preserving both LCSSA and LoopSimplify form in > all of our loop passes recently thanks to yanking off the bandaid > we've been relying on for the past N years of letting the > LoopPassManager simply re-create these at all loop nest levels on > the fly as necessary. During the course of that I'm starting to form > an opinion on the subject as well. > > > I think that SSAUpdater works *great* for passes like GVN and friends > that need to fast, incremental patching of the SSA graph. I also > think it is completely incompatible with LCSSA in its current form. > It just isn't built in a way that would be friendly to preserving > this kind of information. I think that's OK, but it means that I > think we *can't* mix the two solutions effectively long term. > PR18688 is the current manifestation of this madness. > > > Consistently, I am finding that doing incremental update and > maintaining canonical form for loops is *substantially* simpler with > LoopSimplify+LCSSA. The combination is just incredibly powerful. It > also has a huge advantage in the way it is structures the updates: > they avoid perturbing nested or enclosing loops. This property makes > them ideal for working inside-out and iteratively across the loop > nest as it is simplified. > > > So I would personally want to see a really strong argument against > relying on LCSSA. But I'm open to that argument existing, and > sending this email to tease it out. =] > If we decide to go forward > with LCSSA as the canonical form for loops, I'd like to merge LCSSA > and LoopSimplify into a single canonicalization pass,What is the benefit of merging them? LoopSimplify is certainly useful without LCSSA, and LCSSA is certainly useful without LoopSimplify's canonicalization. Also, and I don't know if this matters, but LoopSimplify can "fail" (for non-natural loops, at least), LCSSA cannot fail. -Hal> and then I'll > work harder at re-casting the existing loop optimizations to > leverage LCSSA more heavily and simplify their preservation of it as > a consequence. Right now, we're just brute-force recomputing LCSSA > because we don't really rely on it coming into the pass. It's kind > of the worst of both worlds. =/ > > > -Chandler > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
On Sat, Feb 1, 2014 at 10:51 PM, Hal Finkel <hfinkel at anl.gov> wrote:> What is the benefit of merging them? LoopSimplify is certainly useful > without LCSSA, and LCSSA is certainly useful without LoopSimplify's > canonicalization. Also, and I don't know if this matters, but LoopSimplify > can "fail" (for non-natural loops, at least), LCSSA cannot fail.Forming LCSSA is simpler with simplified loops, and when the loop is simplified we should take advantage of it. Also, by doing them at the same time we can have better locality and fewer trips over the various data structures. There's no important reason, it just seems a nicer organization to have a single "canonicalize" step. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140201/b6b2b990/attachment.html>
> On Feb 1, 2014, at 10:51 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > ----- Original Message ----- >> From: "Chandler Carruth" <chandlerc at gmail.com> >> To: "Andy Trick" <atrick at apple.com>, nicholas at mxc.ca, "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> >> Sent: Saturday, February 1, 2014 6:33:26 AM >> Subject: [LLVMdev] [RFC] LCSSA vs. SSAUpdater ... FIGHT! >> >> >> >> So, there are two primary ideas behind SSA form management in the >> loop optimizers of LLVM: >> >> >> - Require LCSSA form input, leverage its (very powerful) guarantees >> to simplify maintaining SSA form, and also maintain LCSSA form. >> >> >> - Don't bother with LCSSA form input, assume the worst, and use >> powerful incremental SSA formation utilities built on SSAUpdater to >> form SSA on demand when needed. >> >> >> (Note, there are plenty of places where SSAUpdater makes sense, so >> this isn't really about doing away with it at all.) >> >> >> >> >> >> In discussions, Andy had expressed a desire to move entirely away >> from LCSSA and Nick had expressed a desire to do the opposite, so >> I'd like to start a proper discusison of what people think and why. >> >> >> >> >> I've worked a lot of preserving both LCSSA and LoopSimplify form in >> all of our loop passes recently thanks to yanking off the bandaid >> we've been relying on for the past N years of letting the >> LoopPassManager simply re-create these at all loop nest levels on >> the fly as necessary. During the course of that I'm starting to form >> an opinion on the subject as well. >> >> >> I think that SSAUpdater works *great* for passes like GVN and friends >> that need to fast, incremental patching of the SSA graph. I also >> think it is completely incompatible with LCSSA in its current form. >> It just isn't built in a way that would be friendly to preserving >> this kind of information. I think that's OK, but it means that I >> think we *can't* mix the two solutions effectively long term. >> PR18688 is the current manifestation of this madness. >> >> >> Consistently, I am finding that doing incremental update and >> maintaining canonical form for loops is *substantially* simpler with >> LoopSimplify+LCSSA. The combination is just incredibly powerful. It >> also has a huge advantage in the way it is structures the updates: >> they avoid perturbing nested or enclosing loops. This property makes >> them ideal for working inside-out and iteratively across the loop >> nest as it is simplified. >> >> >> So I would personally want to see a really strong argument against >> relying on LCSSA. But I'm open to that argument existing, and >> sending this email to tease it out. =] >> If we decide to go forward >> with LCSSA as the canonical form for loops, I'd like to merge LCSSA >> and LoopSimplify into a single canonicalization pass, > > What is the benefit of merging them? LoopSimplify is certainly useful without LCSSA, and LCSSA is certainly useful without LoopSimplify's canonicalization. Also, and I don't know if this matters, but LoopSimplify can "fail" (for non-natural loops, at least), LCSSA cannot fail.Right. One of my issues with LCSSA is that it has to be completely maintained for correctness. OTOH. It is not hard to compute LCSSA correctly and problems will probably be caught by verification. I have seen bugs in SCEV caused by LCSSA update. But not bugs in LCSSA itself (probably because it is recomputed from scratch) I'll respond to the other points later. Andy> > -Hal > >> and then I'll >> work harder at re-casting the existing loop optimizations to >> leverage LCSSA more heavily and simplify their preservation of it as >> a consequence. Right now, we're just brute-force recomputing LCSSA >> because we don't really rely on it coming into the pass. It's kind >> of the worst of both worlds. =/ >> >> >> -Chandler >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory
On Feb 1, 2014, at 4:33 AM, Chandler Carruth <chandlerc at gmail.com> wrote:> So, there are two primary ideas behind SSA form management in the loop optimizers of LLVM: > > - Require LCSSA form input, leverage its (very powerful) guarantees to simplify maintaining SSA form, and also maintain LCSSA form. > > - Don't bother with LCSSA form input, assume the worst, and use powerful incremental SSA formation utilities built on SSAUpdater to form SSA on demand when needed. > > (Note, there are plenty of places where SSAUpdater makes sense, so this isn't really about doing away with it at all.) > > > In discussions, Andy had expressed a desire to move entirely away from LCSSA and Nick had expressed a desire to do the opposite, so I'd like to start a proper discusison of what people think and why. > > > I've worked a lot of preserving both LCSSA and LoopSimplify form in all of our loop passes recently thanks to yanking off the bandaid we've been relying on for the past N years of letting the LoopPassManager simply re-create these at all loop nest levels on the fly as necessary. During the course of that I'm starting to form an opinion on the subject as well. > > I think that SSAUpdater works *great* for passes like GVN and friends that need to fast, incremental patching of the SSA graph. I also think it is completely incompatible with LCSSA in its current form. It just isn't built in a way that would be friendly to preserving this kind of information. I think that's OK, but it means that I think we *can't* mix the two solutions effectively long term. PR18688 is the current manifestation of this madness. > > Consistently, I am finding that doing incremental update and maintaining canonical form for loops is *substantially* simpler with LoopSimplify+LCSSA. The combination is just incredibly powerful. It also has a huge advantage in the way it is structures the updates: they avoid perturbing nested or enclosing loops. This property makes them ideal for working inside-out and iteratively across the loop nest as it is simplified. > > So I would personally want to see a really strong argument against relying on LCSSA. But I'm open to that argument existing, and sending this email to tease it out. =] If we decide to go forward with LCSSA as the canonical form for loops, I'd like to merge LCSSA and LoopSimplify into a single canonicalization pass, and then I'll work harder at re-casting the existing loop optimizations to leverage LCSSA more heavily and simplify their preservation of it as a consequence. Right now, we're just brute-force recomputing LCSSA because we don't really rely on it coming into the pass. It's kind of the worst of both worlds. =/ > > -ChandlerI also don't want to get rid of LCSSA (if I said that before, I retract the statement). It can be useful to summarize the loop live-out values. I think the question is: do we want to be in LCSSA during the early/canonical rounds of IR loop optimization? This is, LoopSimplify, Rotate, LICM, Unswitch, full unrolling (I think full unrolling should run earlier). Why did I suggest avoiding LCSSA? (1) Unnecessary overhead. (2) Phi nodes "in the way". (3) Awkward pass dependencies. (1) We're forcing a significant analysis to run between passes to make it easier to update SSA after transformation. But we don't even know if any transformations are needed. I would prefer to use SSAUpdater to handle the transformations as they occur. We could save 1-2% of opt time [1]. (2) SSA based analysis are way simpler when they don't handle phis. They could be adapted to handle single operand phis, but we don't usually bother to check. I don't have a specific issue in mind that would impact the early loop opts. (3) As you know LCSSA needs to be rerun between passes, which in turn requires the domtree. Generally removing dependence on analysis is a good thing. So, what's the benefit of LCSSA? I'm told that it makes SSA update easier. I don't understand what could be easier than using an SSAUpdater utility. LoopRotate and LICM already use the updater. LoopUnroll requires LCSSA [2][3]. I don't understand the impact on LoopSimplify. If LCSSA actually makes writing transformations easier, then I'm all for it. Could you point to some specific transformations that are easier with LCSSA? -Andy [1] I admit that LCSSA could be greatly speeding up our implementation of SSAUpdater by restricting the use lists of loop values. [2] LoopUnroll does an incremental SSA update using LCSSA. SSAUpdater would probably be a batch update after unrolling. SSAUpdater might be a little simpler by separating concerns. [3] We could make a much more efficient SSAUpdater that works with loop unrolling and the domtree by making use of the information that the old values always dominate the new ones. It is really dumb that our unroller invalidates the domtree.
On Mon, Feb 3, 2014 at 10:42 AM, Andrew Trick <atrick at apple.com> wrote:> I also don't want to get rid of LCSSA (if I said that before, I retract > the statement). It can be useful to summarize the loop live-out values. > > I think the question is: do we want to be in LCSSA during the > early/canonical rounds of IR loop optimization? This is, LoopSimplify, > Rotate, LICM, Unswitch, full unrolling (I think full unrolling should run > earlier). >Just for clarification, LoopSimplify is not a loop pass, and just creates the canonical loop nest structure with preheaders and exit blocks. LCSSA requires this to have a place for its PHI nodes, not the other way around. Also, full unrolling already runs as part of the early rounds of the IR loop optimizations just as you say it should. Why did I suggest avoiding LCSSA?> (1) Unnecessary overhead. > (3) Awkward pass dependencies. > > (1) We're forcing a significant analysis to run between passes to make it > easier to update SSA after transformation. But we don't even know if any > transformations are needed. I would prefer to use SSAUpdater to handle the > transformations as they occur. We could save 1-2% of opt time [1]. >Do you have any measurement of this? I have no evidence that LCSSA is a remotely significant optimization time cost, but I've not gone looking.> > (3) As you know LCSSA needs to be rerun between passes, which in turn > requires the domtree. Generally removing dependence on analysis is a good > thing. >So, we never re-run domtree here. It is preserved throughout, and it is required for LoopInfo, so there is nothing to be saved on that front. AFAICT, there is no analysis dependency cost of LCSSA given the other loop optimizer. Also, LCSSA *doesn't* need to be re-run between passes. If a pass has LCSSA coming into it, then it is extremely easy to preserve it in place rather than re-running things. We just have a *lot* of bugs because the old loop-pass-manager setup just blindly re-ran LCSSA for us, papering over all manner of badness.> > (2) Phi nodes "in the way". > (2) SSA based analysis are way simpler when they don't handle phis. They > could be adapted to handle single operand phis, but we don't usually bother > to check. I don't have a specific issue in mind that would impact the early > loop opts. >Ok, I think this would be the more serious issue if it comes up in practice. I don't have any cases where it comes up in practice for early loop opts either though.> > So, what's the benefit of LCSSA? I'm told that it makes SSA update > easier. I don't understand what could be easier than using an SSAUpdater > utility. LoopRotate and LICM already use the updater. ><snip> If LCSSA actually makes writing transformations easier, then I'm all for> it. Could you point to some specific transformations that are easier with > LCSSA? >LICM. There is already a fast path in LICM that is the exact path that LCSSA guarantees exists when sinking code into the exit blocks. The only thing missing is for LICM to actually *use* LCSSA instead of just trying to patch it up if it happens to find it in the loop. I think it would dramatically simplify the entire sinking code. A good use of SSAUpdater: re-forming SSA values *within* a loop body due to mem2reg essentially. LICM has a little mini mem2reg pass in it that uses SSAUpdater. This is a great use and I can't imagine something simpler. The fact that we have LCSSA also makes it faster by bounding the space where it has to rewrite uses (to the edge of the current loop). But if we have LCSSA, we should never need SSAUpdater for updating SSA form across the loop boundary in the way LICM does. Instead we should be able to place an instruction, RAUW it, and insert a PHI node for each operand. Done. Maybe a patch would be better to explain matters? As it happens, fixing LICM this way will also fix PR18753 and a host of other latent bugs where LICM invalidates LCSSA in some far-off loop and we crash later. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140207/f42d8586/attachment.html>
On Feb 1, 2014, at 4:33 AM, Chandler Carruth <chandlerc at gmail.com> wrote:> So, there are two primary ideas behind SSA form management in the loop optimizers of LLVM: > > - Require LCSSA form input, leverage its (very powerful) guarantees to simplify maintaining SSA form, and also maintain LCSSA form. > > - Don't bother with LCSSA form input, assume the worst, and use powerful incremental SSA formation utilities built on SSAUpdater to form SSA on demand when needed. > > (Note, there are plenty of places where SSAUpdater makes sense, so this isn't really about doing away with it at all.)It’s worth noting that LCSSA predates SSAUpdater. If I went back in time and knew what I knew now, I wouldn’t have gone with LCSSA. My gripes are three fold: 1) SSAUpdater can handle anything that LCSSA simplifies, 2) that LCSSA is annoying to keep up to date, 3) LCSSA burns compile time optimistically rewriting loop values, which are then later collapsed away even if nothing cares about those values. My personal preference would be to get rid of LCSSA completely, but I don’t know how to stage that. -Chris
On Fri, Feb 7, 2014 at 11:29 AM, Chris Lattner <clattner at apple.com> wrote:> On Feb 1, 2014, at 4:33 AM, Chandler Carruth <chandlerc at gmail.com> wrote: > > > So, there are two primary ideas behind SSA form management in the loop > optimizers of LLVM: > > > > - Require LCSSA form input, leverage its (very powerful) guarantees to > simplify maintaining SSA form, and also maintain LCSSA form. > > > > - Don't bother with LCSSA form input, assume the worst, and use powerful > incremental SSA formation utilities built on SSAUpdater to form SSA on > demand when needed. > > > > (Note, there are plenty of places where SSAUpdater makes sense, so this > isn't really about doing away with it at all.) > > It’s worth noting that LCSSA predates SSAUpdater. If I went back in time > and knew what I knew now, I wouldn’t have gone with LCSSA. > > My gripes are three fold:1) SSAUpdater can handle anything that LCSSA simplifies Yes, it can, but it is *significantly* less simple. I think the simplicity of reasoning and handling things with LCSSA is not without value.> 2) that LCSSA is annoying to keep up to dateI have not found that to be the case. There are many places where we fail to keep it up to date that have to be fixed, but as far as I can tell none of these are due to the complexity, slowness, or challenge of keeping it up to date. We just "forgot" (in that the loop pass manager made everything work without any effort but with considerable compile time cost).> 3) LCSSA burns compile time optimistically rewriting loop values, which > are then later collapsed away even if nothing cares about those values. >Sure. Put another way though, LCSSA puts the SSA graph into the "canonical form" that simplifies the model of writing loop passes. I'm not really disagreeing with you here, just saying that there is a tradeoff. My suspicion having worked on this quite a bit now is that about 80% of the compile time we are burning on LCSSA is due to failures to update LCSSA appropriately in places where it is both convenient and simple to do so.> > My personal preference would be to get rid of LCSSA completely, but I > don’t know how to stage that.OK, I'm not *really* invested one way or the other. But having hacked on the loop optimizer somewhat over the past 3 weeks, I have to say LCSSA form has always been easier to reason about, debug, and transform. Perhaps I'll run out of that warm fuzzy feeling, but so far its holding. And several others have seemed to like LCSSA when I talked to them, so I don't want it to be dismissed out of hand. Either way, we *must* make a decision. The current state is untenable as passes flagrantly destroy LCSSA making it very expensive indeed to recompute. And we have to recompute it because a decent number of passes really do rely on it. Interestingly, many of the *users* of LCSSA are not rebuilding SSA form! They're using it to capture loop live-out values very quickly, which is a very useful property IMO. As I mentioned in the thread to Andy, there seem to still be clear uses for SSAUpdater to update SSA values, and it merely needs to be able to incrementally preserve LCSSA rather than breaking it. That doesn't seem like a ton of code to me, so I think we can reasonably go either way. The tradeoff is in complexity of loop passes' analysis and transforms of live-out values vs. the complexity of canonicalizing to LCSSA and preserving it through the loop pass pipeline. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140207/73e98ac0/attachment.html>
On Feb 7, 2014, at 11:29 AM, Chris Lattner <clattner at apple.com> wrote:> > On Feb 1, 2014, at 4:33 AM, Chandler Carruth <chandlerc at gmail.com> wrote: > >> So, there are two primary ideas behind SSA form management in the loop optimizers of LLVM: >> >> - Require LCSSA form input, leverage its (very powerful) guarantees to simplify maintaining SSA form, and also maintain LCSSA form. >> >> - Don't bother with LCSSA form input, assume the worst, and use powerful incremental SSA formation utilities built on SSAUpdater to form SSA on demand when needed. >> >> (Note, there are plenty of places where SSAUpdater makes sense, so this isn't really about doing away with it at all.) > > It’s worth noting that LCSSA predates SSAUpdater. If I went back in time and knew what I knew now, I wouldn’t have gone with LCSSA. > > My gripes are three fold: 1) SSAUpdater can handle anything that LCSSA simplifies, 2) that LCSSA is annoying to keep up to date, 3) LCSSA burns compile time optimistically rewriting loop values, which are then later collapsed away even if nothing cares about those values. > > My personal preference would be to get rid of LCSSA completely, but I don’t know how to stage that.Until recently I felt exactly the same way. I didn’t want LCSSA just as another mechanism for updating SSA. I’m warming up to having it run during the early loop passes if it - significantly simplifies the LICM logic - greatly speeds up SSAUpdater within loops (maybe little net compile-time increase on average?) - compartmentalizes loop transforms and SSA update so we can debug loop opts one loop at a time I still don’t particularly like that we force all LLVM clients to perform LCSSA when all they end up doing is rotating and simplifying loops (no LICM/unroll). So it is a tradeoff. -Andy