Now with even the right llvm-dev :) On Thu, Feb 2, 2017 at 4:59 PM, Daniel Berlin <dberlin at dberlin.org> wrote:> > > On Thu, Feb 2, 2017 at 3:52 PM, Chandler Carruth <chandlerc at gmail.com> > wrote: > >> On Wed, Feb 1, 2017 at 8:33 PM Daniel Berlin <dberlin at dberlin.org> wrote: >> >>> Hey folks, >>> >>> After a long amount of discussion both offline and on, I put a >>> pass/intrinsic to add extended SSA up at http://reviews.llvm.org/D29316. >>> >>> Sean asked me to share it more broadly on llvm-dev, so here you go :) >>> >>> For those not familiar with extended-SSA, it's described in the paper >>> "ABCD: Eliminating Array Bounds Checks on Demand". >>> >>> There is a very large amount of explanation in the summary of the review >>> that i don't want to paste here , but the short version is: >>> >>> 1. We make copies of variables where they are used in comparisons that >>> lead to assumes or branches and it's possible to determine something about >>> their value from the branch. >>> >>> IE >>> define i32 @test1(i32 %x) { >>> %cmp = icmp eq i32 %x, 50 >>> br i1 %cmp, label %true, label %false >>> true: >>> ret i32 %x >>> false: >>> ret i32 1 >>> } >>> >>> becomes >>> >>> define i32 @test1(i32 %x) { >>> %cmp = icmp eq i32 %x, 50 >>> br i1 %cmp, label %true, label %false >>> >>> true: ; preds = %0 >>> ; Has predicate info >>> ; branch predicate info { TrueEdge: 1 Comparison: %cmp = icmp eq i32 >>> %x, 50 } >>> %x.0 = call i32 @llvm.predicateinfo.i32(i32 %x) >>> ret i32 %x.0 >>> >>> false: ; preds = %0 >>> ret i32 1 >>> } >>> >>> All uses that are dominated by the predicate info are renamed to use it. >>> >>> 2. We do so very quickly (it takes about 600ms to do 2 million blocks >>> with comparisons, by comparison, most passes simply crash or take forever >>> on the same file. As an example, GVN takes 500 seconds), and only insert >>> when and where the operands are used. >>> >>> 3. The intrinsics are marked so they do not affect optimization (and >>> currently, passes that use them all destroy them). They also can detect >>> when they've been moved to make sure they are still valid if need me. >>> >>> 4. They could be updated pretty easily if we wanted >>> >>> With one real downside: >>> >>> 5. We modify the IR in an analysis to do so :( >>> >> >> One question that occurred to me reading this, that probably is just my >> ignorance, but maybe you can help =] >> >> Any particular reason to not model this as essentially a shadow of the IR? >> > > Done already :) Checkout the newgvn-predicateinfo-uses branch on github :P. > > The short version there's no point is because it's complex, and you can't > update it *anyway* at that point :P > > > >> I'm imagining creating a map from operand -> predinfo, where predinfo >> essentially models these inserted intrinsics, and rather than RAUW-ing >> dominated uses, you insert a mapping for dominated uses. >> > > Ah, if you could easily map from operand to predinfo, this might even work > okayish (you now have O(number of uses in the entire program) lookups > you've added to newgvn, to check if they have predicateinfo, but okay). > > But to start: > You can't do this easily > The operands you need to attach to are Use's. > > Because > > if (a == 0) > ret a > else > ret a > > each a has different info. > > The predicateinfo insertion accomplishes this by physical renaming, so the > values have different name, and every use of a given predicateinfo has the > same value. > > But in a virtual form, it looks like this: > > IE this looks like: > if (a == 0) > 1 = PredicateDef(branch, blah blah blah) > PredicateUse(1), but really attached to *this use* of a, not to ret > ret a > else > 2 = PredicateDef(branch, blah blah blah) > PredicateUse(2), but really attached to *this use* of a, not to ret > ret a > > > (even if you really wanted to, you have to attach to uses because ách > operand of an instruction could be a use of different predicateinfo) > > You can in fact, attach to uses. > You can even do something like the above (IE have predicate use which > stores a regular use), so you know what's up, and have a real def/use form. > > it goes downhill from there, though, precisely because the info is > attached to Use's :( > > First, we have two types of things that would use this pass. > > Lazy things, and propagating things. > > Lazy things, that return a value for a thing, all at once, in a single > call (LVI if it had no caches) , might be convertible to use this. > > You have the problem that getting a use from an instruction operand is not > currently trivial (we'd have to have a getOperandUse that mirrors > getOperand or something), but not too bad. > > But all of ours really cache stuff, so they are really lazy propagating > things. > > > and Propagating things (CVP, SCCP, GVN, etc) , which are the rest of them, > are hard. > > They want to store tables of info by Value. > > But we need to store per-use info. > > If you modify them to store per-use info for all uses, you take a 5-15x > memory and time hit, to store info about each use, where most have the same > info. > > Any elimination you do also has to look up per-use info, not per-value > info like we do now, which means the elimination becomes (Defs/Uses) times > slower too :( > > This pretty much puts it out of the range of sanity. > But let's say you are smarter than the average bear. You want to build a > hybrid scheme, where you only store special use info for the things that > are special. > You realize you can make your virtual form, above, Value *'s. Then you > have the name you need because the PredicateDef is a name you can use like > a normal Instruction def. > > But you still have to convert back and forth between Use's and Value's > everywhere, and plumb Use& through anything that does lookups, so you can > get a Use& and turn it into the right thing above. > Then, replacement of the original operands, to make it not have to do > lookups on every use in the program, requires storing the original uses in > the PredicateUse class, so you know which uses in the original IR > correspond to which uses in the Predicate virtual IR. > > This is a *lot* of plumbing. and because Use's have an operator Value*, > the liklihood you create a lot of bugs is high, because if you use a use > where you meant use.getUser(), it still works :( > > You are creative though, and get this all working. > > That gets you value numbering (it's a lot harder in a lot of other cases), > without even needing an O(uses in the program) hash table. > > Now you have to try eliminate stuff. > > In everything but NewGVN, you have to rewrite the eliminators. They all > replace as they go, which means either double the replace cost (since you > have to replace the original uses, and for predicates, the original uses > and the uses in the predicate uses), or even more fun as you try to update > utilities to understand about looking up uses :( > > > In NewGVN, we eliminate in O(uses that have the same value) time, by > sorting congruence class instruction defs and uses by global and local > dominator numbers, and using a stack. > So we now need to order the predicate defs, uses, and regular defs and > uses all in the right order (and replace the right defs with predicate defs > so the lookup happens right) in dominator order to know which replaces > which. > But you can't use dominates() to do this, because it doesn't understand > these,etc. > > So you have to create a local instruction numbering, and figure out, where > in them each thing goes. > Which you can do, at some non-trivial cost over regular instruction > numbering > > This is just the simple parts, mind you. > After all this, you have something that takes pretty much double the > memory, a lot harder to understand, *and* it still gets invalidated easily > :) > > But it's immutable! > > There's kind of no upside, and lots of downsides, both in building it, and > trying to use it:) > > > >> This would cause it to be a pure analysis again. It would make it >> relatively fragile of course -- it would be easily invalidated. But given >> how fast it is to compute, for several of the use cases it seems like it >> would be sufficient (NewGVN seems like it could tolerate computing this >> each time, etc. >> >> Anyways, I'm sure you've looked at something like this, and I'm curious >> what goes wrong with it. >> >> > > >> >>> The main user at the moment is NewGVN, which uses it to do the >>> equivalent of GVN's propagateEquality. I'd be happy to make it a utility >>> called from NewGVN instead of an analysis. A number of folks online and off >>> asked me to make it usable for others, so i did that :) Some folks want to >>> use it to cleanup existing passes (EarlyCSE, etc), some want to use it in >>> new passes. >>> >> >> So, this definitely sounds useful, but I think that while it mutates the >> IR itself, I think it should just be a utility routine that passes call at >> their start to put the IR into the form they need. It'll be a bit tricky if >> you want analysis passes to observe this form -- you may not be able to use >> the caching infrastructure of the new PM at all for such analyses and may >> need to express them as stand alone utilities as well. >> > > Yeah, i'm just going to make it a utility > >> >> > If it is reasonable to do so, I'd also suggest a cleanup utility that >> ensures the mutated IR doesn't escape the end of the pass either to avoid >> having to teach the rest of the optimizer about it. But if there are >> reasons this doesn't work well, maybe we can use actual PHI nodes for this >> to simplify what we have to teach the rest of the optimizer about? >> > > We can use single argument phi nodes for branches, but it gets tricky with > assume, because you need to place the phi at the beginning of the assume > block, and you can't do this if the operands are defined in the same block, > so you'd have to place n ones with the same info in every successor > block, and hope the assume didn't do anything useful in the rest of the > assume block or something :( > > I'll also write a verifier. > > >> -Chandler >> >> PS: In case folks are wondering, the reason *why* I think this needs not >> be an analysis is because in the new PM we rely heavily on caching of >> analysis results. This caching behavior means that when things are run >> isn't very clear and we'd like to avoid ordering dependencies between two >> analyses running over the same IR to make things more debuggable later on. >> I can dive into more details here as useful of course. >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170202/441373f1/attachment-0001.html>
(and just as a short follow, since i got asked privately): The reason MemorySSA doesn't have these issues as an IR is three fold 1. There is no pre-existing IR for memory, really. 2. Most users use it as a version number for memory state of loads/stores, which works because of #3 3. It's a 1:1 mapping of value's to value's. IE one load is one memoryuse one store is one memorydef PredicateInfo's underlying info is a 1:1 mapping of *Uses* to Values, which is ... problematic for the reasons i mentioned. That's what the transform does. it renames things so that the 1:1 mapping of uses to values becomes a a 1:1 mapping of values to values, so that everything just works. :) On Thu, Feb 2, 2017 at 5:01 PM, Daniel Berlin <dberlin at dberlin.org> wrote:> Now with even the right llvm-dev :) > > > On Thu, Feb 2, 2017 at 4:59 PM, Daniel Berlin <dberlin at dberlin.org> wrote: > >> >> >> On Thu, Feb 2, 2017 at 3:52 PM, Chandler Carruth <chandlerc at gmail.com> >> wrote: >> >>> On Wed, Feb 1, 2017 at 8:33 PM Daniel Berlin <dberlin at dberlin.org> >>> wrote: >>> >>>> Hey folks, >>>> >>>> After a long amount of discussion both offline and on, I put a >>>> pass/intrinsic to add extended SSA up at http://reviews.llvm.org/D29316 >>>> . >>>> >>>> Sean asked me to share it more broadly on llvm-dev, so here you go :) >>>> >>>> For those not familiar with extended-SSA, it's described in the paper >>>> "ABCD: Eliminating Array Bounds Checks on Demand". >>>> >>>> There is a very large amount of explanation in the summary of the >>>> review that i don't want to paste here , but the short version is: >>>> >>>> 1. We make copies of variables where they are used in comparisons that >>>> lead to assumes or branches and it's possible to determine something about >>>> their value from the branch. >>>> >>>> IE >>>> define i32 @test1(i32 %x) { >>>> %cmp = icmp eq i32 %x, 50 >>>> br i1 %cmp, label %true, label %false >>>> true: >>>> ret i32 %x >>>> false: >>>> ret i32 1 >>>> } >>>> >>>> becomes >>>> >>>> define i32 @test1(i32 %x) { >>>> %cmp = icmp eq i32 %x, 50 >>>> br i1 %cmp, label %true, label %false >>>> >>>> true: ; preds = %0 >>>> ; Has predicate info >>>> ; branch predicate info { TrueEdge: 1 Comparison: %cmp = icmp eq i32 >>>> %x, 50 } >>>> %x.0 = call i32 @llvm.predicateinfo.i32(i32 %x) >>>> ret i32 %x.0 >>>> >>>> false: ; preds = %0 >>>> ret i32 1 >>>> } >>>> >>>> All uses that are dominated by the predicate info are renamed to use it. >>>> >>>> 2. We do so very quickly (it takes about 600ms to do 2 million blocks >>>> with comparisons, by comparison, most passes simply crash or take forever >>>> on the same file. As an example, GVN takes 500 seconds), and only insert >>>> when and where the operands are used. >>>> >>>> 3. The intrinsics are marked so they do not affect optimization (and >>>> currently, passes that use them all destroy them). They also can detect >>>> when they've been moved to make sure they are still valid if need me. >>>> >>>> 4. They could be updated pretty easily if we wanted >>>> >>>> With one real downside: >>>> >>>> 5. We modify the IR in an analysis to do so :( >>>> >>> >>> One question that occurred to me reading this, that probably is just my >>> ignorance, but maybe you can help =] >>> >>> Any particular reason to not model this as essentially a shadow of the >>> IR? >>> >> >> Done already :) Checkout the newgvn-predicateinfo-uses branch on github >> :P. >> >> The short version there's no point is because it's complex, and you can't >> update it *anyway* at that point :P >> >> >> >>> I'm imagining creating a map from operand -> predinfo, where predinfo >>> essentially models these inserted intrinsics, and rather than RAUW-ing >>> dominated uses, you insert a mapping for dominated uses. >>> >> >> Ah, if you could easily map from operand to predinfo, this might even >> work okayish (you now have O(number of uses in the entire program) lookups >> you've added to newgvn, to check if they have predicateinfo, but okay). >> >> But to start: >> You can't do this easily >> The operands you need to attach to are Use's. >> >> Because >> >> if (a == 0) >> ret a >> else >> ret a >> >> each a has different info. >> >> The predicateinfo insertion accomplishes this by physical renaming, so >> the values have different name, and every use of a given predicateinfo has >> the same value. >> >> But in a virtual form, it looks like this: >> >> IE this looks like: >> if (a == 0) >> 1 = PredicateDef(branch, blah blah blah) >> PredicateUse(1), but really attached to *this use* of a, not to ret >> ret a >> else >> 2 = PredicateDef(branch, blah blah blah) >> PredicateUse(2), but really attached to *this use* of a, not to ret >> ret a >> >> >> (even if you really wanted to, you have to attach to uses because ách >> operand of an instruction could be a use of different predicateinfo) >> >> You can in fact, attach to uses. >> You can even do something like the above (IE have predicate use which >> stores a regular use), so you know what's up, and have a real def/use form. >> >> it goes downhill from there, though, precisely because the info is >> attached to Use's :( >> >> First, we have two types of things that would use this pass. >> >> Lazy things, and propagating things. >> >> Lazy things, that return a value for a thing, all at once, in a single >> call (LVI if it had no caches) , might be convertible to use this. >> >> You have the problem that getting a use from an instruction operand is >> not currently trivial (we'd have to have a getOperandUse that mirrors >> getOperand or something), but not too bad. >> >> But all of ours really cache stuff, so they are really lazy propagating >> things. >> >> >> and Propagating things (CVP, SCCP, GVN, etc) , which are the rest of >> them, are hard. >> >> They want to store tables of info by Value. >> >> But we need to store per-use info. >> >> If you modify them to store per-use info for all uses, you take a 5-15x >> memory and time hit, to store info about each use, where most have the same >> info. >> >> Any elimination you do also has to look up per-use info, not per-value >> info like we do now, which means the elimination becomes (Defs/Uses) times >> slower too :( >> >> This pretty much puts it out of the range of sanity. >> But let's say you are smarter than the average bear. You want to build a >> hybrid scheme, where you only store special use info for the things that >> are special. >> You realize you can make your virtual form, above, Value *'s. Then you >> have the name you need because the PredicateDef is a name you can use like >> a normal Instruction def. >> >> But you still have to convert back and forth between Use's and Value's >> everywhere, and plumb Use& through anything that does lookups, so you can >> get a Use& and turn it into the right thing above. >> Then, replacement of the original operands, to make it not have to do >> lookups on every use in the program, requires storing the original uses in >> the PredicateUse class, so you know which uses in the original IR >> correspond to which uses in the Predicate virtual IR. >> >> This is a *lot* of plumbing. and because Use's have an operator Value*, >> the liklihood you create a lot of bugs is high, because if you use a use >> where you meant use.getUser(), it still works :( >> >> You are creative though, and get this all working. >> >> That gets you value numbering (it's a lot harder in a lot of other >> cases), without even needing an O(uses in the program) hash table. >> >> Now you have to try eliminate stuff. >> >> In everything but NewGVN, you have to rewrite the eliminators. They all >> replace as they go, which means either double the replace cost (since you >> have to replace the original uses, and for predicates, the original uses >> and the uses in the predicate uses), or even more fun as you try to update >> utilities to understand about looking up uses :( >> >> >> In NewGVN, we eliminate in O(uses that have the same value) time, by >> sorting congruence class instruction defs and uses by global and local >> dominator numbers, and using a stack. >> So we now need to order the predicate defs, uses, and regular defs and >> uses all in the right order (and replace the right defs with predicate defs >> so the lookup happens right) in dominator order to know which replaces >> which. >> But you can't use dominates() to do this, because it doesn't understand >> these,etc. >> >> So you have to create a local instruction numbering, and figure out, >> where in them each thing goes. >> Which you can do, at some non-trivial cost over regular instruction >> numbering >> >> This is just the simple parts, mind you. >> After all this, you have something that takes pretty much double the >> memory, a lot harder to understand, *and* it still gets invalidated easily >> :) >> >> But it's immutable! >> >> There's kind of no upside, and lots of downsides, both in building it, >> and trying to use it:) >> >> >> >>> This would cause it to be a pure analysis again. It would make it >>> relatively fragile of course -- it would be easily invalidated. But given >>> how fast it is to compute, for several of the use cases it seems like it >>> would be sufficient (NewGVN seems like it could tolerate computing this >>> each time, etc. >>> >>> Anyways, I'm sure you've looked at something like this, and I'm curious >>> what goes wrong with it. >>> >>> >> >> >>> >>>> The main user at the moment is NewGVN, which uses it to do the >>>> equivalent of GVN's propagateEquality. I'd be happy to make it a utility >>>> called from NewGVN instead of an analysis. A number of folks online and off >>>> asked me to make it usable for others, so i did that :) Some folks want to >>>> use it to cleanup existing passes (EarlyCSE, etc), some want to use it in >>>> new passes. >>>> >>> >>> So, this definitely sounds useful, but I think that while it mutates the >>> IR itself, I think it should just be a utility routine that passes call at >>> their start to put the IR into the form they need. It'll be a bit tricky if >>> you want analysis passes to observe this form -- you may not be able to use >>> the caching infrastructure of the new PM at all for such analyses and may >>> need to express them as stand alone utilities as well. >>> >> >> Yeah, i'm just going to make it a utility >> >>> >>> >> If it is reasonable to do so, I'd also suggest a cleanup utility that >>> ensures the mutated IR doesn't escape the end of the pass either to avoid >>> having to teach the rest of the optimizer about it. But if there are >>> reasons this doesn't work well, maybe we can use actual PHI nodes for this >>> to simplify what we have to teach the rest of the optimizer about? >>> >> >> We can use single argument phi nodes for branches, but it gets tricky >> with assume, because you need to place the phi at the beginning of the >> assume block, and you can't do this if the operands are defined in the same >> block, so you'd have to place n ones with the same info in every >> successor block, and hope the assume didn't do anything useful in the rest >> of the assume block or something :( >> >> I'll also write a verifier. >> >> >>> -Chandler >>> >>> PS: In case folks are wondering, the reason *why* I think this needs not >>> be an analysis is because in the new PM we rely heavily on caching of >>> analysis results. This caching behavior means that when things are run >>> isn't very clear and we'd like to avoid ordering dependencies between two >>> analyses running over the same IR to make things more debuggable later on. >>> I can dive into more details here as useful of course. >>> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170202/1589033b/attachment.html>
On Thu, Feb 2, 2017 at 5:01 PM, Daniel Berlin via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Now with even the right llvm-dev :) > > > On Thu, Feb 2, 2017 at 4:59 PM, Daniel Berlin <dberlin at dberlin.org> wrote: > >> >> >> On Thu, Feb 2, 2017 at 3:52 PM, Chandler Carruth <chandlerc at gmail.com> >> wrote: >> >>> On Wed, Feb 1, 2017 at 8:33 PM Daniel Berlin <dberlin at dberlin.org> >>> wrote: >>> >>>> Hey folks, >>>> >>>> After a long amount of discussion both offline and on, I put a >>>> pass/intrinsic to add extended SSA up at http://reviews.llvm.org/D29316 >>>> . >>>> >>>> Sean asked me to share it more broadly on llvm-dev, so here you go :) >>>> >>>> For those not familiar with extended-SSA, it's described in the paper >>>> "ABCD: Eliminating Array Bounds Checks on Demand". >>>> >>>> There is a very large amount of explanation in the summary of the >>>> review that i don't want to paste here , but the short version is: >>>> >>>> 1. We make copies of variables where they are used in comparisons that >>>> lead to assumes or branches and it's possible to determine something about >>>> their value from the branch. >>>> >>>> IE >>>> define i32 @test1(i32 %x) { >>>> %cmp = icmp eq i32 %x, 50 >>>> br i1 %cmp, label %true, label %false >>>> true: >>>> ret i32 %x >>>> false: >>>> ret i32 1 >>>> } >>>> >>>> becomes >>>> >>>> define i32 @test1(i32 %x) { >>>> %cmp = icmp eq i32 %x, 50 >>>> br i1 %cmp, label %true, label %false >>>> >>>> true: ; preds = %0 >>>> ; Has predicate info >>>> ; branch predicate info { TrueEdge: 1 Comparison: %cmp = icmp eq i32 >>>> %x, 50 } >>>> %x.0 = call i32 @llvm.predicateinfo.i32(i32 %x) >>>> ret i32 %x.0 >>>> >>>> false: ; preds = %0 >>>> ret i32 1 >>>> } >>>> >>>> All uses that are dominated by the predicate info are renamed to use it. >>>> >>>> 2. We do so very quickly (it takes about 600ms to do 2 million blocks >>>> with comparisons, by comparison, most passes simply crash or take forever >>>> on the same file. As an example, GVN takes 500 seconds), and only insert >>>> when and where the operands are used. >>>> >>>> 3. The intrinsics are marked so they do not affect optimization (and >>>> currently, passes that use them all destroy them). They also can detect >>>> when they've been moved to make sure they are still valid if need me. >>>> >>>> 4. They could be updated pretty easily if we wanted >>>> >>>> With one real downside: >>>> >>>> 5. We modify the IR in an analysis to do so :( >>>> >>> >>> One question that occurred to me reading this, that probably is just my >>> ignorance, but maybe you can help =] >>> >>> Any particular reason to not model this as essentially a shadow of the >>> IR? >>> >> >> Done already :) Checkout the newgvn-predicateinfo-uses branch on github >> :P. >> >> The short version there's no point is because it's complex, and you can't >> update it *anyway* at that point :P >> >> >> >>> I'm imagining creating a map from operand -> predinfo, where predinfo >>> essentially models these inserted intrinsics, and rather than RAUW-ing >>> dominated uses, you insert a mapping for dominated uses. >>> >> >> Ah, if you could easily map from operand to predinfo, this might even >> work okayish (you now have O(number of uses in the entire program) lookups >> you've added to newgvn, to check if they have predicateinfo, but okay). >> >> But to start: >> You can't do this easily >> The operands you need to attach to are Use's. >> >> Because >> >> if (a == 0) >> ret a >> else >> ret a >> >> each a has different info. >> >> The predicateinfo insertion accomplishes this by physical renaming, so >> the values have different name, and every use of a given predicateinfo has >> the same value. >> >> But in a virtual form, it looks like this: >> >> IE this looks like: >> if (a == 0) >> 1 = PredicateDef(branch, blah blah blah) >> PredicateUse(1), but really attached to *this use* of a, not to ret >> ret a >> else >> 2 = PredicateDef(branch, blah blah blah) >> PredicateUse(2), but really attached to *this use* of a, not to ret >> ret a >> >> >> (even if you really wanted to, you have to attach to uses because ách >> operand of an instruction could be a use of different predicateinfo) >> >> You can in fact, attach to uses. >> You can even do something like the above (IE have predicate use which >> stores a regular use), so you know what's up, and have a real def/use form. >> >> it goes downhill from there, though, precisely because the info is >> attached to Use's :( >> >> First, we have two types of things that would use this pass. >> >> Lazy things, and propagating things. >> >> Lazy things, that return a value for a thing, all at once, in a single >> call (LVI if it had no caches) , might be convertible to use this. >> >> You have the problem that getting a use from an instruction operand is >> not currently trivial (we'd have to have a getOperandUse that mirrors >> getOperand or something), but not too bad. >> >> But all of ours really cache stuff, so they are really lazy propagating >> things. >> >> >> and Propagating things (CVP, SCCP, GVN, etc) , which are the rest of >> them, are hard. >> >> They want to store tables of info by Value. >> >> But we need to store per-use info. >> >> If you modify them to store per-use info for all uses, you take a 5-15x >> memory and time hit, to store info about each use, where most have the same >> info. >> >> Any elimination you do also has to look up per-use info, not per-value >> info like we do now, which means the elimination becomes (Defs/Uses) times >> slower too :( >> >> This pretty much puts it out of the range of sanity. >> But let's say you are smarter than the average bear. You want to build a >> hybrid scheme, where you only store special use info for the things that >> are special. >> You realize you can make your virtual form, above, Value *'s. Then you >> have the name you need because the PredicateDef is a name you can use like >> a normal Instruction def. >> >> But you still have to convert back and forth between Use's and Value's >> everywhere, and plumb Use& through anything that does lookups, so you can >> get a Use& and turn it into the right thing above. >> Then, replacement of the original operands, to make it not have to do >> lookups on every use in the program, requires storing the original uses in >> the PredicateUse class, so you know which uses in the original IR >> correspond to which uses in the Predicate virtual IR. >> >> This is a *lot* of plumbing. and because Use's have an operator Value*, >> the liklihood you create a lot of bugs is high, because if you use a use >> where you meant use.getUser(), it still works :( >> >> You are creative though, and get this all working. >> >> That gets you value numbering (it's a lot harder in a lot of other >> cases), without even needing an O(uses in the program) hash table. >> >> Now you have to try eliminate stuff. >> >> In everything but NewGVN, you have to rewrite the eliminators. They all >> replace as they go, which means either double the replace cost (since you >> have to replace the original uses, and for predicates, the original uses >> and the uses in the predicate uses), or even more fun as you try to update >> utilities to understand about looking up uses :( >> >> >> In NewGVN, we eliminate in O(uses that have the same value) time, by >> sorting congruence class instruction defs and uses by global and local >> dominator numbers, and using a stack. >> So we now need to order the predicate defs, uses, and regular defs and >> uses all in the right order (and replace the right defs with predicate defs >> so the lookup happens right) in dominator order to know which replaces >> which. >> But you can't use dominates() to do this, because it doesn't understand >> these,etc. >> >> So you have to create a local instruction numbering, and figure out, >> where in them each thing goes. >> Which you can do, at some non-trivial cost over regular instruction >> numbering >> >> This is just the simple parts, mind you. >> After all this, you have something that takes pretty much double the >> memory, a lot harder to understand, *and* it still gets invalidated easily >> :) >> >> But it's immutable! >> >> There's kind of no upside, and lots of downsides, both in building it, >> and trying to use it:) >> >> >> >>> This would cause it to be a pure analysis again. It would make it >>> relatively fragile of course -- it would be easily invalidated. But given >>> how fast it is to compute, for several of the use cases it seems like it >>> would be sufficient (NewGVN seems like it could tolerate computing this >>> each time, etc. >>> >>> Anyways, I'm sure you've looked at something like this, and I'm curious >>> what goes wrong with it. >>> >>> >> >> >>> >>>> The main user at the moment is NewGVN, which uses it to do the >>>> equivalent of GVN's propagateEquality. I'd be happy to make it a utility >>>> called from NewGVN instead of an analysis. A number of folks online and off >>>> asked me to make it usable for others, so i did that :) Some folks want to >>>> use it to cleanup existing passes (EarlyCSE, etc), some want to use it in >>>> new passes. >>>> >>> >>> So, this definitely sounds useful, but I think that while it mutates the >>> IR itself, I think it should just be a utility routine that passes call at >>> their start to put the IR into the form they need. It'll be a bit tricky if >>> you want analysis passes to observe this form -- you may not be able to use >>> the caching infrastructure of the new PM at all for such analyses and may >>> need to express them as stand alone utilities as well. >>> >> >> Yeah, i'm just going to make it a utility >> >I don't have much content to add to the discussion, but since I originally asked you to bring the discussion to llvm-dev: making it a utility SGTM. -- Sean Silva> >>> >> If it is reasonable to do so, I'd also suggest a cleanup utility that >>> ensures the mutated IR doesn't escape the end of the pass either to avoid >>> having to teach the rest of the optimizer about it. But if there are >>> reasons this doesn't work well, maybe we can use actual PHI nodes for this >>> to simplify what we have to teach the rest of the optimizer about? >>> >> >> We can use single argument phi nodes for branches, but it gets tricky >> with assume, because you need to place the phi at the beginning of the >> assume block, and you can't do this if the operands are defined in the same >> block, so you'd have to place n ones with the same info in every >> successor block, and hope the assume didn't do anything useful in the rest >> of the assume block or something :( >> >> I'll also write a verifier. >> >> >>> -Chandler >>> >>> PS: In case folks are wondering, the reason *why* I think this needs not >>> be an analysis is because in the new PM we rely heavily on caching of >>> analysis results. This caching behavior means that when things are run >>> isn't very clear and we'd like to avoid ordering dependencies between two >>> analyses running over the same IR to make things more debuggable later on. >>> I can dive into more details here as useful of course. >>> >> >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170203/bbd51c31/attachment-0001.html>