Finkel, Hal J. via llvm-dev
2018-Sep-21 18:49 UTC
[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance
On 09/21/2018 01:30 PM, Chris Lattner via llvm-dev wrote: On Sep 19, 2018, at 1:30 PM, Reid Kleckner via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi folks, I looked into some slow compiles and filed https://bugs.llvm.org/show_bug.cgi?id=38829. The gist of it is that we spend a lot of time iterating basic blocks to compute local dominance, i.e. given two instructions in the same BB, which comes first. LLVM already has a tool, OrderedBasicBlock, which attempts to address this problem by building a lazy mapping from Instruction* to position. The problem is that cache invalidation is hard. If we don't cache orderings at a high enough level, our transformations become O(n^2). If we cache them too much and insert instructions without renumbering the BB, we get miscompiles. My solution is to hook into the actual BB ilist modification methods, so that we can have greater confidence that our cache invalidation is correct. I created a patch for this at https://reviews.llvm.org/D51664, which adds a lazily calculated position integer to every llvm::Instruction. I stole a bit from BasicBlock's Value subclass data to indicate whether the orders are valid. Hopefully everyone agrees that this a reasonable direction. I just figured I should announce this IR data structure change to the -dev list. :) I haven’t had a chance to look at the patch in detail yet (hopefully this afternoon) but this sounds like a very invasive change to a core data structure. Indeed. Perhaps a long-overdue one ;) The inner loop of the local dominance check in DominatorTree::dominates is also not very well implemented: it does a single linear pass from the beginning of the block until it finds the def or user. A better algorithm would be to use two pointers - one at the user and def. Each time through the loop, move the user iterator “up” the block, and the def iterator “down” the block. Either the iterators meet each other (in which case return true) or you fine the beginning/end of the block. This should work a lot better for many queries, because it will be efficient when the user and def are close to each other, as well as being efficient when the value is at the end of the block. Also, my bet is that most local dom queries return true. This seems like a good idea. It doesn't change the fact, however, that local dominance queries are O(n). We've ended up using OrderedBasicBlock in an increasing number of places, but there are a number of places where this is hard because of the plumbing required, or more importantly, the ambiguity around who owns the state of the cache at any given time. We know that there are a significant number of additional places where we should be using something like OrderedBasicBlock, but adding OBB into many of these places would be quite non-trivial. As indicated by the performance results briefly described in D51664, we have significant headroom. I don't see any really feasible way around these issues except moving the ownership of that state into the BB itself. Thanks again, Hal Have you tried this approach? It should be very easy to hack up to try out on your use case. -Chris _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180921/ab5ef21f/attachment.html>
Reid Kleckner via llvm-dev
2018-Sep-24 17:19 UTC
[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance
To echo what Hal said, yes, it's a major change, but I think the improved complexity guarantees, simplicity, and elimination of certain classes of bugs is worth it. I think we have consensus that we should go forward with this. Would anyone mind formally stamping it in phab? So far everyone understandably has said "makes sense to me, but I don't consider myself to have authority to stamp this." On Fri, Sep 21, 2018 at 11:49 AM Finkel, Hal J. <hfinkel at anl.gov> wrote:> > On 09/21/2018 01:30 PM, Chris Lattner via llvm-dev wrote: > > > > On Sep 19, 2018, at 1:30 PM, Reid Kleckner via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Hi folks, > > I looked into some slow compiles and filed > https://bugs.llvm.org/show_bug.cgi?id=38829. The gist of it is that we > spend a lot of time iterating basic blocks to compute local dominance, i.e. > given two instructions in the same BB, which comes first. > > LLVM already has a tool, OrderedBasicBlock, which attempts to address this > problem by building a lazy mapping from Instruction* to position. The > problem is that cache invalidation is hard. If we don't cache orderings at > a high enough level, our transformations become O(n^2). If we cache them > too much and insert instructions without renumbering the BB, we get > miscompiles. My solution is to hook into the actual BB ilist modification > methods, so that we can have greater confidence that our cache invalidation > is correct. > > I created a patch for this at https://reviews.llvm.org/D51664, which adds > a lazily calculated position integer to every llvm::Instruction. I stole a > bit from BasicBlock's Value subclass data to indicate whether the orders > are valid. > > Hopefully everyone agrees that this a reasonable direction. I just figured > I should announce this IR data structure change to the -dev list. :) > > > I haven’t had a chance to look at the patch in detail yet (hopefully this > afternoon) but this sounds like a very invasive change to a core data > structure. > > > > Indeed. Perhaps a long-overdue one ;) > > > The inner loop of the local dominance check in DominatorTree::dominates is > also not very well implemented: it does a single linear pass from the > beginning of the block until it finds the def or user. A better algorithm > would be to use two pointers - one at the user and def. Each time through > the loop, move the user iterator “up” the block, and the def iterator > “down” the block. Either the iterators meet each other (in which case > return true) or you fine the beginning/end of the block. > > This should work a lot better for many queries, because it will be > efficient when the user and def are close to each other, as well as being > efficient when the value is at the end of the block. Also, my bet is that > most local dom queries return true. > > > This seems like a good idea. > > It doesn't change the fact, however, that local dominance queries are > O(n). We've ended up using OrderedBasicBlock in an increasing number of > places, but there are a number of places where this is hard because of the > plumbing required, or more importantly, the ambiguity around who owns the > state of the cache at any given time. We know that there are a significant > number of additional places where we should be using something like > OrderedBasicBlock, but adding OBB into many of these places would be quite > non-trivial. As indicated by the performance results briefly described in > D51664, we have significant headroom. I don't see any really feasible way > around these issues except moving the ownership of that state into the BB > itself. > > Thanks again, > Hal > > > Have you tried this approach? It should be very easy to hack up to try > out on your use case. > > -Chris > > > > > _______________________________________________ > LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180924/b5fd1e15/attachment.html>
Sanjoy Das via llvm-dev
2018-Sep-24 18:31 UTC
[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance
Did you consider using the waymarking algorithm we already use for going from Use->User to store the offset of an instruction in a basic block? We could steal the two bits needed from the bb parent pointer in the instruction. -- Sanjoy On Mon, Sep 24, 2018 at 10:20 AM Reid Kleckner via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > To echo what Hal said, yes, it's a major change, but I think the improved complexity guarantees, simplicity, and elimination of certain classes of bugs is worth it. > > I think we have consensus that we should go forward with this. Would anyone mind formally stamping it in phab? So far everyone understandably has said "makes sense to me, but I don't consider myself to have authority to stamp this." > > On Fri, Sep 21, 2018 at 11:49 AM Finkel, Hal J. <hfinkel at anl.gov> wrote: >> >> >> On 09/21/2018 01:30 PM, Chris Lattner via llvm-dev wrote: >> >> >> >> On Sep 19, 2018, at 1:30 PM, Reid Kleckner via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hi folks, >> >> I looked into some slow compiles and filed https://bugs.llvm.org/show_bug.cgi?id=38829. The gist of it is that we spend a lot of time iterating basic blocks to compute local dominance, i.e. given two instructions in the same BB, which comes first. >> >> LLVM already has a tool, OrderedBasicBlock, which attempts to address this problem by building a lazy mapping from Instruction* to position. The problem is that cache invalidation is hard. If we don't cache orderings at a high enough level, our transformations become O(n^2). If we cache them too much and insert instructions without renumbering the BB, we get miscompiles. My solution is to hook into the actual BB ilist modification methods, so that we can have greater confidence that our cache invalidation is correct. >> >> I created a patch for this at https://reviews.llvm.org/D51664, which adds a lazily calculated position integer to every llvm::Instruction. I stole a bit from BasicBlock's Value subclass data to indicate whether the orders are valid. >> >> Hopefully everyone agrees that this a reasonable direction. I just figured I should announce this IR data structure change to the -dev list. :) >> >> >> I haven’t had a chance to look at the patch in detail yet (hopefully this afternoon) but this sounds like a very invasive change to a core data structure. >> >> >> >> Indeed. Perhaps a long-overdue one ;) >> >> >> The inner loop of the local dominance check in DominatorTree::dominates is also not very well implemented: it does a single linear pass from the beginning of the block until it finds the def or user. A better algorithm would be to use two pointers - one at the user and def. Each time through the loop, move the user iterator “up” the block, and the def iterator “down” the block. Either the iterators meet each other (in which case return true) or you fine the beginning/end of the block. >> >> This should work a lot better for many queries, because it will be efficient when the user and def are close to each other, as well as being efficient when the value is at the end of the block. Also, my bet is that most local dom queries return true. >> >> >> This seems like a good idea. >> >> It doesn't change the fact, however, that local dominance queries are O(n). We've ended up using OrderedBasicBlock in an increasing number of places, but there are a number of places where this is hard because of the plumbing required, or more importantly, the ambiguity around who owns the state of the cache at any given time. We know that there are a significant number of additional places where we should be using something like OrderedBasicBlock, but adding OBB into many of these places would be quite non-trivial. As indicated by the performance results briefly described in D51664, we have significant headroom. I don't see any really feasible way around these issues except moving the ownership of that state into the BB itself. >> >> Thanks again, >> Hal >> >> >> Have you tried this approach? It should be very easy to hack up to try out on your use case. >> >> -Chris >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> -- >> Hal Finkel >> Lead, Compiler Technology and Programming Languages >> Leadership Computing Facility >> Argonne National Laboratory > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Chris Lattner via llvm-dev
2018-Sep-25 04:54 UTC
[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance
> On Sep 24, 2018, at 10:19 AM, Reid Kleckner <rnk at google.com> wrote: > > To echo what Hal said, yes, it's a major change, but I think the improved complexity guarantees, simplicity, and elimination of certain classes of bugs is worth it. > > I think we have consensus that we should go forward with this. Would anyone mind formally stamping it in phab? So far everyone understandably has said "makes sense to me, but I don't consider myself to have authority to stamp this.”Hi Reid, I have significant concerns with this patch. I’d really appreciate it if you could address the discussion feedback here before just plowing forward with such an invasive change to the core IR. Have you done any memory use analysis or compile time impact analysis of this change? Have you consider the algorithm I mentioned? I am not motivated by your rationale above, and I think it will be a huge step forward and effectively define away the problem. -Chris
Possibly Parallel Threads
- RFC Storing BB order in llvm::Instruction for faster local dominance
- RFC Storing BB order in llvm::Instruction for faster local dominance
- RFC Storing BB order in llvm::Instruction for faster local dominance
- RFC Storing BB order in llvm::Instruction for faster local dominance
- RFC Storing BB order in llvm::Instruction for faster local dominance