thr3ads.net - llvm dev - [llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Reid Kleckner via llvm-dev

2018-Sep-19 20:30 UTC

[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

Hi folks,

I looked into some slow compiles and filed
https://bugs.llvm.org/show_bug.cgi?id=38829. The gist of it is that we
spend a lot of time iterating basic blocks to compute local dominance, i.e.
given two instructions in the same BB, which comes first.

LLVM already has a tool, OrderedBasicBlock, which attempts to address this
problem by building a lazy mapping from Instruction* to position. The
problem is that cache invalidation is hard. If we don't cache orderings at
a high enough level, our transformations become O(n^2). If we cache them
too much and insert instructions without renumbering the BB, we get
miscompiles. My solution is to hook into the actual BB ilist modification
methods, so that we can have greater confidence that our cache invalidation
is correct.

I created a patch for this at https://reviews.llvm.org/D51664, which adds a
lazily calculated position integer to every llvm::Instruction. I stole a
bit from BasicBlock's Value subclass data to indicate whether the orders
are valid.

Hopefully everyone agrees that this a reasonable direction. I just figured
I should announce this IR data structure change to the -dev list. :)

Reid
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180919/0643b09c/attachment.html>

Philip Reames via llvm-dev

2018-Sep-19 23:44 UTC

head link

[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

+1 to the general direction

My concern is the invalidation/recompute cost, but I think we can manage 
that.

Philip


On 09/19/2018 01:30 PM, Reid Kleckner via llvm-dev
wrote:> Hi folks,
>
> I looked into some slow compiles and filed 
> https://bugs.llvm.org/show_bug.cgi?id=38829. The gist of it is that we 
> spend a lot of time iterating basic blocks to compute local dominance, 
> i.e. given two instructions in the same BB, which comes first.
>
> LLVM already has a tool, OrderedBasicBlock, which attempts to address 
> this problem by building a lazy mapping from Instruction* to position. 
> The problem is that cache invalidation is hard. If we don't cache 
> orderings at a high enough level, our transformations become O(n^2). 
> If we cache them too much and insert instructions without renumbering 
> the BB, we get miscompiles. My solution is to hook into the actual BB 
> ilist modification methods, so that we can have greater confidence 
> that our cache invalidation is correct.
>
> I created a patch for this at https://reviews.llvm.org/D51664, which 
> adds a lazily calculated position integer to every llvm::Instruction. 
> I stole a bit from BasicBlock's Value subclass data to indicate 
> whether the orders are valid.
>
> Hopefully everyone agrees that this a reasonable direction. I just 
> figured I should announce this IR data structure change to the -dev 
> list. :)
>
> Reid
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180919/9e280943/attachment.html>

Finkel, Hal J. via llvm-dev

2018-Sep-20 18:20 UTC

head link

[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

On 09/19/2018 03:30 PM, Reid Kleckner via llvm-dev wrote:
Hi folks,

I looked into some slow compiles and filed
https://bugs.llvm.org/show_bug.cgi?id=38829. The gist of it is that we spend a
lot of time iterating basic blocks to compute local dominance, i.e. given two
instructions in the same BB, which comes first.

LLVM already has a tool, OrderedBasicBlock, which attempts to address this
problem by building a lazy mapping from Instruction* to position. The problem is
that cache invalidation is hard. If we don't cache orderings at a high
enough level, our transformations become O(n^2). If we cache them too much and
insert instructions without renumbering the BB, we get miscompiles. My solution
is to hook into the actual BB ilist modification methods, so that we can have
greater confidence that our cache invalidation is correct.

I created a patch for this at https://reviews.llvm.org/D51664, which adds a
lazily calculated position integer to every llvm::Instruction. I stole a bit
from BasicBlock's Value subclass data to indicate whether the orders are
valid.

Hopefully everyone agrees that this a reasonable direction. I just figured I
should announce this IR data structure change to the -dev list. :)

Sounds great!

 -Hal


Reid



_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180920/0ed91d84/attachment.html>

Chris Lattner via llvm-dev

2018-Sep-21 18:30 UTC

head link

[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

> On Sep 19, 2018, at 1:30 PM, Reid Kleckner via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi folks,
> 
> I looked into some slow compiles and filed
https://bugs.llvm.org/show_bug.cgi?id=38829
<https://bugs.llvm.org/show_bug.cgi?id=38829>. The gist of it is that we
spend a lot of time iterating basic blocks to compute local dominance, i.e.
given two instructions in the same BB, which comes first.
> 
> LLVM already has a tool, OrderedBasicBlock, which attempts to address this
problem by building a lazy mapping from Instruction* to position. The problem is
that cache invalidation is hard. If we don't cache orderings at a high
enough level, our transformations become O(n^2). If we cache them too much and
insert instructions without renumbering the BB, we get miscompiles. My solution
is to hook into the actual BB ilist modification methods, so that we can have
greater confidence that our cache invalidation is correct.
> 
> I created a patch for this at https://reviews.llvm.org/D51664
<https://reviews.llvm.org/D51664>, which adds a lazily calculated position
integer to every llvm::Instruction. I stole a bit from BasicBlock's Value
subclass data to indicate whether the orders are valid.
> 
> Hopefully everyone agrees that this a reasonable direction. I just figured
I should announce this IR data structure change to the -dev list. :)
I haven’t had a chance to look at the patch in detail yet (hopefully this
afternoon) but this sounds like a very invasive change to a core data structure.

The inner loop of the local dominance check in DominatorTree::dominates is also
not very well implemented: it does a single linear pass from the beginning of
the block until it finds the def or user.  A better algorithm would be to use
two pointers - one at the user and def.  Each time through the loop, move the
user iterator “up” the block, and the def iterator “down” the block.  Either the
iterators meet each other (in which case return true) or you fine the
beginning/end of the block.

This should work a lot better for many queries, because it will be efficient
when the user and def are close to each other, as well as being efficient when
the value is at the end of the block.  Also, my bet is that most local dom
queries return true.

Have you tried this approach?  It should be very easy to hack up to try out on
your use case.

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180921/9cd97000/attachment.html>

Finkel, Hal J. via llvm-dev

2018-Sep-21 18:49 UTC

head link

[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

On 09/21/2018 01:30 PM, Chris Lattner via llvm-dev wrote:


On Sep 19, 2018, at 1:30 PM, Reid Kleckner via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hi folks,

I looked into some slow compiles and filed
https://bugs.llvm.org/show_bug.cgi?id=38829. The gist of it is that we spend a
lot of time iterating basic blocks to compute local dominance, i.e. given two
instructions in the same BB, which comes first.

LLVM already has a tool, OrderedBasicBlock, which attempts to address this
problem by building a lazy mapping from Instruction* to position. The problem is
that cache invalidation is hard. If we don't cache orderings at a high
enough level, our transformations become O(n^2). If we cache them too much and
insert instructions without renumbering the BB, we get miscompiles. My solution
is to hook into the actual BB ilist modification methods, so that we can have
greater confidence that our cache invalidation is correct.

I created a patch for this at https://reviews.llvm.org/D51664, which adds a
lazily calculated position integer to every llvm::Instruction. I stole a bit
from BasicBlock's Value subclass data to indicate whether the orders are
valid.

Hopefully everyone agrees that this a reasonable direction. I just figured I
should announce this IR data structure change to the -dev list. :)

I haven’t had a chance to look at the patch in detail yet (hopefully this
afternoon) but this sounds like a very invasive change to a core data structure.


Indeed. Perhaps a long-overdue one ;)


The inner loop of the local dominance check in DominatorTree::dominates is also
not very well implemented: it does a single linear pass from the beginning of
the block until it finds the def or user.  A better algorithm would be to use
two pointers - one at the user and def.  Each time through the loop, move the
user iterator “up” the block, and the def iterator “down” the block.  Either the
iterators meet each other (in which case return true) or you fine the
beginning/end of the block.

This should work a lot better for many queries, because it will be efficient
when the user and def are close to each other, as well as being efficient when
the value is at the end of the block.  Also, my bet is that most local dom
queries return true.

This seems like a good idea.

It doesn't change the fact, however, that local dominance queries are O(n).
We've ended up using OrderedBasicBlock in an increasing number of places,
but there are a number of places where this is hard because of the plumbing
required, or more importantly, the ambiguity around who owns the state of the
cache at any given time. We know that there are a significant number of
additional places where we should be using something like OrderedBasicBlock, but
adding OBB into many of these places would be quite non-trivial. As indicated by
the performance results briefly described in D51664, we have significant
headroom. I don't see any really feasible way around these issues except
moving the ownership of that state into the BB itself.

Thanks again,
Hal


Have you tried this approach?  It should be very easy to hack up to try out on
your use case.

-Chris





_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180921/ab5ef21f/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Sep 2018 - RFC Storing BB order in llvm::Instruction for faster local dominance

[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

Possibly Parallel Threads