thr3ads.net - llvm dev - [llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Reid Kleckner via llvm-dev

2018-Sep-25 18:21 UTC

[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

On Mon, Sep 24, 2018 at 10:07 PM Chris Lattner <clattner at nondot.org>
wrote:
> > On Sep 24, 2018, at 9:54 PM, Chris Lattner <clattner at
nondot.org> wrote:
> >> I think we have consensus that we should go forward with this.
Would
> anyone mind formally stamping it in phab? So far everyone understandably
> has said "makes sense to me, but I don't consider myself to have
authority
> to stamp this.”
> >
> > Hi Reid,
> >
> > I have significant concerns with this patch.  I’d really appreciate it
> if you could address the discussion feedback here before just plowing
> forward with such an invasive change to the core IR.
>
Sorry for that, I'm at a conference and I'm trying to push this along in
the background without doing any real work on it. I've convinced myself
it's the right change, and that's enough to make me impatient.

> > Have you done any memory use analysis or compile time impact analysis
of
> this change?  Have you consider the algorithm I mentioned?  I am not
> motivated by your rationale above, and I think it will be a huge step
> forward and effectively define away the problem.
>
> To be clear, I’m a superfan of your work and agree that “automatic
> invalidation and simplification” are good goals.  I’d just love to see more
> analysis of the situation before burning space on this extra field.
>
> For example, it would be very interesting to see what the typical
> distribution of local dominance queries actually are.  You could take your
> patch here and have it instrument the “Instruction::comesBefore” method to
> log the distance between the instruction numbers to a file and generate a
> histogram of them.   My bet is that the vast vast vast majority of them
> will be small (and I’d guess a HUGE number of them have a distance of
> +1/-1, whose query can be resolved without computing a numbering in the
> first place).
>
> Can you do that experiment and report back, e.g. on the SemaChecking.cpp
> testcase?
>
Fair enough. I made a patch to do that, and uploaded it here:
https://reviews.llvm.org/D52510
I put it into a spreadsheet and made a histogram here: https://goo.gl/AMLBt7
I grouped the typical query distances into natural log buckets, and the
table is small enough to paste here:

distance freq
1-2 3707
2-7 5622
7-20 25764
20-54 1726
54-148 2052
148-403 4462
403-1096 8423
1096-2980 20487
2980-8103 34930
8103-22026 63759
22026-59874 41729

I think the table shows that local dominance queries are, at least for
SemaChecking.cpp, not clustered around small distances. The average query
is somewhere out around 11,000 instructions apart.

One thing to note is that the code in question isn't using
DominatorTree::dominates. That was identified as a hotspot, so it's using
OrderedBasicBlock, but it's not caching it at a high enough level to avoid
quadratic behavior. Raising it up isn't easy, hence this patch.

Sanjoy also wanted more data and suggested that I re-run the max RSS during
full LTO of llc experiment that I did when I removed the vptr from Value. I
went ahead and did that on the review, but it's not on the dev list, so it
may not have enough visibility. I found that the max RSS before the patch
was 4816464 kb, and after, 4867836 kb. That's an increase of 1.06%, or in
absolute terms, ~50MB of memory spent on instruction positions during full
LTO of llc.

I think Sanjoy's bit-stealing suggestion is a worthwhile alternative to
growing Instruction, but I really think we should analyze it from the
position of, is the complexity of bit-stealing from Parent worth it to
shrink Instruction?

Anyway, hopefully this looks compelling. I'll put some of it in the patch
description before landing.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180925/d660c09b/attachment.html>

Sanjoy Das via llvm-dev

2018-Sep-25 19:16 UTC

head link

[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

On Tue, Sep 25, 2018 at 11:22 AM Reid Kleckner via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> On Mon, Sep 24, 2018 at 10:07 PM Chris Lattner <clattner at
nondot.org> wrote:
>>
>> > On Sep 24, 2018, at 9:54 PM, Chris Lattner <clattner at
nondot.org> wrote:
>> >> I think we have consensus that we should go forward with this.
Would anyone mind formally stamping it in phab? So far everyone understandably
has said "makes sense to me, but I don't consider myself to have
authority to stamp this.”
>> >
>> > Hi Reid,
>> >
>> > I have significant concerns with this patch.  I’d really
appreciate it if you could address the discussion feedback here before just
plowing forward with such an invasive change to the core IR.
>
>
> Sorry for that, I'm at a conference and I'm trying to push this
along in the background without doing any real work on it. I've convinced
myself it's the right change, and that's enough to make me impatient.
>
>>
>> > Have you done any memory use analysis or compile time impact
analysis of this change?  Have you consider the algorithm I mentioned?  I am not
motivated by your rationale above, and I think it will be a huge step forward
and effectively define away the problem.
>>
>> To be clear, I’m a superfan of your work and agree that “automatic
invalidation and simplification” are good goals.  I’d just love to see more
analysis of the situation before burning space on this extra field.
>>
>> For example, it would be very interesting to see what the typical
distribution of local dominance queries actually are.  You could take your patch
here and have it instrument the “Instruction::comesBefore” method to log the
distance between the instruction numbers to a file and generate a histogram of
them.   My bet is that the vast vast vast majority of them will be small (and
I’d guess a HUGE number of them have a distance of +1/-1, whose query can be
resolved without computing a numbering in the first place).
>>
>> Can you do that experiment and report back, e.g. on the
SemaChecking.cpp testcase?
>
>
> Fair enough. I made a patch to do that, and uploaded it here:
https://reviews.llvm.org/D52510
> I put it into a spreadsheet and made a histogram here:
https://goo.gl/AMLBt7
> I grouped the typical query distances into natural log buckets, and the
table is small enough to paste here:
>
> distance freq
> 1-2 3707
> 2-7 5622
> 7-20 25764
> 20-54 1726
> 54-148 2052
> 148-403 4462
> 403-1096 8423
> 1096-2980 20487
> 2980-8103 34930
> 8103-22026 63759
> 22026-59874 41729
>
> I think the table shows that local dominance queries are, at least for
SemaChecking.cpp, not clustered around small distances. The average query is
somewhere out around 11,000 instructions apart.
>
> One thing to note is that the code in question isn't using
DominatorTree::dominates. That was identified as a hotspot, so it's using
OrderedBasicBlock, but it's not caching it at a high enough level to avoid
quadratic behavior. Raising it up isn't easy, hence this patch.
>
> Sanjoy also wanted more data and suggested that I re-run the max RSS during
full LTO of llc experiment that I did when I removed the vptr from Value. I went
ahead and did that on the review, but it's not on the dev list, so it may
not have enough visibility. I found that the max RSS before the patch was
4816464 kb, and after, 4867836 kb. That's an increase of 1.06%, or in
absolute terms, ~50MB of memory spent on instruction positions during full LTO
of llc.
>
> I think Sanjoy's bit-stealing suggestion is a worthwhile alternative to
growing Instruction, but I really think we should analyze it from the position
of, is the complexity of bit-stealing from Parent worth it to shrink
Instruction?
Let's not assume a dichotomy between storing a int64 in
llvm::Instruction and bitwise tricks -- they're both ways of caching
position within llvm::Instruction.  I think we first need to establish
that we need such a cache in llvm::Instruction/llvm::BasicBlock at
all.

Do you have a sense of where these expensive domtree queries are
coming from?  Are they from a couple of key places or are they
scattered throughout the pass pipeline?

-- Sanjoy


-- Sanjoy

>
> Anyway, hopefully this looks compelling. I'll put some of it in the
patch description before landing.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Reid Kleckner via llvm-dev

2018-Sep-25 21:35 UTC

head link

[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

On Tue, Sep 25, 2018 at 12:16 PM Sanjoy Das <sanjoy at
playingwithpointers.com>
wrote:
> Let's not assume a dichotomy between storing a int64 in
> llvm::Instruction and bitwise tricks -- they're both ways of caching
> position within llvm::Instruction.  I think we first need to establish
> that we need such a cache in llvm::Instruction/llvm::BasicBlock at
> all.
>
> Do you have a sense of where these expensive domtree queries are
> coming from?  Are they from a couple of key places or are they
> scattered throughout the pass pipeline?
>
When I dug into the profile with WPA, most of the time was spent in DSE and
memcpyopt, which call AAResults::callCapturesBefore, which calls
llvm::PointerMayBeCapturedBefore. Neither pass in an OrderedBasicBlock, so
they rebuild the OrderedBasicBlock in linear time on every query. These
passes insert instructions, so it's not correct to simply create and reuse
an OrderedBasicBlock at this level.

As suggested in the bug, if we were to rewrite these passes to use
MemorySSA, this bottleneck would go away. I rebased a patch to do that for
DSE, but finishing it off and enabling it by default is probably out of
scope for me.

These key call sites aside, dominance seems like a pretty common query. We
have several somewhat overlapping mechanisms for checking dominance:
- DominatorTree::dominates(Instruction*,Instruction*), the original
interface, widely known as a bottleneck
- OrderedBasicBlock, as used by memdep, my motivating example
- OrderedInstructions, uses OrderedBasicBlock, used by GVN and others
- MemorySSA has a similar ordering of memory values that is also
invalidated when inserting memory defs. I may have misunderstood how this
works, though.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180925/952d0082/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Sep 2018 - RFC Storing BB order in llvm::Instruction for faster local dominance

[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

Possibly Parallel Threads