thr3ads.net - llvm dev - [llvm-dev] [lld] We call SymbolBody::getVA redundantly a lot... [Mar 2017]

If this information is useful, please help other people find it:
Share via:

Sean Silva via llvm-dev

2017-Feb-28 12:19 UTC

[llvm-dev] [lld] We call SymbolBody::getVA redundantly a lot...

tl;dr: it looks like we call SymbolBody::getVA about 5x more times than we
need to

Should we cache it  or something? (careful with threads).


Here is a link to a PDF of my Mathematica notebook which has all the
details of my investigation:
https://drive.google.com/open?id=0B8v10qJ6EXRxVDQ3YnZtUlFtZ1k


There seem to be two main regimes that we redundantly call
SymbolBody::getVA:

1. most redundant calls on the same symbol (about 80%) happen in quick
succession with few intervening calls for other symbols. Most likely we are
processing a bunch of relocations right next to each other that all refer
to the same symbol (or small set of symbols); e.g. within a TU

2. there is a long-ish tail (about 20% of calls to SymbolBody::getVA) which
happen at a long temporal distance from any previous call to
SymbolBody::getVA on the same symbol. I don't know off the top of my head
where these are coming from, but it doesn't sound like relocations. A quick
grepping shows a bunch of source locations that match getVA, so it's hard
at a glance to see. Any ideas where these other calls are coming from?

The particular link I was looking at was a release without debug info link,
using `-O0 --no-gc-sections --no-threads`. The particular test case is LLD
itself.

-- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170228/55713303/attachment-0001.html>

Rafael Avila de Espindola via llvm-dev

2017-Feb-28 17:47 UTC

head link

[llvm-dev] [lld] We call SymbolBody::getVA redundantly a lot...

Sean Silva <chisophugis at gmail.com> writes:
> tl;dr: it looks like we call SymbolBody::getVA about 5x more times than we
> need to
>
> Should we cache it  or something? (careful with threads).
Maybe. It might be the case that there are multiple relocations to the
same symbol. It can also be the case that we look for it to find the
value to put in a symbol table.

The cost of the call is very different depending on what section the
symbol is in. One thing I think we can do is move the symbols out in the
section merge hierarchy.

For example, a symbol initially points to a MergeInputSection, but we
could then change it to point to a SyntheticSection or even an output
section.

Cheers,
Rafael

Rui Ueyama via llvm-dev

2017-Feb-28 20:10 UTC

head link

[llvm-dev] [lld] We call SymbolBody::getVA redundantly a lot...

I don't think getVA is particularly expensive, and if it is not expensive I
wouldn't cache its result. Did you experiment to cache getVA results? I
think you can do that fairly easily by adding a std::atomic_uint64_t to
SymbolBody and use it as a cache for getVA.

On Tue, Feb 28, 2017 at 4:19 AM, Sean Silva <chisophugis at gmail.com>
wrote:
> tl;dr: it looks like we call SymbolBody::getVA about 5x more times than we
> need to
>
> Should we cache it  or something? (careful with threads).
>
>
> Here is a link to a PDF of my Mathematica notebook which has all the
> details of my investigation:
> https://drive.google.com/open?id=0B8v10qJ6EXRxVDQ3YnZtUlFtZ1k
>
>
> There seem to be two main regimes that we redundantly call
> SymbolBody::getVA:
>
> 1. most redundant calls on the same symbol (about 80%) happen in quick
> succession with few intervening calls for other symbols. Most likely we are
> processing a bunch of relocations right next to each other that all refer
> to the same symbol (or small set of symbols); e.g. within a TU
>
> 2. there is a long-ish tail (about 20% of calls to SymbolBody::getVA)
> which happen at a long temporal distance from any previous call to
> SymbolBody::getVA on the same symbol. I don't know off the top of my
head
> where these are coming from, but it doesn't sound like relocations. A
quick
> grepping shows a bunch of source locations that match getVA, so it's
hard
> at a glance to see. Any ideas where these other calls are coming from?
>
> The particular link I was looking at was a release without debug info
> link, using `-O0 --no-gc-sections --no-threads`. The particular test case
> is LLD itself.
>
> -- Sean Silva
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170228/6a378b9f/attachment.html>

Sean Silva via llvm-dev

2017-Mar-01 07:15 UTC

head link

[llvm-dev] [lld] We call SymbolBody::getVA redundantly a lot...

On Tue, Feb 28, 2017 at 12:10 PM, Rui Ueyama <ruiu at google.com> wrote:
> I don't think getVA is particularly expensive, and if it is not
expensive
> I wouldn't cache its result. Did you experiment to cache getVA results?
I
> think you can do that fairly easily by adding a std::atomic_uint64_t to
> SymbolBody and use it as a cache for getVA.
>

You're right, caching it didn't have any significant effect (though I
wasn't measuring super precisely ). I think I was remembering the profile
wrong. I remember measuring that we had some very bad cache/TLB misses
here, but I guess those aren't too important on the current profile (at
least, not on this test case; the locality of these accesses depends a lot
on the test case).

Also, it seems like our performance is a lot more stable w.r.t.
InputSectionBase::relocate than it used to be (or maybe my current CPU is
just less affected; it's a desktop class processor instead of a xeon).


I took a quick profile of this workload and it looks like it is:

65% in the writer ("backend")
30% in the "frontend" (everything called by SymbolTable::addFile)

The frontend work seems to be largely dominated by ObjectFile::parse (as
you would expect), though there is about 10% of total runtime slipping
through the cracks here in various other "frontend" tasks.

The backend work is split about evenly between scanRelocations and
OutputSection::writeTo. InputSectionBase::relocate is only about 10% of the
total runtime (part of OutputSection::writeTo).

Some slightly cleaned up `perf report` output with some more details:
https://reviews.llvm.org/P7972

So it seems like overall, the profile is basically split 3 ways (about 30%
each):
- frontend (reading input files and building the symbol table and
associated data structures)
- scanRelocations (initial pass over relocations)
- writeTo (mostly IO and InputSectionBase::relocate)

-- Sean Silva

>
> On Tue, Feb 28, 2017 at 4:19 AM, Sean Silva <chisophugis at
gmail.com> wrote:
>
>> tl;dr: it looks like we call SymbolBody::getVA about 5x more times than
>> we need to
>>
>> Should we cache it  or something? (careful with threads).
>>
>>
>> Here is a link to a PDF of my Mathematica notebook which has all the
>> details of my investigation:
>> https://drive.google.com/open?id=0B8v10qJ6EXRxVDQ3YnZtUlFtZ1k
>>
>>
>> There seem to be two main regimes that we redundantly call
>> SymbolBody::getVA:
>>
>> 1. most redundant calls on the same symbol (about 80%) happen in quick
>> succession with few intervening calls for other symbols. Most likely we
are
>> processing a bunch of relocations right next to each other that all
refer
>> to the same symbol (or small set of symbols); e.g. within a TU
>>
>> 2. there is a long-ish tail (about 20% of calls to SymbolBody::getVA)
>> which happen at a long temporal distance from any previous call to
>> SymbolBody::getVA on the same symbol. I don't know off the top of
my head
>> where these are coming from, but it doesn't sound like relocations.
A quick
>> grepping shows a bunch of source locations that match getVA, so
it's hard
>> at a glance to see. Any ideas where these other calls are coming from?
>>
>> The particular link I was looking at was a release without debug info
>> link, using `-O0 --no-gc-sections --no-threads`. The particular test
case
>> is LLD itself.
>>
>> -- Sean Silva
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170228/a91e415b/attachment.html>

Sean Silva via llvm-dev

2017-Mar-01 22:33 UTC

head link

[llvm-dev] [lld] We call SymbolBody::getVA redundantly a lot...

On Tue, Feb 28, 2017 at 9:47 AM, Rafael Avila de Espindola <
rafael.espindola at gmail.com> wrote:
> Sean Silva <chisophugis at gmail.com> writes:
>
> > tl;dr: it looks like we call SymbolBody::getVA about 5x more times
than
> we
> > need to
> >
> > Should we cache it  or something? (careful with threads).
>
> Maybe. It might be the case that there are multiple relocations to the
> same symbol. It can also be the case that we look for it to find the
> value to put in a symbol table.
>
> The cost of the call is very different depending on what section the
> symbol is in. One thing I think we can do is move the symbols out in the
> section merge hierarchy.
>
> For example, a symbol initially points to a MergeInputSection, but we
> could then change it to point to a SyntheticSection or even an output
> section.
>
You mean, redirect the symbol to point to the finalized string table? I
kind of like that idea.
One issue with that though is the section symbol semantics. It's not
possible to correctly handle that just by redirecting the symbol. (I really
don't like the semantics of that)

-- Sean Silva

>
> Cheers,
> Rafael
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170301/daf83202/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Mar 2017 - [lld] We call SymbolBody::getVA redundantly a lot...

[llvm-dev] [lld] We call SymbolBody::getVA redundantly a lot...

[llvm-dev] [lld] We call SymbolBody::getVA redundantly a lot...

[llvm-dev] [lld] We call SymbolBody::getVA redundantly a lot...

[llvm-dev] [lld] We call SymbolBody::getVA redundantly a lot...

[llvm-dev] [lld] We call SymbolBody::getVA redundantly a lot...

Possibly Parallel Threads