thr3ads.net - llvm dev - [llvm-dev] RFC: Reducing Instr PGO size overhead [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Xinliang David Li via llvm-dev

2015-Sep-04 05:26 UTC

[llvm-dev] RFC: Reducing Instr PGO size overhead

LLVM Profile instrumentation incurs very large size (memory, storage)
overhead. For instance, the following is the binary size comparison of
the Clang binaries built  (-O2 -DNDEBUG) with different
configurations:


1) 60.9M  (built with Clang itself)
2) 280.4M (same as 1, but added -fprofile-instr-generate)
3) 54.9M (built with GCC 4.8)
4) 156.5M (same as 3, but added -fprofile-generate)

In other words, Clang's instrumentation increases binary size by 4.6X,
while GCC's instrumentation overhead is only  2.8X.

The profile data size from Clang instrumented binary is also much
larger. The following is a comparison:

1) 114.5M (raw profile data emitted by Clang instrumented by Clang)
2) 63.7M (indexed profile data emitted by Clang instrumented by Clang)
3) 33.1M (total size of GCDA files emitted by Clang instrumented by GCC).

Large size of overhead can limit the usability of PGO greatly:
a) devices with small system partition does not have much room left --
greatly bloated image due to instrumentation can prevent it from being
installed
b) Linker can be highly stressed when linking very large C++
applications -- large size increase due to instrumentation can prevent
those apps from being successfully linked
c) Large raw profile dumps may also cause problems, e.g. running out
of space. It can also profile bootstrap build really slow.


Looking at various sources of size increase caused by instrumentation,
it is clear that the biggest contributor is the __llvm_prf_names
section. The current PGO implementation uses function  assembly names
as the lookup keys in the indexed format so it needs to be emitted in
both rodata of the binary and in the raw/indexed profiles.

On the other hand, LLVM's indexed format also supports 'key
collision'
-- it allows functions with the same key share the same entry in the
hash table. Function's structural hash values will be used as a
secondary key to find a real match.

The above feature allows us to use a different key other than function
assembly names and this can reduce binary/profile data size
significantly.  Function name's MD5 hash is a good candidate, and I
have a patch (3 parts: runtime, reader/writer, clang) that does just
that. The results are very promising. With this change, the clang
instrumented binary size is now 216M (down from 280M); the raw profile
size is only 40.1M (a 2.85X size reduction); and the indexed profile
size is only 29.5M (a 2.2X size reduction).

With the change, the indexed format size is smaller than GCC's (but
the latter has value profile data).  The binary size is still much
larger than GCC's, but with the late instrumentation, we will have
more size reduction.

A couple of more details of the patch:

1) When -fcoverage-mapping is specified, the llvm_prf_names will be
emitted to the binary, but they won't be dumped into the profile data.
In other words, with -fcoverage-mapping, only profile data will be
shrinked.   The reason is that llvm-cov tool reads function names from
the section (referenced from covmap) to support name based filtering
(including regular expression) when dumping line coverage report
2) The change is backward compatible such that old version of both raw
and index formats  can still be read by the new profile reader (and
therefore clients such as clang, llvm-profdata, llvm-cov tools)


I plan to submit the patch for review after some cleanups.

Thoughts, concerns?

thanks,

David

Sean Silva via llvm-dev

2015-Sep-04 19:27 UTC

head link

[llvm-dev] RFC: Reducing Instr PGO size overhead

On Thu, Sep 3, 2015 at 10:26 PM, Xinliang David Li via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> LLVM Profile instrumentation incurs very large size (memory, storage)
> overhead. For instance, the following is the binary size comparison of
> the Clang binaries built  (-O2 -DNDEBUG) with different
> configurations:
>
>
> 1) 60.9M  (built with Clang itself)
> 2) 280.4M (same as 1, but added -fprofile-instr-generate)
> 3) 54.9M (built with GCC 4.8)
> 4) 156.5M (same as 3, but added -fprofile-generate)
>
> In other words, Clang's instrumentation increases binary size by 4.6X,
> while GCC's instrumentation overhead is only  2.8X.
>
> The profile data size from Clang instrumented binary is also much
> larger. The following is a comparison:
>
> 1) 114.5M (raw profile data emitted by Clang instrumented by Clang)
> 2) 63.7M (indexed profile data emitted by Clang instrumented by Clang)
> 3) 33.1M (total size of GCDA files emitted by Clang instrumented by GCC).
>
> Large size of overhead can limit the usability of PGO greatly:
> a) devices with small system partition does not have much room left --
> greatly bloated image due to instrumentation can prevent it from being
> installed
> b) Linker can be highly stressed when linking very large C++
> applications -- large size increase due to instrumentation can prevent
> those apps from being successfully linked
> c) Large raw profile dumps may also cause problems, e.g. running out
> of space. It can also profile bootstrap build really slow.
>
>
> Looking at various sources of size increase caused by instrumentation,
> it is clear that the biggest contributor is the __llvm_prf_names
> section. The current PGO implementation uses function  assembly names
> as the lookup keys in the indexed format so it needs to be emitted in
> both rodata of the binary and in the raw/indexed profiles.
>
> On the other hand, LLVM's indexed format also supports 'key
collision'
> -- it allows functions with the same key share the same entry in the
> hash table. Function's structural hash values will be used as a
> secondary key to find a real match.
>
> The above feature allows us to use a different key other than function
> assembly names and this can reduce binary/profile data size
> significantly.  Function name's MD5 hash is a good candidate, and I
> have a patch (3 parts: runtime, reader/writer, clang) that does just
> that. The results are very promising. With this change, the clang
> instrumented binary size is now 216M (down from 280M); the raw profile
> size is only 40.1M (a 2.85X size reduction); and the indexed profile
> size is only 29.5M (a 2.2X size reduction).
>
> With the change, the indexed format size is smaller than GCC's (but
> the latter has value profile data).  The binary size is still much
> larger than GCC's, but with the late instrumentation, we will have
> more size reduction.
>
> A couple of more details of the patch:
>
> 1) When -fcoverage-mapping is specified, the llvm_prf_names will be
> emitted to the binary, but they won't be dumped into the profile data.
> In other words, with -fcoverage-mapping, only profile data will be
> shrinked.   The reason is that llvm-cov tool reads function names from
> the section (referenced from covmap) to support name based filtering
> (including regular expression) when dumping line coverage report
> 2) The change is backward compatible such that old version of both raw
> and index formats  can still be read by the new profile reader (and
> therefore clients such as clang, llvm-profdata, llvm-cov tools)
>
>
> I plan to submit the patch for review after some cleanups.
>
> Thoughts, concerns?
>
I think it is reasonable to simply replace the key we currently use with
MD5(key) for getting a size reduction.  In practice for my use cases, I
have not observed any of the issues you mentioned under "Large size of
overhead can limit the usability of PGO greatly", but I can understand that
some of these issues could become problems in Google's use case. I would
personally prefer to keep the existing behavior as the default (see below),
and have MD5(key) as an option.

My primary concern is that if the function name are not kept at all stages,
then it becomes difficult to analyze the profile data in a standalone way.
Many times, I have used `llvm-profdata show -all-functions foo.profdata` on
the resulting profile data and then imported that data into Mathematica for
analysis. My understanding of your proposal is that `llvm-profdata show
-all-functions foo.profdata` will not show the actual function names but
instead MD5 hashes, which will make it more difficult for me to do this
kind of analysis (would require using nm on the original binary, hashing
everything, etc.).

btw, feel free to attach the patch even if it in a rough state. It can
still help to clarify the proposal and be a good talking point.
Fine-grained patch review for caring about the rough parts will happen on
llvm-commits; the rough parts will not distract the discussion here on
llvm-dev.

-- Sean Silva

>
> thanks,
>
> David
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150904/e13df04e/attachment.html>

Xinliang David Li via llvm-dev

2015-Sep-04 22:57 UTC

head link

[llvm-dev] RFC: Reducing Instr PGO size overhead

>
> I think it is reasonable to simply replace the key we currently use with
> MD5(key) for getting a size reduction.  In practice for my use cases, I
have
> not observed any of the issues you mentioned under "Large size of
overhead
> can limit the usability of PGO greatly", but I can understand that
some of
> these issues could become problems in Google's use case. I would
personally
> prefer to keep the existing behavior as the default (see below), and have
> MD5(key) as an option.
The problem is that this requires profile format changes. It will be
very messy to support multiple formats in instr-codegen and
instr-runtime.  For compatibility concerns, the reader is taught to
support previous format, but the changes there are isolated (also
expected to be removed in the future).
>
> My primary concern is that if the function name are not kept at all stages,
> then it becomes difficult to analyze the profile data in a standalone way.
> Many times, I have used `llvm-profdata show -all-functions foo.profdata` on
> the resulting profile data and then imported that data into Mathematica for
> analysis.
This is certainly a very valid use case.
>My understanding of your proposal is that `llvm-profdata show
> -all-functions foo.profdata` will not show the actual function names but
> instead MD5 hashes,
Yes.

To support your use case, there are two solutions:

1) user can add -fcoverage-mapping option in the build
2) introduce a new option -fprofile-instr-names that force the
emission of the name sections in the .o file. This is similar to 1),
but no covmap section is needed.

llvm-profdata tool  will be taught to read the name section and attach
function names to the profile records.

Note that with 1) or 2), the user can still benefit from the reduced
profile size.

thanks,

David



>which will make it more difficult for me to do this kind
> of analysis (would require using nm on the original binary, hashing
> everything, etc.).
>
> btw, feel free to attach the patch even if it in a rough state. It can
still
> help to clarify the proposal and be a good talking point. Fine-grained
patch
> review for caring about the rough parts will happen on llvm-commits; the
> rough parts will not distract the discussion here on llvm-dev.
>
> -- Sean Silva
>
>>
>>
>> thanks,
>>
>> David
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>

Justin Bogner via llvm-dev

2015-Sep-08 18:40 UTC

head link

[llvm-dev] RFC: Reducing Instr PGO size overhead

Xinliang David Li <davidxl at google.com> writes:>>> The key type before the change is StringRef, while the after the
key
>>> type is uint64_t. Are you suggesting treating uint64_t md5 sum key
as
>>> a string of 8 bytes or storing md5 has in text form which will
double
>>> the size?
>>
>>
>> How much does this change the benefit? If most of the benefit is
avoiding
>> extraordinarily long mangled names then it may be sufficient.
>
> I implemented the same functionality using the approach suggested by you.
>
> 1) There is no format change required for raw/indexed profile data
> 2) when a user option is specified, md5hash string is used as the
> function key instead of of the raw function name
>
> In addition to that,  the option specified is also passed to the
> runtime and emitted into profile data. This allows different variants
> of the profile data encoded in the same format  to be automatically
> recognized by the client (clang -fprofile-use, llvm-profdata etc)
> without the need to specify the option again.
>
> There are some size increase, but increase seems to be in the range
> that is acceptable. Using clang as the test case. The indexed profile
> size is now 33.6 M (compared with 29.5M which is the optimal case).
> The raw profile size is 54.1M (compared with 40.1M from previous
> patch).
Have any of these patches you're referencing been posted at all? I came
to the thread late and it's a bit hard to follow which is which.

In any case, the idea sounds fine to me. Names in the profile can be
optional assuming we can safely differentiate the functions without them
(aside: on-by-default seems more user friendly than off-by-default, but
I don't have a strong opinion. Sean?).

I have a couple of high level thoughts on the approach:

Xinliang David Li via llvm-dev <llvm-dev at lists.llvm.org>
writes:> Looking at various sources of size increase caused by instrumentation,
> it is clear that the biggest contributor is the __llvm_prf_names
> section. The current PGO implementation uses function  assembly names
> as the lookup keys in the indexed format so it needs to be emitted in
> both rodata of the binary and in the raw/indexed profiles.
There's also the matter that we mangle the filename onto the symbol name
for non-ODR function names, which makes the names even larger than they
need to be. I imagine this isn't the biggest contributor (C++ mangled
names tend to be longer than filenames), but it is relatively easy to
fix independently of this change, by incorporating the filename into the
function's structural hash instead.
> On the other hand, LLVM's indexed format also supports 'key
collision'
> -- it allows functions with the same key share the same entry in the
> hash table. Function's structural hash values will be used as a
> secondary key to find a real match.
>
> The above feature allows us to use a different key other than function
> assembly names and this can reduce binary/profile data size
> significantly.  Function name's MD5 hash is a good candidate, and I
> have a patch (3 parts: runtime, reader/writer, clang) that does just
> that. The results are very promising. With this change, the clang
> instrumented binary size is now 216M (down from 280M); the raw profile
> size is only 40.1M (a 2.85X size reduction); and the indexed profile
> size is only 29.5M (a 2.2X size reduction).
It should be fairly unlikely, but we should probably consider hash
collisions briefly. If we hit a collision in function names that also
collides in the structural hash, there is no way to recover in this
scheme. So:

1. The effect is that we get bogus data in each of the affected
   functions (the sum of the counters in each set, essentially).

2. To hit this, we need two functions with basically the same control
   flow, that have names that collide. Functions having the same control
   flow is obviously valid and likely, so really we only care about MD5
   collisions on function names. Are C++ mangled names long enough that
   collisions are likely?

As long as we're comfortable that (1) isn't so bad and (2) is
sufficiently rare this should be fine. Otherwise, we should consider a
better hash - though that would probably mean changing from 64 bits to
128 and reducing the size less.

Regardless, we can probably take this to -commits and start discussing
the patches themselves. I don't think anyone's opposed to the general
idea, as long as you don't break the use case that Sean brought up.

Xinliang David Li via llvm-dev

2015-Sep-08 19:14 UTC

head link

[llvm-dev] RFC: Reducing Instr PGO size overhead

On Tue, Sep 8, 2015 at 11:40 AM, Justin Bogner <mail at justinbogner.com>
wrote:> Xinliang David Li <davidxl at google.com> writes:
>>>> The key type before the change is StringRef, while the after
the key
>>>> type is uint64_t. Are you suggesting treating uint64_t md5 sum
key as
>>>> a string of 8 bytes or storing md5 has in text form which will
double
>>>> the size?
>>>
>>>
>>> How much does this change the benefit? If most of the benefit is
avoiding
>>> extraordinarily long mangled names then it may be sufficient.
>>
>> I implemented the same functionality using the approach suggested by
you.
>>
>> 1) There is no format change required for raw/indexed profile data
>> 2) when a user option is specified, md5hash string is used as the
>> function key instead of of the raw function name
>>
>> In addition to that,  the option specified is also passed to the
>> runtime and emitted into profile data. This allows different variants
>> of the profile data encoded in the same format  to be automatically
>> recognized by the client (clang -fprofile-use, llvm-profdata etc)
>> without the need to specify the option again.
>>
>> There are some size increase, but increase seems to be in the range
>> that is acceptable. Using clang as the test case. The indexed profile
>> size is now 33.6 M (compared with 29.5M which is the optimal case).
>> The raw profile size is 54.1M (compared with 40.1M from previous
>> patch).
>
> Have any of these patches you're referencing been posted at all? I came
> to the thread late and it's a bit hard to follow which is which.
Not yet. I wanted to get the high level discussions done first. I will
post the patch soon.

>
> In any case, the idea sounds fine to me. Names in the profile can be
> optional assuming we can safely differentiate the functions without them
> (aside: on-by-default seems more user friendly than off-by-default, but
> I don't have a strong opinion. Sean?).
>
> I have a couple of high level thoughts on the approach:
>
> Xinliang David Li via llvm-dev <llvm-dev at lists.llvm.org> writes:
>> Looking at various sources of size increase caused by instrumentation,
>> it is clear that the biggest contributor is the __llvm_prf_names
>> section. The current PGO implementation uses function  assembly names
>> as the lookup keys in the indexed format so it needs to be emitted in
>> both rodata of the binary and in the raw/indexed profiles.
>
> There's also the matter that we mangle the filename onto the symbol
name
> for non-ODR function names, which makes the names even larger than they
> need to be. I imagine this isn't the biggest contributor (C++ mangled
> names tend to be longer than filenames), but it is relatively easy to
> fix independently of this change, by incorporating the filename into the
> function's structural hash instead.
>
The main file name is only added for functions with local linkage -- I
expect the size impact is small.


>> On the other hand, LLVM's indexed format also supports 'key
collision'
>> -- it allows functions with the same key share the same entry in the
>> hash table. Function's structural hash values will be used as a
>> secondary key to find a real match.
>>
>> The above feature allows us to use a different key other than function
>> assembly names and this can reduce binary/profile data size
>> significantly.  Function name's MD5 hash is a good candidate, and I
>> have a patch (3 parts: runtime, reader/writer, clang) that does just
>> that. The results are very promising. With this change, the clang
>> instrumented binary size is now 216M (down from 280M); the raw profile
>> size is only 40.1M (a 2.85X size reduction); and the indexed profile
>> size is only 29.5M (a 2.2X size reduction).
>
> It should be fairly unlikely, but we should probably consider hash
> collisions briefly. If we hit a collision in function names that also
> collides in the structural hash, there is no way to recover in this
> scheme. So:
>
In an experiment done for value profiler, the number of collisions
with md5 sum (lower 64bit) is zero with a set of 3 million symbol
names.  Having two functions collide in both the name hash and
structural hash will indeed be too low to be worried about.
> 1. The effect is that we get bogus data in each of the affected
>    functions (the sum of the counters in each set, essentially).
yes -- I would classify this is a case only existing in theory though.
>
> 2. To hit this, we need two functions with basically the same control
>    flow, that have names that collide. Functions having the same control
>    flow is obviously valid and likely, so really we only care about MD5
>    collisions on function names. Are C++ mangled names long enough that
>    collisions are likely?
Our experiment shows that name collision is unlikely.    Structure
hash collision rate depends on the size of the function body (where
late instrumentation may help).
>
> As long as we're comfortable that (1) isn't so bad and (2) is
> sufficiently rare this should be fine. Otherwise, we should consider a
> better hash - though that would probably mean changing from 64 bits to
> 128 and reducing the size less.
Actually - current implementation is not collision free either .. I
think if it becomes an issue (which I doubt) in the future, we can
even introduce an option to control the length of the md5 key size
(the benefit is that user can further shrink it they desire to do so).
>
> Regardless, we can probably take this to -commits and start discussing
> the patches themselves. I don't think anyone's opposed to the
general
> idea, as long as you don't break the use case that Sean brought up.
Yes.

thanks,

David

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Sep 2015 - RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead

Reasonably Related Threads