thr3ads.net - llvm dev - [llvm-dev] [InstrProfiling] Lightweight Instrumentation [Oct 2021]

If this information is useful, please help other people find it:
Share via:

Xinliang David Li via llvm-dev

2021-Oct-18 19:29 UTC

[llvm-dev] [InstrProfiling] Lightweight Instrumentation

Hi Ellis, thanks for the proposal. Improving the usability of
Instrumentation PGO is indeed very important.
>From the results data below, if I understand correctly, the main savingsare from supporting the coverage mode (using boolean counters), right? If
we only enable that, the meta data based IRPGO clang size will be 10 MB
larger (__llvm_prf_names are strippible or easily doable).

About __llvm_prof_data -- it also serves the purpose of detecting CFG
mismatch (with cfg hashing). On the other hand, about half of the size is
used for value profiling purposes, so for coverage mode when value profile
is not needed, its size can be cut in half -- leaving the total overhead to
be roughly 7 MiB, very close to the debug info based matching scheme.

I support the proposal related to different profiling modes (entry only,
boolean counter).  I suggest having those features upstreamed. In addition,
changes that can reduce existing IRPGO size (e.g, strippable name section,
reduced __llvm_prof_data) are also very welcome. After those are done, we
will have a better idea if the size is still an issue (with a better
comparison with the debug info based method).

thanks,

David


On Mon, Oct 18, 2021 at 10:28 AM Ellis Hoag <ellis.sparky.hoag at
gmail.com>
wrote:
> *RFC: Lightweight Instrumentation*
>
> Hi all,
>
> Our team at Facebook would like to propose a lightweight variant of IR
> instrumentation PGO for use in the mobile space. IRPGO is a proven
> technology in LLVM that can boost performance for server workloads.
> However, the larger binary resulting from instrumentation significantly
> limits its use for mobile applications. In this proposal, we introduce a
> few changes to IRPGO to reduce the instrumented binary size, making it
> suitable for PGO on mobile devices.
>
> This proposal is driven by the same need behind the earlier MIP (machine
> IR profile) prototype <https://reviews.llvm.org/D104060>. But unlike
MIP
> where there is significant divergence from IRPGO, this proposed lightweight
> instrumentation fits into the existing IRPGO framework with a few
> extensions to achieve a smaller instrumented binary.
>
> We’d like to share the new design and results from our prototype and get
> feedback.
>
> Best,
> Ellis, Kyungwoo, and Wenlei
> Motivation
> In the mobile space, profile guided optimization can also have an outsized
> impact on performance just like PGO for server workloads, but conventional
> instrumentation comes with a large binary size and code size increase as
> high as 50%, which limits its use for mobile application for two reasons:
>
>    - Mobile applications are very sensitive to total binary size as
>    larger binaries take longer to download and use more space on devices.
>    There could be a hard size limit for over-the-air (OTA) updates for this
>    reason.
>    - When code (.text) size increases, it takes longer for applications
>    to start up and could also degrade runtime performance due to more page
>    faults on devices with limited RAM.
>
> Reducing the size overhead from instrumentation would make IRPGO usable
> for mobile applications so we could send instrumented binaries through OTA
> updates in production environments, collect representative production
> profiles, and apply PGO.
> OverviewThe size overhead from IRPGO mainly comes from two things: 1)
> metadata for mapping raw counts back to IR/CFG, which has to stay with the
> binary. 2) the increased .text size due to insertion of instrumented code
> and less effective optimization after instrumentation. Two extensions are
> proposed to reduce the size overhead from each of the above:
>
>    - We allow the use of debug info / dwarf as alternative metadata for
>    mapping counts to IR, aka profile correlation. Debug info is extractable
>    from the binary, therefore such metadata doesn’t need to be shipped to
>    mobile devices. Debug info has been used extensively for sampling based
PGO
>    in LLVM, so it has reasonable quality to support profile correlation.
>    - We add the flexibility to allow coarse grained instrumentation that:
>    1) only insert probes at function entry instead of each block (or blocks
>    decided by MST placement); 2) optional coverage mode using one byte
>    booleans in addition to today’s counting mode using 8 byte counters.
>
> The extensions offer a spectrum of trade-off choices from the most
> accurate PGO to something very lightweight that can be used in mobile
> space. With debug info extracted and using function entry coverage mode,
> the size increase can be reduced from close to 50% down to below 5%
> (measured with clang self-build PGO).
> Extractable MetadataWith today’s IRPGO, the instrumentation runtime dumps
> out a profraw profile at the end of training. The runtime creates a
> header and appends data from the __llvm_prf_data, __llvm_prf_cnts, and
> __llvm_prf_names sections to create a profraw profile. The __llvm_prf_data
> section contains references to each function’s profile data (in
> __llvm_prf_cnts) and name (in __llvm_prf_names) so they are needed to
> correlate profile data to the functions they instrument.
>
> Some kind of metadata to correlate counts back to IR (specifically CFG
> blocks) is unavoidable. One way to reduce binary size is to make such
> metadata extractable so they don’t have to be shipped to mobile devices. We
> could make __llvm_prf_data and __llvm_prf_names extractable, but the cost
> will be non-trivial and it will be a breaking change. On the other hand,
> debug info is extractable from binary and it already does a very good job
> of maintaining mapping between address and source location / symbols.
> Sample PGO depends entirely on debug info for profile correlation. So we
> picked debug info as the alternative for extractable metadata.
>
> In our proposed instrumentation, we create a special global struct, e.g.,
> __profc__Z3foov, to hold counters for a particular function. The
> __llvm_prf_cnts data section holds all of these structs and serves as
> placeholder for raw profile counters. In our final instrumented binary, we
> only have probe instructions and raw profile data without any
> instrumentation metadata, i.e., there are no __llvm_prf_names or
> __llvm_prf_data sections but we still have a __llvm_prf_cnts section. At
> runtime, we dump the __llvm_prf_cnts section to a file without any
> processing after profiling. To differentiate from IRPGO, the output from
> runtime is called proflite and we can add another VARIANT_MASK_ flag to
> the Version field of the profile header. At llvm-profdata post-processing
> time, we use debug info to correlate our raw profile data as follows. First
> we identify an instrumented function and look for its special global struct
> that holds counters (__profc__Z3foov) in the debug info. The debug info
> can tell us the address of that symbol in the binary and we can compute its
> offset from the __llvm_prof_cnts section. Then we can use that offset to
> read the function entry and block counters from the proflite file.
> Finally we populate profdata output for each function following the
> existing format.
>
> Value profile is not going to be supported with extractable metadata right
> now, though we believe it can also be added following a similar scheme.
>
> To improve debug info quality for profile correlation,
> -fdebug-info-for-profiling from AutoFDO can be used. Additionally, we
> could also use pseudo-probe from CSSPGO
> <https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s/m/hBrJaOWVAwAJ>
as
> the alternative metadata which is also fully extractable.
>
> We propose a new flag
> -fprofile-generate-correlate=[profdata|debug-info|pseudo-probe] to choose
> what metadata to use for profile correlation. Either we correlate with
> today’s IRPGO metadata and keep them in their own sections (__llvm_prf_data
> and __llvm_prf_names), with debug info, or with pseudo-probe.
> Coarse-grained InstrumentationIn addition to reducing metadata size (
> __llvm_prf_names and __llvm_prf_data), we can also tune down .text size
> and __llvm_prf_cnts size. We do this by 1) only instrumenting function
> entries instead of each block and 2) lowering precision by tracking single
> byte coverage data rather than 8 byte counters. This is a trade-off between
> profile quality and binary size.
>
> Function profile vs block profile and counting mode vs coverage mode can
> all be selected independently using our proposed flag
> -fprofile-generate-mode=[func-cov|block-cov|func-cnt|block-cnt], and they
> can work with both extractable metadata as well as IRPGO‘s correlation
> method. func-cov and block-cov use single byte booleans for coverage data
> while func-cnt and block-cnt use 8 byte counters. block-cnt represents
> today’s IRPGO which is the default.
>
> When using a profile generated from modes other than block-cnt,
> additional profile inference is needed before the counts can be consumed by
> optimizations. Such inference is done during profile loading and so it’s
> transparent to optimizations.
>
>    - For block coverage mode, we will use coverage info to seed block
>    count inference, and leverage static branch probability at the same time
to
>    produce a CFG profile that honors zero count blocks and converts live
block
>    coverage data into synthetic counts.
>    - For function count mode, we will derived a CFG profile entirely from
>    static branch probability, then scale the CFG profile based on function
>    entry count.
>    - Function coverage mode is handled similar to function count mode.
>    For covered/live functions, we will derived a CFG profile entirely from
>    static branch probability first, then scale that CFG profile by a
constant.
>
> Experiments showed that even with coarse-grained function entry profiles,
> mobile application can still benefit from PGO. But the smaller binary make
> it possible for mobile to use PGO.
> WorkflowSince these are extensions that share the same underlying PGO
> framework, the workflow for lightweight PGO is very similar to existing
> IRPGO.
>
> The diagram below has the PGO workflow today (shown in red) in comparison
> with the workflow for lightweight instrumentation (shown in green). We
> first create an instrumentation build that produces a raw profile at
> runtime. Then we use the llvm-profdata tool to convert that raw profile
> to a profile that the compiler can consume in the PGO build. The main
> difference for lightweight instrumentation is that we create an
> instrumentation build with debug info and we use that debug info to create
> our final profile.
>
> [image: image.png]Prototype & Results
> We have a proof of concept
>
<https://github.com/ellishg/llvm-project/commits/instr-correlate-debug-info>
> using dwarf as the extractable metadata and single byte function coverage
> instrumentation. We measured code size by building Clang with and without
> instrumentation using -Oz and no value profiling. Our lightweight
> instrumented Clang binary is only +4 MB (+3.48%) larger than a
> non-instrumented binary. We compare this with today’s PGO instrumentation
> Clang binary which is +54 MB (+46.96%) larger. If we used debug info to
> correlate normal instrumentation (without value profiling) instead of just
> function coverage then we would expect to see an overhead of +43.2 MB
> (+37.5%). We don’t have performance data on clang experiments using the
> prototype since not all components are implemented. However, an alternative
> implementation earlier (similar to MIP) delivered good performance boost
> for mobile applications.
>
> [image: table-large.jpg]
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211018/ce84f53b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 230544 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211018/ce84f53b/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: table-large.jpg
Type: image/jpeg
Size: 310107 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211018/ce84f53b/attachment-0001.jpg>

Wenlei He via llvm-dev

2021-Oct-18 20:38 UTC

head link

[llvm-dev] [InstrProfiling] Lightweight Instrumentation

Thanks for the feedback, David.

You’re right that most of the savings comes from coarse-grained instrumentation.
However, the situation we’re facing for mobile (and also embedded systems) comes
with very tight size constraints. Some components are already built with -Oz,
and we’re constantly on the lookout for extra MiB to save so more “features” can
get into the components.

For clang self-build example, 7M overhead is much better than 50M+, and
50M->7M indeed look close to 50M->4M as improvements. But comparing to
non-PGO, this is still +7M vs +4M. The extra 3M is considered quite significant,
and could potentially be a deal breaker for some cases.

In short, this is “close” as you mentioned, but not good enough still. Using
dwarf as metadata also has a few benefits over tweaking existing metadata to be
extractable: it’s less intrusive, and it’s also a more standardized metadata
comparing to PGO’s own metadata.

Thanks,
Wenlei

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Xinliang
David Li via llvm-dev <llvm-dev at lists.llvm.org>
Date: Monday, October 18, 2021 at 1:14 PM
To: Ellis Hoag <ellis.sparky.hoag at gmail.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] [InstrProfiling] Lightweight Instrumentation
Hi Ellis, thanks for the proposal. Improving the usability of Instrumentation
PGO is indeed very important.
>From the results data below, if I understand correctly, the main savings are
from supporting the coverage mode (using boolean counters), right? If we only
enable that, the meta data based IRPGO clang size will be 10 MB larger
(__llvm_prf_names are strippible or easily doable).
About __llvm_prof_data -- it also serves the purpose of detecting CFG mismatch
(with cfg hashing). On the other hand, about half of the size is used for value
profiling purposes, so for coverage mode when value profile is not needed, its
size can be cut in half -- leaving the total overhead to be roughly 7 MiB, very
close to the debug info based matching scheme.

I support the proposal related to different profiling modes (entry only, boolean
counter).  I suggest having those features upstreamed. In addition, changes that
can reduce existing IRPGO size (e.g, strippable name section, reduced
__llvm_prof_data) are also very welcome. After those are done, we will have a
better idea if the size is still an issue (with a better comparison with the
debug info based method).

thanks,

David


On Mon, Oct 18, 2021 at 10:28 AM Ellis Hoag <ellis.sparky.hoag at
gmail.com<mailto:ellis.sparky.hoag at gmail.com>> wrote:
RFC: Lightweight Instrumentation

Hi all,

Our team at Facebook would like to propose a lightweight variant of IR
instrumentation PGO for use in the mobile space. IRPGO is a proven technology in
LLVM that can boost performance for server workloads. However, the larger binary
resulting from instrumentation significantly limits its use for mobile
applications. In this proposal, we introduce a few changes to IRPGO to reduce
the instrumented binary size, making it suitable for PGO on mobile devices.

This proposal is driven by the same need behind the earlier MIP (machine IR
profile) prototype<https://reviews.llvm.org/D104060>. But unlike MIP where
there is significant divergence from IRPGO, this proposed lightweight
instrumentation fits into the existing IRPGO framework with a few extensions to
achieve a smaller instrumented binary.

We’d like to share the new design and results from our prototype and get
feedback.

Best,
Ellis, Kyungwoo, and Wenlei
Motivation
In the mobile space, profile guided optimization can also have an outsized
impact on performance just like PGO for server workloads, but conventional
instrumentation comes with a large binary size and code size increase as high as
50%, which limits its use for mobile application for two reasons:

  *   Mobile applications are very sensitive to total binary size as larger
binaries take longer to download and use more space on devices. There could be a
hard size limit for over-the-air (OTA) updates for this reason.
  *   When code (.text) size increases, it takes longer for applications to
start up and could also degrade runtime performance due to more page faults on
devices with limited RAM.
Reducing the size overhead from instrumentation would make IRPGO usable for
mobile applications so we could send instrumented binaries through OTA updates
in production environments, collect representative production profiles, and
apply PGO.
Overview
The size overhead from IRPGO mainly comes from two things: 1) metadata for
mapping raw counts back to IR/CFG, which has to stay with the binary. 2) the
increased .text size due to insertion of instrumented code and less effective
optimization after instrumentation. Two extensions are proposed to reduce the
size overhead from each of the above:

  *   We allow the use of debug info / dwarf as alternative metadata for mapping
counts to IR, aka profile correlation. Debug info is extractable from the
binary, therefore such metadata doesn’t need to be shipped to mobile devices.
Debug info has been used extensively for sampling based PGO in LLVM, so it has
reasonable quality to support profile correlation.
  *   We add the flexibility to allow coarse grained instrumentation that: 1)
only insert probes at function entry instead of each block (or blocks decided by
MST placement); 2) optional coverage mode using one byte booleans in addition to
today’s counting mode using 8 byte counters.
The extensions offer a spectrum of trade-off choices from the most accurate PGO
to something very lightweight that can be used in mobile space. With debug info
extracted and using function entry coverage mode, the size increase can be
reduced from close to 50% down to below 5% (measured with clang self-build PGO).
Extractable Metadata
With today’s IRPGO, the instrumentation runtime dumps out a profraw profile at
the end of training. The runtime creates a header and appends data from the
__llvm_prf_data, __llvm_prf_cnts, and __llvm_prf_names sections to create a
profraw profile. The __llvm_prf_data section contains references to each
function’s profile data (in __llvm_prf_cnts) and name (in __llvm_prf_names) so
they are needed to correlate profile data to the functions they instrument.

Some kind of metadata to correlate counts back to IR (specifically CFG blocks)
is unavoidable. One way to reduce binary size is to make such metadata
extractable so they don’t have to be shipped to mobile devices. We could make
__llvm_prf_data and __llvm_prf_names extractable, but the cost will be
non-trivial and it will be a breaking change. On the other hand, debug info is
extractable from binary and it already does a very good job of maintaining
mapping between address and source location / symbols. Sample PGO depends
entirely on debug info for profile correlation. So we picked debug info as the
alternative for extractable metadata.

In our proposed instrumentation, we create a special global struct, e.g.,
__profc__Z3foov, to hold counters for a particular function. The __llvm_prf_cnts
data section holds all of these structs and serves as placeholder for raw
profile counters. In our final instrumented binary, we only have probe
instructions and raw profile data without any instrumentation metadata, i.e.,
there are no __llvm_prf_names or __llvm_prf_data sections but we still have a
__llvm_prf_cnts section. At runtime, we dump the __llvm_prf_cnts section to a
file without any processing after profiling. To differentiate from IRPGO, the
output from runtime is called proflite and we can add another VARIANT_MASK_ flag
to the Version field of the profile header. At llvm-profdata post-processing
time, we use debug info to correlate our raw profile data as follows. First we
identify an instrumented function and look for its special global struct that
holds counters (__profc__Z3foov) in the debug info. The debug info can tell us
the address of that symbol in the binary and we can compute its offset from the
__llvm_prof_cnts section. Then we can use that offset to read the function entry
and block counters from the proflite file. Finally we populate profdata output
for each function following the existing format.

Value profile is not going to be supported with extractable metadata right now,
though we believe it can also be added following a similar scheme.

To improve debug info quality for profile correlation,
-fdebug-info-for-profiling from AutoFDO can be used. Additionally, we could also
use pseudo-probe from
CSSPGO<https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s/m/hBrJaOWVAwAJ>
as the alternative metadata which is also fully extractable.

We propose a new flag
-fprofile-generate-correlate=[profdata|debug-info|pseudo-probe] to choose what
metadata to use for profile correlation. Either we correlate with today’s IRPGO
metadata and keep them in their own sections (__llvm_prf_data and
__llvm_prf_names), with debug info, or with pseudo-probe.
Coarse-grained Instrumentation
In addition to reducing metadata size ( __llvm_prf_names and __llvm_prf_data),
we can also tune down .text size and __llvm_prf_cnts size. We do this by 1) only
instrumenting function entries instead of each block and 2) lowering precision
by tracking single byte coverage data rather than 8 byte counters. This is a
trade-off between profile quality and binary size.

Function profile vs block profile and counting mode vs coverage mode can all be
selected independently using our proposed flag
-fprofile-generate-mode=[func-cov|block-cov|func-cnt|block-cnt], and they can
work with both extractable metadata as well as IRPGO‘s correlation method.
func-cov and block-cov use single byte booleans for coverage data while func-cnt
and block-cnt use 8 byte counters. block-cnt represents today’s IRPGO which is
the default.

When using a profile generated from modes other than block-cnt, additional
profile inference is needed before the counts can be consumed by optimizations.
Such inference is done during profile loading and so it’s transparent to
optimizations.

  *   For block coverage mode, we will use coverage info to seed block count
inference, and leverage static branch probability at the same time to produce a
CFG profile that honors zero count blocks and converts live block coverage data
into synthetic counts.
  *   For function count mode, we will derived a CFG profile entirely from
static branch probability, then scale the CFG profile based on function entry
count.
  *   Function coverage mode is handled similar to function count mode. For
covered/live functions, we will derived a CFG profile entirely from static
branch probability first, then scale that CFG profile by a constant.
Experiments showed that even with coarse-grained function entry profiles, mobile
application can still benefit from PGO. But the smaller binary make it possible
for mobile to use PGO.
Workflow
Since these are extensions that share the same underlying PGO framework, the
workflow for lightweight PGO is very similar to existing IRPGO.
The diagram below has the PGO workflow today (shown in red) in comparison with
the workflow for lightweight instrumentation (shown in green). We first create
an instrumentation build that produces a raw profile at runtime. Then we use the
llvm-profdata tool to convert that raw profile to a profile that the compiler
can consume in the PGO build. The main difference for lightweight
instrumentation is that we create an instrumentation build with debug info and
we use that debug info to create our final profile.

[cid:image001.png at 01D7C425.671F0840]
Prototype & Results
We have a proof of
concept<https://github.com/ellishg/llvm-project/commits/instr-correlate-debug-info>
using dwarf as the extractable metadata and single byte function coverage
instrumentation. We measured code size by building Clang with and without
instrumentation using -Oz and no value profiling. Our lightweight instrumented
Clang binary is only +4 MB (+3.48%) larger than a non-instrumented binary. We
compare this with today’s PGO instrumentation Clang binary which is +54 MB
(+46.96%) larger. If we used debug info to correlate normal instrumentation
(without value profiling) instead of just function coverage then we would expect
to see an overhead of +43.2 MB (+37.5%). We don’t have performance data on clang
experiments using the prototype since not all components are implemented.
However, an alternative implementation earlier (similar to MIP) delivered good
performance boost for mobile applications.

[cid:image002.jpg at 01D7C425.671F0840]

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211018/ceb2a1a8/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 230545 bytes
Desc: image001.png
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211018/ceb2a1a8/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 310108 bytes
Desc: image002.jpg
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211018/ceb2a1a8/attachment-0001.jpg>

Ellis Hoag via llvm-dev

2021-Oct-18 20:50 UTC

head link

[llvm-dev] [InstrProfiling] Lightweight Instrumentation

Hi David,

Thanks for the feedback! I'm happy to see interest in improving
instrumentation size in upstream LLVM.

As for detecting CFG mismatches with __llvm_prf_data, we can easily store
the cfg hash in Dwarf using the DW_TAG_LLVM_annotation attribute which can
store any link-time constant. This value would be included final
default.profdata profile without any format change, so the compile would
consume it in the same way.

Ellis

On Mon, Oct 18, 2021 at 12:30 PM Xinliang David Li <davidxl at google.com>
wrote:
> Hi Ellis, thanks for the proposal. Improving the usability of
> Instrumentation PGO is indeed very important.
>
> From the results data below, if I understand correctly, the main savings
> are from supporting the coverage mode (using boolean counters), right? If
> we only enable that, the meta data based IRPGO clang size will be 10 MB
> larger (__llvm_prf_names are strippible or easily doable).
>
> About __llvm_prof_data -- it also serves the purpose of detecting CFG
> mismatch (with cfg hashing). On the other hand, about half of the size is
> used for value profiling purposes, so for coverage mode when value profile
> is not needed, its size can be cut in half -- leaving the total overhead to
> be roughly 7 MiB, very close to the debug info based matching scheme.
>
> I support the proposal related to different profiling modes (entry only,
> boolean counter).  I suggest having those features upstreamed. In addition,
> changes that can reduce existing IRPGO size (e.g, strippable name section,
> reduced __llvm_prof_data) are also very welcome. After those are done, we
> will have a better idea if the size is still an issue (with a better
> comparison with the debug info based method).
>
> thanks,
>
> David
>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211018/ba9cd100/attachment.html>

Xinliang David Li via llvm-dev

2021-Oct-18 21:20 UTC

head link

[llvm-dev] [InstrProfiling] Lightweight Instrumentation

On Mon, Oct 18, 2021 at 1:38 PM Wenlei He <wenlei at fb.com> wrote:
> Thanks for the feedback, David.
>
>
>
> You’re right that most of the savings comes from coarse-grained
> instrumentation. However, the situation we’re facing for mobile (and also
> embedded systems) comes with very tight size constraints. Some components
> are already built with -Oz, and we’re constantly on the lookout for extra
> MiB to save so more “features” can get into the components.
>
>
>
> For clang self-build example, 7M overhead is much better than 50M+, and
> 50M->7M indeed look close to 50M->4M as improvements. But comparing
to
> non-PGO, this is still +7M vs +4M. The extra 3M is considered quite
> significant, and could potentially be a deal breaker for some cases.
>
>
>
> In short, this is “close” as you mentioned, but not good enough still.
> Using dwarf as metadata also has a few benefits over tweaking existing
> metadata to be extractable: it’s less intrusive, and it’s also a more
> standardized metadata comparing to PGO’s own metadata.
>
While dwarf is a standard way of program annotation, using it for
instrumentation PGO does mean an additional dependency (instead of being
self contained).

This proposal requires debug_type info to be emitted, right? What is the
object size and compile time overhead? If this can be trimmed, it is a
reasonable way to emit the profile data mapping information at compile time.

David

>
>
> Thanks,
>
> Wenlei
>
>
>
> *From: *llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of
Xinliang
> David Li via llvm-dev <llvm-dev at lists.llvm.org>
> *Date: *Monday, October 18, 2021 at 1:14 PM
> *To: *Ellis Hoag <ellis.sparky.hoag at gmail.com>
> *Cc: *llvm-dev <llvm-dev at lists.llvm.org>
> *Subject: *Re: [llvm-dev] [InstrProfiling] Lightweight Instrumentation
>
> Hi Ellis, thanks for the proposal. Improving the usability of
> Instrumentation PGO is indeed very important.
>
>
>
> From the results data below, if I understand correctly, the main savings
> are from supporting the coverage mode (using boolean counters), right? If
> we only enable that, the meta data based IRPGO clang size will be 10 MB
> larger (__llvm_prf_names are strippible or easily doable).
>
>
>
> About __llvm_prof_data -- it also serves the purpose of detecting CFG
> mismatch (with cfg hashing). On the other hand, about half of the size is
> used for value profiling purposes, so for coverage mode when value profile
> is not needed, its size can be cut in half -- leaving the total overhead to
> be roughly 7 MiB, very close to the debug info based matching scheme.
>
>
>
> I support the proposal related to different profiling modes (entry only,
> boolean counter).  I suggest having those features upstreamed. In addition,
> changes that can reduce existing IRPGO size (e.g, strippable name section,
> reduced __llvm_prof_data) are also very welcome. After those are done, we
> will have a better idea if the size is still an issue (with a better
> comparison with the debug info based method).
>
>
>
> thanks,
>
>
>
> David
>
>
>
>
>
>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211018/369974fe/attachment.html>

llvm dev - Oct 2021 - [InstrProfiling] Lightweight Instrumentation

[llvm-dev] [InstrProfiling] Lightweight Instrumentation

[llvm-dev] [InstrProfiling] Lightweight Instrumentation

[llvm-dev] [InstrProfiling] Lightweight Instrumentation

[llvm-dev] [InstrProfiling] Lightweight Instrumentation