thr3ads.net - llvm dev - [llvm-dev] RFC: Reducing Instr PGO size overhead [Oct 2015]

If this information is useful, please help other people find it:
Share via:

Xinliang David Li via llvm-dev

2015-Oct-08 06:12 UTC

[llvm-dev] RFC: Reducing Instr PGO size overhead

There is no further response to this, so I will assume general
direction of solution-3 is acceptable ;)

Solution-3 can be further improved. Instead of using static symbol
table (with zero size __llvm_prf_nm symbols) to store function names
for profile display and coverage mapping, the function names can be
stored compressed in a non-allocatable section. The compression ratio
for function name strings can be very high (~5x).  The covmapping data
can also be made non-allocatable.

thanks,

David

On Tue, Sep 29, 2015 at 9:52 AM, Xinliang David Li <davidxl at google.com>
wrote:> Sorry for the late update. I finally found time to implement a solution
> (Solution-3) that has the best size savings (for both PGO and coverage
> testing) with symbolic information available.  Here is a re-cap of what we
> have discussed so far:
>
> Solution-1:
>
> This is the original solution proposed. In this solution, A function
name's
> MD5 hash is used as the primary key (combined with IR hash) for function
> lookup. __llvm_prf_names section won't be emitted into the binary nor
dumped
> into the profile data unless -fcoverage-mapping is also specified.
>
>     Pros:
>         1. maximal reduction of instrumentation binary process image size
>         2. maximal reduction of object and unstripped binary size
>         3. maximal reduction of raw profile data size
>         4. maximal reduction of indexed profile data size
>
>    Cons:
>         1. -fcoverage-mapping use case is not handled -- the size problem
> still exist
>         2. profile dump with llvm-profdata no longer have function names
> associated -- user needs to use postprocessing tool to get the
functionality
>         3. function filtering with partial function name is not supported
>         4. Requires incompatible format change
>
>
> Solution-2: (http://reviews.llvm.org/D12715)
>
> In this solution, the MD5 hash string is used to replace the raw name
string
>       Pros:
>           1. Very simple to implement
>           2. have good reduction of all sizes for typical large C++
> applications
>           3. No profile data format change is required.
>
>       Cons:
>           1. Still requires 16byte overhead per-function -- can actually
> hurt C programs
>           2. -fcoverage-mapping use case is still not handled
>           3. The problem with llvm-profdata still exists (no symbolic info,
> partial filtering support)
>
>
> Solution-3:
>
> This is the new solution I am proposing. It is basically an enhancement of
> Solution-1 with most of the weakness resolved.  The difference with
> Solution-1 is
>       1. Function name symbols are emitted into the symbol table as weak
> externs. They don't occupy any space at runtime and can be easily
stripped.
>       2. -fcoverage-mapping does not need special handling -- it
> automatically benefit from the same size saving.
>       3. llvm-cov is changed to read symbol info from the symtab instead of
> reading them from the name section data
>       4. llvm-profdata is enhanced to take a binary as input and dump
> profile with names attached. Function filtering is fully supported (option
> can also be introduced to force dumping names into binary and profile data,
> so that llvm-profdata use case is not changed at all).
>
> Pros:
>        1. All the pros from Solution-1
>        2. Size savings for coverage-mapping case
> Cons:
>        Format change is required for profile data and coverage mapping.
>
> The initial patch is here: http://reviews.llvm.org/D13251
>
> With this patch, the size of a release clang binary with coverage mapping
is
> reduced from 986M to 569M.
>
> If there are no major concerns,  I will carve out the patch into smaller
> ones for review.
>
> thanks,
>
> David
>
>
>
>
>
>
>
> On Tue, Sep 8, 2015 at 3:47 PM, Xinliang David Li <davidxl at
google.com>
> wrote:
>>
>> >> >>
>> >> >> yes -- it is fixed length (8byte) blob which may
include null byte
>> >> >> in
>> >> >> the middle.
>> >> >
>> >> >
>> >> > For reference, MD5 sum is 16 bytes (128-bit):
>> >> > https://en.wikipedia.org/wiki/MD5
>> >>
>> >> yes, LLVM's MD5 hash only takes the lower 64bit.
>> >>
>> >>
>> >> >
>> >> >>
>> >> >>
>> >> >> >
>> >> >> > Or to say it another way, suppose that Itanium
mangling required
>> >> >> > as a
>> >> >> > final
>> >> >> > step to replace the string with its md5 sum in
hex. Therefore all
>> >> >> > symbol
>> >> >> > names are "small". My understanding is
that this is effectively
>> >> >> > all
>> >> >> > your
>> >> >> > patch is doing.
>> >> >>
>> >> >> The key type before the change is StringRef, while
the after the key
>> >> >> type is uint64_t. Are you suggesting treating
uint64_t md5 sum key
>> >> >> as
>> >> >> a string of 8 bytes or storing md5 has in text form
which will
>> >> >> double
>> >> >> the size?
>> >> >
>> >> >
>> >> > How much does this change the benefit? If most of the
benefit is
>> >> > avoiding
>> >> > extraordinarily long mangled names then it may be
sufficient.
>> >> >
>> >> > With IR-level instrumentation like Rong is pursuing the
size may be
>> >> > reduced
>> >> > sufficiently that we do not need the optimization
proposed in this
>> >> > thread.
>> >> > For example, Rong found >2x size reduction on
Google's C++
>> >> > benchmarks,
>> >> > which
>> >> > I assume are representative of the extremely large Google
binaries
>> >> > that
>> >> > are
>> >> > causing the problems addressed by your proposal in this
thread. The
>> >> > measurements you mention for Clang in this thread provide
similar
>> >> > size
>> >> > reductions, so Rong's approach may be sufficient
(especially because
>> >> > functions with extremely large mangled names tend to be
small inline
>> >> > functions in header-only template libraries).
>> >>
>> >> Late instrumentation helps many cases. In some cases (as shown
in
>> >> SPEC), the reduction in size is not as large. Reducing PGO
overhead
>> >> will lower the bar for its adoption.
>> >>
>> >> >
>> >> > Of the points you mention in "Large size of overhead
can limit the
>> >> > usability
>> >> > of PGO greatly", many of the issues are hard limits
that prevent the
>> >> > use
>> >> > of
>> >> > PGO. Do you have a lower bound on how much the size of
the PGO data
>> >> > must
>> >> > be
>> >> > reduced in order to overcome the hard limits?
>> >>
>> >> This is a static view:  Think about the situation where
application
>> >> size is ever increasing; also think about situation where we
want to
>> >>
>> >> collect more types of profile data. Think about situation
where user
>> >> want to run pgo binaries on small devices with tiny
memory/storage ..
>> >
>> >
>> > If we want to reduce memory overhead at runtime and reduce the
size of
>> > the
>> > raw profile data extracted from the target, there are clear
solutions.
>> > Consider that debug info does not need to be loaded into the
memory
>> > image of
>> > the target; why should information identifying each counter need
to be?
>> > A file containing raw profile counters is a subset of a core dump;
in
>> > most
>> > environments, a core dump does not need to have debug info or
symbol
>> > names
>> > in it, but can be still be read in full detail in conjunction with
the
>> > original binary.
>>
>> Yes -- there are many alternatives:
>> 1) emit the name key mapping as a side data at compile time, or
>> 2) emit them into nonloadable sections of the object file.
>>
>> Compared with the above, LLVM's existing design does have its own
>> advantage -- making it easier for tool to access 'debug' info
for
>> counters.
>>
>> LLVM's coverage testing, on the other hand, take a hybrid approach:
It
>> emits the coverage map as rodata, but does not pass it to the profile
>> dumper. I think it is better to emit covmap as a side data not
>> attached to target binary.
>>
>>
>> >
>> > Thus, as we require that the binary be passed to llvm-profdata,
there is
>> > no
>> > fundamental reason that the memory image of the program, or the
raw data
>> > extracted from the program, must have any size overhead besides
the raw
>> > values of the counters themselves and any text size increase for
>> > incrementing them. If we are willing to impose this requirement on
>> > users,
>> > then as far as reducing memory overhead at runtime and reducing
the size
>> > of
>> > the raw profile data extracted from the target, using hashed
function
>> > names
>> > is clearly the wrong direction.
>> >
>> > *Without* imposing the requirement of passing the binary to
>> > llvm-profdata, I
>> > do like the ability to use hashed function names like you are
proposing.
>> > It
>> > is a simple solution for reducing size overhead of function name
strings
>> > with little complexity, as it is just swapping one string for
another.
>>
>> Agree. The good news is that the overhead of hashed function names is
>> small enough that makes this approach attractive.
>>
>> thanks,
>>
>> David
>> >
>> >>
>> >>
>> >> >
>> >> > Obviously LLVM must be able to support the extremely
large binaries
>> >> > in
>> >> > your
>> >> > configuration (otherwise what use is LLVM as a compiler
;) My
>> >> > questions
>> >> > are
>> >> > primarily aimed at establishing which tradeoffs are
acceptable for
>> >> > supporting this (both for LLVM and for you guys).
>> >>
>> >> As I said, with the modified proposal (after getting your
feedback),
>> >> no PGO users in LLVM land is going to lose
anything/functionality. The
>> >> end result will be net win for general users of LLVM (even
though your
>> >> customers don't care about it), not just 'us' as
you have mentioned
>> >> many times.
>> >>
>> >> >
>> >> > Btw, for us, the issue of PGO data size is not completely
immaterial
>> >> > but
>> >> > is
>> >> > very different from your use case. For us, the primary
issue is the
>> >> > additional memory use at run time, since PS4 games
usually use "all"
>> >> > available memory. We had a problem with UBSan where the
large amount
>> >> > of
>> >> > memory required for storing the UBSan diagnostic data at
runtime
>> >> > required
>> >> > the game programmers to manually change their memory map
to make
>> >> > room.
>> >> > +Filipe, do you remember how much memory UBSan was using
that caused
>> >> > a
>> >> > problem?
>> >> >
>> >>
>> >> My proposal does help reducing rodata size significantly.
>> >
>> >
>> > Yes, that is why I think that this is a useful thing to do. I just
want
>> > to
>> > be careful about existing use cases and the relevant workflow
issues.
>> >
>> > -- Sean Silva
>> >
>> >>
>> >>
>> >> David
>> >>
>> >>
>> >> > -- Sean Silva
>> >> >
>> >> >>
>> >> >>
>> >> >> In the raw format, md5 sum key can be an embedded
field in the
>> >> >> prf_data variable instead of as different var
referenced by
>> >> >> prf_data.
>> >> >>
>> >> >> >
>> >> >> > If this is not the case, you should show your
current patch so
>> >> >> > that
>> >> >> > we
>> >> >> > can
>> >> >> > discuss things concretely.
>> >> >>
>> >> >> It is not. See above about the difference.
>> >> >>
>> >> >> >
>> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> It will be
>> >> >> >> >> very messy to support multiple
formats in instr-codegen and
>> >> >> >> >> instr-runtime.  For compatibility
concerns, the reader is
>> >> >> >> >> taught
>> >> >> >> >> to
>> >> >> >> >> support previous format, but the
changes there are isolated
>> >> >> >> >> (also
>> >> >> >> >> expected to be removed in the
future).
>> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> > My primary concern is that if
the function name are not kept
>> >> >> >> >> > at
>> >> >> >> >> > all
>> >> >> >> >> > stages,
>> >> >> >> >> > then it becomes difficult to
analyze the profile data in a
>> >> >> >> >> > standalone
>> >> >> >> >> > way.
>> >> >> >> >> > Many times, I have used
`llvm-profdata show -all-functions
>> >> >> >> >> > foo.profdata`
>> >> >> >> >> > on
>> >> >> >> >> > the resulting profile data and
then imported that data into
>> >> >> >> >> > Mathematica
>> >> >> >> >> > for
>> >> >> >> >> > analysis.
>> >> >> >> >>
>> >> >> >> >> This is certainly a very valid use
case.
>> >> >> >> >>
>> >> >> >> >> >My understanding of your
proposal is that `llvm-profdata show
>> >> >> >> >> > -all-functions foo.profdata`
will not show the actual
>> >> >> >> >> > function
>> >> >> >> >> > names
>> >> >> >> >> > but
>> >> >> >> >> > instead MD5 hashes,
>> >> >> >> >>
>> >> >> >> >> Yes.
>> >> >> >> >>
>> >> >> >> >> To support your use case, there are
two solutions:
>> >> >> >> >>
>> >> >> >> >> 1) user can add -fcoverage-mapping
option in the build
>> >> >> >> >> 2) introduce a new option
-fprofile-instr-names that force the
>> >> >> >> >> emission of the name sections in
the .o file. This is similar
>> >> >> >> >> to
>> >> >> >> >> 1),
>> >> >> >> >> but no covmap section is needed.
>> >> >> >> >>
>> >> >> >> >> llvm-profdata tool  will be taught
to read the name section
>> >> >> >> >> and
>> >> >> >> >> attach
>> >> >> >> >> function names to the profile
records.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > Needing to pass the executable to
llvm-profdata would cause
>> >> >> >> > deployment
>> >> >> >> > issues for my customers in practice.
>> >> >> >>
>> >> >> >> Why? The deployment needs to pass the
profile data anyway right?
>> >> >> >
>> >> >> >
>> >> >> > Yes, but not the executable.
>> >> >> >
>> >> >> > The PGO training run is likely being run by a
gameplay tester
>> >> >> > (non-programmer). In general the binary will not
be lying around
>> >> >> > as a
>> >> >> > loose
>> >> >> > file anywhere, it will be part of a full package
of the
>> >> >> > binary+assets
>> >> >> > (think
>> >> >> > like what will end up on a bluray disc). A
game's binary
>> >> >> > *completely
>> >> >> > useless* without the assets, so except locally
on a programmer's
>> >> >> > machine
>> >> >> > while they iterate/debug, there is no reason for
a binary to ever
>> >> >> > exist
>> >> >> > as a
>> >> >> > standalone file.
>> >> >> >
>> >> >> > I'm not saying that needing the binary is
insurmountable in any
>> >> >> > particular
>> >> >> > scenario. Just that it will cause a strict
increase in the number
>> >> >> > of
>> >> >> > issues
>> >> >> > to deploying PGO.
>> >> >>
>> >> >>  Your concern is acknowledged.
>> >> >>
>> >> >> >
>> >> >> > These are much bigger "compatibility
concerns" for me than for
>> >> >> > newer
>> >> >> > toolchains to accept the old format. For a
change in format I can
>> >> >> > easily
>> >> >> > tell my users to replace an exe with a newer one
and that is all
>> >> >> > they
>> >> >> > need
>> >> >> > to do and it takes 10 seconds, guaranteed. A
workflow change is
>> >> >> > potentially
>> >> >> > a massive disruption and guaranteed to take more
than 10 seconds
>> >> >> > to
>> >> >> > fix
>> >> >> > (perhaps hours or days).
>> >> >>
>> >> >> ok.
>> >> >>
>> >> >> >
>> >> >> >
>> >> >> >>
>> >> >> >>   This
>> >> >> >> is no different from llvm-cov usage model.
>> >> >> >
>> >> >> >
>> >> >> > In practice, getting the performance of PGO is a
higher priority
>> >> >> > for
>> >> >> > my
>> >> >> > users, so we should not assume that llvm-cov is
being used.
>> >> >>
>> >> >> Glad to hear that :)
>> >> >>
>> >> >> thanks,
>> >> >>
>> >> >> David
>> >> >>
>> >> >> >
>> >> >> > -- Sean Silva
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> David
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> Note that with 1) or 2), the user
can still benefit from the
>> >> >> >> >> reduced
>> >> >> >> >> profile size.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > Let me reiterate that the size of the
profile is not a problem
>> >> >> >> > I
>> >> >> >> > have
>> >> >> >> > observed in practice (nor have I heard
of this being a problem
>> >> >> >> > in
>> >> >> >> > practice
>> >> >> >> > until this thread). Therefore I'm
skeptical of any changes to
>> >> >> >> > our
>> >> >> >> > default
>> >> >> >> > behavior or any new requirements that
are not opt-in.
>> >> >> >> >
>> >> >> >> > -- Sean Silva
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> thanks,
>> >> >> >> >>
>> >> >> >> >> David
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> >which will make it more
difficult for me to do this kind
>> >> >> >> >> > of analysis (would require
using nm on the original binary,
>> >> >> >> >> > hashing
>> >> >> >> >> > everything, etc.).
>> >> >> >> >> >
>> >> >> >> >> > btw, feel free to attach the
patch even if it in a rough
>> >> >> >> >> > state.
>> >> >> >> >> > It
>> >> >> >> >> > can
>> >> >> >> >> > still
>> >> >> >> >> > help to clarify the proposal
and be a good talking point.
>> >> >> >> >> > Fine-grained
>> >> >> >> >> > patch
>> >> >> >> >> > review for caring about the
rough parts will happen on
>> >> >> >> >> > llvm-commits;
>> >> >> >> >> > the
>> >> >> >> >> > rough parts will not distract
the discussion here on
>> >> >> >> >> > llvm-dev.
>> >> >> >> >> >
>> >> >> >> >> > -- Sean Silva
>> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> thanks,
>> >> >> >> >> >>
>> >> >> >> >> >> David
>> >> >> >> >> >>
_______________________________________________
>> >> >> >> >> >> LLVM Developers mailing
list
>> >> >> >> >> >> llvm-dev at lists.llvm.org
>> >> >> >> >> >>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>
>

Sean Silva via llvm-dev

2015-Oct-09 22:58 UTC

head link

[llvm-dev] RFC: Reducing Instr PGO size overhead

On Wed, Oct 7, 2015 at 11:12 PM, Xinliang David Li <davidxl at google.com>
wrote:
> There is no further response to this, so I will assume general
> direction of solution-3 is acceptable ;)
>
No response does not mean "LGTM".

-- Sean Silva

>
> Solution-3 can be further improved. Instead of using static symbol
> table (with zero size __llvm_prf_nm symbols) to store function names
> for profile display and coverage mapping, the function names can be
> stored compressed in a non-allocatable section. The compression ratio
> for function name strings can be very high (~5x).  The covmapping data
> can also be made non-allocatable.
>
> thanks,
>
> David
>
> On Tue, Sep 29, 2015 at 9:52 AM, Xinliang David Li <davidxl at
google.com>
> wrote:
> > Sorry for the late update. I finally found time to implement a
solution
> > (Solution-3) that has the best size savings (for both PGO and coverage
> > testing) with symbolic information available.  Here is a re-cap of
what
> we
> > have discussed so far:
> >
> > Solution-1:
> >
> > This is the original solution proposed. In this solution, A function
> name's
> > MD5 hash is used as the primary key (combined with IR hash) for
function
> > lookup. __llvm_prf_names section won't be emitted into the binary
nor
> dumped
> > into the profile data unless -fcoverage-mapping is also specified.
> >
> >     Pros:
> >         1. maximal reduction of instrumentation binary process image
size
> >         2. maximal reduction of object and unstripped binary size
> >         3. maximal reduction of raw profile data size
> >         4. maximal reduction of indexed profile data size
> >
> >    Cons:
> >         1. -fcoverage-mapping use case is not handled -- the size
problem
> > still exist
> >         2. profile dump with llvm-profdata no longer have function
names
> > associated -- user needs to use postprocessing tool to get the
> functionality
> >         3. function filtering with partial function name is not
supported
> >         4. Requires incompatible format change
> >
> >
> > Solution-2: (http://reviews.llvm.org/D12715)
> >
> > In this solution, the MD5 hash string is used to replace the raw name
> string
> >       Pros:
> >           1. Very simple to implement
> >           2. have good reduction of all sizes for typical large C++
> > applications
> >           3. No profile data format change is required.
> >
> >       Cons:
> >           1. Still requires 16byte overhead per-function -- can
actually
> > hurt C programs
> >           2. -fcoverage-mapping use case is still not handled
> >           3. The problem with llvm-profdata still exists (no symbolic
> info,
> > partial filtering support)
> >
> >
> > Solution-3:
> >
> > This is the new solution I am proposing. It is basically an
enhancement
> of
> > Solution-1 with most of the weakness resolved.  The difference with
> > Solution-1 is
> >       1. Function name symbols are emitted into the symbol table as
weak
> > externs. They don't occupy any space at runtime and can be easily
> stripped.
> >       2. -fcoverage-mapping does not need special handling -- it
> > automatically benefit from the same size saving.
> >       3. llvm-cov is changed to read symbol info from the symtab
instead
> of
> > reading them from the name section data
> >       4. llvm-profdata is enhanced to take a binary as input and dump
> > profile with names attached. Function filtering is fully supported
> (option
> > can also be introduced to force dumping names into binary and profile
> data,
> > so that llvm-profdata use case is not changed at all).
> >
> > Pros:
> >        1. All the pros from Solution-1
> >        2. Size savings for coverage-mapping case
> > Cons:
> >        Format change is required for profile data and coverage
mapping.
> >
> > The initial patch is here: http://reviews.llvm.org/D13251
> >
> > With this patch, the size of a release clang binary with coverage
> mapping is
> > reduced from 986M to 569M.
> >
> > If there are no major concerns,  I will carve out the patch into
smaller
> > ones for review.
> >
> > thanks,
> >
> > David
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Sep 8, 2015 at 3:47 PM, Xinliang David Li <davidxl at
google.com>
> > wrote:
> >>
> >> >> >>
> >> >> >> yes -- it is fixed length (8byte) blob which may
include null byte
> >> >> >> in
> >> >> >> the middle.
> >> >> >
> >> >> >
> >> >> > For reference, MD5 sum is 16 bytes (128-bit):
> >> >> > https://en.wikipedia.org/wiki/MD5
> >> >>
> >> >> yes, LLVM's MD5 hash only takes the lower 64bit.
> >> >>
> >> >>
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> >
> >> >> >> > Or to say it another way, suppose that
Itanium mangling required
> >> >> >> > as a
> >> >> >> > final
> >> >> >> > step to replace the string with its md5 sum
in hex. Therefore
> all
> >> >> >> > symbol
> >> >> >> > names are "small". My
understanding is that this is effectively
> >> >> >> > all
> >> >> >> > your
> >> >> >> > patch is doing.
> >> >> >>
> >> >> >> The key type before the change is StringRef,
while the after the
> key
> >> >> >> type is uint64_t. Are you suggesting treating
uint64_t md5 sum key
> >> >> >> as
> >> >> >> a string of 8 bytes or storing md5 has in text
form which will
> >> >> >> double
> >> >> >> the size?
> >> >> >
> >> >> >
> >> >> > How much does this change the benefit? If most of
the benefit is
> >> >> > avoiding
> >> >> > extraordinarily long mangled names then it may be
sufficient.
> >> >> >
> >> >> > With IR-level instrumentation like Rong is pursuing
the size may be
> >> >> > reduced
> >> >> > sufficiently that we do not need the optimization
proposed in this
> >> >> > thread.
> >> >> > For example, Rong found >2x size reduction on
Google's C++
> >> >> > benchmarks,
> >> >> > which
> >> >> > I assume are representative of the extremely large
Google binaries
> >> >> > that
> >> >> > are
> >> >> > causing the problems addressed by your proposal in
this thread. The
> >> >> > measurements you mention for Clang in this thread
provide similar
> >> >> > size
> >> >> > reductions, so Rong's approach may be sufficient
(especially
> because
> >> >> > functions with extremely large mangled names tend to
be small
> inline
> >> >> > functions in header-only template libraries).
> >> >>
> >> >> Late instrumentation helps many cases. In some cases (as
shown in
> >> >> SPEC), the reduction in size is not as large. Reducing
PGO overhead
> >> >> will lower the bar for its adoption.
> >> >>
> >> >> >
> >> >> > Of the points you mention in "Large size of
overhead can limit the
> >> >> > usability
> >> >> > of PGO greatly", many of the issues are hard
limits that prevent
> the
> >> >> > use
> >> >> > of
> >> >> > PGO. Do you have a lower bound on how much the size
of the PGO data
> >> >> > must
> >> >> > be
> >> >> > reduced in order to overcome the hard limits?
> >> >>
> >> >> This is a static view:  Think about the situation where
application
> >> >> size is ever increasing; also think about situation where
we want to
> >> >>
> >> >> collect more types of profile data. Think about situation
where user
> >> >> want to run pgo binaries on small devices with tiny
memory/storage ..
> >> >
> >> >
> >> > If we want to reduce memory overhead at runtime and reduce
the size of
> >> > the
> >> > raw profile data extracted from the target, there are clear
solutions.
> >> > Consider that debug info does not need to be loaded into the
memory
> >> > image of
> >> > the target; why should information identifying each counter
need to
> be?
> >> > A file containing raw profile counters is a subset of a core
dump; in
> >> > most
> >> > environments, a core dump does not need to have debug info or
symbol
> >> > names
> >> > in it, but can be still be read in full detail in conjunction
with the
> >> > original binary.
> >>
> >> Yes -- there are many alternatives:
> >> 1) emit the name key mapping as a side data at compile time, or
> >> 2) emit them into nonloadable sections of the object file.
> >>
> >> Compared with the above, LLVM's existing design does have its
own
> >> advantage -- making it easier for tool to access 'debug'
info for
> >> counters.
> >>
> >> LLVM's coverage testing, on the other hand, take a hybrid
approach: It
> >> emits the coverage map as rodata, but does not pass it to the
profile
> >> dumper. I think it is better to emit covmap as a side data not
> >> attached to target binary.
> >>
> >>
> >> >
> >> > Thus, as we require that the binary be passed to
llvm-profdata, there
> is
> >> > no
> >> > fundamental reason that the memory image of the program, or
the raw
> data
> >> > extracted from the program, must have any size overhead
besides the
> raw
> >> > values of the counters themselves and any text size increase
for
> >> > incrementing them. If we are willing to impose this
requirement on
> >> > users,
> >> > then as far as reducing memory overhead at runtime and
reducing the
> size
> >> > of
> >> > the raw profile data extracted from the target, using hashed
function
> >> > names
> >> > is clearly the wrong direction.
> >> >
> >> > *Without* imposing the requirement of passing the binary to
> >> > llvm-profdata, I
> >> > do like the ability to use hashed function names like you are
> proposing.
> >> > It
> >> > is a simple solution for reducing size overhead of function
name
> strings
> >> > with little complexity, as it is just swapping one string for
another.
> >>
> >> Agree. The good news is that the overhead of hashed function names
is
> >> small enough that makes this approach attractive.
> >>
> >> thanks,
> >>
> >> David
> >> >
> >> >>
> >> >>
> >> >> >
> >> >> > Obviously LLVM must be able to support the extremely
large binaries
> >> >> > in
> >> >> > your
> >> >> > configuration (otherwise what use is LLVM as a
compiler ;) My
> >> >> > questions
> >> >> > are
> >> >> > primarily aimed at establishing which tradeoffs are
acceptable for
> >> >> > supporting this (both for LLVM and for you guys).
> >> >>
> >> >> As I said, with the modified proposal (after getting your
feedback),
> >> >> no PGO users in LLVM land is going to lose
anything/functionality.
> The
> >> >> end result will be net win for general users of LLVM
(even though
> your
> >> >> customers don't care about it), not just 'us'
as you have mentioned
> >> >> many times.
> >> >>
> >> >> >
> >> >> > Btw, for us, the issue of PGO data size is not
completely
> immaterial
> >> >> > but
> >> >> > is
> >> >> > very different from your use case. For us, the
primary issue is the
> >> >> > additional memory use at run time, since PS4 games
usually use
> "all"
> >> >> > available memory. We had a problem with UBSan where
the large
> amount
> >> >> > of
> >> >> > memory required for storing the UBSan diagnostic
data at runtime
> >> >> > required
> >> >> > the game programmers to manually change their memory
map to make
> >> >> > room.
> >> >> > +Filipe, do you remember how much memory UBSan was
using that
> caused
> >> >> > a
> >> >> > problem?
> >> >> >
> >> >>
> >> >> My proposal does help reducing rodata size significantly.
> >> >
> >> >
> >> > Yes, that is why I think that this is a useful thing to do. I
just
> want
> >> > to
> >> > be careful about existing use cases and the relevant workflow
issues.
> >> >
> >> > -- Sean Silva
> >> >
> >> >>
> >> >>
> >> >> David
> >> >>
> >> >>
> >> >> > -- Sean Silva
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> In the raw format, md5 sum key can be an
embedded field in the
> >> >> >> prf_data variable instead of as different var
referenced by
> >> >> >> prf_data.
> >> >> >>
> >> >> >> >
> >> >> >> > If this is not the case, you should show
your current patch so
> >> >> >> > that
> >> >> >> > we
> >> >> >> > can
> >> >> >> > discuss things concretely.
> >> >> >>
> >> >> >> It is not. See above about the difference.
> >> >> >>
> >> >> >> >
> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >>
> >> >> >> >> >> It will be
> >> >> >> >> >> very messy to support multiple
formats in instr-codegen and
> >> >> >> >> >> instr-runtime.  For
compatibility concerns, the reader is
> >> >> >> >> >> taught
> >> >> >> >> >> to
> >> >> >> >> >> support previous format, but
the changes there are isolated
> >> >> >> >> >> (also
> >> >> >> >> >> expected to be removed in the
future).
> >> >> >> >> >>
> >> >> >> >> >> >
> >> >> >> >> >> > My primary concern is
that if the function name are not
> kept
> >> >> >> >> >> > at
> >> >> >> >> >> > all
> >> >> >> >> >> > stages,
> >> >> >> >> >> > then it becomes difficult
to analyze the profile data in a
> >> >> >> >> >> > standalone
> >> >> >> >> >> > way.
> >> >> >> >> >> > Many times, I have used
`llvm-profdata show -all-functions
> >> >> >> >> >> > foo.profdata`
> >> >> >> >> >> > on
> >> >> >> >> >> > the resulting profile
data and then imported that data
> into
> >> >> >> >> >> > Mathematica
> >> >> >> >> >> > for
> >> >> >> >> >> > analysis.
> >> >> >> >> >>
> >> >> >> >> >> This is certainly a very valid
use case.
> >> >> >> >> >>
> >> >> >> >> >> >My understanding of your
proposal is that `llvm-profdata
> show
> >> >> >> >> >> > -all-functions
foo.profdata` will not show the actual
> >> >> >> >> >> > function
> >> >> >> >> >> > names
> >> >> >> >> >> > but
> >> >> >> >> >> > instead MD5 hashes,
> >> >> >> >> >>
> >> >> >> >> >> Yes.
> >> >> >> >> >>
> >> >> >> >> >> To support your use case,
there are two solutions:
> >> >> >> >> >>
> >> >> >> >> >> 1) user can add
-fcoverage-mapping option in the build
> >> >> >> >> >> 2) introduce a new option
-fprofile-instr-names that force
> the
> >> >> >> >> >> emission of the name sections
in the .o file. This is
> similar
> >> >> >> >> >> to
> >> >> >> >> >> 1),
> >> >> >> >> >> but no covmap section is
needed.
> >> >> >> >> >>
> >> >> >> >> >> llvm-profdata tool  will be
taught to read the name section
> >> >> >> >> >> and
> >> >> >> >> >> attach
> >> >> >> >> >> function names to the profile
records.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > Needing to pass the executable to
llvm-profdata would cause
> >> >> >> >> > deployment
> >> >> >> >> > issues for my customers in
practice.
> >> >> >> >>
> >> >> >> >> Why? The deployment needs to pass the
profile data anyway
> right?
> >> >> >> >
> >> >> >> >
> >> >> >> > Yes, but not the executable.
> >> >> >> >
> >> >> >> > The PGO training run is likely being run by
a gameplay tester
> >> >> >> > (non-programmer). In general the binary
will not be lying around
> >> >> >> > as a
> >> >> >> > loose
> >> >> >> > file anywhere, it will be part of a full
package of the
> >> >> >> > binary+assets
> >> >> >> > (think
> >> >> >> > like what will end up on a bluray disc). A
game's binary
> >> >> >> > *completely
> >> >> >> > useless* without the assets, so except
locally on a programmer's
> >> >> >> > machine
> >> >> >> > while they iterate/debug, there is no
reason for a binary to
> ever
> >> >> >> > exist
> >> >> >> > as a
> >> >> >> > standalone file.
> >> >> >> >
> >> >> >> > I'm not saying that needing the binary
is insurmountable in any
> >> >> >> > particular
> >> >> >> > scenario. Just that it will cause a strict
increase in the
> number
> >> >> >> > of
> >> >> >> > issues
> >> >> >> > to deploying PGO.
> >> >> >>
> >> >> >>  Your concern is acknowledged.
> >> >> >>
> >> >> >> >
> >> >> >> > These are much bigger "compatibility
concerns" for me than for
> >> >> >> > newer
> >> >> >> > toolchains to accept the old format. For a
change in format I
> can
> >> >> >> > easily
> >> >> >> > tell my users to replace an exe with a
newer one and that is all
> >> >> >> > they
> >> >> >> > need
> >> >> >> > to do and it takes 10 seconds, guaranteed.
A workflow change is
> >> >> >> > potentially
> >> >> >> > a massive disruption and guaranteed to take
more than 10 seconds
> >> >> >> > to
> >> >> >> > fix
> >> >> >> > (perhaps hours or days).
> >> >> >>
> >> >> >> ok.
> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >>
> >> >> >> >>   This
> >> >> >> >> is no different from llvm-cov usage
model.
> >> >> >> >
> >> >> >> >
> >> >> >> > In practice, getting the performance of PGO
is a higher priority
> >> >> >> > for
> >> >> >> > my
> >> >> >> > users, so we should not assume that
llvm-cov is being used.
> >> >> >>
> >> >> >> Glad to hear that :)
> >> >> >>
> >> >> >> thanks,
> >> >> >>
> >> >> >> David
> >> >> >>
> >> >> >> >
> >> >> >> > -- Sean Silva
> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> David
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> Note that with 1) or 2), the
user can still benefit from the
> >> >> >> >> >> reduced
> >> >> >> >> >> profile size.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > Let me reiterate that the size of
the profile is not a
> problem
> >> >> >> >> > I
> >> >> >> >> > have
> >> >> >> >> > observed in practice (nor have I
heard of this being a
> problem
> >> >> >> >> > in
> >> >> >> >> > practice
> >> >> >> >> > until this thread). Therefore
I'm skeptical of any changes to
> >> >> >> >> > our
> >> >> >> >> > default
> >> >> >> >> > behavior or any new requirements
that are not opt-in.
> >> >> >> >> >
> >> >> >> >> > -- Sean Silva
> >> >> >> >> >
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> thanks,
> >> >> >> >> >>
> >> >> >> >> >> David
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> >which will make it more
difficult for me to do this kind
> >> >> >> >> >> > of analysis (would
require using nm on the original
> binary,
> >> >> >> >> >> > hashing
> >> >> >> >> >> > everything, etc.).
> >> >> >> >> >> >
> >> >> >> >> >> > btw, feel free to attach
the patch even if it in a rough
> >> >> >> >> >> > state.
> >> >> >> >> >> > It
> >> >> >> >> >> > can
> >> >> >> >> >> > still
> >> >> >> >> >> > help to clarify the
proposal and be a good talking point.
> >> >> >> >> >> > Fine-grained
> >> >> >> >> >> > patch
> >> >> >> >> >> > review for caring about
the rough parts will happen on
> >> >> >> >> >> > llvm-commits;
> >> >> >> >> >> > the
> >> >> >> >> >> > rough parts will not
distract the discussion here on
> >> >> >> >> >> > llvm-dev.
> >> >> >> >> >> >
> >> >> >> >> >> > -- Sean Silva
> >> >> >> >> >> >
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> thanks,
> >> >> >> >> >> >>
> >> >> >> >> >> >> David
> >> >> >> >> >> >>
_______________________________________________
> >> >> >> >> >> >> LLVM Developers
mailing list
> >> >> >> >> >> >> llvm-dev at
lists.llvm.org
> >> >> >> >> >> >>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >
> >> >> >
> >> >
> >> >
> >
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151009/90eb4251/attachment.html>

Xinliang David Li via llvm-dev

2015-Oct-09 23:06 UTC

head link

[llvm-dev] RFC: Reducing Instr PGO size overhead

On Fri, Oct 9, 2015 at 3:58 PM, Sean Silva <chisophugis at gmail.com>
wrote:>
>
> On Wed, Oct 7, 2015 at 11:12 PM, Xinliang David Li <davidxl at
google.com>
> wrote:
>>
>> There is no further response to this, so I will assume general
>> direction of solution-3 is acceptable ;)
> No response does not mean "LGTM".
>
What I meant is that the discussion can be moved on to the formal code
review. I have not yet submitted the final patch for review yet.
Before that is ready, continue using this thread to voice your
concerns.

David



> -- Sean Silva
>
>>
>>
>> Solution-3 can be further improved. Instead of using static symbol
>> table (with zero size __llvm_prf_nm symbols) to store function names
>> for profile display and coverage mapping, the function names can be
>> stored compressed in a non-allocatable section. The compression ratio
>> for function name strings can be very high (~5x).  The covmapping data
>> can also be made non-allocatable.
>>
>> thanks,
>>
>> David
>>
>> On Tue, Sep 29, 2015 at 9:52 AM, Xinliang David Li <davidxl at
google.com>
>> wrote:
>> > Sorry for the late update. I finally found time to implement a
solution
>> > (Solution-3) that has the best size savings (for both PGO and
coverage
>> > testing) with symbolic information available.  Here is a re-cap of
what
>> > we
>> > have discussed so far:
>> >
>> > Solution-1:
>> >
>> > This is the original solution proposed. In this solution, A
function
>> > name's
>> > MD5 hash is used as the primary key (combined with IR hash) for
function
>> > lookup. __llvm_prf_names section won't be emitted into the
binary nor
>> > dumped
>> > into the profile data unless -fcoverage-mapping is also specified.
>> >
>> >     Pros:
>> >         1. maximal reduction of instrumentation binary process
image
>> > size
>> >         2. maximal reduction of object and unstripped binary size
>> >         3. maximal reduction of raw profile data size
>> >         4. maximal reduction of indexed profile data size
>> >
>> >    Cons:
>> >         1. -fcoverage-mapping use case is not handled -- the size
>> > problem
>> > still exist
>> >         2. profile dump with llvm-profdata no longer have function
names
>> > associated -- user needs to use postprocessing tool to get the
>> > functionality
>> >         3. function filtering with partial function name is not
>> > supported
>> >         4. Requires incompatible format change
>> >
>> >
>> > Solution-2: (http://reviews.llvm.org/D12715)
>> >
>> > In this solution, the MD5 hash string is used to replace the raw
name
>> > string
>> >       Pros:
>> >           1. Very simple to implement
>> >           2. have good reduction of all sizes for typical large
C++
>> > applications
>> >           3. No profile data format change is required.
>> >
>> >       Cons:
>> >           1. Still requires 16byte overhead per-function -- can
actually
>> > hurt C programs
>> >           2. -fcoverage-mapping use case is still not handled
>> >           3. The problem with llvm-profdata still exists (no
symbolic
>> > info,
>> > partial filtering support)
>> >
>> >
>> > Solution-3:
>> >
>> > This is the new solution I am proposing. It is basically an
enhancement
>> > of
>> > Solution-1 with most of the weakness resolved.  The difference
with
>> > Solution-1 is
>> >       1. Function name symbols are emitted into the symbol table
as weak
>> > externs. They don't occupy any space at runtime and can be
easily
>> > stripped.
>> >       2. -fcoverage-mapping does not need special handling -- it
>> > automatically benefit from the same size saving.
>> >       3. llvm-cov is changed to read symbol info from the symtab
instead
>> > of
>> > reading them from the name section data
>> >       4. llvm-profdata is enhanced to take a binary as input and
dump
>> > profile with names attached. Function filtering is fully supported
>> > (option
>> > can also be introduced to force dumping names into binary and
profile
>> > data,
>> > so that llvm-profdata use case is not changed at all).
>> >
>> > Pros:
>> >        1. All the pros from Solution-1
>> >        2. Size savings for coverage-mapping case
>> > Cons:
>> >        Format change is required for profile data and coverage
mapping.
>> >
>> > The initial patch is here: http://reviews.llvm.org/D13251
>> >
>> > With this patch, the size of a release clang binary with coverage
>> > mapping is
>> > reduced from 986M to 569M.
>> >
>> > If there are no major concerns,  I will carve out the patch into
smaller
>> > ones for review.
>> >
>> > thanks,
>> >
>> > David
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Sep 8, 2015 at 3:47 PM, Xinliang David Li <davidxl at
google.com>
>> > wrote:
>> >>
>> >> >> >>
>> >> >> >> yes -- it is fixed length (8byte) blob which
may include null
>> >> >> >> byte
>> >> >> >> in
>> >> >> >> the middle.
>> >> >> >
>> >> >> >
>> >> >> > For reference, MD5 sum is 16 bytes (128-bit):
>> >> >> > https://en.wikipedia.org/wiki/MD5
>> >> >>
>> >> >> yes, LLVM's MD5 hash only takes the lower 64bit.
>> >> >>
>> >> >>
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> >
>> >> >> >> > Or to say it another way, suppose that
Itanium mangling
>> >> >> >> > required
>> >> >> >> > as a
>> >> >> >> > final
>> >> >> >> > step to replace the string with its md5
sum in hex. Therefore
>> >> >> >> > all
>> >> >> >> > symbol
>> >> >> >> > names are "small". My
understanding is that this is effectively
>> >> >> >> > all
>> >> >> >> > your
>> >> >> >> > patch is doing.
>> >> >> >>
>> >> >> >> The key type before the change is StringRef,
while the after the
>> >> >> >> key
>> >> >> >> type is uint64_t. Are you suggesting
treating uint64_t md5 sum
>> >> >> >> key
>> >> >> >> as
>> >> >> >> a string of 8 bytes or storing md5 has in
text form which will
>> >> >> >> double
>> >> >> >> the size?
>> >> >> >
>> >> >> >
>> >> >> > How much does this change the benefit? If most
of the benefit is
>> >> >> > avoiding
>> >> >> > extraordinarily long mangled names then it may
be sufficient.
>> >> >> >
>> >> >> > With IR-level instrumentation like Rong is
pursuing the size may
>> >> >> > be
>> >> >> > reduced
>> >> >> > sufficiently that we do not need the
optimization proposed in this
>> >> >> > thread.
>> >> >> > For example, Rong found >2x size reduction on
Google's C++
>> >> >> > benchmarks,
>> >> >> > which
>> >> >> > I assume are representative of the extremely
large Google binaries
>> >> >> > that
>> >> >> > are
>> >> >> > causing the problems addressed by your proposal
in this thread.
>> >> >> > The
>> >> >> > measurements you mention for Clang in this
thread provide similar
>> >> >> > size
>> >> >> > reductions, so Rong's approach may be
sufficient (especially
>> >> >> > because
>> >> >> > functions with extremely large mangled names
tend to be small
>> >> >> > inline
>> >> >> > functions in header-only template libraries).
>> >> >>
>> >> >> Late instrumentation helps many cases. In some cases
(as shown in
>> >> >> SPEC), the reduction in size is not as large.
Reducing PGO overhead
>> >> >> will lower the bar for its adoption.
>> >> >>
>> >> >> >
>> >> >> > Of the points you mention in "Large size of
overhead can limit the
>> >> >> > usability
>> >> >> > of PGO greatly", many of the issues are
hard limits that prevent
>> >> >> > the
>> >> >> > use
>> >> >> > of
>> >> >> > PGO. Do you have a lower bound on how much the
size of the PGO
>> >> >> > data
>> >> >> > must
>> >> >> > be
>> >> >> > reduced in order to overcome the hard limits?
>> >> >>
>> >> >> This is a static view:  Think about the situation
where application
>> >> >> size is ever increasing; also think about situation
where we want to
>> >> >>
>> >> >> collect more types of profile data. Think about
situation where user
>> >> >> want to run pgo binaries on small devices with tiny
memory/storage
>> >> >> ..
>> >> >
>> >> >
>> >> > If we want to reduce memory overhead at runtime and
reduce the size
>> >> > of
>> >> > the
>> >> > raw profile data extracted from the target, there are
clear
>> >> > solutions.
>> >> > Consider that debug info does not need to be loaded into
the memory
>> >> > image of
>> >> > the target; why should information identifying each
counter need to
>> >> > be?
>> >> > A file containing raw profile counters is a subset of a
core dump; in
>> >> > most
>> >> > environments, a core dump does not need to have debug
info or symbol
>> >> > names
>> >> > in it, but can be still be read in full detail in
conjunction with
>> >> > the
>> >> > original binary.
>> >>
>> >> Yes -- there are many alternatives:
>> >> 1) emit the name key mapping as a side data at compile time,
or
>> >> 2) emit them into nonloadable sections of the object file.
>> >>
>> >> Compared with the above, LLVM's existing design does have
its own
>> >> advantage -- making it easier for tool to access
'debug' info for
>> >> counters.
>> >>
>> >> LLVM's coverage testing, on the other hand, take a hybrid
approach: It
>> >> emits the coverage map as rodata, but does not pass it to the
profile
>> >> dumper. I think it is better to emit covmap as a side data not
>> >> attached to target binary.
>> >>
>> >>
>> >> >
>> >> > Thus, as we require that the binary be passed to
llvm-profdata, there
>> >> > is
>> >> > no
>> >> > fundamental reason that the memory image of the program,
or the raw
>> >> > data
>> >> > extracted from the program, must have any size overhead
besides the
>> >> > raw
>> >> > values of the counters themselves and any text size
increase for
>> >> > incrementing them. If we are willing to impose this
requirement on
>> >> > users,
>> >> > then as far as reducing memory overhead at runtime and
reducing the
>> >> > size
>> >> > of
>> >> > the raw profile data extracted from the target, using
hashed function
>> >> > names
>> >> > is clearly the wrong direction.
>> >> >
>> >> > *Without* imposing the requirement of passing the binary
to
>> >> > llvm-profdata, I
>> >> > do like the ability to use hashed function names like you
are
>> >> > proposing.
>> >> > It
>> >> > is a simple solution for reducing size overhead of
function name
>> >> > strings
>> >> > with little complexity, as it is just swapping one string
for
>> >> > another.
>> >>
>> >> Agree. The good news is that the overhead of hashed function
names is
>> >> small enough that makes this approach attractive.
>> >>
>> >> thanks,
>> >>
>> >> David
>> >> >
>> >> >>
>> >> >>
>> >> >> >
>> >> >> > Obviously LLVM must be able to support the
extremely large
>> >> >> > binaries
>> >> >> > in
>> >> >> > your
>> >> >> > configuration (otherwise what use is LLVM as a
compiler ;) My
>> >> >> > questions
>> >> >> > are
>> >> >> > primarily aimed at establishing which tradeoffs
are acceptable for
>> >> >> > supporting this (both for LLVM and for you
guys).
>> >> >>
>> >> >> As I said, with the modified proposal (after getting
your feedback),
>> >> >> no PGO users in LLVM land is going to lose
anything/functionality.
>> >> >> The
>> >> >> end result will be net win for general users of LLVM
(even though
>> >> >> your
>> >> >> customers don't care about it), not just
'us' as you have mentioned
>> >> >> many times.
>> >> >>
>> >> >> >
>> >> >> > Btw, for us, the issue of PGO data size is not
completely
>> >> >> > immaterial
>> >> >> > but
>> >> >> > is
>> >> >> > very different from your use case. For us, the
primary issue is
>> >> >> > the
>> >> >> > additional memory use at run time, since PS4
games usually use
>> >> >> > "all"
>> >> >> > available memory. We had a problem with UBSan
where the large
>> >> >> > amount
>> >> >> > of
>> >> >> > memory required for storing the UBSan diagnostic
data at runtime
>> >> >> > required
>> >> >> > the game programmers to manually change their
memory map to make
>> >> >> > room.
>> >> >> > +Filipe, do you remember how much memory UBSan
was using that
>> >> >> > caused
>> >> >> > a
>> >> >> > problem?
>> >> >> >
>> >> >>
>> >> >> My proposal does help reducing rodata size
significantly.
>> >> >
>> >> >
>> >> > Yes, that is why I think that this is a useful thing to
do. I just
>> >> > want
>> >> > to
>> >> > be careful about existing use cases and the relevant
workflow issues.
>> >> >
>> >> > -- Sean Silva
>> >> >
>> >> >>
>> >> >>
>> >> >> David
>> >> >>
>> >> >>
>> >> >> > -- Sean Silva
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> In the raw format, md5 sum key can be an
embedded field in the
>> >> >> >> prf_data variable instead of as different
var referenced by
>> >> >> >> prf_data.
>> >> >> >>
>> >> >> >> >
>> >> >> >> > If this is not the case, you should
show your current patch so
>> >> >> >> > that
>> >> >> >> > we
>> >> >> >> > can
>> >> >> >> > discuss things concretely.
>> >> >> >>
>> >> >> >> It is not. See above about the difference.
>> >> >> >>
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >> >> It will be
>> >> >> >> >> >> very messy to support
multiple formats in instr-codegen and
>> >> >> >> >> >> instr-runtime.  For
compatibility concerns, the reader is
>> >> >> >> >> >> taught
>> >> >> >> >> >> to
>> >> >> >> >> >> support previous format,
but the changes there are isolated
>> >> >> >> >> >> (also
>> >> >> >> >> >> expected to be removed in
the future).
>> >> >> >> >> >>
>> >> >> >> >> >> >
>> >> >> >> >> >> > My primary concern is
that if the function name are not
>> >> >> >> >> >> > kept
>> >> >> >> >> >> > at
>> >> >> >> >> >> > all
>> >> >> >> >> >> > stages,
>> >> >> >> >> >> > then it becomes
difficult to analyze the profile data in
>> >> >> >> >> >> > a
>> >> >> >> >> >> > standalone
>> >> >> >> >> >> > way.
>> >> >> >> >> >> > Many times, I have
used `llvm-profdata show
>> >> >> >> >> >> > -all-functions
>> >> >> >> >> >> > foo.profdata`
>> >> >> >> >> >> > on
>> >> >> >> >> >> > the resulting profile
data and then imported that data
>> >> >> >> >> >> > into
>> >> >> >> >> >> > Mathematica
>> >> >> >> >> >> > for
>> >> >> >> >> >> > analysis.
>> >> >> >> >> >>
>> >> >> >> >> >> This is certainly a very
valid use case.
>> >> >> >> >> >>
>> >> >> >> >> >> >My understanding of
your proposal is that `llvm-profdata
>> >> >> >> >> >> > show
>> >> >> >> >> >> > -all-functions
foo.profdata` will not show the actual
>> >> >> >> >> >> > function
>> >> >> >> >> >> > names
>> >> >> >> >> >> > but
>> >> >> >> >> >> > instead MD5 hashes,
>> >> >> >> >> >>
>> >> >> >> >> >> Yes.
>> >> >> >> >> >>
>> >> >> >> >> >> To support your use case,
there are two solutions:
>> >> >> >> >> >>
>> >> >> >> >> >> 1) user can add
-fcoverage-mapping option in the build
>> >> >> >> >> >> 2) introduce a new option
-fprofile-instr-names that force
>> >> >> >> >> >> the
>> >> >> >> >> >> emission of the name
sections in the .o file. This is
>> >> >> >> >> >> similar
>> >> >> >> >> >> to
>> >> >> >> >> >> 1),
>> >> >> >> >> >> but no covmap section is
needed.
>> >> >> >> >> >>
>> >> >> >> >> >> llvm-profdata tool  will
be taught to read the name section
>> >> >> >> >> >> and
>> >> >> >> >> >> attach
>> >> >> >> >> >> function names to the
profile records.
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > Needing to pass the executable
to llvm-profdata would cause
>> >> >> >> >> > deployment
>> >> >> >> >> > issues for my customers in
practice.
>> >> >> >> >>
>> >> >> >> >> Why? The deployment needs to pass
the profile data anyway
>> >> >> >> >> right?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > Yes, but not the executable.
>> >> >> >> >
>> >> >> >> > The PGO training run is likely being
run by a gameplay tester
>> >> >> >> > (non-programmer). In general the binary
will not be lying
>> >> >> >> > around
>> >> >> >> > as a
>> >> >> >> > loose
>> >> >> >> > file anywhere, it will be part of a
full package of the
>> >> >> >> > binary+assets
>> >> >> >> > (think
>> >> >> >> > like what will end up on a bluray
disc). A game's binary
>> >> >> >> > *completely
>> >> >> >> > useless* without the assets, so except
locally on a
>> >> >> >> > programmer's
>> >> >> >> > machine
>> >> >> >> > while they iterate/debug, there is no
reason for a binary to
>> >> >> >> > ever
>> >> >> >> > exist
>> >> >> >> > as a
>> >> >> >> > standalone file.
>> >> >> >> >
>> >> >> >> > I'm not saying that needing the
binary is insurmountable in any
>> >> >> >> > particular
>> >> >> >> > scenario. Just that it will cause a
strict increase in the
>> >> >> >> > number
>> >> >> >> > of
>> >> >> >> > issues
>> >> >> >> > to deploying PGO.
>> >> >> >>
>> >> >> >>  Your concern is acknowledged.
>> >> >> >>
>> >> >> >> >
>> >> >> >> > These are much bigger
"compatibility concerns" for me than for
>> >> >> >> > newer
>> >> >> >> > toolchains to accept the old format.
For a change in format I
>> >> >> >> > can
>> >> >> >> > easily
>> >> >> >> > tell my users to replace an exe with a
newer one and that is
>> >> >> >> > all
>> >> >> >> > they
>> >> >> >> > need
>> >> >> >> > to do and it takes 10 seconds,
guaranteed. A workflow change is
>> >> >> >> > potentially
>> >> >> >> > a massive disruption and guaranteed to
take more than 10
>> >> >> >> > seconds
>> >> >> >> > to
>> >> >> >> > fix
>> >> >> >> > (perhaps hours or days).
>> >> >> >>
>> >> >> >> ok.
>> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>   This
>> >> >> >> >> is no different from llvm-cov usage
model.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > In practice, getting the performance of
PGO is a higher
>> >> >> >> > priority
>> >> >> >> > for
>> >> >> >> > my
>> >> >> >> > users, so we should not assume that
llvm-cov is being used.
>> >> >> >>
>> >> >> >> Glad to hear that :)
>> >> >> >>
>> >> >> >> thanks,
>> >> >> >>
>> >> >> >> David
>> >> >> >>
>> >> >> >> >
>> >> >> >> > -- Sean Silva
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> David
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> Note that with 1) or 2),
the user can still benefit from
>> >> >> >> >> >> the
>> >> >> >> >> >> reduced
>> >> >> >> >> >> profile size.
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > Let me reiterate that the size
of the profile is not a
>> >> >> >> >> > problem
>> >> >> >> >> > I
>> >> >> >> >> > have
>> >> >> >> >> > observed in practice (nor have
I heard of this being a
>> >> >> >> >> > problem
>> >> >> >> >> > in
>> >> >> >> >> > practice
>> >> >> >> >> > until this thread). Therefore
I'm skeptical of any changes
>> >> >> >> >> > to
>> >> >> >> >> > our
>> >> >> >> >> > default
>> >> >> >> >> > behavior or any new
requirements that are not opt-in.
>> >> >> >> >> >
>> >> >> >> >> > -- Sean Silva
>> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> thanks,
>> >> >> >> >> >>
>> >> >> >> >> >> David
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> >which will make it
more difficult for me to do this kind
>> >> >> >> >> >> > of analysis (would
require using nm on the original
>> >> >> >> >> >> > binary,
>> >> >> >> >> >> > hashing
>> >> >> >> >> >> > everything, etc.).
>> >> >> >> >> >> >
>> >> >> >> >> >> > btw, feel free to
attach the patch even if it in a rough
>> >> >> >> >> >> > state.
>> >> >> >> >> >> > It
>> >> >> >> >> >> > can
>> >> >> >> >> >> > still
>> >> >> >> >> >> > help to clarify the
proposal and be a good talking point.
>> >> >> >> >> >> > Fine-grained
>> >> >> >> >> >> > patch
>> >> >> >> >> >> > review for caring
about the rough parts will happen on
>> >> >> >> >> >> > llvm-commits;
>> >> >> >> >> >> > the
>> >> >> >> >> >> > rough parts will not
distract the discussion here on
>> >> >> >> >> >> > llvm-dev.
>> >> >> >> >> >> >
>> >> >> >> >> >> > -- Sean Silva
>> >> >> >> >> >> >
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> thanks,
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> David
>> >> >> >> >> >> >>
_______________________________________________
>> >> >> >> >> >> >> LLVM Developers
mailing list
>> >> >> >> >> >> >> llvm-dev at
lists.llvm.org
>> >> >> >> >> >> >>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>
>

Bob Wilson via llvm-dev

2015-Dec-14 16:55 UTC

head link

[llvm-dev] RFC: Reducing Instr PGO size overhead

> On Dec 9, 2015, at 1:12 PM, Xinliang David Li via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> We are now very close to push the final stage of the PGO size
> reduction task. Here is the updated plan going forward:
> 
> 1) In this round, the format of the indexed profile data won't be
unchanged.
That’s a double negative. Was that intentional? I.E., are you changing the
format?
> 2) there will be *no* changes in user interfaces to all profiling
> related tools including llvm-profdata, llvm-cov -- the change will be
> transparent in terms of PGO usability.
> 3) The implementation will be using  compression for the function name
> section (the compression ratio is about 5:1). As a result, the linker
> input object size, unstripped binary size, stripped binary size,
> process image size, and raw profile data size will all be greatly
> reduced;
> 4) The change will also greatly improve usability of coverage-mapping
> due to the reduced data size in memory.
> 
> Before the final patch, there are a few more small preparation patches
> : 1) abstract naming reading into a class (ProfSymtab) (currently the
> reader uses/assumes the raw/uncompressed object section. 2) add
> compression support in ProfSymtab.
> 
> thanks,
> 
> David
> 
> 
> 
> 
> 
> On Wed, Oct 14, 2015 at 12:30 AM, Xinliang David Li <davidxl at
google.com> wrote:
>> I plan to divide the patch into series of small patches. The
>> preparation patches will mostly be refactoring changes with NFC. This
>> will minimize the size of final patch(es) with functional changes to
>> help discussions.
>> 
>> thanks,
>> 
>> David
>> 
>> On Fri, Oct 9, 2015 at 4:06 PM, Xinliang David Li <davidxl at
google.com> wrote:
>>> On Fri, Oct 9, 2015 at 3:58 PM, Sean Silva <chisophugis at
gmail.com> wrote:
>>>> 
>>>> 
>>>> On Wed, Oct 7, 2015 at 11:12 PM, Xinliang David Li <davidxl
at google.com>
>>>> wrote:
>>>>> 
>>>>> There is no further response to this, so I will assume
general
>>>>> direction of solution-3 is acceptable ;)
>>> 
>>>> No response does not mean "LGTM".
>>>> 
>>> 
>>> What I meant is that the discussion can be moved on to the formal
code
>>> review. I have not yet submitted the final patch for review yet.
>>> Before that is ready, continue using this thread to voice your
>>> concerns.
>>> 
>>> David
>>> 
>>> 
>>> 
>>> 
>>>> -- Sean Silva
>>>> 
>>>>> 
>>>>> 
>>>>> Solution-3 can be further improved. Instead of using static
symbol
>>>>> table (with zero size __llvm_prf_nm symbols) to store
function names
>>>>> for profile display and coverage mapping, the function
names can be
>>>>> stored compressed in a non-allocatable section. The
compression ratio
>>>>> for function name strings can be very high (~5x).  The
covmapping data
>>>>> can also be made non-allocatable.
>>>>> 
>>>>> thanks,
>>>>> 
>>>>> David
>>>>> 
>>>>> On Tue, Sep 29, 2015 at 9:52 AM, Xinliang David Li
<davidxl at google.com>
>>>>> wrote:
>>>>>> Sorry for the late update. I finally found time to
implement a solution
>>>>>> (Solution-3) that has the best size savings (for both
PGO and coverage
>>>>>> testing) with symbolic information available.  Here is
a re-cap of what
>>>>>> we
>>>>>> have discussed so far:
>>>>>> 
>>>>>> Solution-1:
>>>>>> 
>>>>>> This is the original solution proposed. In this
solution, A function
>>>>>> name's
>>>>>> MD5 hash is used as the primary key (combined with IR
hash) for function
>>>>>> lookup. __llvm_prf_names section won't be emitted
into the binary nor
>>>>>> dumped
>>>>>> into the profile data unless -fcoverage-mapping is also
specified.
>>>>>> 
>>>>>>    Pros:
>>>>>>        1. maximal reduction of instrumentation binary
process image
>>>>>> size
>>>>>>        2. maximal reduction of object and unstripped
binary size
>>>>>>        3. maximal reduction of raw profile data size
>>>>>>        4. maximal reduction of indexed profile data
size
>>>>>> 
>>>>>>   Cons:
>>>>>>        1. -fcoverage-mapping use case is not handled --
the size
>>>>>> problem
>>>>>> still exist
>>>>>>        2. profile dump with llvm-profdata no longer
have function names
>>>>>> associated -- user needs to use postprocessing tool to
get the
>>>>>> functionality
>>>>>>        3. function filtering with partial function name
is not
>>>>>> supported
>>>>>>        4. Requires incompatible format change
>>>>>> 
>>>>>> 
>>>>>> Solution-2: (http://reviews.llvm.org/D12715)
>>>>>> 
>>>>>> In this solution, the MD5 hash string is used to
replace the raw name
>>>>>> string
>>>>>>      Pros:
>>>>>>          1. Very simple to implement
>>>>>>          2. have good reduction of all sizes for
typical large C++
>>>>>> applications
>>>>>>          3. No profile data format change is required.
>>>>>> 
>>>>>>      Cons:
>>>>>>          1. Still requires 16byte overhead per-function
-- can actually
>>>>>> hurt C programs
>>>>>>          2. -fcoverage-mapping use case is still not
handled
>>>>>>          3. The problem with llvm-profdata still exists
(no symbolic
>>>>>> info,
>>>>>> partial filtering support)
>>>>>> 
>>>>>> 
>>>>>> Solution-3:
>>>>>> 
>>>>>> This is the new solution I am proposing. It is
basically an enhancement
>>>>>> of
>>>>>> Solution-1 with most of the weakness resolved.  The
difference with
>>>>>> Solution-1 is
>>>>>>      1. Function name symbols are emitted into the
symbol table as weak
>>>>>> externs. They don't occupy any space at runtime and
can be easily
>>>>>> stripped.
>>>>>>      2. -fcoverage-mapping does not need special
handling -- it
>>>>>> automatically benefit from the same size saving.
>>>>>>      3. llvm-cov is changed to read symbol info from
the symtab instead
>>>>>> of
>>>>>> reading them from the name section data
>>>>>>      4. llvm-profdata is enhanced to take a binary as
input and dump
>>>>>> profile with names attached. Function filtering is
fully supported
>>>>>> (option
>>>>>> can also be introduced to force dumping names into
binary and profile
>>>>>> data,
>>>>>> so that llvm-profdata use case is not changed at all).
>>>>>> 
>>>>>> Pros:
>>>>>>       1. All the pros from Solution-1
>>>>>>       2. Size savings for coverage-mapping case
>>>>>> Cons:
>>>>>>       Format change is required for profile data and
coverage mapping.
>>>>>> 
>>>>>> The initial patch is here:
http://reviews.llvm.org/D13251
>>>>>> 
>>>>>> With this patch, the size of a release clang binary
with coverage
>>>>>> mapping is
>>>>>> reduced from 986M to 569M.
>>>>>> 
>>>>>> If there are no major concerns,  I will carve out the
patch into smaller
>>>>>> ones for review.
>>>>>> 
>>>>>> thanks,
>>>>>> 
>>>>>> David
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Sep 8, 2015 at 3:47 PM, Xinliang David Li
<davidxl at google.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> yes -- it is fixed length (8byte)
blob which may include null
>>>>>>>>>>> byte
>>>>>>>>>>> in
>>>>>>>>>>> the middle.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> For reference, MD5 sum is 16 bytes
(128-bit):
>>>>>>>>>> https://en.wikipedia.org/wiki/MD5
>>>>>>>>> 
>>>>>>>>> yes, LLVM's MD5 hash only takes the
lower 64bit.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Or to say it another way,
suppose that Itanium mangling
>>>>>>>>>>>> required
>>>>>>>>>>>> as a
>>>>>>>>>>>> final
>>>>>>>>>>>> step to replace the string with
its md5 sum in hex. Therefore
>>>>>>>>>>>> all
>>>>>>>>>>>> symbol
>>>>>>>>>>>> names are "small". My
understanding is that this is effectively
>>>>>>>>>>>> all
>>>>>>>>>>>> your
>>>>>>>>>>>> patch is doing.
>>>>>>>>>>> 
>>>>>>>>>>> The key type before the change is
StringRef, while the after the
>>>>>>>>>>> key
>>>>>>>>>>> type is uint64_t. Are you
suggesting treating uint64_t md5 sum
>>>>>>>>>>> key
>>>>>>>>>>> as
>>>>>>>>>>> a string of 8 bytes or storing md5
has in text form which will
>>>>>>>>>>> double
>>>>>>>>>>> the size?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> How much does this change the benefit?
If most of the benefit is
>>>>>>>>>> avoiding
>>>>>>>>>> extraordinarily long mangled names then
it may be sufficient.
>>>>>>>>>> 
>>>>>>>>>> With IR-level instrumentation like Rong
is pursuing the size may
>>>>>>>>>> be
>>>>>>>>>> reduced
>>>>>>>>>> sufficiently that we do not need the
optimization proposed in this
>>>>>>>>>> thread.
>>>>>>>>>> For example, Rong found >2x size
reduction on Google's C++
>>>>>>>>>> benchmarks,
>>>>>>>>>> which
>>>>>>>>>> I assume are representative of the
extremely large Google binaries
>>>>>>>>>> that
>>>>>>>>>> are
>>>>>>>>>> causing the problems addressed by your
proposal in this thread.
>>>>>>>>>> The
>>>>>>>>>> measurements you mention for Clang in
this thread provide similar
>>>>>>>>>> size
>>>>>>>>>> reductions, so Rong's approach may
be sufficient (especially
>>>>>>>>>> because
>>>>>>>>>> functions with extremely large mangled
names tend to be small
>>>>>>>>>> inline
>>>>>>>>>> functions in header-only template
libraries).
>>>>>>>>> 
>>>>>>>>> Late instrumentation helps many cases. In
some cases (as shown in
>>>>>>>>> SPEC), the reduction in size is not as
large. Reducing PGO overhead
>>>>>>>>> will lower the bar for its adoption.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Of the points you mention in
"Large size of overhead can limit the
>>>>>>>>>> usability
>>>>>>>>>> of PGO greatly", many of the
issues are hard limits that prevent
>>>>>>>>>> the
>>>>>>>>>> use
>>>>>>>>>> of
>>>>>>>>>> PGO. Do you have a lower bound on how
much the size of the PGO
>>>>>>>>>> data
>>>>>>>>>> must
>>>>>>>>>> be
>>>>>>>>>> reduced in order to overcome the hard
limits?
>>>>>>>>> 
>>>>>>>>> This is a static view:  Think about the
situation where application
>>>>>>>>> size is ever increasing; also think about
situation where we want to
>>>>>>>>> 
>>>>>>>>> collect more types of profile data. Think
about situation where user
>>>>>>>>> want to run pgo binaries on small devices
with tiny memory/storage
>>>>>>>>> ..
>>>>>>>> 
>>>>>>>> 
>>>>>>>> If we want to reduce memory overhead at runtime
and reduce the size
>>>>>>>> of
>>>>>>>> the
>>>>>>>> raw profile data extracted from the target,
there are clear
>>>>>>>> solutions.
>>>>>>>> Consider that debug info does not need to be
loaded into the memory
>>>>>>>> image of
>>>>>>>> the target; why should information identifying
each counter need to
>>>>>>>> be?
>>>>>>>> A file containing raw profile counters is a
subset of a core dump; in
>>>>>>>> most
>>>>>>>> environments, a core dump does not need to have
debug info or symbol
>>>>>>>> names
>>>>>>>> in it, but can be still be read in full detail
in conjunction with
>>>>>>>> the
>>>>>>>> original binary.
>>>>>>> 
>>>>>>> Yes -- there are many alternatives:
>>>>>>> 1) emit the name key mapping as a side data at
compile time, or
>>>>>>> 2) emit them into nonloadable sections of the
object file.
>>>>>>> 
>>>>>>> Compared with the above, LLVM's existing design
does have its own
>>>>>>> advantage -- making it easier for tool to access
'debug' info for
>>>>>>> counters.
>>>>>>> 
>>>>>>> LLVM's coverage testing, on the other hand,
take a hybrid approach: It
>>>>>>> emits the coverage map as rodata, but does not pass
it to the profile
>>>>>>> dumper. I think it is better to emit covmap as a
side data not
>>>>>>> attached to target binary.
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> Thus, as we require that the binary be passed
to llvm-profdata, there
>>>>>>>> is
>>>>>>>> no
>>>>>>>> fundamental reason that the memory image of the
program, or the raw
>>>>>>>> data
>>>>>>>> extracted from the program, must have any size
overhead besides the
>>>>>>>> raw
>>>>>>>> values of the counters themselves and any text
size increase for
>>>>>>>> incrementing them. If we are willing to impose
this requirement on
>>>>>>>> users,
>>>>>>>> then as far as reducing memory overhead at
runtime and reducing the
>>>>>>>> size
>>>>>>>> of
>>>>>>>> the raw profile data extracted from the target,
using hashed function
>>>>>>>> names
>>>>>>>> is clearly the wrong direction.
>>>>>>>> 
>>>>>>>> *Without* imposing the requirement of passing
the binary to
>>>>>>>> llvm-profdata, I
>>>>>>>> do like the ability to use hashed function
names like you are
>>>>>>>> proposing.
>>>>>>>> It
>>>>>>>> is a simple solution for reducing size overhead
of function name
>>>>>>>> strings
>>>>>>>> with little complexity, as it is just swapping
one string for
>>>>>>>> another.
>>>>>>> 
>>>>>>> Agree. The good news is that the overhead of hashed
function names is
>>>>>>> small enough that makes this approach attractive.
>>>>>>> 
>>>>>>> thanks,
>>>>>>> 
>>>>>>> David
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Obviously LLVM must be able to support
the extremely large
>>>>>>>>>> binaries
>>>>>>>>>> in
>>>>>>>>>> your
>>>>>>>>>> configuration (otherwise what use is
LLVM as a compiler ;) My
>>>>>>>>>> questions
>>>>>>>>>> are
>>>>>>>>>> primarily aimed at establishing which
tradeoffs are acceptable for
>>>>>>>>>> supporting this (both for LLVM and for
you guys).
>>>>>>>>> 
>>>>>>>>> As I said, with the modified proposal
(after getting your feedback),
>>>>>>>>> no PGO users in LLVM land is going to lose
anything/functionality.
>>>>>>>>> The
>>>>>>>>> end result will be net win for general
users of LLVM (even though
>>>>>>>>> your
>>>>>>>>> customers don't care about it), not
just 'us' as you have mentioned
>>>>>>>>> many times.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Btw, for us, the issue of PGO data size
is not completely
>>>>>>>>>> immaterial
>>>>>>>>>> but
>>>>>>>>>> is
>>>>>>>>>> very different from your use case. For
us, the primary issue is
>>>>>>>>>> the
>>>>>>>>>> additional memory use at run time,
since PS4 games usually use
>>>>>>>>>> "all"
>>>>>>>>>> available memory. We had a problem with
UBSan where the large
>>>>>>>>>> amount
>>>>>>>>>> of
>>>>>>>>>> memory required for storing the UBSan
diagnostic data at runtime
>>>>>>>>>> required
>>>>>>>>>> the game programmers to manually change
their memory map to make
>>>>>>>>>> room.
>>>>>>>>>> +Filipe, do you remember how much
memory UBSan was using that
>>>>>>>>>> caused
>>>>>>>>>> a
>>>>>>>>>> problem?
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> My proposal does help reducing rodata size
significantly.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Yes, that is why I think that this is a useful
thing to do. I just
>>>>>>>> want
>>>>>>>> to
>>>>>>>> be careful about existing use cases and the
relevant workflow issues.
>>>>>>>> 
>>>>>>>> -- Sean Silva
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> David
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> -- Sean Silva
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> In the raw format, md5 sum key can
be an embedded field in the
>>>>>>>>>>> prf_data variable instead of as
different var referenced by
>>>>>>>>>>> prf_data.
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> If this is not the case, you
should show your current patch so
>>>>>>>>>>>> that
>>>>>>>>>>>> we
>>>>>>>>>>>> can
>>>>>>>>>>>> discuss things concretely.
>>>>>>>>>>> 
>>>>>>>>>>> It is not. See above about the
difference.
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> It will be
>>>>>>>>>>>>>>> very messy to
support multiple formats in instr-codegen and
>>>>>>>>>>>>>>> instr-runtime.  For
compatibility concerns, the reader is
>>>>>>>>>>>>>>> taught
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> support previous
format, but the changes there are isolated
>>>>>>>>>>>>>>> (also
>>>>>>>>>>>>>>> expected to be
removed in the future).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> My primary
concern is that if the function name are not
>>>>>>>>>>>>>>>> kept
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>> stages,
>>>>>>>>>>>>>>>> then it becomes
difficult to analyze the profile data in
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> standalone
>>>>>>>>>>>>>>>> way.
>>>>>>>>>>>>>>>> Many times, I
have used `llvm-profdata show
>>>>>>>>>>>>>>>> -all-functions
>>>>>>>>>>>>>>>> foo.profdata`
>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> the resulting
profile data and then imported that data
>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>> Mathematica
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> analysis.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This is certainly a
very valid use case.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> My
understanding of your proposal is that `llvm-profdata
>>>>>>>>>>>>>>>> show
>>>>>>>>>>>>>>>> -all-functions
foo.profdata` will not show the actual
>>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>> instead MD5
hashes,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> To support your use
case, there are two solutions:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 1) user can add
-fcoverage-mapping option in the build
>>>>>>>>>>>>>>> 2) introduce a new
option -fprofile-instr-names that force
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> emission of the
name sections in the .o file. This is
>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> 1),
>>>>>>>>>>>>>>> but no covmap
section is needed.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> llvm-profdata tool 
will be taught to read the name section
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> attach
>>>>>>>>>>>>>>> function names to
the profile records.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Needing to pass the
executable to llvm-profdata would cause
>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>> issues for my customers
in practice.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Why? The deployment needs
to pass the profile data anyway
>>>>>>>>>>>>> right?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Yes, but not the executable.
>>>>>>>>>>>> 
>>>>>>>>>>>> The PGO training run is likely
being run by a gameplay tester
>>>>>>>>>>>> (non-programmer). In general
the binary will not be lying
>>>>>>>>>>>> around
>>>>>>>>>>>> as a
>>>>>>>>>>>> loose
>>>>>>>>>>>> file anywhere, it will be part
of a full package of the
>>>>>>>>>>>> binary+assets
>>>>>>>>>>>> (think
>>>>>>>>>>>> like what will end up on a
bluray disc). A game's binary
>>>>>>>>>>>> *completely
>>>>>>>>>>>> useless* without the assets, so
except locally on a
>>>>>>>>>>>> programmer's
>>>>>>>>>>>> machine
>>>>>>>>>>>> while they iterate/debug, there
is no reason for a binary to
>>>>>>>>>>>> ever
>>>>>>>>>>>> exist
>>>>>>>>>>>> as a
>>>>>>>>>>>> standalone file.
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm not saying that needing
the binary is insurmountable in any
>>>>>>>>>>>> particular
>>>>>>>>>>>> scenario. Just that it will
cause a strict increase in the
>>>>>>>>>>>> number
>>>>>>>>>>>> of
>>>>>>>>>>>> issues
>>>>>>>>>>>> to deploying PGO.
>>>>>>>>>>> 
>>>>>>>>>>> Your concern is acknowledged.
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> These are much bigger
"compatibility concerns" for me than for
>>>>>>>>>>>> newer
>>>>>>>>>>>> toolchains to accept the old
format. For a change in format I
>>>>>>>>>>>> can
>>>>>>>>>>>> easily
>>>>>>>>>>>> tell my users to replace an exe
with a newer one and that is
>>>>>>>>>>>> all
>>>>>>>>>>>> they
>>>>>>>>>>>> need
>>>>>>>>>>>> to do and it takes 10 seconds,
guaranteed. A workflow change is
>>>>>>>>>>>> potentially
>>>>>>>>>>>> a massive disruption and
guaranteed to take more than 10
>>>>>>>>>>>> seconds
>>>>>>>>>>>> to
>>>>>>>>>>>> fix
>>>>>>>>>>>> (perhaps hours or days).
>>>>>>>>>>> 
>>>>>>>>>>> ok.
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>  This
>>>>>>>>>>>>> is no different from
llvm-cov usage model.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> In practice, getting the
performance of PGO is a higher
>>>>>>>>>>>> priority
>>>>>>>>>>>> for
>>>>>>>>>>>> my
>>>>>>>>>>>> users, so we should not assume
that llvm-cov is being used.
>>>>>>>>>>> 
>>>>>>>>>>> Glad to hear that :)
>>>>>>>>>>> 
>>>>>>>>>>> thanks,
>>>>>>>>>>> 
>>>>>>>>>>> David
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> David
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Note that with 1)
or 2), the user can still benefit from
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> reduced
>>>>>>>>>>>>>>> profile size.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Let me reiterate that
the size of the profile is not a
>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>> observed in practice
(nor have I heard of this being a
>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>> practice
>>>>>>>>>>>>>> until this thread).
Therefore I'm skeptical of any changes
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> our
>>>>>>>>>>>>>> default
>>>>>>>>>>>>>> behavior or any new
requirements that are not opt-in.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> which will make
it more difficult for me to do this kind
>>>>>>>>>>>>>>>> of analysis
(would require using nm on the original
>>>>>>>>>>>>>>>> binary,
>>>>>>>>>>>>>>>> hashing
>>>>>>>>>>>>>>>> everything,
etc.).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> btw, feel free
to attach the patch even if it in a rough
>>>>>>>>>>>>>>>> state.
>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>> help to clarify
the proposal and be a good talking point.
>>>>>>>>>>>>>>>> Fine-grained
>>>>>>>>>>>>>>>> patch
>>>>>>>>>>>>>>>> review for
caring about the rough parts will happen on
>>>>>>>>>>>>>>>> llvm-commits;
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> rough parts
will not distract the discussion here on
>>>>>>>>>>>>>>>> llvm-dev.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>> LLVM
Developers mailing list
>>>>>>>>>>>>>>>>> llvm-dev at
lists.llvm.org
>>>>>>>>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Xinliang David Li via llvm-dev

2015-Dec-14 17:31 UTC

head link

[llvm-dev] RFC: Reducing Instr PGO size overhead

Sorry about the typo  -- I meant indexed format version won't be changed.

thanks,

David

On Mon, Dec 14, 2015 at 8:55 AM, Bob Wilson <bob.wilson at apple.com>
wrote:>
>> On Dec 9, 2015, at 1:12 PM, Xinliang David Li via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
>>
>> We are now very close to push the final stage of the PGO size
>> reduction task. Here is the updated plan going forward:
>>
>> 1) In this round, the format of the indexed profile data won't be
unchanged.
>
> That’s a double negative. Was that intentional? I.E., are you changing the
format?
>
>> 2) there will be *no* changes in user interfaces to all profiling
>> related tools including llvm-profdata, llvm-cov -- the change will be
>> transparent in terms of PGO usability.
>> 3) The implementation will be using  compression for the function name
>> section (the compression ratio is about 5:1). As a result, the linker
>> input object size, unstripped binary size, stripped binary size,
>> process image size, and raw profile data size will all be greatly
>> reduced;
>> 4) The change will also greatly improve usability of coverage-mapping
>> due to the reduced data size in memory.
>>
>> Before the final patch, there are a few more small preparation patches
>> : 1) abstract naming reading into a class (ProfSymtab) (currently the
>> reader uses/assumes the raw/uncompressed object section. 2) add
>> compression support in ProfSymtab.
>>
>> thanks,
>>
>> David
>>
>>
>>
>>
>>
>> On Wed, Oct 14, 2015 at 12:30 AM, Xinliang David Li <davidxl at
google.com> wrote:
>>> I plan to divide the patch into series of small patches. The
>>> preparation patches will mostly be refactoring changes with NFC.
This
>>> will minimize the size of final patch(es) with functional changes
to
>>> help discussions.
>>>
>>> thanks,
>>>
>>> David
>>>
>>> On Fri, Oct 9, 2015 at 4:06 PM, Xinliang David Li <davidxl at
google.com> wrote:
>>>> On Fri, Oct 9, 2015 at 3:58 PM, Sean Silva <chisophugis at
gmail.com> wrote:
>>>>>
>>>>>
>>>>> On Wed, Oct 7, 2015 at 11:12 PM, Xinliang David Li
<davidxl at google.com>
>>>>> wrote:
>>>>>>
>>>>>> There is no further response to this, so I will assume
general
>>>>>> direction of solution-3 is acceptable ;)
>>>>
>>>>> No response does not mean "LGTM".
>>>>>
>>>>
>>>> What I meant is that the discussion can be moved on to the
formal code
>>>> review. I have not yet submitted the final patch for review
yet.
>>>> Before that is ready, continue using this thread to voice your
>>>> concerns.
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>> -- Sean Silva
>>>>>
>>>>>>
>>>>>>
>>>>>> Solution-3 can be further improved. Instead of using
static symbol
>>>>>> table (with zero size __llvm_prf_nm symbols) to store
function names
>>>>>> for profile display and coverage mapping, the function
names can be
>>>>>> stored compressed in a non-allocatable section. The
compression ratio
>>>>>> for function name strings can be very high (~5x).  The
covmapping data
>>>>>> can also be made non-allocatable.
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On Tue, Sep 29, 2015 at 9:52 AM, Xinliang David Li
<davidxl at google.com>
>>>>>> wrote:
>>>>>>> Sorry for the late update. I finally found time to
implement a solution
>>>>>>> (Solution-3) that has the best size savings (for
both PGO and coverage
>>>>>>> testing) with symbolic information available.  Here
is a re-cap of what
>>>>>>> we
>>>>>>> have discussed so far:
>>>>>>>
>>>>>>> Solution-1:
>>>>>>>
>>>>>>> This is the original solution proposed. In this
solution, A function
>>>>>>> name's
>>>>>>> MD5 hash is used as the primary key (combined with
IR hash) for function
>>>>>>> lookup. __llvm_prf_names section won't be
emitted into the binary nor
>>>>>>> dumped
>>>>>>> into the profile data unless -fcoverage-mapping is
also specified.
>>>>>>>
>>>>>>>    Pros:
>>>>>>>        1. maximal reduction of instrumentation
binary process image
>>>>>>> size
>>>>>>>        2. maximal reduction of object and
unstripped binary size
>>>>>>>        3. maximal reduction of raw profile data
size
>>>>>>>        4. maximal reduction of indexed profile data
size
>>>>>>>
>>>>>>>   Cons:
>>>>>>>        1. -fcoverage-mapping use case is not
handled -- the size
>>>>>>> problem
>>>>>>> still exist
>>>>>>>        2. profile dump with llvm-profdata no longer
have function names
>>>>>>> associated -- user needs to use postprocessing tool
to get the
>>>>>>> functionality
>>>>>>>        3. function filtering with partial function
name is not
>>>>>>> supported
>>>>>>>        4. Requires incompatible format change
>>>>>>>
>>>>>>>
>>>>>>> Solution-2: (http://reviews.llvm.org/D12715)
>>>>>>>
>>>>>>> In this solution, the MD5 hash string is used to
replace the raw name
>>>>>>> string
>>>>>>>      Pros:
>>>>>>>          1. Very simple to implement
>>>>>>>          2. have good reduction of all sizes for
typical large C++
>>>>>>> applications
>>>>>>>          3. No profile data format change is
required.
>>>>>>>
>>>>>>>      Cons:
>>>>>>>          1. Still requires 16byte overhead
per-function -- can actually
>>>>>>> hurt C programs
>>>>>>>          2. -fcoverage-mapping use case is still
not handled
>>>>>>>          3. The problem with llvm-profdata still
exists (no symbolic
>>>>>>> info,
>>>>>>> partial filtering support)
>>>>>>>
>>>>>>>
>>>>>>> Solution-3:
>>>>>>>
>>>>>>> This is the new solution I am proposing. It is
basically an enhancement
>>>>>>> of
>>>>>>> Solution-1 with most of the weakness resolved.  The
difference with
>>>>>>> Solution-1 is
>>>>>>>      1. Function name symbols are emitted into the
symbol table as weak
>>>>>>> externs. They don't occupy any space at runtime
and can be easily
>>>>>>> stripped.
>>>>>>>      2. -fcoverage-mapping does not need special
handling -- it
>>>>>>> automatically benefit from the same size saving.
>>>>>>>      3. llvm-cov is changed to read symbol info
from the symtab instead
>>>>>>> of
>>>>>>> reading them from the name section data
>>>>>>>      4. llvm-profdata is enhanced to take a binary
as input and dump
>>>>>>> profile with names attached. Function filtering is
fully supported
>>>>>>> (option
>>>>>>> can also be introduced to force dumping names into
binary and profile
>>>>>>> data,
>>>>>>> so that llvm-profdata use case is not changed at
all).
>>>>>>>
>>>>>>> Pros:
>>>>>>>       1. All the pros from Solution-1
>>>>>>>       2. Size savings for coverage-mapping case
>>>>>>> Cons:
>>>>>>>       Format change is required for profile data
and coverage mapping.
>>>>>>>
>>>>>>> The initial patch is here:
http://reviews.llvm.org/D13251
>>>>>>>
>>>>>>> With this patch, the size of a release clang binary
with coverage
>>>>>>> mapping is
>>>>>>> reduced from 986M to 569M.
>>>>>>>
>>>>>>> If there are no major concerns,  I will carve out
the patch into smaller
>>>>>>> ones for review.
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 8, 2015 at 3:47 PM, Xinliang David Li
<davidxl at google.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> yes -- it is fixed length
(8byte) blob which may include null
>>>>>>>>>>>> byte
>>>>>>>>>>>> in
>>>>>>>>>>>> the middle.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> For reference, MD5 sum is 16 bytes
(128-bit):
>>>>>>>>>>> https://en.wikipedia.org/wiki/MD5
>>>>>>>>>>
>>>>>>>>>> yes, LLVM's MD5 hash only takes the
lower 64bit.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Or to say it another way,
suppose that Itanium mangling
>>>>>>>>>>>>> required
>>>>>>>>>>>>> as a
>>>>>>>>>>>>> final
>>>>>>>>>>>>> step to replace the string
with its md5 sum in hex. Therefore
>>>>>>>>>>>>> all
>>>>>>>>>>>>> symbol
>>>>>>>>>>>>> names are
"small". My understanding is that this is effectively
>>>>>>>>>>>>> all
>>>>>>>>>>>>> your
>>>>>>>>>>>>> patch is doing.
>>>>>>>>>>>>
>>>>>>>>>>>> The key type before the change
is StringRef, while the after the
>>>>>>>>>>>> key
>>>>>>>>>>>> type is uint64_t. Are you
suggesting treating uint64_t md5 sum
>>>>>>>>>>>> key
>>>>>>>>>>>> as
>>>>>>>>>>>> a string of 8 bytes or storing
md5 has in text form which will
>>>>>>>>>>>> double
>>>>>>>>>>>> the size?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> How much does this change the
benefit? If most of the benefit is
>>>>>>>>>>> avoiding
>>>>>>>>>>> extraordinarily long mangled names
then it may be sufficient.
>>>>>>>>>>>
>>>>>>>>>>> With IR-level instrumentation like
Rong is pursuing the size may
>>>>>>>>>>> be
>>>>>>>>>>> reduced
>>>>>>>>>>> sufficiently that we do not need
the optimization proposed in this
>>>>>>>>>>> thread.
>>>>>>>>>>> For example, Rong found >2x size
reduction on Google's C++
>>>>>>>>>>> benchmarks,
>>>>>>>>>>> which
>>>>>>>>>>> I assume are representative of the
extremely large Google binaries
>>>>>>>>>>> that
>>>>>>>>>>> are
>>>>>>>>>>> causing the problems addressed by
your proposal in this thread.
>>>>>>>>>>> The
>>>>>>>>>>> measurements you mention for Clang
in this thread provide similar
>>>>>>>>>>> size
>>>>>>>>>>> reductions, so Rong's approach
may be sufficient (especially
>>>>>>>>>>> because
>>>>>>>>>>> functions with extremely large
mangled names tend to be small
>>>>>>>>>>> inline
>>>>>>>>>>> functions in header-only template
libraries).
>>>>>>>>>>
>>>>>>>>>> Late instrumentation helps many cases.
In some cases (as shown in
>>>>>>>>>> SPEC), the reduction in size is not as
large. Reducing PGO overhead
>>>>>>>>>> will lower the bar for its adoption.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Of the points you mention in
"Large size of overhead can limit the
>>>>>>>>>>> usability
>>>>>>>>>>> of PGO greatly", many of the
issues are hard limits that prevent
>>>>>>>>>>> the
>>>>>>>>>>> use
>>>>>>>>>>> of
>>>>>>>>>>> PGO. Do you have a lower bound on
how much the size of the PGO
>>>>>>>>>>> data
>>>>>>>>>>> must
>>>>>>>>>>> be
>>>>>>>>>>> reduced in order to overcome the
hard limits?
>>>>>>>>>>
>>>>>>>>>> This is a static view:  Think about the
situation where application
>>>>>>>>>> size is ever increasing; also think
about situation where we want to
>>>>>>>>>>
>>>>>>>>>> collect more types of profile data.
Think about situation where user
>>>>>>>>>> want to run pgo binaries on small
devices with tiny memory/storage
>>>>>>>>>> ..
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If we want to reduce memory overhead at
runtime and reduce the size
>>>>>>>>> of
>>>>>>>>> the
>>>>>>>>> raw profile data extracted from the target,
there are clear
>>>>>>>>> solutions.
>>>>>>>>> Consider that debug info does not need to
be loaded into the memory
>>>>>>>>> image of
>>>>>>>>> the target; why should information
identifying each counter need to
>>>>>>>>> be?
>>>>>>>>> A file containing raw profile counters is a
subset of a core dump; in
>>>>>>>>> most
>>>>>>>>> environments, a core dump does not need to
have debug info or symbol
>>>>>>>>> names
>>>>>>>>> in it, but can be still be read in full
detail in conjunction with
>>>>>>>>> the
>>>>>>>>> original binary.
>>>>>>>>
>>>>>>>> Yes -- there are many alternatives:
>>>>>>>> 1) emit the name key mapping as a side data at
compile time, or
>>>>>>>> 2) emit them into nonloadable sections of the
object file.
>>>>>>>>
>>>>>>>> Compared with the above, LLVM's existing
design does have its own
>>>>>>>> advantage -- making it easier for tool to
access 'debug' info for
>>>>>>>> counters.
>>>>>>>>
>>>>>>>> LLVM's coverage testing, on the other hand,
take a hybrid approach: It
>>>>>>>> emits the coverage map as rodata, but does not
pass it to the profile
>>>>>>>> dumper. I think it is better to emit covmap as
a side data not
>>>>>>>> attached to target binary.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thus, as we require that the binary be
passed to llvm-profdata, there
>>>>>>>>> is
>>>>>>>>> no
>>>>>>>>> fundamental reason that the memory image of
the program, or the raw
>>>>>>>>> data
>>>>>>>>> extracted from the program, must have any
size overhead besides the
>>>>>>>>> raw
>>>>>>>>> values of the counters themselves and any
text size increase for
>>>>>>>>> incrementing them. If we are willing to
impose this requirement on
>>>>>>>>> users,
>>>>>>>>> then as far as reducing memory overhead at
runtime and reducing the
>>>>>>>>> size
>>>>>>>>> of
>>>>>>>>> the raw profile data extracted from the
target, using hashed function
>>>>>>>>> names
>>>>>>>>> is clearly the wrong direction.
>>>>>>>>>
>>>>>>>>> *Without* imposing the requirement of
passing the binary to
>>>>>>>>> llvm-profdata, I
>>>>>>>>> do like the ability to use hashed function
names like you are
>>>>>>>>> proposing.
>>>>>>>>> It
>>>>>>>>> is a simple solution for reducing size
overhead of function name
>>>>>>>>> strings
>>>>>>>>> with little complexity, as it is just
swapping one string for
>>>>>>>>> another.
>>>>>>>>
>>>>>>>> Agree. The good news is that the overhead of
hashed function names is
>>>>>>>> small enough that makes this approach
attractive.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> David
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Obviously LLVM must be able to
support the extremely large
>>>>>>>>>>> binaries
>>>>>>>>>>> in
>>>>>>>>>>> your
>>>>>>>>>>> configuration (otherwise what use
is LLVM as a compiler ;) My
>>>>>>>>>>> questions
>>>>>>>>>>> are
>>>>>>>>>>> primarily aimed at establishing
which tradeoffs are acceptable for
>>>>>>>>>>> supporting this (both for LLVM and
for you guys).
>>>>>>>>>>
>>>>>>>>>> As I said, with the modified proposal
(after getting your feedback),
>>>>>>>>>> no PGO users in LLVM land is going to
lose anything/functionality.
>>>>>>>>>> The
>>>>>>>>>> end result will be net win for general
users of LLVM (even though
>>>>>>>>>> your
>>>>>>>>>> customers don't care about it), not
just 'us' as you have mentioned
>>>>>>>>>> many times.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Btw, for us, the issue of PGO data
size is not completely
>>>>>>>>>>> immaterial
>>>>>>>>>>> but
>>>>>>>>>>> is
>>>>>>>>>>> very different from your use case.
For us, the primary issue is
>>>>>>>>>>> the
>>>>>>>>>>> additional memory use at run time,
since PS4 games usually use
>>>>>>>>>>> "all"
>>>>>>>>>>> available memory. We had a problem
with UBSan where the large
>>>>>>>>>>> amount
>>>>>>>>>>> of
>>>>>>>>>>> memory required for storing the
UBSan diagnostic data at runtime
>>>>>>>>>>> required
>>>>>>>>>>> the game programmers to manually
change their memory map to make
>>>>>>>>>>> room.
>>>>>>>>>>> +Filipe, do you remember how much
memory UBSan was using that
>>>>>>>>>>> caused
>>>>>>>>>>> a
>>>>>>>>>>> problem?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> My proposal does help reducing rodata
size significantly.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes, that is why I think that this is a
useful thing to do. I just
>>>>>>>>> want
>>>>>>>>> to
>>>>>>>>> be careful about existing use cases and the
relevant workflow issues.
>>>>>>>>>
>>>>>>>>> -- Sean Silva
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> In the raw format, md5 sum key
can be an embedded field in the
>>>>>>>>>>>> prf_data variable instead of as
different var referenced by
>>>>>>>>>>>> prf_data.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> If this is not the case,
you should show your current patch so
>>>>>>>>>>>>> that
>>>>>>>>>>>>> we
>>>>>>>>>>>>> can
>>>>>>>>>>>>> discuss things concretely.
>>>>>>>>>>>>
>>>>>>>>>>>> It is not. See above about the
difference.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It will be
>>>>>>>>>>>>>>>> very messy to
support multiple formats in instr-codegen and
>>>>>>>>>>>>>>>> instr-runtime. 
For compatibility concerns, the reader is
>>>>>>>>>>>>>>>> taught
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> support
previous format, but the changes there are isolated
>>>>>>>>>>>>>>>> (also
>>>>>>>>>>>>>>>> expected to be
removed in the future).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My primary
concern is that if the function name are not
>>>>>>>>>>>>>>>>> kept
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>> stages,
>>>>>>>>>>>>>>>>> then it
becomes difficult to analyze the profile data in
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> standalone
>>>>>>>>>>>>>>>>> way.
>>>>>>>>>>>>>>>>> Many times,
I have used `llvm-profdata show
>>>>>>>>>>>>>>>>>
-all-functions
>>>>>>>>>>>>>>>>>
foo.profdata`
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> the
resulting profile data and then imported that data
>>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>> Mathematica
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> analysis.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is
certainly a very valid use case.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My
understanding of your proposal is that `llvm-profdata
>>>>>>>>>>>>>>>>> show
>>>>>>>>>>>>>>>>>
-all-functions foo.profdata` will not show the actual
>>>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> instead MD5
hashes,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> To support your
use case, there are two solutions:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) user can add
-fcoverage-mapping option in the build
>>>>>>>>>>>>>>>> 2) introduce a
new option -fprofile-instr-names that force
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> emission of the
name sections in the .o file. This is
>>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> 1),
>>>>>>>>>>>>>>>> but no covmap
section is needed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> llvm-profdata
tool  will be taught to read the name section
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> attach
>>>>>>>>>>>>>>>> function names
to the profile records.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Needing to pass the
executable to llvm-profdata would cause
>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>> issues for my
customers in practice.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Why? The deployment
needs to pass the profile data anyway
>>>>>>>>>>>>>> right?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, but not the
executable.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The PGO training run is
likely being run by a gameplay tester
>>>>>>>>>>>>> (non-programmer). In
general the binary will not be lying
>>>>>>>>>>>>> around
>>>>>>>>>>>>> as a
>>>>>>>>>>>>> loose
>>>>>>>>>>>>> file anywhere, it will be
part of a full package of the
>>>>>>>>>>>>> binary+assets
>>>>>>>>>>>>> (think
>>>>>>>>>>>>> like what will end up on a
bluray disc). A game's binary
>>>>>>>>>>>>> *completely
>>>>>>>>>>>>> useless* without the
assets, so except locally on a
>>>>>>>>>>>>> programmer's
>>>>>>>>>>>>> machine
>>>>>>>>>>>>> while they iterate/debug,
there is no reason for a binary to
>>>>>>>>>>>>> ever
>>>>>>>>>>>>> exist
>>>>>>>>>>>>> as a
>>>>>>>>>>>>> standalone file.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm not saying that
needing the binary is insurmountable in any
>>>>>>>>>>>>> particular
>>>>>>>>>>>>> scenario. Just that it will
cause a strict increase in the
>>>>>>>>>>>>> number
>>>>>>>>>>>>> of
>>>>>>>>>>>>> issues
>>>>>>>>>>>>> to deploying PGO.
>>>>>>>>>>>>
>>>>>>>>>>>> Your concern is acknowledged.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> These are much bigger
"compatibility concerns" for me than for
>>>>>>>>>>>>> newer
>>>>>>>>>>>>> toolchains to accept the
old format. For a change in format I
>>>>>>>>>>>>> can
>>>>>>>>>>>>> easily
>>>>>>>>>>>>> tell my users to replace an
exe with a newer one and that is
>>>>>>>>>>>>> all
>>>>>>>>>>>>> they
>>>>>>>>>>>>> need
>>>>>>>>>>>>> to do and it takes 10
seconds, guaranteed. A workflow change is
>>>>>>>>>>>>> potentially
>>>>>>>>>>>>> a massive disruption and
guaranteed to take more than 10
>>>>>>>>>>>>> seconds
>>>>>>>>>>>>> to
>>>>>>>>>>>>> fix
>>>>>>>>>>>>> (perhaps hours or days).
>>>>>>>>>>>>
>>>>>>>>>>>> ok.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  This
>>>>>>>>>>>>>> is no different from
llvm-cov usage model.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> In practice, getting the
performance of PGO is a higher
>>>>>>>>>>>>> priority
>>>>>>>>>>>>> for
>>>>>>>>>>>>> my
>>>>>>>>>>>>> users, so we should not
assume that llvm-cov is being used.
>>>>>>>>>>>>
>>>>>>>>>>>> Glad to hear that :)
>>>>>>>>>>>>
>>>>>>>>>>>> thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Note that with
1) or 2), the user can still benefit from
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> reduced
>>>>>>>>>>>>>>>> profile size.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Let me reiterate
that the size of the profile is not a
>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> observed in
practice (nor have I heard of this being a
>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> practice
>>>>>>>>>>>>>>> until this thread).
Therefore I'm skeptical of any changes
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> our
>>>>>>>>>>>>>>> default
>>>>>>>>>>>>>>> behavior or any new
requirements that are not opt-in.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> which will
make it more difficult for me to do this kind
>>>>>>>>>>>>>>>>> of analysis
(would require using nm on the original
>>>>>>>>>>>>>>>>> binary,
>>>>>>>>>>>>>>>>> hashing
>>>>>>>>>>>>>>>>> everything,
etc.).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> btw, feel
free to attach the patch even if it in a rough
>>>>>>>>>>>>>>>>> state.
>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>> help to
clarify the proposal and be a good talking point.
>>>>>>>>>>>>>>>>>
Fine-grained
>>>>>>>>>>>>>>>>> patch
>>>>>>>>>>>>>>>>> review for
caring about the rough parts will happen on
>>>>>>>>>>>>>>>>>
llvm-commits;
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> rough parts
will not distract the discussion here on
>>>>>>>>>>>>>>>>> llvm-dev.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -- Sean
Silva
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>> LLVM
Developers mailing list
>>>>>>>>>>>>>>>>>>
llvm-dev at lists.llvm.org
>>>>>>>>>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Bob Wilson via llvm-dev

2015-Dec-14 18:08 UTC

head link

[llvm-dev] RFC: Reducing Instr PGO size overhead

Great — thanks for confirming that.
> On Dec 14, 2015, at 9:31 AM, Xinliang David Li <davidxl at
google.com> wrote:
> 
> Sorry about the typo  -- I meant indexed format version won't be
changed.
> 
> thanks,
> 
> David
> 
> On Mon, Dec 14, 2015 at 8:55 AM, Bob Wilson <bob.wilson at apple.com>
wrote:
>> 
>>> On Dec 9, 2015, at 1:12 PM, Xinliang David Li via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>> 
>>> We are now very close to push the final stage of the PGO size
>>> reduction task. Here is the updated plan going forward:
>>> 
>>> 1) In this round, the format of the indexed profile data won't
be unchanged.
>> 
>> That’s a double negative. Was that intentional? I.E., are you changing
the format?
>> 
>>> 2) there will be *no* changes in user interfaces to all profiling
>>> related tools including llvm-profdata, llvm-cov -- the change will
be
>>> transparent in terms of PGO usability.
>>> 3) The implementation will be using  compression for the function
name
>>> section (the compression ratio is about 5:1). As a result, the
linker
>>> input object size, unstripped binary size, stripped binary size,
>>> process image size, and raw profile data size will all be greatly
>>> reduced;
>>> 4) The change will also greatly improve usability of
coverage-mapping
>>> due to the reduced data size in memory.
>>> 
>>> Before the final patch, there are a few more small preparation
patches
>>> : 1) abstract naming reading into a class (ProfSymtab) (currently
the
>>> reader uses/assumes the raw/uncompressed object section. 2) add
>>> compression support in ProfSymtab.
>>> 
>>> thanks,
>>> 
>>> David
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Oct 14, 2015 at 12:30 AM, Xinliang David Li <davidxl at
google.com> wrote:
>>>> I plan to divide the patch into series of small patches. The
>>>> preparation patches will mostly be refactoring changes with
NFC. This
>>>> will minimize the size of final patch(es) with functional
changes to
>>>> help discussions.
>>>> 
>>>> thanks,
>>>> 
>>>> David
>>>> 
>>>> On Fri, Oct 9, 2015 at 4:06 PM, Xinliang David Li <davidxl
at google.com> wrote:
>>>>> On Fri, Oct 9, 2015 at 3:58 PM, Sean Silva <chisophugis
at gmail.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> On Wed, Oct 7, 2015 at 11:12 PM, Xinliang David Li
<davidxl at google.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> There is no further response to this, so I will
assume general
>>>>>>> direction of solution-3 is acceptable ;)
>>>>> 
>>>>>> No response does not mean "LGTM".
>>>>>> 
>>>>> 
>>>>> What I meant is that the discussion can be moved on to the
formal code
>>>>> review. I have not yet submitted the final patch for review
yet.
>>>>> Before that is ready, continue using this thread to voice
your
>>>>> concerns.
>>>>> 
>>>>> David
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> -- Sean Silva
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Solution-3 can be further improved. Instead of
using static symbol
>>>>>>> table (with zero size __llvm_prf_nm symbols) to
store function names
>>>>>>> for profile display and coverage mapping, the
function names can be
>>>>>>> stored compressed in a non-allocatable section. The
compression ratio
>>>>>>> for function name strings can be very high (~5x). 
The covmapping data
>>>>>>> can also be made non-allocatable.
>>>>>>> 
>>>>>>> thanks,
>>>>>>> 
>>>>>>> David
>>>>>>> 
>>>>>>> On Tue, Sep 29, 2015 at 9:52 AM, Xinliang David Li
<davidxl at google.com>
>>>>>>> wrote:
>>>>>>>> Sorry for the late update. I finally found time
to implement a solution
>>>>>>>> (Solution-3) that has the best size savings
(for both PGO and coverage
>>>>>>>> testing) with symbolic information available. 
Here is a re-cap of what
>>>>>>>> we
>>>>>>>> have discussed so far:
>>>>>>>> 
>>>>>>>> Solution-1:
>>>>>>>> 
>>>>>>>> This is the original solution proposed. In this
solution, A function
>>>>>>>> name's
>>>>>>>> MD5 hash is used as the primary key (combined
with IR hash) for function
>>>>>>>> lookup. __llvm_prf_names section won't be
emitted into the binary nor
>>>>>>>> dumped
>>>>>>>> into the profile data unless -fcoverage-mapping
is also specified.
>>>>>>>> 
>>>>>>>>   Pros:
>>>>>>>>       1. maximal reduction of instrumentation
binary process image
>>>>>>>> size
>>>>>>>>       2. maximal reduction of object and
unstripped binary size
>>>>>>>>       3. maximal reduction of raw profile data
size
>>>>>>>>       4. maximal reduction of indexed profile
data size
>>>>>>>> 
>>>>>>>>  Cons:
>>>>>>>>       1. -fcoverage-mapping use case is not
handled -- the size
>>>>>>>> problem
>>>>>>>> still exist
>>>>>>>>       2. profile dump with llvm-profdata no
longer have function names
>>>>>>>> associated -- user needs to use postprocessing
tool to get the
>>>>>>>> functionality
>>>>>>>>       3. function filtering with partial
function name is not
>>>>>>>> supported
>>>>>>>>       4. Requires incompatible format change
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Solution-2: (http://reviews.llvm.org/D12715)
>>>>>>>> 
>>>>>>>> In this solution, the MD5 hash string is used
to replace the raw name
>>>>>>>> string
>>>>>>>>     Pros:
>>>>>>>>         1. Very simple to implement
>>>>>>>>         2. have good reduction of all sizes for
typical large C++
>>>>>>>> applications
>>>>>>>>         3. No profile data format change is
required.
>>>>>>>> 
>>>>>>>>     Cons:
>>>>>>>>         1. Still requires 16byte overhead
per-function -- can actually
>>>>>>>> hurt C programs
>>>>>>>>         2. -fcoverage-mapping use case is still
not handled
>>>>>>>>         3. The problem with llvm-profdata still
exists (no symbolic
>>>>>>>> info,
>>>>>>>> partial filtering support)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Solution-3:
>>>>>>>> 
>>>>>>>> This is the new solution I am proposing. It is
basically an enhancement
>>>>>>>> of
>>>>>>>> Solution-1 with most of the weakness resolved. 
The difference with
>>>>>>>> Solution-1 is
>>>>>>>>     1. Function name symbols are emitted into
the symbol table as weak
>>>>>>>> externs. They don't occupy any space at
runtime and can be easily
>>>>>>>> stripped.
>>>>>>>>     2. -fcoverage-mapping does not need special
handling -- it
>>>>>>>> automatically benefit from the same size
saving.
>>>>>>>>     3. llvm-cov is changed to read symbol info
from the symtab instead
>>>>>>>> of
>>>>>>>> reading them from the name section data
>>>>>>>>     4. llvm-profdata is enhanced to take a
binary as input and dump
>>>>>>>> profile with names attached. Function filtering
is fully supported
>>>>>>>> (option
>>>>>>>> can also be introduced to force dumping names
into binary and profile
>>>>>>>> data,
>>>>>>>> so that llvm-profdata use case is not changed
at all).
>>>>>>>> 
>>>>>>>> Pros:
>>>>>>>>      1. All the pros from Solution-1
>>>>>>>>      2. Size savings for coverage-mapping case
>>>>>>>> Cons:
>>>>>>>>      Format change is required for profile data
and coverage mapping.
>>>>>>>> 
>>>>>>>> The initial patch is here:
http://reviews.llvm.org/D13251
>>>>>>>> 
>>>>>>>> With this patch, the size of a release clang
binary with coverage
>>>>>>>> mapping is
>>>>>>>> reduced from 986M to 569M.
>>>>>>>> 
>>>>>>>> If there are no major concerns,  I will carve
out the patch into smaller
>>>>>>>> ones for review.
>>>>>>>> 
>>>>>>>> thanks,
>>>>>>>> 
>>>>>>>> David
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Sep 8, 2015 at 3:47 PM, Xinliang David
Li <davidxl at google.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> yes -- it is fixed length
(8byte) blob which may include null
>>>>>>>>>>>>> byte
>>>>>>>>>>>>> in
>>>>>>>>>>>>> the middle.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> For reference, MD5 sum is 16
bytes (128-bit):
>>>>>>>>>>>>
https://en.wikipedia.org/wiki/MD5
>>>>>>>>>>> 
>>>>>>>>>>> yes, LLVM's MD5 hash only takes
the lower 64bit.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Or to say it another
way, suppose that Itanium mangling
>>>>>>>>>>>>>> required
>>>>>>>>>>>>>> as a
>>>>>>>>>>>>>> final
>>>>>>>>>>>>>> step to replace the
string with its md5 sum in hex. Therefore
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>> symbol
>>>>>>>>>>>>>> names are
"small". My understanding is that this is effectively
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>> your
>>>>>>>>>>>>>> patch is doing.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The key type before the
change is StringRef, while the after the
>>>>>>>>>>>>> key
>>>>>>>>>>>>> type is uint64_t. Are you
suggesting treating uint64_t md5 sum
>>>>>>>>>>>>> key
>>>>>>>>>>>>> as
>>>>>>>>>>>>> a string of 8 bytes or
storing md5 has in text form which will
>>>>>>>>>>>>> double
>>>>>>>>>>>>> the size?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> How much does this change the
benefit? If most of the benefit is
>>>>>>>>>>>> avoiding
>>>>>>>>>>>> extraordinarily long mangled
names then it may be sufficient.
>>>>>>>>>>>> 
>>>>>>>>>>>> With IR-level instrumentation
like Rong is pursuing the size may
>>>>>>>>>>>> be
>>>>>>>>>>>> reduced
>>>>>>>>>>>> sufficiently that we do not
need the optimization proposed in this
>>>>>>>>>>>> thread.
>>>>>>>>>>>> For example, Rong found >2x
size reduction on Google's C++
>>>>>>>>>>>> benchmarks,
>>>>>>>>>>>> which
>>>>>>>>>>>> I assume are representative of
the extremely large Google binaries
>>>>>>>>>>>> that
>>>>>>>>>>>> are
>>>>>>>>>>>> causing the problems addressed
by your proposal in this thread.
>>>>>>>>>>>> The
>>>>>>>>>>>> measurements you mention for
Clang in this thread provide similar
>>>>>>>>>>>> size
>>>>>>>>>>>> reductions, so Rong's
approach may be sufficient (especially
>>>>>>>>>>>> because
>>>>>>>>>>>> functions with extremely large
mangled names tend to be small
>>>>>>>>>>>> inline
>>>>>>>>>>>> functions in header-only
template libraries).
>>>>>>>>>>> 
>>>>>>>>>>> Late instrumentation helps many
cases. In some cases (as shown in
>>>>>>>>>>> SPEC), the reduction in size is not
as large. Reducing PGO overhead
>>>>>>>>>>> will lower the bar for its
adoption.
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Of the points you mention in
"Large size of overhead can limit the
>>>>>>>>>>>> usability
>>>>>>>>>>>> of PGO greatly", many of
the issues are hard limits that prevent
>>>>>>>>>>>> the
>>>>>>>>>>>> use
>>>>>>>>>>>> of
>>>>>>>>>>>> PGO. Do you have a lower bound
on how much the size of the PGO
>>>>>>>>>>>> data
>>>>>>>>>>>> must
>>>>>>>>>>>> be
>>>>>>>>>>>> reduced in order to overcome
the hard limits?
>>>>>>>>>>> 
>>>>>>>>>>> This is a static view:  Think about
the situation where application
>>>>>>>>>>> size is ever increasing; also think
about situation where we want to
>>>>>>>>>>> 
>>>>>>>>>>> collect more types of profile data.
Think about situation where user
>>>>>>>>>>> want to run pgo binaries on small
devices with tiny memory/storage
>>>>>>>>>>> ..
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> If we want to reduce memory overhead at
runtime and reduce the size
>>>>>>>>>> of
>>>>>>>>>> the
>>>>>>>>>> raw profile data extracted from the
target, there are clear
>>>>>>>>>> solutions.
>>>>>>>>>> Consider that debug info does not need
to be loaded into the memory
>>>>>>>>>> image of
>>>>>>>>>> the target; why should information
identifying each counter need to
>>>>>>>>>> be?
>>>>>>>>>> A file containing raw profile counters
is a subset of a core dump; in
>>>>>>>>>> most
>>>>>>>>>> environments, a core dump does not need
to have debug info or symbol
>>>>>>>>>> names
>>>>>>>>>> in it, but can be still be read in full
detail in conjunction with
>>>>>>>>>> the
>>>>>>>>>> original binary.
>>>>>>>>> 
>>>>>>>>> Yes -- there are many alternatives:
>>>>>>>>> 1) emit the name key mapping as a side data
at compile time, or
>>>>>>>>> 2) emit them into nonloadable sections of
the object file.
>>>>>>>>> 
>>>>>>>>> Compared with the above, LLVM's
existing design does have its own
>>>>>>>>> advantage -- making it easier for tool to
access 'debug' info for
>>>>>>>>> counters.
>>>>>>>>> 
>>>>>>>>> LLVM's coverage testing, on the other
hand, take a hybrid approach: It
>>>>>>>>> emits the coverage map as rodata, but does
not pass it to the profile
>>>>>>>>> dumper. I think it is better to emit covmap
as a side data not
>>>>>>>>> attached to target binary.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thus, as we require that the binary be
passed to llvm-profdata, there
>>>>>>>>>> is
>>>>>>>>>> no
>>>>>>>>>> fundamental reason that the memory
image of the program, or the raw
>>>>>>>>>> data
>>>>>>>>>> extracted from the program, must have
any size overhead besides the
>>>>>>>>>> raw
>>>>>>>>>> values of the counters themselves and
any text size increase for
>>>>>>>>>> incrementing them. If we are willing to
impose this requirement on
>>>>>>>>>> users,
>>>>>>>>>> then as far as reducing memory overhead
at runtime and reducing the
>>>>>>>>>> size
>>>>>>>>>> of
>>>>>>>>>> the raw profile data extracted from the
target, using hashed function
>>>>>>>>>> names
>>>>>>>>>> is clearly the wrong direction.
>>>>>>>>>> 
>>>>>>>>>> *Without* imposing the requirement of
passing the binary to
>>>>>>>>>> llvm-profdata, I
>>>>>>>>>> do like the ability to use hashed
function names like you are
>>>>>>>>>> proposing.
>>>>>>>>>> It
>>>>>>>>>> is a simple solution for reducing size
overhead of function name
>>>>>>>>>> strings
>>>>>>>>>> with little complexity, as it is just
swapping one string for
>>>>>>>>>> another.
>>>>>>>>> 
>>>>>>>>> Agree. The good news is that the overhead
of hashed function names is
>>>>>>>>> small enough that makes this approach
attractive.
>>>>>>>>> 
>>>>>>>>> thanks,
>>>>>>>>> 
>>>>>>>>> David
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Obviously LLVM must be able to
support the extremely large
>>>>>>>>>>>> binaries
>>>>>>>>>>>> in
>>>>>>>>>>>> your
>>>>>>>>>>>> configuration (otherwise what
use is LLVM as a compiler ;) My
>>>>>>>>>>>> questions
>>>>>>>>>>>> are
>>>>>>>>>>>> primarily aimed at establishing
which tradeoffs are acceptable for
>>>>>>>>>>>> supporting this (both for LLVM
and for you guys).
>>>>>>>>>>> 
>>>>>>>>>>> As I said, with the modified
proposal (after getting your feedback),
>>>>>>>>>>> no PGO users in LLVM land is going
to lose anything/functionality.
>>>>>>>>>>> The
>>>>>>>>>>> end result will be net win for
general users of LLVM (even though
>>>>>>>>>>> your
>>>>>>>>>>> customers don't care about it),
not just 'us' as you have mentioned
>>>>>>>>>>> many times.
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Btw, for us, the issue of PGO
data size is not completely
>>>>>>>>>>>> immaterial
>>>>>>>>>>>> but
>>>>>>>>>>>> is
>>>>>>>>>>>> very different from your use
case. For us, the primary issue is
>>>>>>>>>>>> the
>>>>>>>>>>>> additional memory use at run
time, since PS4 games usually use
>>>>>>>>>>>> "all"
>>>>>>>>>>>> available memory. We had a
problem with UBSan where the large
>>>>>>>>>>>> amount
>>>>>>>>>>>> of
>>>>>>>>>>>> memory required for storing the
UBSan diagnostic data at runtime
>>>>>>>>>>>> required
>>>>>>>>>>>> the game programmers to
manually change their memory map to make
>>>>>>>>>>>> room.
>>>>>>>>>>>> +Filipe, do you remember how
much memory UBSan was using that
>>>>>>>>>>>> caused
>>>>>>>>>>>> a
>>>>>>>>>>>> problem?
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> My proposal does help reducing
rodata size significantly.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Yes, that is why I think that this is a
useful thing to do. I just
>>>>>>>>>> want
>>>>>>>>>> to
>>>>>>>>>> be careful about existing use cases and
the relevant workflow issues.
>>>>>>>>>> 
>>>>>>>>>> -- Sean Silva
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> David
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In the raw format, md5 sum
key can be an embedded field in the
>>>>>>>>>>>>> prf_data variable instead
of as different var referenced by
>>>>>>>>>>>>> prf_data.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If this is not the
case, you should show your current patch so
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>> discuss things
concretely.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It is not. See above about
the difference.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> It will be
>>>>>>>>>>>>>>>>> very messy
to support multiple formats in instr-codegen and
>>>>>>>>>>>>>>>>>
instr-runtime.  For compatibility concerns, the reader is
>>>>>>>>>>>>>>>>> taught
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> support
previous format, but the changes there are isolated
>>>>>>>>>>>>>>>>> (also
>>>>>>>>>>>>>>>>> expected to
be removed in the future).
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> My
primary concern is that if the function name are not
>>>>>>>>>>>>>>>>>> kept
>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>> stages,
>>>>>>>>>>>>>>>>>> then it
becomes difficult to analyze the profile data in
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>
standalone
>>>>>>>>>>>>>>>>>> way.
>>>>>>>>>>>>>>>>>> Many
times, I have used `llvm-profdata show
>>>>>>>>>>>>>>>>>>
-all-functions
>>>>>>>>>>>>>>>>>>
foo.profdata`
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> the
resulting profile data and then imported that data
>>>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>>>
Mathematica
>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>
analysis.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> This is
certainly a very valid use case.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> My
understanding of your proposal is that `llvm-profdata
>>>>>>>>>>>>>>>>>> show
>>>>>>>>>>>>>>>>>>
-all-functions foo.profdata` will not show the actual
>>>>>>>>>>>>>>>>>>
function
>>>>>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>> instead
MD5 hashes,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> To support
your use case, there are two solutions:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 1) user can
add -fcoverage-mapping option in the build
>>>>>>>>>>>>>>>>> 2)
introduce a new option -fprofile-instr-names that force
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> emission of
the name sections in the .o file. This is
>>>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> 1),
>>>>>>>>>>>>>>>>> but no
covmap section is needed.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>
llvm-profdata tool  will be taught to read the name section
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> attach
>>>>>>>>>>>>>>>>> function
names to the profile records.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Needing to pass
the executable to llvm-profdata would cause
>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>> issues for my
customers in practice.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Why? The deployment
needs to pass the profile data anyway
>>>>>>>>>>>>>>> right?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes, but not the
executable.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The PGO training run is
likely being run by a gameplay tester
>>>>>>>>>>>>>> (non-programmer). In
general the binary will not be lying
>>>>>>>>>>>>>> around
>>>>>>>>>>>>>> as a
>>>>>>>>>>>>>> loose
>>>>>>>>>>>>>> file anywhere, it will
be part of a full package of the
>>>>>>>>>>>>>> binary+assets
>>>>>>>>>>>>>> (think
>>>>>>>>>>>>>> like what will end up
on a bluray disc). A game's binary
>>>>>>>>>>>>>> *completely
>>>>>>>>>>>>>> useless* without the
assets, so except locally on a
>>>>>>>>>>>>>> programmer's
>>>>>>>>>>>>>> machine
>>>>>>>>>>>>>> while they
iterate/debug, there is no reason for a binary to
>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>> exist
>>>>>>>>>>>>>> as a
>>>>>>>>>>>>>> standalone file.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I'm not saying that
needing the binary is insurmountable in any
>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>> scenario. Just that it
will cause a strict increase in the
>>>>>>>>>>>>>> number
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>> issues
>>>>>>>>>>>>>> to deploying PGO.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Your concern is
acknowledged.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> These are much bigger
"compatibility concerns" for me than for
>>>>>>>>>>>>>> newer
>>>>>>>>>>>>>> toolchains to accept
the old format. For a change in format I
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>> easily
>>>>>>>>>>>>>> tell my users to
replace an exe with a newer one and that is
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>> they
>>>>>>>>>>>>>> need
>>>>>>>>>>>>>> to do and it takes 10
seconds, guaranteed. A workflow change is
>>>>>>>>>>>>>> potentially
>>>>>>>>>>>>>> a massive disruption
and guaranteed to take more than 10
>>>>>>>>>>>>>> seconds
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> fix
>>>>>>>>>>>>>> (perhaps hours or
days).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ok.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>> is no different
from llvm-cov usage model.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> In practice, getting
the performance of PGO is a higher
>>>>>>>>>>>>>> priority
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> my
>>>>>>>>>>>>>> users, so we should not
assume that llvm-cov is being used.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Glad to hear that :)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> David
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Note that
with 1) or 2), the user can still benefit from
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> reduced
>>>>>>>>>>>>>>>>> profile
size.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Let me
reiterate that the size of the profile is not a
>>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>> observed in
practice (nor have I heard of this being a
>>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> practice
>>>>>>>>>>>>>>>> until this
thread). Therefore I'm skeptical of any changes
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> our
>>>>>>>>>>>>>>>> default
>>>>>>>>>>>>>>>> behavior or any
new requirements that are not opt-in.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> which
will make it more difficult for me to do this kind
>>>>>>>>>>>>>>>>>> of
analysis (would require using nm on the original
>>>>>>>>>>>>>>>>>> binary,
>>>>>>>>>>>>>>>>>> hashing
>>>>>>>>>>>>>>>>>>
everything, etc.).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> btw,
feel free to attach the patch even if it in a rough
>>>>>>>>>>>>>>>>>> state.
>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>> help to
clarify the proposal and be a good talking point.
>>>>>>>>>>>>>>>>>>
Fine-grained
>>>>>>>>>>>>>>>>>> patch
>>>>>>>>>>>>>>>>>> review
for caring about the rough parts will happen on
>>>>>>>>>>>>>>>>>>
llvm-commits;
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> rough
parts will not distract the discussion here on
>>>>>>>>>>>>>>>>>>
llvm-dev.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> -- Sean
Silva
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>
thanks,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>
David
>>>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>>>
LLVM Developers mailing list
>>>>>>>>>>>>>>>>>>>
llvm-dev at lists.llvm.org
>>>>>>>>>>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Oct 2015 - RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead

Apparently Analagous Threads