thr3ads.net - llvm dev - [llvm-dev] RFC: Reducing Instr PGO size overhead [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Xinliang David Li via llvm-dev

2015-Sep-05 06:44 UTC

[llvm-dev] RFC: Reducing Instr PGO size overhead

On Fri, Sep 4, 2015 at 11:03 PM, Sean Silva <chisophugis at gmail.com>
wrote:>
>
> On Fri, Sep 4, 2015 at 10:11 PM, Xinliang David Li <davidxl at
google.com>
> wrote:
>>
>> On Fri, Sep 4, 2015 at 9:11 PM, Sean Silva <chisophugis at
gmail.com> wrote:
>> >
>> >
>> > On Fri, Sep 4, 2015 at 5:42 PM, Xinliang David Li <davidxl at
google.com>
>> > wrote:
>> >>
>> >> On Fri, Sep 4, 2015 at 5:21 PM, Sean Silva <chisophugis at
gmail.com>
>> >> wrote:
>> >> >
>> >> >
>> >> > On Fri, Sep 4, 2015 at 3:57 PM, Xinliang David Li
>> >> > <davidxl at google.com>
>> >> > wrote:
>> >> >>
>> >> >> >
>> >> >> > I think it is reasonable to simply replace the
key we currently
>> >> >> > use
>> >> >> > with
>> >> >> > MD5(key) for getting a size reduction.  In
practice for my use
>> >> >> > cases,
>> >> >> > I
>> >> >> > have
>> >> >> > not observed any of the issues you mentioned
under "Large size of
>> >> >> > overhead
>> >> >> > can limit the usability of PGO greatly",
but I can understand that
>> >> >> > some
>> >> >> > of
>> >> >> > these issues could become problems in
Google's use case. I would
>> >> >> > personally
>> >> >> > prefer to keep the existing behavior as the
default (see below),
>> >> >> > and
>> >> >> > have
>> >> >> > MD5(key) as an option.
>> >> >>
>> >> >> The problem is that this requires profile format
changes.
>> >> >
>> >> >
>> >> > Why? AFAIK the "function name" is just an
arbitrary string. Using s
>> >> > or
>> >> > MD5(s) shouldn't matter. Of course, the user will
need to pass
>> >> > consistent
>> >> > flags to clang.
>> >>
>> >> The raw format for 64bit target can be made
'unchanged', but not for
>> >> the 32bit raw format -- the nameptr field is only 32bit.
>> >>
>> >> The indexed format can not be made the same --  The ondisk
profile
>> >> record layout totally changes. The key field changes from a
blob of
>> >> chars into an 64bit integer.
>> >
>> >
>> > An MD5 sum cannot be represented as a blob of chars?
>>
>> yes -- it is fixed length (8byte) blob which may include null byte in
>> the middle.
>
>
> For reference, MD5 sum is 16 bytes (128-bit):
> https://en.wikipedia.org/wiki/MD5
yes, LLVM's MD5 hash only takes the lower 64bit.

>
>>
>>
>> >
>> > Or to say it another way, suppose that Itanium mangling required
as a
>> > final
>> > step to replace the string with its md5 sum in hex. Therefore all
symbol
>> > names are "small". My understanding is that this is
effectively all your
>> > patch is doing.
>>
>> The key type before the change is StringRef, while the after the key
>> type is uint64_t. Are you suggesting treating uint64_t md5 sum key as
>> a string of 8 bytes or storing md5 has in text form which will double
>> the size?
>
>
> How much does this change the benefit? If most of the benefit is avoiding
> extraordinarily long mangled names then it may be sufficient.
>
> With IR-level instrumentation like Rong is pursuing the size may be reduced
> sufficiently that we do not need the optimization proposed in this thread.
> For example, Rong found >2x size reduction on Google's C++
benchmarks, which
> I assume are representative of the extremely large Google binaries that are
> causing the problems addressed by your proposal in this thread. The
> measurements you mention for Clang in this thread provide similar size
> reductions, so Rong's approach may be sufficient (especially because
> functions with extremely large mangled names tend to be small inline
> functions in header-only template libraries).
Late instrumentation helps many cases. In some cases (as shown in
SPEC), the reduction in size is not as large. Reducing PGO overhead
will lower the bar for its adoption.
>
> Of the points you mention in "Large size of overhead can limit the
usability
> of PGO greatly", many of the issues are hard limits that prevent the
use of
> PGO. Do you have a lower bound on how much the size of the PGO data must be
> reduced in order to overcome the hard limits?
This is a static view:  Think about the situation where application
size is ever increasing; also think about situation where we want to
collect more types of profile data. Think about situation where user
want to run pgo binaries on small devices with tiny memory/storage ..
>
> Obviously LLVM must be able to support the extremely large binaries in your
> configuration (otherwise what use is LLVM as a compiler ;) My questions are
> primarily aimed at establishing which tradeoffs are acceptable for
> supporting this (both for LLVM and for you guys).
As I said, with the modified proposal (after getting your feedback),
no PGO users in LLVM land is going to lose anything/functionality. The
end result will be net win for general users of LLVM (even though your
customers don't care about it), not just 'us' as you have mentioned
many times.
>
> Btw, for us, the issue of PGO data size is not completely immaterial but is
> very different from your use case. For us, the primary issue is the
> additional memory use at run time, since PS4 games usually use
"all"
> available memory. We had a problem with UBSan where the large amount of
> memory required for storing the UBSan diagnostic data at runtime required
> the game programmers to manually change their memory map to make room.
> +Filipe, do you remember how much memory UBSan was using that caused a
> problem?
>
My proposal does help reducing rodata size significantly.

David

> -- Sean Silva
>
>>
>>
>> In the raw format, md5 sum key can be an embedded field in the
>> prf_data variable instead of as different var referenced by prf_data.
>>
>> >
>> > If this is not the case, you should show your current patch so
that we
>> > can
>> > discuss things concretely.
>>
>> It is not. See above about the difference.
>>
>> >
>> >>
>> >> >
>> >> >
>> >> >>
>> >> >> It will be
>> >> >> very messy to support multiple formats in
instr-codegen and
>> >> >> instr-runtime.  For compatibility concerns, the
reader is taught to
>> >> >> support previous format, but the changes there are
isolated (also
>> >> >> expected to be removed in the future).
>> >> >>
>> >> >> >
>> >> >> > My primary concern is that if the function name
are not kept at
>> >> >> > all
>> >> >> > stages,
>> >> >> > then it becomes difficult to analyze the profile
data in a
>> >> >> > standalone
>> >> >> > way.
>> >> >> > Many times, I have used `llvm-profdata show
-all-functions
>> >> >> > foo.profdata`
>> >> >> > on
>> >> >> > the resulting profile data and then imported
that data into
>> >> >> > Mathematica
>> >> >> > for
>> >> >> > analysis.
>> >> >>
>> >> >> This is certainly a very valid use case.
>> >> >>
>> >> >> >My understanding of your proposal is that
`llvm-profdata show
>> >> >> > -all-functions foo.profdata` will not show the
actual function
>> >> >> > names
>> >> >> > but
>> >> >> > instead MD5 hashes,
>> >> >>
>> >> >> Yes.
>> >> >>
>> >> >> To support your use case, there are two solutions:
>> >> >>
>> >> >> 1) user can add -fcoverage-mapping option in the
build
>> >> >> 2) introduce a new option -fprofile-instr-names that
force the
>> >> >> emission of the name sections in the .o file. This is
similar to 1),
>> >> >> but no covmap section is needed.
>> >> >>
>> >> >> llvm-profdata tool  will be taught to read the name
section and
>> >> >> attach
>> >> >> function names to the profile records.
>> >> >
>> >> >
>> >> > Needing to pass the executable to llvm-profdata would
cause
>> >> > deployment
>> >> > issues for my customers in practice.
>> >>
>> >> Why? The deployment needs to pass the profile data anyway
right?
>> >
>> >
>> > Yes, but not the executable.
>> >
>> > The PGO training run is likely being run by a gameplay tester
>> > (non-programmer). In general the binary will not be lying around
as a
>> > loose
>> > file anywhere, it will be part of a full package of the
binary+assets
>> > (think
>> > like what will end up on a bluray disc). A game's binary
*completely
>> > useless* without the assets, so except locally on a
programmer's machine
>> > while they iterate/debug, there is no reason for a binary to ever
exist
>> > as a
>> > standalone file.
>> >
>> > I'm not saying that needing the binary is insurmountable in
any
>> > particular
>> > scenario. Just that it will cause a strict increase in the number
of
>> > issues
>> > to deploying PGO.
>>
>>  Your concern is acknowledged.
>>
>> >
>> > These are much bigger "compatibility concerns" for me
than for newer
>> > toolchains to accept the old format. For a change in format I can
easily
>> > tell my users to replace an exe with a newer one and that is all
they
>> > need
>> > to do and it takes 10 seconds, guaranteed. A workflow change is
>> > potentially
>> > a massive disruption and guaranteed to take more than 10 seconds
to fix
>> > (perhaps hours or days).
>>
>> ok.
>>
>> >
>> >
>> >>
>> >>   This
>> >> is no different from llvm-cov usage model.
>> >
>> >
>> > In practice, getting the performance of PGO is a higher priority
for my
>> > users, so we should not assume that llvm-cov is being used.
>>
>> Glad to hear that :)
>>
>> thanks,
>>
>> David
>>
>> >
>> > -- Sean Silva
>> >
>> >>
>> >>
>> >> David
>> >>
>> >>
>> >>
>> >> >
>> >> >>
>> >> >>
>> >> >> Note that with 1) or 2), the user can still benefit
from the reduced
>> >> >> profile size.
>> >> >
>> >> >
>> >> > Let me reiterate that the size of the profile is not a
problem I have
>> >> > observed in practice (nor have I heard of this being a
problem in
>> >> > practice
>> >> > until this thread). Therefore I'm skeptical of any
changes to our
>> >> > default
>> >> > behavior or any new requirements that are not opt-in.
>> >> >
>> >> > -- Sean Silva
>> >> >
>> >> >>
>> >> >>
>> >> >> thanks,
>> >> >>
>> >> >> David
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> >which will make it more difficult for me to do
this kind
>> >> >> > of analysis (would require using nm on the
original binary,
>> >> >> > hashing
>> >> >> > everything, etc.).
>> >> >> >
>> >> >> > btw, feel free to attach the patch even if it in
a rough state. It
>> >> >> > can
>> >> >> > still
>> >> >> > help to clarify the proposal and be a good
talking point.
>> >> >> > Fine-grained
>> >> >> > patch
>> >> >> > review for caring about the rough parts will
happen on
>> >> >> > llvm-commits;
>> >> >> > the
>> >> >> > rough parts will not distract the discussion
here on llvm-dev.
>> >> >> >
>> >> >> > -- Sean Silva
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> thanks,
>> >> >> >>
>> >> >> >> David
>> >> >> >>
_______________________________________________
>> >> >> >> LLVM Developers mailing list
>> >> >> >> llvm-dev at lists.llvm.org
>> >> >> >>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>
>

Sean Silva via llvm-dev

2015-Sep-08 22:03 UTC

head link

[llvm-dev] RFC: Reducing Instr PGO size overhead

On Fri, Sep 4, 2015 at 11:44 PM, Xinliang David Li <davidxl at google.com>
wrote:
> On Fri, Sep 4, 2015 at 11:03 PM, Sean Silva <chisophugis at
gmail.com> wrote:
> >
> >
> > On Fri, Sep 4, 2015 at 10:11 PM, Xinliang David Li <davidxl at
google.com>
> > wrote:
> >>
> >> On Fri, Sep 4, 2015 at 9:11 PM, Sean Silva <chisophugis at
gmail.com>
> wrote:
> >> >
> >> >
> >> > On Fri, Sep 4, 2015 at 5:42 PM, Xinliang David Li <davidxl
at google.com
> >
> >> > wrote:
> >> >>
> >> >> On Fri, Sep 4, 2015 at 5:21 PM, Sean Silva
<chisophugis at gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >
> >> >> > On Fri, Sep 4, 2015 at 3:57 PM, Xinliang David Li
> >> >> > <davidxl at google.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> >
> >> >> >> > I think it is reasonable to simply replace
the key we currently
> >> >> >> > use
> >> >> >> > with
> >> >> >> > MD5(key) for getting a size reduction.  In
practice for my use
> >> >> >> > cases,
> >> >> >> > I
> >> >> >> > have
> >> >> >> > not observed any of the issues you
mentioned under "Large size
> of
> >> >> >> > overhead
> >> >> >> > can limit the usability of PGO
greatly", but I can understand
> that
> >> >> >> > some
> >> >> >> > of
> >> >> >> > these issues could become problems in
Google's use case. I would
> >> >> >> > personally
> >> >> >> > prefer to keep the existing behavior as the
default (see below),
> >> >> >> > and
> >> >> >> > have
> >> >> >> > MD5(key) as an option.
> >> >> >>
> >> >> >> The problem is that this requires profile format
changes.
> >> >> >
> >> >> >
> >> >> > Why? AFAIK the "function name" is just an
arbitrary string. Using s
> >> >> > or
> >> >> > MD5(s) shouldn't matter. Of course, the user
will need to pass
> >> >> > consistent
> >> >> > flags to clang.
> >> >>
> >> >> The raw format for 64bit target can be made
'unchanged', but not for
> >> >> the 32bit raw format -- the nameptr field is only 32bit.
> >> >>
> >> >> The indexed format can not be made the same --  The
ondisk profile
> >> >> record layout totally changes. The key field changes from
a blob of
> >> >> chars into an 64bit integer.
> >> >
> >> >
> >> > An MD5 sum cannot be represented as a blob of chars?
> >>
> >> yes -- it is fixed length (8byte) blob which may include null byte
in
> >> the middle.
> >
> >
> > For reference, MD5 sum is 16 bytes (128-bit):
> > https://en.wikipedia.org/wiki/MD5
>
> yes, LLVM's MD5 hash only takes the lower 64bit.
>
>
> >
> >>
> >>
> >> >
> >> > Or to say it another way, suppose that Itanium mangling
required as a
> >> > final
> >> > step to replace the string with its md5 sum in hex. Therefore
all
> symbol
> >> > names are "small". My understanding is that this is
effectively all
> your
> >> > patch is doing.
> >>
> >> The key type before the change is StringRef, while the after the
key
> >> type is uint64_t. Are you suggesting treating uint64_t md5 sum key
as
> >> a string of 8 bytes or storing md5 has in text form which will
double
> >> the size?
> >
> >
> > How much does this change the benefit? If most of the benefit is
avoiding
> > extraordinarily long mangled names then it may be sufficient.
> >
> > With IR-level instrumentation like Rong is pursuing the size may be
> reduced
> > sufficiently that we do not need the optimization proposed in this
> thread.
> > For example, Rong found >2x size reduction on Google's C++
benchmarks,
> which
> > I assume are representative of the extremely large Google binaries
that
> are
> > causing the problems addressed by your proposal in this thread. The
> > measurements you mention for Clang in this thread provide similar size
> > reductions, so Rong's approach may be sufficient (especially
because
> > functions with extremely large mangled names tend to be small inline
> > functions in header-only template libraries).
>
> Late instrumentation helps many cases. In some cases (as shown in
> SPEC), the reduction in size is not as large. Reducing PGO overhead
> will lower the bar for its adoption.
>
> >
> > Of the points you mention in "Large size of overhead can limit
the
> usability
> > of PGO greatly", many of the issues are hard limits that prevent
the use
> of
> > PGO. Do you have a lower bound on how much the size of the PGO data
must
> be
> > reduced in order to overcome the hard limits?
>
> This is a static view:  Think about the situation where application
> size is ever increasing; also think about situation where we want to
collect more types of profile data. Think about situation where
user> want to run pgo binaries on small devices with tiny memory/storage ..

If we want to reduce memory overhead at runtime and reduce the size of the
raw profile data extracted from the target, there are clear solutions.
Consider that debug info does not need to be loaded into the memory image
of the target; why should information identifying each counter need to be?
A file containing raw profile counters is a subset of a core dump; in most
environments, a core dump does not need to have debug info or symbol names
in it, but can be still be read in full detail in conjunction with the
original binary.

Thus, as we require that the binary be passed to llvm-profdata, there is no
fundamental reason that the memory image of the program, or the raw data
extracted from the program, must have any size overhead besides the raw
values of the counters themselves and any text size increase for
incrementing them. If we are willing to impose this requirement on users,
then as far as reducing memory overhead at runtime and reducing the size of
the raw profile data extracted from the target, using hashed function names
is clearly the wrong direction.

*Without* imposing the requirement of passing the binary to llvm-profdata,
I do like the ability to use hashed function names like you are proposing.
It is a simple solution for reducing size overhead of function name strings
with little complexity, as it is just swapping one string for another.

>
> >
> > Obviously LLVM must be able to support the extremely large binaries in
> your
> > configuration (otherwise what use is LLVM as a compiler ;) My
questions
> are
> > primarily aimed at establishing which tradeoffs are acceptable for
> > supporting this (both for LLVM and for you guys).
>
> As I said, with the modified proposal (after getting your feedback),
> no PGO users in LLVM land is going to lose anything/functionality. The
> end result will be net win for general users of LLVM (even though your
> customers don't care about it), not just 'us' as you have
mentioned
> many times.
>
> >
> > Btw, for us, the issue of PGO data size is not completely immaterial
but
> is
> > very different from your use case. For us, the primary issue is the
> > additional memory use at run time, since PS4 games usually use
"all"
> > available memory. We had a problem with UBSan where the large amount
of
> > memory required for storing the UBSan diagnostic data at runtime
required
> > the game programmers to manually change their memory map to make room.
> > +Filipe, do you remember how much memory UBSan was using that caused a
> > problem?
> >
>
> My proposal does help reducing rodata size significantly.
>
Yes, that is why I think that this is a useful thing to do. I just want to
be careful about existing use cases and the relevant workflow issues.

-- Sean Silva

>
> David
>
>
> > -- Sean Silva
> >
> >>
> >>
> >> In the raw format, md5 sum key can be an embedded field in the
> >> prf_data variable instead of as different var referenced by
prf_data.
> >>
> >> >
> >> > If this is not the case, you should show your current patch
so that we
> >> > can
> >> > discuss things concretely.
> >>
> >> It is not. See above about the difference.
> >>
> >> >
> >> >>
> >> >> >
> >> >> >
> >> >> >>
> >> >> >> It will be
> >> >> >> very messy to support multiple formats in
instr-codegen and
> >> >> >> instr-runtime.  For compatibility concerns, the
reader is taught
> to
> >> >> >> support previous format, but the changes there
are isolated (also
> >> >> >> expected to be removed in the future).
> >> >> >>
> >> >> >> >
> >> >> >> > My primary concern is that if the function
name are not kept at
> >> >> >> > all
> >> >> >> > stages,
> >> >> >> > then it becomes difficult to analyze the
profile data in a
> >> >> >> > standalone
> >> >> >> > way.
> >> >> >> > Many times, I have used `llvm-profdata show
-all-functions
> >> >> >> > foo.profdata`
> >> >> >> > on
> >> >> >> > the resulting profile data and then
imported that data into
> >> >> >> > Mathematica
> >> >> >> > for
> >> >> >> > analysis.
> >> >> >>
> >> >> >> This is certainly a very valid use case.
> >> >> >>
> >> >> >> >My understanding of your proposal is that
`llvm-profdata show
> >> >> >> > -all-functions foo.profdata` will not show
the actual function
> >> >> >> > names
> >> >> >> > but
> >> >> >> > instead MD5 hashes,
> >> >> >>
> >> >> >> Yes.
> >> >> >>
> >> >> >> To support your use case, there are two
solutions:
> >> >> >>
> >> >> >> 1) user can add -fcoverage-mapping option in the
build
> >> >> >> 2) introduce a new option -fprofile-instr-names
that force the
> >> >> >> emission of the name sections in the .o file.
This is similar to
> 1),
> >> >> >> but no covmap section is needed.
> >> >> >>
> >> >> >> llvm-profdata tool  will be taught to read the
name section and
> >> >> >> attach
> >> >> >> function names to the profile records.
> >> >> >
> >> >> >
> >> >> > Needing to pass the executable to llvm-profdata
would cause
> >> >> > deployment
> >> >> > issues for my customers in practice.
> >> >>
> >> >> Why? The deployment needs to pass the profile data anyway
right?
> >> >
> >> >
> >> > Yes, but not the executable.
> >> >
> >> > The PGO training run is likely being run by a gameplay tester
> >> > (non-programmer). In general the binary will not be lying
around as a
> >> > loose
> >> > file anywhere, it will be part of a full package of the
binary+assets
> >> > (think
> >> > like what will end up on a bluray disc). A game's binary
*completely
> >> > useless* without the assets, so except locally on a
programmer's
> machine
> >> > while they iterate/debug, there is no reason for a binary to
ever
> exist
> >> > as a
> >> > standalone file.
> >> >
> >> > I'm not saying that needing the binary is insurmountable
in any
> >> > particular
> >> > scenario. Just that it will cause a strict increase in the
number of
> >> > issues
> >> > to deploying PGO.
> >>
> >>  Your concern is acknowledged.
> >>
> >> >
> >> > These are much bigger "compatibility concerns" for
me than for newer
> >> > toolchains to accept the old format. For a change in format I
can
> easily
> >> > tell my users to replace an exe with a newer one and that is
all they
> >> > need
> >> > to do and it takes 10 seconds, guaranteed. A workflow change
is
> >> > potentially
> >> > a massive disruption and guaranteed to take more than 10
seconds to
> fix
> >> > (perhaps hours or days).
> >>
> >> ok.
> >>
> >> >
> >> >
> >> >>
> >> >>   This
> >> >> is no different from llvm-cov usage model.
> >> >
> >> >
> >> > In practice, getting the performance of PGO is a higher
priority for
> my
> >> > users, so we should not assume that llvm-cov is being used.
> >>
> >> Glad to hear that :)
> >>
> >> thanks,
> >>
> >> David
> >>
> >> >
> >> > -- Sean Silva
> >> >
> >> >>
> >> >>
> >> >> David
> >> >>
> >> >>
> >> >>
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> Note that with 1) or 2), the user can still
benefit from the
> reduced
> >> >> >> profile size.
> >> >> >
> >> >> >
> >> >> > Let me reiterate that the size of the profile is not
a problem I
> have
> >> >> > observed in practice (nor have I heard of this being
a problem in
> >> >> > practice
> >> >> > until this thread). Therefore I'm skeptical of
any changes to our
> >> >> > default
> >> >> > behavior or any new requirements that are not
opt-in.
> >> >> >
> >> >> > -- Sean Silva
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> thanks,
> >> >> >>
> >> >> >> David
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> >which will make it more difficult for me to
do this kind
> >> >> >> > of analysis (would require using nm on the
original binary,
> >> >> >> > hashing
> >> >> >> > everything, etc.).
> >> >> >> >
> >> >> >> > btw, feel free to attach the patch even if
it in a rough state.
> It
> >> >> >> > can
> >> >> >> > still
> >> >> >> > help to clarify the proposal and be a good
talking point.
> >> >> >> > Fine-grained
> >> >> >> > patch
> >> >> >> > review for caring about the rough parts
will happen on
> >> >> >> > llvm-commits;
> >> >> >> > the
> >> >> >> > rough parts will not distract the
discussion here on llvm-dev.
> >> >> >> >
> >> >> >> > -- Sean Silva
> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> thanks,
> >> >> >> >>
> >> >> >> >> David
> >> >> >> >>
_______________________________________________
> >> >> >> >> LLVM Developers mailing list
> >> >> >> >> llvm-dev at lists.llvm.org
> >> >> >> >>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> >> >> >
> >> >> >> >
> >> >> >
> >> >> >
> >> >
> >> >
> >
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150908/69c3446f/attachment-0001.html>

Sean Silva via llvm-dev

2015-Sep-08 22:04 UTC

head link

[llvm-dev] RFC: Reducing Instr PGO size overhead

On Tue, Sep 8, 2015 at 3:03 PM, Sean Silva <chisophugis at gmail.com>
wrote:
>
>
> On Fri, Sep 4, 2015 at 11:44 PM, Xinliang David Li <davidxl at
google.com>
> wrote:
>
>> On Fri, Sep 4, 2015 at 11:03 PM, Sean Silva <chisophugis at
gmail.com>
>> wrote:
>> >
>> >
>> > On Fri, Sep 4, 2015 at 10:11 PM, Xinliang David Li <davidxl at
google.com>
>> > wrote:
>> >>
>> >> On Fri, Sep 4, 2015 at 9:11 PM, Sean Silva <chisophugis at
gmail.com>
>> wrote:
>> >> >
>> >> >
>> >> > On Fri, Sep 4, 2015 at 5:42 PM, Xinliang David Li <
>> davidxl at google.com>
>> >> > wrote:
>> >> >>
>> >> >> On Fri, Sep 4, 2015 at 5:21 PM, Sean Silva
<chisophugis at gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >
>> >> >> > On Fri, Sep 4, 2015 at 3:57 PM, Xinliang David
Li
>> >> >> > <davidxl at google.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> >
>> >> >> >> > I think it is reasonable to simply
replace the key we currently
>> >> >> >> > use
>> >> >> >> > with
>> >> >> >> > MD5(key) for getting a size reduction. 
In practice for my use
>> >> >> >> > cases,
>> >> >> >> > I
>> >> >> >> > have
>> >> >> >> > not observed any of the issues you
mentioned under "Large size
>> of
>> >> >> >> > overhead
>> >> >> >> > can limit the usability of PGO
greatly", but I can understand
>> that
>> >> >> >> > some
>> >> >> >> > of
>> >> >> >> > these issues could become problems in
Google's use case. I
>> would
>> >> >> >> > personally
>> >> >> >> > prefer to keep the existing behavior as
the default (see
>> below),
>> >> >> >> > and
>> >> >> >> > have
>> >> >> >> > MD5(key) as an option.
>> >> >> >>
>> >> >> >> The problem is that this requires profile
format changes.
>> >> >> >
>> >> >> >
>> >> >> > Why? AFAIK the "function name" is just
an arbitrary string. Using
>> s
>> >> >> > or
>> >> >> > MD5(s) shouldn't matter. Of course, the user
will need to pass
>> >> >> > consistent
>> >> >> > flags to clang.
>> >> >>
>> >> >> The raw format for 64bit target can be made
'unchanged', but not for
>> >> >> the 32bit raw format -- the nameptr field is only
32bit.
>> >> >>
>> >> >> The indexed format can not be made the same --  The
ondisk profile
>> >> >> record layout totally changes. The key field changes
from a blob of
>> >> >> chars into an 64bit integer.
>> >> >
>> >> >
>> >> > An MD5 sum cannot be represented as a blob of chars?
>> >>
>> >> yes -- it is fixed length (8byte) blob which may include null
byte in
>> >> the middle.
>> >
>> >
>> > For reference, MD5 sum is 16 bytes (128-bit):
>> > https://en.wikipedia.org/wiki/MD5
>>
>> yes, LLVM's MD5 hash only takes the lower 64bit.
>>
>>
>> >
>> >>
>> >>
>> >> >
>> >> > Or to say it another way, suppose that Itanium mangling
required as a
>> >> > final
>> >> > step to replace the string with its md5 sum in hex.
Therefore all
>> symbol
>> >> > names are "small". My understanding is that
this is effectively all
>> your
>> >> > patch is doing.
>> >>
>> >> The key type before the change is StringRef, while the after
the key
>> >> type is uint64_t. Are you suggesting treating uint64_t md5 sum
key as
>> >> a string of 8 bytes or storing md5 has in text form which will
double
>> >> the size?
>> >
>> >
>> > How much does this change the benefit? If most of the benefit is
>> avoiding
>> > extraordinarily long mangled names then it may be sufficient.
>> >
>> > With IR-level instrumentation like Rong is pursuing the size may
be
>> reduced
>> > sufficiently that we do not need the optimization proposed in this
>> thread.
>> > For example, Rong found >2x size reduction on Google's C++
benchmarks,
>> which
>> > I assume are representative of the extremely large Google binaries
that
>> are
>> > causing the problems addressed by your proposal in this thread.
The
>> > measurements you mention for Clang in this thread provide similar
size
>> > reductions, so Rong's approach may be sufficient (especially
because
>> > functions with extremely large mangled names tend to be small
inline
>> > functions in header-only template libraries).
>>
>> Late instrumentation helps many cases. In some cases (as shown in
>> SPEC), the reduction in size is not as large. Reducing PGO overhead
>> will lower the bar for its adoption.
>>
>> >
>> > Of the points you mention in "Large size of overhead can
limit the
>> usability
>> > of PGO greatly", many of the issues are hard limits that
prevent the
>> use of
>> > PGO. Do you have a lower bound on how much the size of the PGO
data
>> must be
>> > reduced in order to overcome the hard limits?
>>
>> This is a static view:  Think about the situation where application
>> size is ever increasing; also think about situation where we want to
>
> collect more types of profile data. Think about situation where user
>> want to run pgo binaries on small devices with tiny memory/storage ..
>
>
> If we want to reduce memory overhead at runtime and reduce the size of the
> raw profile data extracted from the target, there are clear solutions.
> Consider that debug info does not need to be loaded into the memory image
> of the target; why should information identifying each counter need to be?
> A file containing raw profile counters is a subset of a core dump; in most
> environments, a core dump does not need to have debug info or symbol names
> in it, but can be still be read in full detail in conjunction with the
> original binary.
>
> Thus, as we require
>
this should read "if we ant to require".

-- Sean Silva

> that the binary be passed to llvm-profdata, there is no fundamental reason
> that the memory image of the program, or the raw data extracted from the
> program, must have any size overhead besides the raw values of the counters
> themselves and any text size increase for incrementing them. If we are
> willing to impose this requirement on users, then as far as reducing memory
> overhead at runtime and reducing the size of the raw profile data extracted
> from the target, using hashed function names is clearly the wrong
direction.
>
> *Without* imposing the requirement of passing the binary to llvm-profdata,
> I do like the ability to use hashed function names like you are proposing.
> It is a simple solution for reducing size overhead of function name strings
> with little complexity, as it is just swapping one string for another.
>
>
>>
>> >
>> > Obviously LLVM must be able to support the extremely large
binaries in
>> your
>> > configuration (otherwise what use is LLVM as a compiler ;) My
questions
>> are
>> > primarily aimed at establishing which tradeoffs are acceptable for
>> > supporting this (both for LLVM and for you guys).
>>
>> As I said, with the modified proposal (after getting your feedback),
>> no PGO users in LLVM land is going to lose anything/functionality. The
>> end result will be net win for general users of LLVM (even though your
>> customers don't care about it), not just 'us' as you have
mentioned
>> many times.
>>
>> >
>> > Btw, for us, the issue of PGO data size is not completely
immaterial
>> but is
>> > very different from your use case. For us, the primary issue is
the
>> > additional memory use at run time, since PS4 games usually use
"all"
>> > available memory. We had a problem with UBSan where the large
amount of
>> > memory required for storing the UBSan diagnostic data at runtime
>> required
>> > the game programmers to manually change their memory map to make
room.
>> > +Filipe, do you remember how much memory UBSan was using that
caused a
>> > problem?
>> >
>>
>> My proposal does help reducing rodata size significantly.
>>
>
> Yes, that is why I think that this is a useful thing to do. I just want to
> be careful about existing use cases and the relevant workflow issues.
>
> -- Sean Silva
>
>
>>
>> David
>>
>>
>> > -- Sean Silva
>> >
>> >>
>> >>
>> >> In the raw format, md5 sum key can be an embedded field in the
>> >> prf_data variable instead of as different var referenced by
prf_data.
>> >>
>> >> >
>> >> > If this is not the case, you should show your current
patch so that
>> we
>> >> > can
>> >> > discuss things concretely.
>> >>
>> >> It is not. See above about the difference.
>> >>
>> >> >
>> >> >>
>> >> >> >
>> >> >> >
>> >> >> >>
>> >> >> >> It will be
>> >> >> >> very messy to support multiple formats in
instr-codegen and
>> >> >> >> instr-runtime.  For compatibility concerns,
the reader is taught
>> to
>> >> >> >> support previous format, but the changes
there are isolated (also
>> >> >> >> expected to be removed in the future).
>> >> >> >>
>> >> >> >> >
>> >> >> >> > My primary concern is that if the
function name are not kept at
>> >> >> >> > all
>> >> >> >> > stages,
>> >> >> >> > then it becomes difficult to analyze
the profile data in a
>> >> >> >> > standalone
>> >> >> >> > way.
>> >> >> >> > Many times, I have used `llvm-profdata
show -all-functions
>> >> >> >> > foo.profdata`
>> >> >> >> > on
>> >> >> >> > the resulting profile data and then
imported that data into
>> >> >> >> > Mathematica
>> >> >> >> > for
>> >> >> >> > analysis.
>> >> >> >>
>> >> >> >> This is certainly a very valid use case.
>> >> >> >>
>> >> >> >> >My understanding of your proposal is
that `llvm-profdata show
>> >> >> >> > -all-functions foo.profdata` will not
show the actual function
>> >> >> >> > names
>> >> >> >> > but
>> >> >> >> > instead MD5 hashes,
>> >> >> >>
>> >> >> >> Yes.
>> >> >> >>
>> >> >> >> To support your use case, there are two
solutions:
>> >> >> >>
>> >> >> >> 1) user can add -fcoverage-mapping option in
the build
>> >> >> >> 2) introduce a new option
-fprofile-instr-names that force the
>> >> >> >> emission of the name sections in the .o
file. This is similar to
>> 1),
>> >> >> >> but no covmap section is needed.
>> >> >> >>
>> >> >> >> llvm-profdata tool  will be taught to read
the name section and
>> >> >> >> attach
>> >> >> >> function names to the profile records.
>> >> >> >
>> >> >> >
>> >> >> > Needing to pass the executable to llvm-profdata
would cause
>> >> >> > deployment
>> >> >> > issues for my customers in practice.
>> >> >>
>> >> >> Why? The deployment needs to pass the profile data
anyway right?
>> >> >
>> >> >
>> >> > Yes, but not the executable.
>> >> >
>> >> > The PGO training run is likely being run by a gameplay
tester
>> >> > (non-programmer). In general the binary will not be lying
around as a
>> >> > loose
>> >> > file anywhere, it will be part of a full package of the
binary+assets
>> >> > (think
>> >> > like what will end up on a bluray disc). A game's
binary *completely
>> >> > useless* without the assets, so except locally on a
programmer's
>> machine
>> >> > while they iterate/debug, there is no reason for a binary
to ever
>> exist
>> >> > as a
>> >> > standalone file.
>> >> >
>> >> > I'm not saying that needing the binary is
insurmountable in any
>> >> > particular
>> >> > scenario. Just that it will cause a strict increase in
the number of
>> >> > issues
>> >> > to deploying PGO.
>> >>
>> >>  Your concern is acknowledged.
>> >>
>> >> >
>> >> > These are much bigger "compatibility concerns"
for me than for newer
>> >> > toolchains to accept the old format. For a change in
format I can
>> easily
>> >> > tell my users to replace an exe with a newer one and that
is all they
>> >> > need
>> >> > to do and it takes 10 seconds, guaranteed. A workflow
change is
>> >> > potentially
>> >> > a massive disruption and guaranteed to take more than 10
seconds to
>> fix
>> >> > (perhaps hours or days).
>> >>
>> >> ok.
>> >>
>> >> >
>> >> >
>> >> >>
>> >> >>   This
>> >> >> is no different from llvm-cov usage model.
>> >> >
>> >> >
>> >> > In practice, getting the performance of PGO is a higher
priority for
>> my
>> >> > users, so we should not assume that llvm-cov is being
used.
>> >>
>> >> Glad to hear that :)
>> >>
>> >> thanks,
>> >>
>> >> David
>> >>
>> >> >
>> >> > -- Sean Silva
>> >> >
>> >> >>
>> >> >>
>> >> >> David
>> >> >>
>> >> >>
>> >> >>
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> Note that with 1) or 2), the user can still
benefit from the
>> reduced
>> >> >> >> profile size.
>> >> >> >
>> >> >> >
>> >> >> > Let me reiterate that the size of the profile is
not a problem I
>> have
>> >> >> > observed in practice (nor have I heard of this
being a problem in
>> >> >> > practice
>> >> >> > until this thread). Therefore I'm skeptical
of any changes to our
>> >> >> > default
>> >> >> > behavior or any new requirements that are not
opt-in.
>> >> >> >
>> >> >> > -- Sean Silva
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> thanks,
>> >> >> >>
>> >> >> >> David
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> >which will make it more difficult for me
to do this kind
>> >> >> >> > of analysis (would require using nm on
the original binary,
>> >> >> >> > hashing
>> >> >> >> > everything, etc.).
>> >> >> >> >
>> >> >> >> > btw, feel free to attach the patch even
if it in a rough
>> state. It
>> >> >> >> > can
>> >> >> >> > still
>> >> >> >> > help to clarify the proposal and be a
good talking point.
>> >> >> >> > Fine-grained
>> >> >> >> > patch
>> >> >> >> > review for caring about the rough parts
will happen on
>> >> >> >> > llvm-commits;
>> >> >> >> > the
>> >> >> >> > rough parts will not distract the
discussion here on llvm-dev.
>> >> >> >> >
>> >> >> >> > -- Sean Silva
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> thanks,
>> >> >> >> >>
>> >> >> >> >> David
>> >> >> >> >>
_______________________________________________
>> >> >> >> >> LLVM Developers mailing list
>> >> >> >> >> llvm-dev at lists.llvm.org
>> >> >> >> >>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >> >> >> >
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150908/0e0ebf86/attachment.html>

Xinliang David Li via llvm-dev

2015-Sep-08 22:47 UTC

head link

[llvm-dev] RFC: Reducing Instr PGO size overhead

>> >>
>> >> yes -- it is fixed length (8byte) blob which may include null
byte in
>> >> the middle.
>> >
>> >
>> > For reference, MD5 sum is 16 bytes (128-bit):
>> > https://en.wikipedia.org/wiki/MD5
>>
>> yes, LLVM's MD5 hash only takes the lower 64bit.
>>
>>
>> >
>> >>
>> >>
>> >> >
>> >> > Or to say it another way, suppose that Itanium mangling
required as a
>> >> > final
>> >> > step to replace the string with its md5 sum in hex.
Therefore all
>> >> > symbol
>> >> > names are "small". My understanding is that
this is effectively all
>> >> > your
>> >> > patch is doing.
>> >>
>> >> The key type before the change is StringRef, while the after
the key
>> >> type is uint64_t. Are you suggesting treating uint64_t md5 sum
key as
>> >> a string of 8 bytes or storing md5 has in text form which will
double
>> >> the size?
>> >
>> >
>> > How much does this change the benefit? If most of the benefit is
>> > avoiding
>> > extraordinarily long mangled names then it may be sufficient.
>> >
>> > With IR-level instrumentation like Rong is pursuing the size may
be
>> > reduced
>> > sufficiently that we do not need the optimization proposed in this
>> > thread.
>> > For example, Rong found >2x size reduction on Google's C++
benchmarks,
>> > which
>> > I assume are representative of the extremely large Google binaries
that
>> > are
>> > causing the problems addressed by your proposal in this thread.
The
>> > measurements you mention for Clang in this thread provide similar
size
>> > reductions, so Rong's approach may be sufficient (especially
because
>> > functions with extremely large mangled names tend to be small
inline
>> > functions in header-only template libraries).
>>
>> Late instrumentation helps many cases. In some cases (as shown in
>> SPEC), the reduction in size is not as large. Reducing PGO overhead
>> will lower the bar for its adoption.
>>
>> >
>> > Of the points you mention in "Large size of overhead can
limit the
>> > usability
>> > of PGO greatly", many of the issues are hard limits that
prevent the use
>> > of
>> > PGO. Do you have a lower bound on how much the size of the PGO
data must
>> > be
>> > reduced in order to overcome the hard limits?
>>
>> This is a static view:  Think about the situation where application
>> size is ever increasing; also think about situation where we want to
>>
>> collect more types of profile data. Think about situation where user
>> want to run pgo binaries on small devices with tiny memory/storage ..
>
>
> If we want to reduce memory overhead at runtime and reduce the size of the
> raw profile data extracted from the target, there are clear solutions.
> Consider that debug info does not need to be loaded into the memory image
of
> the target; why should information identifying each counter need to be?
> A file containing raw profile counters is a subset of a core dump; in most
> environments, a core dump does not need to have debug info or symbol names
> in it, but can be still be read in full detail in conjunction with the
> original binary.
Yes -- there are many alternatives:
1) emit the name key mapping as a side data at compile time, or
2) emit them into nonloadable sections of the object file.

Compared with the above, LLVM's existing design does have its own
advantage -- making it easier for tool to access 'debug' info for
counters.

LLVM's coverage testing, on the other hand, take a hybrid approach: It
emits the coverage map as rodata, but does not pass it to the profile
dumper. I think it is better to emit covmap as a side data not
attached to target binary.

>
> Thus, as we require that the binary be passed to llvm-profdata, there is no
> fundamental reason that the memory image of the program, or the raw data
> extracted from the program, must have any size overhead besides the raw
> values of the counters themselves and any text size increase for
> incrementing them. If we are willing to impose this requirement on users,
> then as far as reducing memory overhead at runtime and reducing the size of
> the raw profile data extracted from the target, using hashed function names
> is clearly the wrong direction.
>
> *Without* imposing the requirement of passing the binary to llvm-profdata,
I
> do like the ability to use hashed function names like you are proposing. It
> is a simple solution for reducing size overhead of function name strings
> with little complexity, as it is just swapping one string for another.
Agree. The good news is that the overhead of hashed function names is
small enough that makes this approach attractive.

thanks,

David>
>>
>>
>> >
>> > Obviously LLVM must be able to support the extremely large
binaries in
>> > your
>> > configuration (otherwise what use is LLVM as a compiler ;) My
questions
>> > are
>> > primarily aimed at establishing which tradeoffs are acceptable for
>> > supporting this (both for LLVM and for you guys).
>>
>> As I said, with the modified proposal (after getting your feedback),
>> no PGO users in LLVM land is going to lose anything/functionality. The
>> end result will be net win for general users of LLVM (even though your
>> customers don't care about it), not just 'us' as you have
mentioned
>> many times.
>>
>> >
>> > Btw, for us, the issue of PGO data size is not completely
immaterial but
>> > is
>> > very different from your use case. For us, the primary issue is
the
>> > additional memory use at run time, since PS4 games usually use
"all"
>> > available memory. We had a problem with UBSan where the large
amount of
>> > memory required for storing the UBSan diagnostic data at runtime
>> > required
>> > the game programmers to manually change their memory map to make
room.
>> > +Filipe, do you remember how much memory UBSan was using that
caused a
>> > problem?
>> >
>>
>> My proposal does help reducing rodata size significantly.
>
>
> Yes, that is why I think that this is a useful thing to do. I just want to
> be careful about existing use cases and the relevant workflow issues.
>
> -- Sean Silva
>
>>
>>
>> David
>>
>>
>> > -- Sean Silva
>> >
>> >>
>> >>
>> >> In the raw format, md5 sum key can be an embedded field in the
>> >> prf_data variable instead of as different var referenced by
prf_data.
>> >>
>> >> >
>> >> > If this is not the case, you should show your current
patch so that
>> >> > we
>> >> > can
>> >> > discuss things concretely.
>> >>
>> >> It is not. See above about the difference.
>> >>
>> >> >
>> >> >>
>> >> >> >
>> >> >> >
>> >> >> >>
>> >> >> >> It will be
>> >> >> >> very messy to support multiple formats in
instr-codegen and
>> >> >> >> instr-runtime.  For compatibility concerns,
the reader is taught
>> >> >> >> to
>> >> >> >> support previous format, but the changes
there are isolated (also
>> >> >> >> expected to be removed in the future).
>> >> >> >>
>> >> >> >> >
>> >> >> >> > My primary concern is that if the
function name are not kept at
>> >> >> >> > all
>> >> >> >> > stages,
>> >> >> >> > then it becomes difficult to analyze
the profile data in a
>> >> >> >> > standalone
>> >> >> >> > way.
>> >> >> >> > Many times, I have used `llvm-profdata
show -all-functions
>> >> >> >> > foo.profdata`
>> >> >> >> > on
>> >> >> >> > the resulting profile data and then
imported that data into
>> >> >> >> > Mathematica
>> >> >> >> > for
>> >> >> >> > analysis.
>> >> >> >>
>> >> >> >> This is certainly a very valid use case.
>> >> >> >>
>> >> >> >> >My understanding of your proposal is
that `llvm-profdata show
>> >> >> >> > -all-functions foo.profdata` will not
show the actual function
>> >> >> >> > names
>> >> >> >> > but
>> >> >> >> > instead MD5 hashes,
>> >> >> >>
>> >> >> >> Yes.
>> >> >> >>
>> >> >> >> To support your use case, there are two
solutions:
>> >> >> >>
>> >> >> >> 1) user can add -fcoverage-mapping option in
the build
>> >> >> >> 2) introduce a new option
-fprofile-instr-names that force the
>> >> >> >> emission of the name sections in the .o
file. This is similar to
>> >> >> >> 1),
>> >> >> >> but no covmap section is needed.
>> >> >> >>
>> >> >> >> llvm-profdata tool  will be taught to read
the name section and
>> >> >> >> attach
>> >> >> >> function names to the profile records.
>> >> >> >
>> >> >> >
>> >> >> > Needing to pass the executable to llvm-profdata
would cause
>> >> >> > deployment
>> >> >> > issues for my customers in practice.
>> >> >>
>> >> >> Why? The deployment needs to pass the profile data
anyway right?
>> >> >
>> >> >
>> >> > Yes, but not the executable.
>> >> >
>> >> > The PGO training run is likely being run by a gameplay
tester
>> >> > (non-programmer). In general the binary will not be lying
around as a
>> >> > loose
>> >> > file anywhere, it will be part of a full package of the
binary+assets
>> >> > (think
>> >> > like what will end up on a bluray disc). A game's
binary *completely
>> >> > useless* without the assets, so except locally on a
programmer's
>> >> > machine
>> >> > while they iterate/debug, there is no reason for a binary
to ever
>> >> > exist
>> >> > as a
>> >> > standalone file.
>> >> >
>> >> > I'm not saying that needing the binary is
insurmountable in any
>> >> > particular
>> >> > scenario. Just that it will cause a strict increase in
the number of
>> >> > issues
>> >> > to deploying PGO.
>> >>
>> >>  Your concern is acknowledged.
>> >>
>> >> >
>> >> > These are much bigger "compatibility concerns"
for me than for newer
>> >> > toolchains to accept the old format. For a change in
format I can
>> >> > easily
>> >> > tell my users to replace an exe with a newer one and that
is all they
>> >> > need
>> >> > to do and it takes 10 seconds, guaranteed. A workflow
change is
>> >> > potentially
>> >> > a massive disruption and guaranteed to take more than 10
seconds to
>> >> > fix
>> >> > (perhaps hours or days).
>> >>
>> >> ok.
>> >>
>> >> >
>> >> >
>> >> >>
>> >> >>   This
>> >> >> is no different from llvm-cov usage model.
>> >> >
>> >> >
>> >> > In practice, getting the performance of PGO is a higher
priority for
>> >> > my
>> >> > users, so we should not assume that llvm-cov is being
used.
>> >>
>> >> Glad to hear that :)
>> >>
>> >> thanks,
>> >>
>> >> David
>> >>
>> >> >
>> >> > -- Sean Silva
>> >> >
>> >> >>
>> >> >>
>> >> >> David
>> >> >>
>> >> >>
>> >> >>
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> Note that with 1) or 2), the user can still
benefit from the
>> >> >> >> reduced
>> >> >> >> profile size.
>> >> >> >
>> >> >> >
>> >> >> > Let me reiterate that the size of the profile is
not a problem I
>> >> >> > have
>> >> >> > observed in practice (nor have I heard of this
being a problem in
>> >> >> > practice
>> >> >> > until this thread). Therefore I'm skeptical
of any changes to our
>> >> >> > default
>> >> >> > behavior or any new requirements that are not
opt-in.
>> >> >> >
>> >> >> > -- Sean Silva
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> thanks,
>> >> >> >>
>> >> >> >> David
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> >which will make it more difficult for me
to do this kind
>> >> >> >> > of analysis (would require using nm on
the original binary,
>> >> >> >> > hashing
>> >> >> >> > everything, etc.).
>> >> >> >> >
>> >> >> >> > btw, feel free to attach the patch even
if it in a rough state.
>> >> >> >> > It
>> >> >> >> > can
>> >> >> >> > still
>> >> >> >> > help to clarify the proposal and be a
good talking point.
>> >> >> >> > Fine-grained
>> >> >> >> > patch
>> >> >> >> > review for caring about the rough parts
will happen on
>> >> >> >> > llvm-commits;
>> >> >> >> > the
>> >> >> >> > rough parts will not distract the
discussion here on llvm-dev.
>> >> >> >> >
>> >> >> >> > -- Sean Silva
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> thanks,
>> >> >> >> >>
>> >> >> >> >> David
>> >> >> >> >>
_______________________________________________
>> >> >> >> >> LLVM Developers mailing list
>> >> >> >> >> llvm-dev at lists.llvm.org
>> >> >> >> >>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >> >> >> >
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>
>

llvm dev - Sep 2015 - RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead