thr3ads.net - llvm dev - [llvm-dev] RFC: Reducing Instr PGO size overhead [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Xinliang David Li via llvm-dev

2015-Sep-04 22:57 UTC

[llvm-dev] RFC: Reducing Instr PGO size overhead

>
> I think it is reasonable to simply replace the key we currently use with
> MD5(key) for getting a size reduction.  In practice for my use cases, I
have
> not observed any of the issues you mentioned under "Large size of
overhead
> can limit the usability of PGO greatly", but I can understand that
some of
> these issues could become problems in Google's use case. I would
personally
> prefer to keep the existing behavior as the default (see below), and have
> MD5(key) as an option.
The problem is that this requires profile format changes. It will be
very messy to support multiple formats in instr-codegen and
instr-runtime.  For compatibility concerns, the reader is taught to
support previous format, but the changes there are isolated (also
expected to be removed in the future).
>
> My primary concern is that if the function name are not kept at all stages,
> then it becomes difficult to analyze the profile data in a standalone way.
> Many times, I have used `llvm-profdata show -all-functions foo.profdata` on
> the resulting profile data and then imported that data into Mathematica for
> analysis.
This is certainly a very valid use case.
>My understanding of your proposal is that `llvm-profdata show
> -all-functions foo.profdata` will not show the actual function names but
> instead MD5 hashes,
Yes.

To support your use case, there are two solutions:

1) user can add -fcoverage-mapping option in the build
2) introduce a new option -fprofile-instr-names that force the
emission of the name sections in the .o file. This is similar to 1),
but no covmap section is needed.

llvm-profdata tool  will be taught to read the name section and attach
function names to the profile records.

Note that with 1) or 2), the user can still benefit from the reduced
profile size.

thanks,

David



>which will make it more difficult for me to do this kind
> of analysis (would require using nm on the original binary, hashing
> everything, etc.).
>
> btw, feel free to attach the patch even if it in a rough state. It can
still
> help to clarify the proposal and be a good talking point. Fine-grained
patch
> review for caring about the rough parts will happen on llvm-commits; the
> rough parts will not distract the discussion here on llvm-dev.
>
> -- Sean Silva
>
>>
>>
>> thanks,
>>
>> David
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>

Sean Silva via llvm-dev

2015-Sep-05 00:21 UTC

head link

[llvm-dev] RFC: Reducing Instr PGO size overhead

On Fri, Sep 4, 2015 at 3:57 PM, Xinliang David Li <davidxl at google.com>
wrote:
> >
> > I think it is reasonable to simply replace the key we currently use
with
> > MD5(key) for getting a size reduction.  In practice for my use cases,
I
> have
> > not observed any of the issues you mentioned under "Large size of
> overhead
> > can limit the usability of PGO greatly", but I can understand
that some
> of
> > these issues could become problems in Google's use case. I would
> personally
> > prefer to keep the existing behavior as the default (see below), and
have
> > MD5(key) as an option.
>
> The problem is that this requires profile format changes.

Why? AFAIK the "function name" is just an arbitrary string. Using s or
MD5(s) shouldn't matter. Of course, the user will need to pass consistent
flags to clang.


> It will be
> very messy to support multiple formats in instr-codegen and
> instr-runtime.  For compatibility concerns, the reader is taught to
> support previous format, but the changes there are isolated (also
> expected to be removed in the future).
>
> >
> > My primary concern is that if the function name are not kept at all
> stages,
> > then it becomes difficult to analyze the profile data in a standalone
> way.
> > Many times, I have used `llvm-profdata show -all-functions
foo.profdata`
> on
> > the resulting profile data and then imported that data into
Mathematica
> for
> > analysis.
>
> This is certainly a very valid use case.
>
> >My understanding of your proposal is that `llvm-profdata show
> > -all-functions foo.profdata` will not show the actual function names
but
> > instead MD5 hashes,
>
> Yes.
>
> To support your use case, there are two solutions:
>
> 1) user can add -fcoverage-mapping option in the build
> 2) introduce a new option -fprofile-instr-names that force the
> emission of the name sections in the .o file. This is similar to 1),
> but no covmap section is needed.
>
> llvm-profdata tool  will be taught to read the name section and attach
> function names to the profile records.
>
Needing to pass the executable to llvm-profdata would cause deployment
issues for my customers in practice.

>
> Note that with 1) or 2), the user can still benefit from the reduced
> profile size.
>
Let me reiterate that the size of the profile is not a problem I have
observed in practice (nor have I heard of this being a problem in practice
until this thread). Therefore I'm skeptical of any changes to our default
behavior or any new requirements that are not opt-in.

-- Sean Silva

>
> thanks,
>
> David
>
>
>
>
> >which will make it more difficult for me to do this kind
> > of analysis (would require using nm on the original binary, hashing
> > everything, etc.).
> >
> > btw, feel free to attach the patch even if it in a rough state. It can
> still
> > help to clarify the proposal and be a good talking point. Fine-grained
> patch
> > review for caring about the rough parts will happen on llvm-commits;
the
> > rough parts will not distract the discussion here on llvm-dev.
> >
> > -- Sean Silva
> >
> >>
> >>
> >> thanks,
> >>
> >> David
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150904/a5f30100/attachment.html>

Xinliang David Li via llvm-dev

2015-Sep-05 00:42 UTC

head link

[llvm-dev] RFC: Reducing Instr PGO size overhead

On Fri, Sep 4, 2015 at 5:21 PM, Sean Silva <chisophugis at gmail.com>
wrote:>
>
> On Fri, Sep 4, 2015 at 3:57 PM, Xinliang David Li <davidxl at
google.com>
> wrote:
>>
>> >
>> > I think it is reasonable to simply replace the key we currently
use with
>> > MD5(key) for getting a size reduction.  In practice for my use
cases, I
>> > have
>> > not observed any of the issues you mentioned under "Large
size of
>> > overhead
>> > can limit the usability of PGO greatly", but I can understand
that some
>> > of
>> > these issues could become problems in Google's use case. I
would
>> > personally
>> > prefer to keep the existing behavior as the default (see below),
and
>> > have
>> > MD5(key) as an option.
>>
>> The problem is that this requires profile format changes.
>
>
> Why? AFAIK the "function name" is just an arbitrary string. Using
s or
> MD5(s) shouldn't matter. Of course, the user will need to pass
consistent
> flags to clang.
The raw format for 64bit target can be made 'unchanged', but not for
the 32bit raw format -- the nameptr field is only 32bit.

The indexed format can not be made the same --  The ondisk profile
record layout totally changes. The key field changes from a blob of
chars into an 64bit integer.>
>
>>
>> It will be
>> very messy to support multiple formats in instr-codegen and
>> instr-runtime.  For compatibility concerns, the reader is taught to
>> support previous format, but the changes there are isolated (also
>> expected to be removed in the future).
>>
>> >
>> > My primary concern is that if the function name are not kept at
all
>> > stages,
>> > then it becomes difficult to analyze the profile data in a
standalone
>> > way.
>> > Many times, I have used `llvm-profdata show -all-functions
foo.profdata`
>> > on
>> > the resulting profile data and then imported that data into
Mathematica
>> > for
>> > analysis.
>>
>> This is certainly a very valid use case.
>>
>> >My understanding of your proposal is that `llvm-profdata show
>> > -all-functions foo.profdata` will not show the actual function
names but
>> > instead MD5 hashes,
>>
>> Yes.
>>
>> To support your use case, there are two solutions:
>>
>> 1) user can add -fcoverage-mapping option in the build
>> 2) introduce a new option -fprofile-instr-names that force the
>> emission of the name sections in the .o file. This is similar to 1),
>> but no covmap section is needed.
>>
>> llvm-profdata tool  will be taught to read the name section and attach
>> function names to the profile records.
>
>
> Needing to pass the executable to llvm-profdata would cause deployment
> issues for my customers in practice.
Why? The deployment needs to pass the profile data anyway right?  This
is no different from llvm-cov usage model.

David


>
>>
>>
>> Note that with 1) or 2), the user can still benefit from the reduced
>> profile size.
>
>
> Let me reiterate that the size of the profile is not a problem I have
> observed in practice (nor have I heard of this being a problem in practice
> until this thread). Therefore I'm skeptical of any changes to our
default
> behavior or any new requirements that are not opt-in.
>
> -- Sean Silva
>
>>
>>
>> thanks,
>>
>> David
>>
>>
>>
>>
>> >which will make it more difficult for me to do this kind
>> > of analysis (would require using nm on the original binary,
hashing
>> > everything, etc.).
>> >
>> > btw, feel free to attach the patch even if it in a rough state. It
can
>> > still
>> > help to clarify the proposal and be a good talking point.
Fine-grained
>> > patch
>> > review for caring about the rough parts will happen on
llvm-commits; the
>> > rough parts will not distract the discussion here on llvm-dev.
>> >
>> > -- Sean Silva
>> >
>> >>
>> >>
>> >> thanks,
>> >>
>> >> David
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> >
>
>

llvm dev - Sep 2015 - RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead

[llvm-dev] RFC: Reducing Instr PGO size overhead