thr3ads.net - llvm dev - [LLVMdev] RFC: Binary format for instrumentation based profiling data [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Robinson, Paul

2014-Mar-24 17:08 UTC

[LLVMdev] RFC: Binary format for instrumentation based profiling data

> We seem to have some agreement that two formats for instrumentation
> based profiling is worthwhile. These are that emitted by compiler-rt in
> the instrumented program at runtime (format 1), and that which is
> consumed by clang when compiling the program with PGO (format 2).
> 
> Format 1
> --------
> 
> This format should be efficient to write, since the instrumented program
> should run with as little overhead as possible. This also doesn't need
> to be stable, and we can assume the same version of LLVM that was used
> to instrument the program will read the counter data. As such, the file
> format is versioned (so we can easily reject versions we don't
> understand) and consists basically of a memory dump of the relevant
> profiling counters.
The "same version" assertion isn't completely true, at a previous
job
we had clients who preferred not to regenerate profile data unless they
actually had to (because it was a big pain and took a long time).  But
as long as the versioning is based on actual format changes, not just
repurposing the current LLVM version number (making the previous data
unusable for no technical reason), that's okay.

As long as I'm bothering to say something, is there some way that the
tools will figure out that you're trying to apply old data to new files
that have changed in ways that make the old data inapplicable?  Sorry
if this has been brought up elsewhere and I just missed it.
--paulr
> 
> Format 2
> --------
> 
> This format should be efficient to read and preferably reasonably
> compact. We'll convert from format 1 to format 2 using llvm-profdata,
> and clang will use format 2 for PGO.
> 
> Since the only particularly important operation in this use case is fast
> lookup, I propose using the on disk hash table that's currently used in
> clang for AST serialization/PTH/etc with a small amount of metadata in a
> header.
> 
> The hash table implementation currently lives in include/clang/Basic and
> consists of a single header. Moving it to llvm and updating the clients
> in clang should be easy. I'll send a brief RFC separately to see if
> anyone's opposed to moving it.
> 
> 
> Thoughts?
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Duncan P. N. Exon Smith

2014-Mar-24 22:03 UTC

head link

[LLVMdev] RFC: Binary format for instrumentation based profiling data

On Mar 24, 2014, at 10:08 AM, Robinson, Paul <Paul_Robinson at
playstation.sony.com> wrote:
>> We seem to have some agreement that two formats for instrumentation
>> based profiling is worthwhile. These are that emitted by compiler-rt in
>> the instrumented program at runtime (format 1), and that which is
>> consumed by clang when compiling the program with PGO (format 2).
>> 
>> Format 1
>> --------
>> 
>> This format should be efficient to write, since the instrumented
program
>> should run with as little overhead as possible. This also doesn't
need
>> to be stable, and we can assume the same version of LLVM that was used
>> to instrument the program will read the counter data. As such, the file
>> format is versioned (so we can easily reject versions we don't
>> understand) and consists basically of a memory dump of the relevant
>> profiling counters.
> 
> The "same version" assertion isn't completely true, at a
previous job
> we had clients who preferred not to regenerate profile data unless they
> actually had to (because it was a big pain and took a long time).  But
> as long as the versioning is based on actual format changes, not just
> repurposing the current LLVM version number (making the previous data
> unusable for no technical reason), that's okay.
Format 1 (extension .profraw since r204676) should be run immediately through
llvm-profdata to generate format 2 (extension .profdata).  The only profiles
that should be kept around are format 2.
> As long as I'm bothering to say something, is there some way that the
> tools will figure out that you're trying to apply old data to new files
> that have changed in ways that make the old data inapplicable?  Sorry
> if this has been brought up elsewhere and I just missed it.
> —paulr
There’s a hash for each function based on the layout of the counters assigned to
it.  If the hash from the data doesn’t match the current frontend, the data is
ignored.  Currently, the hash is extremely naive:  the number of counters.

Robinson, Paul

2014-Mar-25 17:46 UTC

head link

[LLVMdev] RFC: Binary format for instrumentation based profiling data

> On Mar 24, 2014, at 10:08 AM, Robinson, Paul
> <Paul_Robinson at playstation.sony.com> wrote:
> 
> >> We seem to have some agreement that two formats for
instrumentation
> >> based profiling is worthwhile. These are that emitted by
compiler-rt
> in
> >> the instrumented program at runtime (format 1), and that which is
> >> consumed by clang when compiling the program with PGO (format 2).
> >>
> >> Format 1
> >> --------
> >>
> >> This format should be efficient to write, since the instrumented
> program
> >> should run with as little overhead as possible. This also
doesn't
> need
> >> to be stable, and we can assume the same version of LLVM that was
> used
> >> to instrument the program will read the counter data. As such, the
> file
> >> format is versioned (so we can easily reject versions we don't
> >> understand) and consists basically of a memory dump of the
relevant
> >> profiling counters.
> >
> > The "same version" assertion isn't completely true, at a
previous job
> > we had clients who preferred not to regenerate profile data unless
> they
> > actually had to (because it was a big pain and took a long time).  But
> > as long as the versioning is based on actual format changes, not just
> > repurposing the current LLVM version number (making the previous data
> > unusable for no technical reason), that's okay.
> 
> Format 1 (extension .profraw since r204676) should be run immediately
> through llvm-profdata to generate format 2 (extension .profdata).  The
> only profiles that should be kept around are format 2.
Okay, but the version comment still applies to format 2 then.
> 
> > As long as I'm bothering to say something, is there some way that
the
> > tools will figure out that you're trying to apply old data to new
> files
> > that have changed in ways that make the old data inapplicable?  Sorry
> > if this has been brought up elsewhere and I just missed it.
> > -paulr
> 
> There's a hash for each function based on the layout of the counters
> assigned to it.  If the hash from the data doesn't match the current
> frontend, the data is ignored.  Currently, the hash is extremely naive:
> the number of counters.
Eww.  Should be CFG-based.  I think merge-similar-functions has a way
to compute this?
--paulr

llvm dev - Mar 2014 - [LLVMdev] RFC: Binary format for instrumentation based profiling data

[LLVMdev] RFC: Binary format for instrumentation based profiling data

[LLVMdev] RFC: Binary format for instrumentation based profiling data

[LLVMdev] RFC: Binary format for instrumentation based profiling data