thr3ads.net - llvm dev - [LLVMdev] RFC: Binary format for instrumentation based profiling data [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Justin Bogner

2014-Mar-18 00:22 UTC

[LLVMdev] RFC: Binary format for instrumentation based profiling data

Chandler Carruth <chandlerc at google.com> writes:> The other assumption here is that you want the same file format written by
> instrumentation and read back by the compiler. While I think that is an
> unsurprising goal, I think it creates quite a few limitations that I'd
like to
> point out. I think it would be worthwhile to consider the alternative of
> having the profile library write out data files in a format which is
> essentially "always" transformed by a post-processing tool before
being used
> during compilation.
>
> Limitations of using the same format in both places:
> - High burden on writing the file constrains the format (must be fast, must
>   not use libraries, etc...)
> - Have to write and index even though the writer doesn't really need
it.
> - Have to have the function name passed through the instrumentation,
>   potentially duplicating it with debug info.
> - Can't use an extensible file format (like bitcode) to insulate
readers of
>   profile data from format changes.
>
> I'm imagining it might be nicer to have something along the lines of
the
> following counter proposal. Define two formats: the format written by
> instrumentation, and the format read by the compiler. Split the use cases
up.
> Specialize the formats based on the use cases. It does require the user to
> post-process the results, but it isn't clear that this is really a
burden.
> Historically it has been needed to merge gcov profiles from different TUs,
and
> it is still required to merge them from multiple runs.
This is an interesting idea. The counter data itself without index is
dead simple, so this approach for the instrumentation written format
would certainly be nice for compiler-rt, at the small cost of needing
two readers. We'd also need two writers, but that appears inevitable
since one needs to live in compiler-rt.
> I think the results could be superior for both the writer and reader:
>
> Instrumentation written format:
> - No index, just header and counters
> - (optional) Omit function names, and use PC at a known point of the
function,
>   and rely on debug info to map back to function names.
This depends a bit on whether or not the conversion tool should depend
on the debug info being available. We'd need to weigh the usability cost
against the size benefit.
> - Use a structure which can be mmap-ed directly by the instrumentation code
>   (at least on LE systems) so that "writing the file on close" is
just
>   flushing the memory region to disk
If this is feasible, we could also make the format is host endian and
force the post-processing to byteswap as it reads. This avoids online
work in favour of offline.
> - Explicitly version format, and provide no stability going forward
>
> Profile reading format:
> - Use a bitcoded format much like Clang's ASTs do (or some other tagged
format
>   which allows extensions)
I'm not entirely convinced a bitcoded format is going to gain us much
over a simpler on disk hash table. The variable bit rate integers might
be worthwhile, but will it be efficient to look up the counters for a
particular function name?

That said, the ASTs also make use of the on disk hash that Dmitri
mentioned for various indexes, which is definitely worth looking at.
> - Leverage the existing partial reading which has been heavily optimized
for
>   modules, LLVM IR, etc.
> - Use implicit-zero semantics for missing counters within a function where
we
>   have *some* instrumentation results, and remove all zero counters
> - Maybe other compression techniques
>
> Thoughts? Specific reasons to avoid this? I'm very much interested in
> minimizing the space and runtime overhead of instrumentation, as well as
> getting more advanced features in the format read by Clang itself.

Duncan Exon Smith

2014-Mar-18 02:00 UTC

head link

[LLVMdev] RFC: Binary format for instrumentation based profiling data

> On Mar 17, 2014, at 17:22, Justin Bogner <mail at justinbogner.com>
wrote:
> 
> Chandler Carruth <chandlerc at google.com> writes:
>> The other assumption here is that you want the same file format written
by
>> instrumentation and read back by the compiler. While I think that is an
>> unsurprising goal, I think it creates quite a few limitations that
I'd like to
>> point out. I think it would be worthwhile to consider the alternative
of
>> having the profile library write out data files in a format which is
>> essentially "always" transformed by a post-processing tool
before being used
>> during compilation.
>> 
>> Limitations of using the same format in both places:
>> - High burden on writing the file constrains the format (must be fast,
must
>>  not use libraries, etc...)
>> - Have to write and index even though the writer doesn't really
need it.
>> - Have to have the function name passed through the instrumentation,
>>  potentially duplicating it with debug info.
>> - Can't use an extensible file format (like bitcode) to insulate
readers of
>>  profile data from format changes.
>> 
>> I'm imagining it might be nicer to have something along the lines
of the
>> following counter proposal. Define two formats: the format written by
>> instrumentation, and the format read by the compiler. Split the use
cases up.
>> Specialize the formats based on the use cases. It does require the user
to
>> post-process the results, but it isn't clear that this is really a
burden.
>> Historically it has been needed to merge gcov profiles from different
TUs, and
>> it is still required to merge them from multiple runs.
> 
> This is an interesting idea. The counter data itself without index is
> dead simple, so this approach for the instrumentation written format
> would certainly be nice for compiler-rt, at the small cost of needing
> two readers. We'd also need two writers, but that appears inevitable
> since one needs to live in compiler-rt.
I'm in favour of two formats.  Simplifying compiler-rt is a worthwhile goal.

Nevertheless, the current proposal with a naive index is straightforward to
produce, especially after the changes I committed today.  I think moving to that
is a good incremental change.

Moving forward we can split the format in two and evolve them independently.  In
particular, compiler-rt's write could be coded as a few memcpy calls plus a
header, if there's some freedom around the format.
>> I think the results could be superior for both the writer and reader:
>> 
>> Instrumentation written format:
>> - No index, just header and counters
>> - (optional) Omit function names, and use PC at a known point of the
function,
>>  and rely on debug info to map back to function names.
> 
> This depends a bit on whether or not the conversion tool should depend
> on the debug info being available. We'd need to weigh the usability
cost
> against the size benefit.
> 
>> - Use a structure which can be mmap-ed directly by the instrumentation
code
>>  (at least on LE systems) so that "writing the file on close"
is just
>>  flushing the memory region to disk
> 
> If this is feasible, we could also make the format is host endian and
> force the post-processing to byteswap as it reads. This avoids online
> work in favour of offline.
> 
>> - Explicitly version format, and provide no stability going forward
>> 
>> Profile reading format:
>> - Use a bitcoded format much like Clang's ASTs do (or some other
tagged format
>>  which allows extensions)
> 
> I'm not entirely convinced a bitcoded format is going to gain us much
> over a simpler on disk hash table. The variable bit rate integers might
> be worthwhile, but will it be efficient to look up the counters for a
> particular function name?
> 
> That said, the ASTs also make use of the on disk hash that Dmitri
> mentioned for various indexes, which is definitely worth looking at.
> 
>> - Leverage the existing partial reading which has been heavily
optimized for
>>  modules, LLVM IR, etc.
>> - Use implicit-zero semantics for missing counters within a function
where we
>>  have *some* instrumentation results, and remove all zero counters
>> - Maybe other compression techniques
>> 
>> Thoughts? Specific reasons to avoid this? I'm very much interested
in
>> minimizing the space and runtime overhead of instrumentation, as well
as
>> getting more advanced features in the format read by Clang itself.
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Justin Bogner

2014-Mar-22 08:18 UTC

head link

[LLVMdev] RFC: Binary format for instrumentation based profiling data

Chandler Carruth <chandlerc at google.com> writes:> I think it would be worthwhile to consider the alternative of having
> the profile library write out data files in a format which is
> essentially "always" transformed by a post-processing tool before
> being used during compilation.
We seem to have some agreement that two formats for instrumentation
based profiling is worthwhile. These are that emitted by compiler-rt in
the instrumented program at runtime (format 1), and that which is
consumed by clang when compiling the program with PGO (format 2).

Format 1
--------

This format should be efficient to write, since the instrumented program
should run with as little overhead as possible. This also doesn't need
to be stable, and we can assume the same version of LLVM that was used
to instrument the program will read the counter data. As such, the file
format is versioned (so we can easily reject versions we don't
understand) and consists basically of a memory dump of the relevant
profiling counters.

Format 2
--------

This format should be efficient to read and preferably reasonably
compact. We'll convert from format 1 to format 2 using llvm-profdata,
and clang will use format 2 for PGO.

Since the only particularly important operation in this use case is fast
lookup, I propose using the on disk hash table that's currently used in
clang for AST serialization/PTH/etc with a small amount of metadata in a
header.

The hash table implementation currently lives in include/clang/Basic and
consists of a single header. Moving it to llvm and updating the clients
in clang should be easy. I'll send a brief RFC separately to see if
anyone's opposed to moving it.

Thoughts?

Robinson, Paul

2014-Mar-24 17:08 UTC

head link

[LLVMdev] RFC: Binary format for instrumentation based profiling data

> We seem to have some agreement that two formats for instrumentation
> based profiling is worthwhile. These are that emitted by compiler-rt in
> the instrumented program at runtime (format 1), and that which is
> consumed by clang when compiling the program with PGO (format 2).
> 
> Format 1
> --------
> 
> This format should be efficient to write, since the instrumented program
> should run with as little overhead as possible. This also doesn't need
> to be stable, and we can assume the same version of LLVM that was used
> to instrument the program will read the counter data. As such, the file
> format is versioned (so we can easily reject versions we don't
> understand) and consists basically of a memory dump of the relevant
> profiling counters.
The "same version" assertion isn't completely true, at a previous
job
we had clients who preferred not to regenerate profile data unless they
actually had to (because it was a big pain and took a long time).  But
as long as the versioning is based on actual format changes, not just
repurposing the current LLVM version number (making the previous data
unusable for no technical reason), that's okay.

As long as I'm bothering to say something, is there some way that the
tools will figure out that you're trying to apply old data to new files
that have changed in ways that make the old data inapplicable?  Sorry
if this has been brought up elsewhere and I just missed it.
--paulr
> 
> Format 2
> --------
> 
> This format should be efficient to read and preferably reasonably
> compact. We'll convert from format 1 to format 2 using llvm-profdata,
> and clang will use format 2 for PGO.
> 
> Since the only particularly important operation in this use case is fast
> lookup, I propose using the on disk hash table that's currently used in
> clang for AST serialization/PTH/etc with a small amount of metadata in a
> header.
> 
> The hash table implementation currently lives in include/clang/Basic and
> consists of a single header. Moving it to llvm and updating the clients
> in clang should be easy. I'll send a brief RFC separately to see if
> anyone's opposed to moving it.
> 
> 
> Thoughts?
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Chandler Carruth

2014-Mar-24 19:29 UTC

head link

[LLVMdev] RFC: Binary format for instrumentation based profiling data

On Sat, Mar 22, 2014 at 1:18 AM, Justin Bogner <mail at
justinbogner.com>wrote:
> Chandler Carruth <chandlerc at google.com> writes:
> > I think it would be worthwhile to consider the alternative of having
> > the profile library write out data files in a format which is
> > essentially "always" transformed by a post-processing tool
before
> > being used during compilation.
>
> We seem to have some agreement that two formats for instrumentation
> based profiling is worthwhile. These are that emitted by compiler-rt in
> the instrumented program at runtime (format 1), and that which is
> consumed by clang when compiling the program with PGO (format 2).
>
> Format 1
> --------
>
> This format should be efficient to write, since the instrumented program
> should run with as little overhead as possible. This also doesn't need
> to be stable, and we can assume the same version of LLVM that was used
> to instrument the program will read the counter data. As such, the file
> format is versioned (so we can easily reject versions we don't
> understand) and consists basically of a memory dump of the relevant
> profiling counters.
>
This makes perfect sense to me.

>
> Format 2
> --------
>
> This format should be efficient to read and preferably reasonably
> compact. We'll convert from format 1 to format 2 using llvm-profdata,
> and clang will use format 2 for PGO.
>
> Since the only particularly important operation in this use case is fast
> lookup, I propose using the on disk hash table that's currently used in
> clang for AST serialization/PTH/etc with a small amount of metadata in a
> header.
>
> The hash table implementation currently lives in include/clang/Basic and
> consists of a single header. Moving it to llvm and updating the clients
> in clang should be easy. I'll send a brief RFC separately to see if
> anyone's opposed to moving it.
>
I can mention this and we can discuss this on the other thread if you would
rather, but I'm not a huge fan of this code. My vague memory was that this
was a quick hack by Doug that he never really expected to live long-term.

I have a general preference for from-disk lookups to use tries (for
strings, prefix tries) or other fast, sorted lookup structures. They have
the nice property of being inherently stable and unambiguous, and not
baking any hashing algorithm into it.

I've not thought enough about how to make a general purpose one of these to
have a stronger opinion though; perhaps I should do so...
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140324/874bbcfc/attachment.html>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Mar 2014 - [LLVMdev] RFC: Binary format for instrumentation based profiling data

[LLVMdev] RFC: Binary format for instrumentation based profiling data

[LLVMdev] RFC: Binary format for instrumentation based profiling data

[LLVMdev] RFC: Binary format for instrumentation based profiling data

[LLVMdev] RFC: Binary format for instrumentation based profiling data

[LLVMdev] RFC: Binary format for instrumentation based profiling data

Maybe Matching Threads