Justin Bogner
2014-Mar-13 01:09 UTC
[LLVMdev] RFC: Binary format for instrumentation based profiling data
Instrumentation-based profiling data is generated by instrumented binaries through library functions in compiler-rt, and read by the clang frontend to feed PGO. The current data format is a naive textual format. The files consist of a number of counters for each function in a program. Use cases ======== There are a few use cases that drive our requirements: A. Looking up the counters for a particular function needs to be efficient, so as to not slow down compilation too much when using the data for PGO. This is also presumably how this data would be accessed if using it for coverage. B. The file should be relatively quick to generate, and shouldn't require a large amount of memory overhead to write out. Instrumented programs will write this file out on termination or on demand. C. Iterating over all of the data must be reasonably simple, to support tools that summarize or merge data files. D. The format should be relatively compact. Programs with a large number of functions may be instrumented, and we don't want to create unnecessarily large files because of it. E. The format should be portable between systems. Ie, if Alice generates profile data for some program P using Machine A, Bob should be able to use that same profile data to instrument a copy of that same program on machine B. Partially updating the file is not a goal. Instrumented programs write out a profile for a single run, and multiple runs can be merged by a separate tool. Semantics ======== Semantically, we have a collection of functions, each with a number of counters associated with them. Functions are represented by strings, determined by the part of the frontend that both generates and uses this data. In our case, these are generally whatever clang thinks of as the function's name, with minor details added to disambiguate names that aren't necessarily unique. The counters are a list of unsigned 64 bit values. The frontend decides what constructs these belong to, but the first counter is for the function itself, and always exists. In order to validate that the mapping is correct, we need a hash for the frontend to compare against. Initially, we’re using the number of counters as the hash, but that won't be sufficient as time goes on. Format ===== The file should be a binary format in order to satisfy use case (D). As such, we need to choose an endianness in order to satisfy (E). A fixed endianness is simpler than writing in native-endian and byte-swapping if necessary on read, so we arbitrarily use little-endian. The file consists of three parts: 1. Header 2. Index 3. Counter data Header ------ The header contains metadata for parsing the file, and summarizing data that's expensive to recompute. Specifically: - A magic number to identify the file format. The character string "LPRF" has a nice ring to it. - A version number, to accomodate future changes to the file format. - An offset to the end of the index. This could alternatively be a size of the index, to which the header size would need to be added to get the offset, but that's slightly less convenient. - The maximum value of any function counter. This is easy to calculate when writing the file, but would require reading the entire file to calculate later. Index ----- An index is necessary in order to get fast look up for particular functions (use case A), and so that we can avoid paging in the entire data file when we're only interested in a small subset of the data. The index is used as an associative array from the function name to the data offset where the function's counter data can be found. The first implementation uses the naive approach of an unordered list of names and offsets. Each element will consist of: <string length> <character data> <data offset> This obviously has an obvious drawback in the "startup" cost of reading these into an associative data structure, but it's relatively compact and is an reasonable baseline to evaluate better solutions. Future directions may include: 1. An on-disk hash table. Questions like whether to use separate chaining or open addressing would need to be addressed. This solution takes at least as much space as the naive approach, but avoids the startup cost. 1a. A format like LLVM's dwarf accelerator tables. This is optimized for a different case than we're interested in (failed lookups rather than successful), so it's probably not appropriate. 2. An on-disk trie. This should save a fair amount of space due to the fact that C language names and C++ mangling tend to lead to a large number of names with common prefixes. This format also allows iterating over the functions in alphabetical order. Counter data ------------ The counter data is simply an array of unsigned 64 bit values. Given an offset found in the index, a sequence follows: <function hash> <number of counters> <counters...> This is all of the data needed for a given function. A potential future direction would be to represent counters with a variable length encoding. Many counters in a program are likely to be 0 or 1 for a particular run. Using 8B for each is expensive. However, there’s a natural tradeoff against use case B, as the size of the file is harder to predict when writing it out.
Diego Novillo
2014-Mar-13 12:48 UTC
[LLVMdev] RFC: Binary format for instrumentation based profiling data
On Wed, Mar 12, 2014 at 9:09 PM, Justin Bogner <mail at justinbogner.com> wrote:> Functions are represented by strings, determined by the part of the > frontend that both generates and uses this data. In our case, these are > generally whatever clang thinks of as the function's name, with minor > details added to disambiguate names that aren't necessarily unique.Why not just use mangled names here? No need to add minor details, nor ad-hoc disambiguation. If you're already using mangled names, then I'm not sure why we need disambiguating details.> The counter data is simply an array of unsigned 64 bit values. Given an > offset found in the index, a sequence follows: > > <function hash> <number of counters> <counters...> > > This is all of the data needed for a given function.How are counters represented? Are these line numbers together with the counter? Basic blocks? Edges? I wonder if it would make sense to use the existing gcov format for this. OTOH, we could provide a converter in the Profile library. Thanks. Diego.
Bob Wilson
2014-Mar-13 15:51 UTC
[LLVMdev] RFC: Binary format for instrumentation based profiling data
On Mar 13, 2014, at 5:48 AM, Diego Novillo <dnovillo at google.com> wrote:> On Wed, Mar 12, 2014 at 9:09 PM, Justin Bogner <mail at justinbogner.com> wrote: > >> Functions are represented by strings, determined by the part of the >> frontend that both generates and uses this data. In our case, these are >> generally whatever clang thinks of as the function's name, with minor >> details added to disambiguate names that aren't necessarily unique. > > Why not just use mangled names here? No need to add minor details, nor > ad-hoc disambiguation. If you're already using mangled names, then I'm > not sure why we need disambiguating details.We need to prepend the file name to distinguish functions with local linkage. Also, Objective-C methods do not get mangled, so we don’t want to say that we just use the mangled names.> >> The counter data is simply an array of unsigned 64 bit values. Given an >> offset found in the index, a sequence follows: >> >> <function hash> <number of counters> <counters...> >> >> This is all of the data needed for a given function. > > How are counters represented? Are these line numbers together with the > counter? Basic blocks? Edges?There are no line numbers, basic blocks, or edges. It is just a sequence of counters that the front-end knows how to map to the code (the same as with our current textual file format).> > I wonder if it would make sense to use the existing gcov format for > this. OTOH, we could provide a converter in the Profile library.This is pretty clearly a different format from gcov, and I don’t see how we could convert between them. But, I agree that it would be nice to collect the code for handling different kinds of profile formats in one library, even if those file formats are not interchangeable.
Dmitri Gribenko
2014-Mar-13 19:52 UTC
[LLVMdev] RFC: Binary format for instrumentation based profiling data
On Thu, Mar 13, 2014 at 1:09 AM, Justin Bogner <mail at justinbogner.com> wrote:> 1. An on-disk hash table. Questions like whether to use separate > chaining or open addressing would need to be addressed. This > solution takes at least as much space as the naive approach, but > avoids the startup cost.Just wanted to point out that Clang already has an on-disk hash table implementation in: include/clang/Basic/OnDiskHashTable.h Dmitri -- main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
Chandler Carruth
2014-Mar-17 21:07 UTC
[LLVMdev] RFC: Binary format for instrumentation based profiling data
On Wed, Mar 12, 2014 at 6:09 PM, Justin Bogner <mail at justinbogner.com>wrote:> Instrumentation-based profiling data is generated by instrumented > binaries through library functions in compiler-rt, and read by the clang > frontend to feed PGO. >Note that compiler-rt *cannot* link in LLVM libraries for many reasons: 1) The compiler-rt must be built for the target, the LLVM libraries are built for the host. 2) The compiler-rt source is under a different license 3) We typically want to avoid the large dependencies of LLVM from runtime libraries that need to be extremely light weight such as the profile runtime. So I think you will at least need to have the header file with basic structs, and endian-aware writing code duplicated. =/ This may explain some of the trouble you had with build systems at least. Use cases> ========> > There are a few use cases that drive our requirements: > > A. Looking up the counters for a particular function needs to be > efficient, so as to not slow down compilation too much when using the > data for PGO. This is also presumably how this data would be > accessed if using it for coverage. > > B. The file should be relatively quick to generate, and shouldn't > require a large amount of memory overhead to write out. Instrumented > programs will write this file out on termination or on demand. > > C. Iterating over all of the data must be reasonably simple, to support > tools that summarize or merge data files. > > D. The format should be relatively compact. Programs with a large > number of functions may be instrumented, and we don't want to create > unnecessarily large files because of it. > > E. The format should be portable between systems. Ie, if Alice > generates profile data for some program P using Machine A, Bob should > be able to use that same profile data to instrument a copy of that > same program on machine B. > > Partially updating the file is not a goal. Instrumented programs write > out a profile for a single run, and multiple runs can be merged by a > separate tool. >The other assumption here is that you want the same file format written by instrumentation and read back by the compiler. While I think that is an unsurprising goal, I think it creates quite a few limitations that I'd like to point out. I think it would be worthwhile to consider the alternative of having the profile library write out data files in a format which is essentially "always" transformed by a post-processing tool before being used during compilation. Limitations of using the same format in both places: - High burden on writing the file constrains the format (must be fast, must not use libraries, etc...) - Have to write and index even though the writer doesn't really need it. - Have to have the function name passed through the instrumentation, potentially duplicating it with debug info. - Can't use an extensible file format (like bitcode) to insulate readers of profile data from format changes. I'm imagining it might be nicer to have something along the lines of the following counter proposal. Define two formats: the format written by instrumentation, and the format read by the compiler. Split the use cases up. Specialize the formats based on the use cases. It does require the user to post-process the results, but it isn't clear that this is really a burden. Historically it has been needed to merge gcov profiles from different TUs, and it is still required to merge them from multiple runs. I think the results could be superior for both the writer and reader: Instrumentation written format: - No index, just header and counters - (optional) Omit function names, and use PC at a known point of the function, and rely on debug info to map back to function names. - Use a structure which can be mmap-ed directly by the instrumentation code (at least on LE systems) so that "writing the file on close" is just flushing the memory region to disk - Explicitly version format, and provide no stability going forward Profile reading format: - Use a bitcoded format much like Clang's ASTs do (or some other tagged format which allows extensions) - Leverage the existing partial reading which has been heavily optimized for modules, LLVM IR, etc. - Use implicit-zero semantics for missing counters within a function where we have *some* instrumentation results, and remove all zero counters - Maybe other compression techniques Thoughts? Specific reasons to avoid this? I'm very much interested in minimizing the space and runtime overhead of instrumentation, as well as getting more advanced features in the format read by Clang itself. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140317/d516e1c2/attachment.html>
Justin Bogner
2014-Mar-18 00:22 UTC
[LLVMdev] RFC: Binary format for instrumentation based profiling data
Chandler Carruth <chandlerc at google.com> writes:> The other assumption here is that you want the same file format written by > instrumentation and read back by the compiler. While I think that is an > unsurprising goal, I think it creates quite a few limitations that I'd like to > point out. I think it would be worthwhile to consider the alternative of > having the profile library write out data files in a format which is > essentially "always" transformed by a post-processing tool before being used > during compilation. > > Limitations of using the same format in both places: > - High burden on writing the file constrains the format (must be fast, must > not use libraries, etc...) > - Have to write and index even though the writer doesn't really need it. > - Have to have the function name passed through the instrumentation, > potentially duplicating it with debug info. > - Can't use an extensible file format (like bitcode) to insulate readers of > profile data from format changes. > > I'm imagining it might be nicer to have something along the lines of the > following counter proposal. Define two formats: the format written by > instrumentation, and the format read by the compiler. Split the use cases up. > Specialize the formats based on the use cases. It does require the user to > post-process the results, but it isn't clear that this is really a burden. > Historically it has been needed to merge gcov profiles from different TUs, and > it is still required to merge them from multiple runs.This is an interesting idea. The counter data itself without index is dead simple, so this approach for the instrumentation written format would certainly be nice for compiler-rt, at the small cost of needing two readers. We'd also need two writers, but that appears inevitable since one needs to live in compiler-rt.> I think the results could be superior for both the writer and reader: > > Instrumentation written format: > - No index, just header and counters > - (optional) Omit function names, and use PC at a known point of the function, > and rely on debug info to map back to function names.This depends a bit on whether or not the conversion tool should depend on the debug info being available. We'd need to weigh the usability cost against the size benefit.> - Use a structure which can be mmap-ed directly by the instrumentation code > (at least on LE systems) so that "writing the file on close" is just > flushing the memory region to diskIf this is feasible, we could also make the format is host endian and force the post-processing to byteswap as it reads. This avoids online work in favour of offline.> - Explicitly version format, and provide no stability going forward > > Profile reading format: > - Use a bitcoded format much like Clang's ASTs do (or some other tagged format > which allows extensions)I'm not entirely convinced a bitcoded format is going to gain us much over a simpler on disk hash table. The variable bit rate integers might be worthwhile, but will it be efficient to look up the counters for a particular function name? That said, the ASTs also make use of the on disk hash that Dmitri mentioned for various indexes, which is definitely worth looking at.> - Leverage the existing partial reading which has been heavily optimized for > modules, LLVM IR, etc. > - Use implicit-zero semantics for missing counters within a function where we > have *some* instrumentation results, and remove all zero counters > - Maybe other compression techniques > > Thoughts? Specific reasons to avoid this? I'm very much interested in > minimizing the space and runtime overhead of instrumentation, as well as > getting more advanced features in the format read by Clang itself.