Francis Visoiu Mistrih via llvm-dev
2019-Jun-18 20:43 UTC
[llvm-dev] [RFC] Optimization remarks: LLVM bitstream format and future plans
Hello, We have been looking into making optimization remarks more scalable. We looked into a few formats that satisfy the following requirements: * allows streaming to a file: we want to avoid keeping all the remarks in memory * allows string deduplication: most of the strings are repeated [1] * is fast to parse: building clang with remarks results in 24,205,892 remarks * is compact to save on disk: building clang with YAML remarks results in 17.6GB of remarks * supports some kind of key-value pairing: we need to have arbitrary remark “arguments” [2] We took a look at a few formats: * YAML: 3. & 4. are very far from being reasonable using this format. * MessagePack [3]: having support for this in LLVM is an advantage for this format. It allowed us to make parsing 5.5x faster and remark files more than 2x smaller. * clangd’s RIFF-based format [4]. 1. & 5. are not satisfied here. * .dia: parsing this format (using libclang) is not fast enough for us. * custom format: we managed to make remarks 11x smaller, and parsing fast enough. The main concern with a custom format is the maintenance and versioning of the format. * LLVM bitstream: 1. by emitting a block per remark, we can stream to a file 2. by using a string table that is found in the metadata separately we can deduplicate strings 3. llvm-bcanalyzer runs in 20s over all the remark files for clang 4. total size of remarks for clang is 1.3GB -> 13.4x smaller 5. we can have an arbitrary number of records and describe them using abbreviations to provide a key-value-like pairing We decided to go ahead with LLVM bitstream since it satisfies all our requirements and it is well-known by the community. The remark generation part of the format is available for review at: https://reviews.llvm.org/D63466. Another goal is to make it easy to find remarks for a given object file or binary. The way we want to do this on Darwin is to follow the debug info model: add a section to the object file, make the linker ignore it, let dsymutil pack it up and put the final result in the .dSYM bundle. For that, I plan on making a few more changes: * Emit the bitstream metadata in the __LLVM,__remarks/.remarks section * Add the parsing logic to lib/Remarks/RemarksParser and make it usable through the C API * Add a tool, llvm-remarkutil, to merge the remarks from the object files into a standalone remark file * Add support do dsymutil to merge and generate a standalone remark file in the .dSYM bundle * Add support to llvm-remarkutil to convert from YAML to bitstream, to extract metadata from sections, and other utilities Please let me know what you think! Thanks, — Francis [1] 2x size reduction with https://reviews.llvm.org/rG7fee2b89fd6e5101bc590e0741f4d7a82b7715e1 [2] Usually, remarks have arbitrary arguments, like the “Args” part of: ``` --- !Missed Pass: inline Name: NoDefinition DebugLoc: { File: 'test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c', Line: 7, Column: 3 } Function: printArgsNoRet Args: - Callee: printf - String: ' will not be inlined into ' - Caller: printArgsNoRet DebugLoc: { File: 'test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c', Line: 6, Column: 0 } - String: ' because its definition is unavailable' ... ``` [3] https://msgpack.org/index.html [4] https://reviews.llvm.org/rG50f3631057f717448ba34b4175daaa81215fbd5e