Gulfem Savrun Yeniceri via llvm-dev
2021-Jun-15 00:47 UTC
[llvm-dev] [RFC] Adding Binary ID into LLVM Profiles
Motivation There is no direct way of associating binaries with the corresponding profiles in LLVM. Therefore, source code coverage processing requires an additional post-processing step to match the executables to their associated profiles. In order to improve it, we propose embedding binary IDs into profiles, so that we can uniquely identify a profile and easily find the relevant binary. Background Binary ID We use the name binary ID to refer to the unique identifiers used in binaries in different file formats. Build ID <https://fedoraproject.org/wiki/Releases/FeatureBuildId> is a unique identifier for the build that is included in the ELF file format. It was originally introduced in GNU, and is used for various purposes, such as assoicating binaries with core dumps. Build ID is optional, and can be enabled by using -Wl,--build-id options. To the best of our knowledge, similar unique identifiers are used in different file formats. For example, a unique identifier called LC_UUID is used in Mach-O, and similarly GUID (Globally Unique Identifier) is used in COFF. Profiling Clang supports profiling with instrumentation <https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation> for two main purposes: 1. Front-end instrumentation, where the compiler front-end inserts instrumentation for collecting source code coverage. 2. IR-level instrumentation, where LLVM inserts instrumentation during optimizations for PGO (Profile-Guided Optimization). Profiling inserts instrumentation code into binaries, which will be used by compiler_rt (compiler runtime) during execution. When the instrumented binary executes, it will write a raw profile (.profraw). Multiple raw profiles are merged together by using llvm-profdata <https://llvm.org/docs/CommandGuide/llvm-profdata.html> tool. At the end, a single indexed profile is created (.profdata) that is used to generate source code coverage reports. Profile format consists of two major parts: 1. Profile header includes version, magic (and paddings and sizes of each section in raw profile). 2. Profile data includes function name and hash, and pointers to three sections: counters, names and value profiling counters per function. Proposal We propose adding build ID, which is the unique binary ID in ELF, into profiles to improve source-code coverage post-processing step. Although we target ELF file format, we are proposing a design that can be leveraged and extended for other file formats, such as Mach-O and COFF. Extending profile format We need to extend the both raw and indexed profile format to include build ID. Since build ID does not have a fixed length, we will add a variable-length byte array at the end of profile formats. We will also change the compiler-rt profiling runtime for ELF platforms to read build IDs from ELF data in memory and write them into the raw profile. Extending profiling tools Since the profile format changes, we also need to extend the tools that process profiles. We need to extend the ProfileData library functions that llvm-profdata tool uses to operate on profiles, and add support for printing binary ids in the profiles. Future Work Embedding binary ids into profiles would also enable implementing support for debuginfod <https://sourceware.org/elfutils/Debuginfod.html> library in llvm-cov <https://lists.llvm.org/pipermail/llvm-dev/2020-August/144708.html>, where the tool will automatically download binaries corresponding to input profile. References - https://fedoraproject.org/wiki/Releases/FeatureBuildId - https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation - https://llvm.org/docs/CommandGuide/llvm-profdata.html - https://lists.llvm.org/pipermail/llvm-dev/2020-August/144708.html - https://sourceware.org/elfutils/Debuginfod.html Please let us know if you have any suggestions or questions. Thanks, Gülfem -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210614/59a7cfeb/attachment.html>
Xinliang David Li via llvm-dev
2021-Jun-24 21:39 UTC
[llvm-dev] [RFC] Adding Binary ID into LLVM Profiles
Hi Gulfem, current profile matching scheme supports function level mis-match detection which is at a finer level of granularity as the executable level build-id. What is the use case of this level of identification? David On Mon, Jun 14, 2021 at 5:47 PM Gulfem Savrun Yeniceri via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Motivation > > There is no direct way of associating binaries with the corresponding > profiles in LLVM. Therefore, source code coverage processing requires an > additional post-processing step to match the executables to their > associated profiles. In order to improve it, we propose embedding binary > IDs into profiles, so that we can uniquely identify a profile and easily > find the relevant binary. > > Background > Binary ID > > We use the name binary ID to refer to the unique identifiers used in > binaries in different file formats. Build ID > <https://fedoraproject.org/wiki/Releases/FeatureBuildId> is a unique > identifier for the build that is included in the ELF file format. It was > originally introduced in GNU, and is used for various purposes, such as > assoicating binaries with core dumps. Build ID is optional, and can be > enabled by using -Wl,--build-id options. To the best of our knowledge, > similar unique identifiers are used in different file formats. For example, > a unique identifier called LC_UUID is used in Mach-O, and similarly GUID (Globally > Unique Identifier) is used in COFF. > > Profiling > > Clang supports profiling with instrumentation > <https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation> > for two main purposes: > > 1. > > Front-end instrumentation, where the compiler front-end inserts > instrumentation for collecting source code coverage. > 2. > > IR-level instrumentation, where LLVM inserts instrumentation during > optimizations for PGO (Profile-Guided Optimization). > > > Profiling inserts instrumentation code into binaries, which will be used > by compiler_rt (compiler runtime) during execution. When the instrumented > binary executes, it will write a raw profile (.profraw). Multiple raw > profiles are merged together by using llvm-profdata > <https://llvm.org/docs/CommandGuide/llvm-profdata.html> tool. At the end, > a single indexed profile is created (.profdata) that is used to generate > source code coverage reports. > > Profile format consists of two major parts: > > 1. > > Profile header includes version, magic (and paddings and sizes of each > section in raw profile). > 2. > > Profile data includes function name and hash, and pointers to three > sections: counters, names and value profiling counters per function. > > > Proposal > > We propose adding build ID, which is the unique binary ID in ELF, into > profiles to improve source-code coverage post-processing step. Although we > target ELF file format, we are proposing a design that can be leveraged and > extended for other file formats, such as Mach-O and COFF. > Extending profile format > > We need to extend the both raw and indexed profile format to include build > ID. Since build ID does not have a fixed length, we will add a > variable-length byte array at the end of profile formats. We will also > change the compiler-rt profiling runtime for ELF platforms to read build > IDs from ELF data in memory and write them into the raw profile. > Extending profiling tools > > Since the profile format changes, we also need to extend the tools that > process profiles. We need to extend the ProfileData library functions > that llvm-profdata tool uses to operate on profiles, and add support for > printing binary ids in the profiles. > > Future Work > > Embedding binary ids into profiles would also enable implementing support > for debuginfod <https://sourceware.org/elfutils/Debuginfod.html> library > in llvm-cov > <https://lists.llvm.org/pipermail/llvm-dev/2020-August/144708.html>, > where the tool will automatically download binaries corresponding to input > profile. > > References > > - https://fedoraproject.org/wiki/Releases/FeatureBuildId > > - > https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation > > - https://llvm.org/docs/CommandGuide/llvm-profdata.html > > - https://lists.llvm.org/pipermail/llvm-dev/2020-August/144708.html > > - https://sourceware.org/elfutils/Debuginfod.html > > > > Please let us know if you have any suggestions or questions. > > > Thanks, > > > Gülfem > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210624/d438618a/attachment.html>