Hi Diego, thanks for clarifying the difference between the two formats. I have noticed the new note in the "Sample Profile Format" section of the Clang guide clarifying that it is different from the coverage format. So, my further question is... Am I right in understanding that both formats can be used for PGO purposes then? I have tried the following, as in the Clang user guide: $ clang++ -O2 -fprofile-instr-generate code.cc -o code $ LLVM_PROFILE_FILE="code-%p.profraw" ./code $ llvm-profdata merge -output=code.profdata code-*.profraw $ clang++ -O2 -fprofile-instr-use=code.profdata code.cc -o code This produces a PGOptimized executable which performs differently (in fact, better!) than a normal O2 build, so I think the "code.profdata" file produced by the commands above is valid. If I look inside "code.profdata" with a text editor, the file is most definitely not the ASCII-based sampling profile file format. Now I know that this is to be expected because I have used the infrastructure designed for coverage to generate the file. So, if I understand correctly: - If you want to do PGO with a sampling profile file that you have somehow generated from data collected by an external profiler, then the format must be the ASCII text one described in the Clang guide. - However you can also use the infrastructure for coverage, and the file produced by such infrastructure, as an input to PGO (without caring too much about the format at this point, as you don't need to look inside the file). Is my understanding correct? In which case I would recommend to add a note to the "Profiling with Instrumentation" section as well, to state that the format produced by "llvm-profdata merge" is not the same as the one detailed just above that section. I now understand the difference, but I believe a reader who is approaching this for the first time could be misinterpreting the guide and they could assume the instrumentation approach also produces a sampling profile file in the ASCII format. Cheers, Dario Domizioli SN Systems - Sony Computer Entertainment Group On 22 May 2015 at 16:57, Diego Novillo <dnovillo at google.com> wrote:> On Fri, May 22, 2015 at 11:16 AM, Dario Domizioli > <dario.domizioli at gmail.com> wrote: > > Hi all, > > > > I am a bit confused about the documentation of the format of the profile > > data file. > > > > The Clang user guide here describes it as an ASCII text file: > > http://clang.llvm.org/docs/UsersManual.html#sample-profile-format > > > > Whereas the posts above and the referenced link describe it as a stream > of > > bytes containing LEB128s: > > http://www.llvm.org/docs/CoverageMappingFormat.html > > > > From experimenting with the latest trunk I can see the latter is correct > > (well, at least the file I get is not ASCII text). > > Should we update the Clang user guide documentation? > > Or am I just getting confused? Are there two formats, one used for > coverage > > and one used for PGO? > > You are looking at two unrelated formats. The first URL describes the > sampling profiling format. That is not used for coverage, only > optimization. > > There are two main profilers in LLVM. The sampling profiler uses > external profilers (e.g., Linux Perf) to produce sample information > that is then matched to the user code. There is no option to use the > sampling profiler for coverage (it would be a very poor match). > > The instrumentation profiler causes Clang to inject tracking code into > the user program. This is the one used for coverage. If you are > interested in coverage, you should read the second URL. > > I will clarify the documentation for sampling profiles. > > > Diego. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150528/d26fcce8/attachment.html>
Dario Domizioli <dario.domizioli at gmail.com> writes:> Hi Diego, > > thanks for clarifying the difference between the two formats. I have noticed > the new note in the "Sample Profile Format" section of the Clang guide > clarifying that it is different from the coverage format. > > So, my further question is... Am I right in understanding that both formats > can be used for PGO purposes then? > I have tried the following, as in the Clang user guide: > > $ clang++ -O2 -fprofile-instr-generate code.cc -o code > $ LLVM_PROFILE_FILE="code-%p.profraw" ./code > $ llvm-profdata merge -output=code.profdata code-*.profraw > $ clang++ -O2 -fprofile-instr-use=code.profdata code.cc -o code > > This produces a PGOptimized executable which performs differently (in fact, > better!) than a normal O2 build, so I think the "code.profdata" file produced > by the commands above is valid. > > If I look inside "code.profdata" with a text editor, the file is most > definitely not the ASCII-based sampling profile file format. Now I know that > this is to be expected because I have used the infrastructure designed for > coverage to generate the file. > > So, if I understand correctly: > - If you want to do PGO with a sampling profile file that you have somehow > generated from data collected by an external profiler, then the format must be > the ASCII text one described in the Clang guide. > - However you can also use the infrastructure for coverage, and the file > produced by such infrastructure, as an input to PGO (without caring too much > about the format at this point, as you don't need to look inside the file). > > Is my understanding correct?Yes, basically. There are two ways to do PGO with clang - using the sample based profiling or the instrumentation based profiling. The on disk formats for these two types of profiling have nothing to do with each other, and the instrumentation based profiling is also useful for coverage.> In which case I would recommend to add a note to the "Profiling with > Instrumentation" section as well, to state that the format produced by > "llvm-profdata merge" is not the same as the one detailed just above that > section.It sounds to me like this documentation just needs some clean up to be clearer up front about what's going on. If nobody else gets to it first, I'll take a shot at improving the situation when I have a chance.> I now understand the difference, but I believe a reader who is approaching > this for the first time could be misinterpreting the guide and they could > assume the instrumentation approach also produces a sampling profile file in > the ASCII format. > > Cheers, > Dario Domizioli > SN Systems - Sony Computer Entertainment Group > > On 22 May 2015 at 16:57, Diego Novillo <dnovillo at google.com> wrote: > > On Fri, May 22, 2015 at 11:16 AM, Dario Domizioli > <dario.domizioli at gmail.com> wrote: > > Hi all, > > > > I am a bit confused about the documentation of the format of the profile > > data file. > > > > The Clang user guide here describes it as an ASCII text file: > > http://clang.llvm.org/docs/UsersManual.html#sample-profile-format > > > > Whereas the posts above and the referenced link describe it as a stream > of > > bytes containing LEB128s: > > http://www.llvm.org/docs/CoverageMappingFormat.html > > > > From experimenting with the latest trunk I can see the latter is correct > > (well, at least the file I get is not ASCII text). > > Should we update the Clang user guide documentation? > > Or am I just getting confused? Are there two formats, one used for > coverage > > and one used for PGO? > > You are looking at two unrelated formats. The first URL describes the > sampling profiling format. That is not used for coverage, only > optimization. > > There are two main profilers in LLVM. The sampling profiler uses > external profilers (e.g., Linux Perf) to produce sample information > that is then matched to the user code. There is no option to use the > sampling profiler for coverage (it would be a very poor match). > > The instrumentation profiler causes Clang to inject tracking code into > the user program. This is the one used for coverage. If you are > interested in coverage, you should read the second URL. > > I will clarify the documentation for sampling profiles. > > Diego. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Thu, May 28, 2015 at 1:10 PM, Dario Domizioli <dario.domizioli at gmail.com> wrote:> Hi Diego, > > thanks for clarifying the difference between the two formats. I have noticed > the new note in the "Sample Profile Format" section of the Clang guide > clarifying that it is different from the coverage format. > > So, my further question is... Am I right in understanding that both formats > can be used for PGO purposes then? > I have tried the following, as in the Clang user guide: > > $ clang++ -O2 -fprofile-instr-generate code.cc -o code > $ LLVM_PROFILE_FILE="code-%p.profraw" ./code > $ llvm-profdata merge -output=code.profdata code-*.profraw > $ clang++ -O2 -fprofile-instr-use=code.profdata code.cc -o code > > This produces a PGOptimized executable which performs differently (in fact, > better!) than a normal O2 build, so I think the "code.profdata" file > produced by the commands above is valid. > > If I look inside "code.profdata" with a text editor, the file is most > definitely not the ASCII-based sampling profile file format. Now I know that > this is to be expected because I have used the infrastructure designed for > coverage to generate the file. > > So, if I understand correctly: > - If you want to do PGO with a sampling profile file that you have somehow > generated from data collected by an external profiler, then the format must > be the ASCII text one described in the Clang guide.Right. Note that this ASCII text format is just one of the 3 formats accepted by the sampling profiler. There is a more compact binary representation and a (yet unsubmitted) gcov variant that's used by GCC's sampling profiler. However, the fundamental difference is still the same. Regardless of what file format you use for the sampling profiler, that data is not suitable for coverage. Only the instrumentation generated with -fprofile-instr-generate can be used for coverage.> - However you can also use the infrastructure for coverage, and the file > produced by such infrastructure, as an input to PGO (without caring too much > about the format at this point, as you don't need to look inside the file).Well, you never need to care about the format for inspection. All the formats are read by llvm-profdata. All you need to care is that the data generated by sampling profilers is not really useful for coverage. Note that it would be ~trivial to use, but the results would be awful. Sampling is a pretty lossy approach.> In which case I would recommend to add a note to the "Profiling with > Instrumentation" section as well, to state that the format produced by > "llvm-profdata merge" is not the same as the one detailed just above that > section. > I now understand the difference, but I believe a reader who is approaching > this for the first time could be misinterpreting the guide and they could > assume the instrumentation approach also produces a sampling profile file in > the ASCII format.I agree. Thanks for pointing this out. I'll re-work this section. Diego.
On Thu, May 28, 2015 at 2:09 PM, Justin Bogner <mail at justinbogner.com> wrote:> It sounds to me like this documentation just needs some clean up to be > clearer up front about what's going on. If nobody else gets to it first, > I'll take a shot at improving the situation when I have a chance.I just reworked this section a bit in r238504. Dario, does this help? Justin, could you check me for consistency? Thanks. Diego.
Apparently Analagous Threads
- [LLVMdev] RFC - Improvements to PGO profile support
- [LLVMdev] RFC - Improvements to PGO profile support
- [LLVMdev] RFC - Improvements to PGO profile support
- [LLVMdev] RFC - Improvements to PGO profile support
- [LLVMdev] RFC - Improvements to PGO profile support