Venugopal Raghavan via llvm-dev
2020-Apr-08 05:52 UTC
[llvm-dev] Error with perf2bolt in LLVM BOLT
Hi, I was interested in trying out LLVM BOLT and generated profile data using Linux perf using the following: perf record -e cycles:u -o perf.data <command> This is without the use of LBR so I understand the performance improvements may not be much but this was more for becoming familiar with BOLT's commands. I then run: perf2bolt -nl -p perf.data -o perf.fdata <binary> and I get the following: PERF2BOLT: Starting data aggregation job for perf.data PERF2BOLT: spawning perf job to read events without LBR PERF2BOLT: spawning perf job to read mem events PERF2BOLT: spawning perf job to read process events PERF2BOLT: spawning perf job to read task events BOLT-INFO: Target architecture: x86_64 *BOLT-ERROR: input file was processed by BOLT. Cannot re-optimize.* Not sure why I get the above error. Can someone who has used BOLT help me? Thanks. Regards, Venugopal Raghavan. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200408/248ba556/attachment.html>
Rafael Auler via llvm-dev
2020-Apr-08 17:46 UTC
[llvm-dev] Error with perf2bolt in LLVM BOLT
Hi Venugopal, perf2bolt has strict demands on its inputs when generating the profile data file. The input binary to perf2bolt must be the same that was running when you launched perf record, and it will try to verify that by checking the build id, if the binary has one. It also assumes this will be the binary you will later optimize. Ordinarily, BOLT doesn’t optimize a binary that was already optimized by BOLT itself. That’s the message you are getting. You should try collecting data on a binary that you did not already optimize with BOLT. That said, if you really want it, it is possible to collect data in binaries that were already bolted, but you need some non-standard flags for that. You need to use -enable-bat when generating that binary. This flag will embed a translation table in your binary that perf2bolt uses to build the profile data suitable to be consumed in the original binary. This is non-standard because it is only really necessary in some large scale deployments where collecting the data in a special “no bolt” configuration can be inconvenient. Best, Rafael From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Venugopal Raghavan via llvm-dev <llvm-dev at lists.llvm.org> Reply-To: Venugopal Raghavan <venur2005 at gmail.com> Date: Tuesday, April 7, 2020 at 10:53 PM To: "llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org> Subject: [llvm-dev] Error with perf2bolt in LLVM BOLT Hi, I was interested in trying out LLVM BOLT and generated profile data using Linux perf using the following: perf record -e cycles:u -o perf.data <command> This is without the use of LBR so I understand the performance improvements may not be much but this was more for becoming familiar with BOLT's commands. I then run: perf2bolt -nl -p perf.data -o perf.fdata <binary> and I get the following: PERF2BOLT: Starting data aggregation job for perf.data PERF2BOLT: spawning perf job to read events without LBR PERF2BOLT: spawning perf job to read mem events PERF2BOLT: spawning perf job to read process events PERF2BOLT: spawning perf job to read task events BOLT-INFO: Target architecture: x86_64 BOLT-ERROR: input file was processed by BOLT. Cannot re-optimize. Not sure why I get the above error. Can someone who has used BOLT help me? Thanks. Regards, Venugopal Raghavan. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200408/f39683d3/attachment-0001.html>
Rafael Auler via llvm-dev
2020-Apr-09 19:05 UTC
[llvm-dev] Error with perf2bolt in LLVM BOLT
Hi Venugopal, Running a project with source code under a debugger is the fastest way I know to create a mental model of the most important classes and how a project is organized. I definitely recommend doing that. BOLT’s documentation would be a high-level view described in the 2019 CGO paper and some slides, but the technical details are only available in source code. However, I can give a quick overview to get you better prepared. Here are some key points: 1. Binary format is mostly abstracted away by LLVM’s libObject (see Binary.h and ObjectFile.h). We are currently working in a new bolt-only BinaryFormat abstraction to better encapsulate the gory details of manipulating object files. 2. llvm-bolt.cpp is the main tool entry point. All class hierarchy where real work happens is designed as a library, LLVM-style, and llvm-bolt.cpp is the main user of BOLT as a library. 3. The main control class in BOLT would be RewriteInstance. This represents the concept of a single binary rewrite that was requested by the user. 4. RewriteInstance will coordinate the entire rewrite process by first reading the binary, building BOLT’s IR to represent its contents, perform a pipeline of modifications (BinaryPasses) and then rewriting it in a separate output file. If your interest in BOLT is to write a pass, I would suggest setting breakpoints and paying special attention to BinaryPassManager. You can find the simplest passes and easier to understand under Passes/BinaryPasses.cpp. Just copy one of them and register your copy in BinaryPassManager. Your pass will be exposed to BOLT’s view of the world and you can write code to dump a snapshot of this view for a quick analysis (look for dump() methods in the objects exposed to your pass). BOLT’s IR top-level class is BinaryFunction. A BinaryFunction holds BinaryBasicBlock instances, which holds MCInst instances. Since BOLT operates with an augmented MCInst in comparison with the regular MCInst from LLVM, we have a special class to deal with operations on instructions and abstract the target machine. You can play around by using MCPlusBuilder and its subclass X86MCPlusBuilder to do all sort of work, such as creating/checking calls, branches, etc. If you want to instrument your binary, for example, take a look at Passes/Instrumentation.cpp to see how it accomplishes branch instrumentation. I hope this helps, Rafael From: Venugopal Raghavan <venur2005 at gmail.com> Date: Thursday, April 9, 2020 at 6:10 AM To: Rafael Auler <rafaelauler at fb.com> Subject: Re: [llvm-dev] Error with perf2bolt in LLVM BOLT Hi Rafael, Thanks for your reply. I think I understand the issue now. I had run BOLT a week ago successfully and I must have attempted perf2bolt specifying the binary that was optimized earlier. I think I will do a fresh run. I would like to understand the code in BOLT, the general flow and the data structures it uses. Is there any documentation on the code structure? What would suggest the fastest way to get some understanding of the code. My idea was to run llvm-bolt under gdb and step through the code, but I am not sure that is the best way. I am not familiar with binary formats and so on, so that is another obstacle I need to face. Thanks. Regards, Venu. On Wed, Apr 8, 2020 at 11:16 PM Rafael Auler <rafaelauler at fb.com<mailto:rafaelauler at fb.com>> wrote: Hi Venugopal, perf2bolt has strict demands on its inputs when generating the profile data file. The input binary to perf2bolt must be the same that was running when you launched perf record, and it will try to verify that by checking the build id, if the binary has one. It also assumes this will be the binary you will later optimize. Ordinarily, BOLT doesn’t optimize a binary that was already optimized by BOLT itself. That’s the message you are getting. You should try collecting data on a binary that you did not already optimize with BOLT. That said, if you really want it, it is possible to collect data in binaries that were already bolted, but you need some non-standard flags for that. You need to use -enable-bat when generating that binary. This flag will embed a translation table in your binary that perf2bolt uses to build the profile data suitable to be consumed in the original binary. This is non-standard because it is only really necessary in some large scale deployments where collecting the data in a special “no bolt” configuration can be inconvenient. Best, Rafael From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> on behalf of Venugopal Raghavan via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Reply-To: Venugopal Raghavan <venur2005 at gmail.com<mailto:venur2005 at gmail.com>> Date: Tuesday, April 7, 2020 at 10:53 PM To: "llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>" <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Subject: [llvm-dev] Error with perf2bolt in LLVM BOLT Hi, I was interested in trying out LLVM BOLT and generated profile data using Linux perf using the following: perf record -e cycles:u -o perf.data <command> This is without the use of LBR so I understand the performance improvements may not be much but this was more for becoming familiar with BOLT's commands. I then run: perf2bolt -nl -p perf.data -o perf.fdata <binary> and I get the following: PERF2BOLT: Starting data aggregation job for perf.data PERF2BOLT: spawning perf job to read events without LBR PERF2BOLT: spawning perf job to read mem events PERF2BOLT: spawning perf job to read process events PERF2BOLT: spawning perf job to read task events BOLT-INFO: Target architecture: x86_64 BOLT-ERROR: input file was processed by BOLT. Cannot re-optimize. Not sure why I get the above error. Can someone who has used BOLT help me? Thanks. Regards, Venugopal Raghavan. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200409/d86df3d3/attachment.html>
Venugopal Raghavan via llvm-dev
2020-Apr-10 04:59 UTC
[llvm-dev] Error with perf2bolt in LLVM BOLT
Hi Rafael, Thanks a lot for the detailed description. I think this would help me a lot. Regards, Venu. On Fri, Apr 10, 2020 at 12:35 AM Rafael Auler <rafaelauler at fb.com> wrote:> Hi Venugopal, > > > > Running a project with source code under a debugger is the fastest way I > know to create a mental model of the most important classes and how a > project is organized. I definitely recommend doing that. BOLT’s > documentation would be a high-level view described in the 2019 CGO paper > and some slides, but the technical details are only available in source > code. > > > > However, I can give a quick overview to get you better prepared. Here are > some key points: > > 1. Binary format is mostly abstracted away by LLVM’s libObject (see > Binary.h and ObjectFile.h). We are currently working in a new bolt-only > BinaryFormat abstraction to better encapsulate the gory details of > manipulating object files. > 2. llvm-bolt.cpp is the main tool entry point. All class hierarchy where > real work happens is designed as a library, LLVM-style, and llvm-bolt.cpp > is the main user of BOLT as a library. > 3. The main control class in BOLT would be RewriteInstance. This > represents the concept of a single binary rewrite that was requested by the > user. > 4. RewriteInstance will coordinate the entire rewrite process by first > reading the binary, building BOLT’s IR to represent its contents, perform a > pipeline of modifications (BinaryPasses) and then rewriting it in a > separate output file. > > If your interest in BOLT is to write a pass, I would suggest setting > breakpoints and paying special attention to BinaryPassManager. You can find > the simplest passes and easier to understand under Passes/BinaryPasses.cpp. > Just copy one of them and register your copy in BinaryPassManager. Your > pass will be exposed to BOLT’s view of the world and you can write code to > dump a snapshot of this view for a quick analysis (look for dump() methods > in the objects exposed to your pass). > > BOLT’s IR top-level class is BinaryFunction. A BinaryFunction holds > BinaryBasicBlock instances, which holds MCInst instances. Since BOLT > operates with an augmented MCInst in comparison with the regular MCInst > from LLVM, we have a special class to deal with operations on instructions > and abstract the target machine. You can play around by using MCPlusBuilder > and its subclass X86MCPlusBuilder to do all sort of work, such as > creating/checking calls, branches, etc. If you want to instrument your > binary, for example, take a look at Passes/Instrumentation.cpp to see how > it accomplishes branch instrumentation. > > I hope this helps, > > Rafael > > > > *From: *Venugopal Raghavan <venur2005 at gmail.com> > *Date: *Thursday, April 9, 2020 at 6:10 AM > *To: *Rafael Auler <rafaelauler at fb.com> > *Subject: *Re: [llvm-dev] Error with perf2bolt in LLVM BOLT > > > > Hi Rafael, > > > > Thanks for your reply. I think I understand the issue now. I had run BOLT > a week ago successfully and I must have attempted perf2bolt specifying the > binary that was optimized earlier. > > > > I think I will do a fresh run. > > > > I would like to understand the code in BOLT, the general flow and the data > structures it uses. Is there any documentation on the code structure? What > would suggest the fastest way to get some understanding of the code. My > idea was to run llvm-bolt under gdb and step through the code, but I am not > sure that is the best way. I am not familiar with binary formats and so on, > so that is another obstacle I need to face. > > > > Thanks. > > > > Regards, > > Venu. > > > > On Wed, Apr 8, 2020 at 11:16 PM Rafael Auler <rafaelauler at fb.com> wrote: > > Hi Venugopal, > > > > perf2bolt has strict demands on its inputs when generating the profile > data file. The input binary to perf2bolt must be the same that was running > when you launched perf record, and it will try to verify that by checking > the build id, if the binary has one. It also assumes this will be the > binary you will later optimize. Ordinarily, BOLT doesn’t optimize a binary > that was already optimized by BOLT itself. That’s the message you are > getting. You should try collecting data on a binary that you did not > already optimize with BOLT. > > > > That said, if you really want it, it is possible to collect data in > binaries that were already bolted, but you need some non-standard flags for > that. You need to use -enable-bat when generating that binary. This flag > will embed a translation table in your binary that perf2bolt uses to build > the profile data suitable to be consumed in the original binary. This is > non-standard because it is only really necessary in some large scale > deployments where collecting the data in a special “no bolt” configuration > can be inconvenient. > > > > Best, > > Rafael > > *From: *llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Venugopal > Raghavan via llvm-dev <llvm-dev at lists.llvm.org> > *Reply-To: *Venugopal Raghavan <venur2005 at gmail.com> > *Date: *Tuesday, April 7, 2020 at 10:53 PM > *To: *"llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org> > *Subject: *[llvm-dev] Error with perf2bolt in LLVM BOLT > > > > Hi, > > > > I was interested in trying out LLVM BOLT and generated profile data using > Linux perf using the following: > > > > perf record -e cycles:u -o perf.data <command> > > > > This is without the use of LBR so I understand the performance > improvements may not be much but this was more for becoming familiar with > BOLT's commands. > > > > I then run: > > > > perf2bolt -nl -p perf.data -o perf.fdata <binary> > > > > and I get the following: > > > > PERF2BOLT: Starting data aggregation job for perf.data > PERF2BOLT: spawning perf job to read events without LBR > PERF2BOLT: spawning perf job to read mem events > PERF2BOLT: spawning perf job to read process events > PERF2BOLT: spawning perf job to read task events > BOLT-INFO: Target architecture: x86_64 > *BOLT-ERROR: input file was processed by BOLT. Cannot re-optimize.* > > > > Not sure why I get the above error. Can someone who has used BOLT help me? > > > > Thanks. > > > > Regards, > > Venugopal Raghavan. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200410/7b11c632/attachment-0001.html>
Reasonably Related Threads
- Reaching definitions on Machine IR post register allocation
- [RFC] Propeller: A frame work for Post Link Optimizations
- [RFC] Propeller: A frame work for Post Link Optimizations
- [RFC] Propeller: A frame work for Post Link Optimizations
- [RFC] Propeller: A frame work for Post Link Optimizations