I just read that LLVM project could be used to do static analysis on C/C++ codes using the analyzer Clang which the front end of LLVM. I wanted to know if it is possible to extract all the accesses to memory(variables, local as well as global) in the source code using LLVM. Is there any inbuilt library present in LLVM which I could use to extract this information. If not please suggest me how to write functions to do the same.(existing source code, reference, tutorial, example...) Of what i studied is, I need to first convert the source code into LLVM IR and then make an instrumenting pass which would go over this bitcode file and insert calls to do the analysis, but don't know exactly how to do it. Please suggest me how to go about it . thanks himanshu -- -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110923/e931dd87/attachment.html>
On 9/23/11 12:24 PM, Himanshu Shekhar wrote:> I just read that LLVM project could be used to do static analysis on > C/C++ codes using the analyzer Clang which the front end of LLVM. I > wanted to know if it is possible to extract all the accesses to > memory(variables, local as well as global) in the source code using LLVM.When doing analysis with Clang and LLVM, you first must make a choice about which IR to use: Clang's Abstract Syntax Tree (AST) or LLVM's SSA Intermediate Representation (IR). Clang takes source code and converts it into an AST; it later takes the AST and converts it to LLVM IR. LLVM then performs mid-level compiler analysis and optimization on code in LLVM IR form and then translates from LLVM IR to native code. Clang ASTs will give you much higher level information than LLVM IR. On the other hand, LLVM IR is probably easier to work with and is programming language agnostic. You might want to read about the LLVM Language Reference Manual (http://llvm.org/docs/LangRef.html) to get a feel of whether it is suitable for your analysis. There may be a similar document for Clang, but I'm not familiar with it since I haven't worked with Clang ASTs myself.> Is there any inbuilt library present in LLVM which I could use to > extract this information. If not please suggest me how to write > functions to do the same.(existing source code, reference, tutorial, > example...)It is easy to write an LLVM pass that plugs into the opt tool that searches for explicit accesses to memory. The LLVM load and store instructions access memory (similar to how loads and stores are used to access memory in a RISC instruction set). That said, it is not clear whether this is what you want to do. Some source-level variables are translated into one or more SSA virtual registers, so you'll never see a load or store to them (as they may never exist in memory but only in registers). Additionally, some loads and stores to memory are not visible at the LLVM IR level. For example, loads and stores to stack spill slots are not visible at the LLVM IR level because they're only created during code generation (and technically, they're generated in a third IR called Machine Instructions that is used specifically for code generation).> Of what i studied is, I need to first convert the source code into > LLVM IR and then make an instrumenting pass which would go over this > bitcode file and insert calls to do the analysis, but don't know > exactly how to do it.The first thing you need to do is figure out which representation of the program (Clang ASTs, LLVM IR, LLVM's code generation IR) is the best for solving your particular problem. If you want, you can provide more details on what you're trying to do; people on the list can then provide feedback on which representation is most suitable for what you want to do. If you decide to work with LLVM IR, I then recommend reading the "How to Write an LLVM Pass" document (http://llvm.org/docs/WritingAnLLVMPass.html) as well as the Programmer's Guide (http://llvm.org/docs/ProgrammersManual.html). Doxygen is also valuable (http://llvm.org/doxygen/). For an example of a pass that adds run-time checks to LLVM IR loads and stores, look at SAFECode's load/store instrumentation pass (http://llvm.org/viewvc/llvm-project/safecode/trunk/include/safecode/LoadStoreChecks.h?view=markup and http://llvm.org/viewvc/llvm-project/safecode/trunk/lib/InsertPoolChecks/LoadStoreChecks.cpp?view=markup). It's about as simple as an instrumentation pass gets. -- John T.> > Please suggest me how to go about it . > thanks > himanshu > -- > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110923/ffdd9130/attachment.html>
Hey John, Thank you for the detailed reply. I tried to figure out myself which IR should I use for my purpose ( Clang's Abstract Syntax Tree (AST) or LLVM's SSA Intermediate Representation (IR). ), but couldn't really figure out which one to use. Here is what I m trying to do. Given any C/C++ program (like the one given below), I am trying to insert calls to some function, before and after *every instruction that reads/writes to/from memory*. For example consider the below C++ program ( Account.cpp) /***********************************************************/ #include <stdio.h> class Account { int balance; public: Account(int b) { balance = b; } ~Account(){ } int read() { int r; r = balance; return r; } void deposit(int n) { balance = balance + n; } void withdraw(int n) { int r = read(); balance = r - n; } }; int main (){ Account* a = new Account(10); a->deposit(1); a->withdraw(2); delete a; } /***********************************************************/ So after the instrumentation my program should look like : /***********************************************************/ #include <stdio.h> class Account { int balance; public: Account(int b) { balance = b; } ~Account(){ } int read() { int r; foo(); r = balance; foo(); return r; } void deposit(int n) { foo(); balance = balance + n; foo(); } void withdraw(int n) { foo(); int r = read(); foo(); foo(); balance = r - n; foo(); } }; int main (){ Account* a = new Account(10); a->deposit(1); a->withdraw(2); delete a; } /***********************************************************/ where *foo() *may be any function like get the current system time or increment a counter .. so on. I understand that to insert function like above I will have to first get the IR and then run an instrumentation pass on the IR which will insert such calls into the IR, but I don't really know how to achieve it. Please suggest me with examples how to go about it. Also I understand that once I compile the program into the IR, it would be really difficult to get 1:1 mapping between my original program and the instrumented IR. So, is it possible to reflect the changes made in the IR ( because of instrumentation ) into the original program. In order to get started with LLVM pass and how to make one on my own, I looked at an example of a pass that adds run-time checks to LLVM IR loads and stores, the SAFECode's load/store instrumentation pass ( http://llvm.org/viewvc/llvm-project/safecode/trunk/include/safecode/LoadStoreChecks.h?view=markupand http://llvm.org/viewvc/llvm-project/safecode/trunk/lib/InsertPoolChecks/LoadStoreChecks.cpp?view=markup). But I couldn't figure out how to run this pass. Please give me steps how to run this pass on some program say the above Account.cpp. Thanks, Himanshu On Fri, Sep 23, 2011 at 11:13 PM, John Criswell <criswell at illinois.edu>wrote:> On 9/23/11 12:24 PM, Himanshu Shekhar wrote: > > I just read that LLVM project could be used to do static analysis on C/C++ > codes using the analyzer Clang which the front end of LLVM. I wanted to know > if it is possible to extract all the accesses to memory(variables, local as > well as global) in the source code using LLVM. > > > When doing analysis with Clang and LLVM, you first must make a choice about > which IR to use: Clang's Abstract Syntax Tree (AST) or LLVM's SSA > Intermediate Representation (IR). Clang takes source code and converts it > into an AST; it later takes the AST and converts it to LLVM IR. LLVM then > performs mid-level compiler analysis and optimization on code in LLVM IR > form and then translates from LLVM IR to native code. > > Clang ASTs will give you much higher level information than LLVM IR. On > the other hand, LLVM IR is probably easier to work with and is programming > language agnostic. > > You might want to read about the LLVM Language Reference Manual ( > http://llvm.org/docs/LangRef.html) to get a feel of whether it is suitable > for your analysis. There may be a similar document for Clang, but I'm not > familiar with it since I haven't worked with Clang ASTs myself. > > > Is there any inbuilt library present in LLVM which I could use to extract > this information. If not please suggest me how to write functions to do the > same.(existing source code, reference, tutorial, example...) > > > It is easy to write an LLVM pass that plugs into the opt tool that searches > for explicit accesses to memory. The LLVM load and store instructions > access memory (similar to how loads and stores are used to access memory in > a RISC instruction set). That said, it is not clear whether this is what > you want to do. Some source-level variables are translated into one or more > SSA virtual registers, so you'll never see a load or store to them (as they > may never exist in memory but only in registers). Additionally, some loads > and stores to memory are not visible at the LLVM IR level. For example, > loads and stores to stack spill slots are not visible at the LLVM IR level > because they're only created during code generation (and technically, > they're generated in a third IR called Machine Instructions that is used > specifically for code generation). > > > > Of what i studied is, I need to first convert the source code into LLVM IR > and then make an instrumenting pass which would go over this bitcode file > and insert calls to do the analysis, but don't know exactly how to do it. > > > The first thing you need to do is figure out which representation of the > program (Clang ASTs, LLVM IR, LLVM's code generation IR) is the best for > solving your particular problem. If you want, you can provide more details > on what you're trying to do; people on the list can then provide feedback on > which representation is most suitable for what you want to do. > > If you decide to work with LLVM IR, I then recommend reading the "How to > Write an LLVM Pass" document (http://llvm.org/docs/WritingAnLLVMPass.html) > as well as the Programmer's Guide ( > http://llvm.org/docs/ProgrammersManual.html). Doxygen is also valuable ( > http://llvm.org/doxygen/). > > For an example of a pass that adds run-time checks to LLVM IR loads and > stores, look at SAFECode's load/store instrumentation pass ( > http://llvm.org/viewvc/llvm-project/safecode/trunk/include/safecode/LoadStoreChecks.h?view=markupand > http://llvm.org/viewvc/llvm-project/safecode/trunk/lib/InsertPoolChecks/LoadStoreChecks.cpp?view=markup). > It's about as simple as an instrumentation pass gets. > > -- John T. > > > Please suggest me how to go about it . > thanks > himanshu > -- > > > > > _______________________________________________ > LLVM Developers mailing listLLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110928/b7cfbf71/attachment.html>
Hi, Himanshu. I once wrote an LLVM IR-based memory profiling pass. Basically, I followed the code for EdgeProfiling. The source code is enclosed here, which worked with LLVM 2.8. Hope it is helpful. MemoryProfiling.cpp---the instrumentation pass, which inserts profiling function calls into the original program MemoryProfiling.c---the profiling library containing the profiling calls llvm-memory-profiling.patch---the other modifications notes.txt---some information collected when I was working on this profiling pass Xiaoming On Tue, Sep 27, 2011 at 7:13 PM, Himanshu Shekhar <imhimanshu91 at gmail.com>wrote:> Hey John, > Thank you for the detailed reply. > I tried to figure out myself which IR should I use for my purpose ( Clang's > Abstract Syntax Tree (AST) or LLVM's SSA Intermediate Representation (IR). > ), but couldn't really figure out which one to use. > Here is what I m trying to do. > Given any C/C++ program (like the one given below), I am trying to insert > calls to some function, before and after *every instruction that > reads/writes to/from memory*. For example consider the below C++ program > ( Account.cpp) > /***********************************************************/ > > #include <stdio.h> > > class Account { > int balance; > > public: > Account(int b) > { > balance = b; > } > ~Account(){ } > > int read() { > int r; > r = balance; > return r; > } > > void deposit(int n) { > balance = balance + n; > } > > void withdraw(int n) { > int r = read(); > balance = r - n; > } > }; > > int main (){ > Account* a = new Account(10); > a->deposit(1); > a->withdraw(2); > delete a; > } > > /***********************************************************/ > So after the instrumentation my program should look like : > > /***********************************************************/ > > #include <stdio.h> > > class Account { > int balance; > > public: > Account(int b) > { > balance = b; > } > ~Account(){ } > > int read() { > int r; > foo(); > r = balance; > foo(); > return r; > } > > void deposit(int n) { > foo(); > balance = balance + n; > foo(); > } > > void withdraw(int n) { > foo(); > int r = read(); > foo(); > foo(); > balance = r - n; > foo(); > } > }; > > int main (){ > Account* a = new Account(10); > a->deposit(1); > a->withdraw(2); > delete a; > } > > /***********************************************************/ > where *foo() *may be any function like get the current system time or > increment a counter .. so on. I understand that to insert function like > above I will have to first get the IR and then run an instrumentation pass > on the IR which will insert such calls into the IR, but I don't really know > how to achieve it. Please suggest me with examples how to go about it. > Also I understand that once I compile the program into the IR, it would be > really difficult to get 1:1 mapping between my original program and the > instrumented IR. So, is it possible to reflect the changes made in the IR ( > because of instrumentation ) into the original program. > > In order to get started with LLVM pass and how to make one on my own, I > looked at an example of a pass that adds run-time checks to LLVM IR loads > and stores, the SAFECode's load/store instrumentation pass ( > http://llvm.org/viewvc/llvm-project/safecode/trunk/include/safecode/LoadStoreChecks.h?view=markupand > http://llvm.org/viewvc/llvm-project/safecode/trunk/lib/InsertPoolChecks/LoadStoreChecks.cpp?view=markup). > But I couldn't figure out how to run this pass. Please give me steps how to > run this pass on some program say the above Account.cpp. > > Thanks, > Himanshu > > > > > On Fri, Sep 23, 2011 at 11:13 PM, John Criswell <criswell at illinois.edu>wrote: > >> On 9/23/11 12:24 PM, Himanshu Shekhar wrote: >> >> I just read that LLVM project could be used to do static analysis on >> C/C++ codes using the analyzer Clang which the front end of LLVM. I wanted >> to know if it is possible to extract all the accesses to memory(variables, >> local as well as global) in the source code using LLVM. >> >> >> When doing analysis with Clang and LLVM, you first must make a choice >> about which IR to use: Clang's Abstract Syntax Tree (AST) or LLVM's SSA >> Intermediate Representation (IR). Clang takes source code and converts it >> into an AST; it later takes the AST and converts it to LLVM IR. LLVM then >> performs mid-level compiler analysis and optimization on code in LLVM IR >> form and then translates from LLVM IR to native code. >> >> Clang ASTs will give you much higher level information than LLVM IR. On >> the other hand, LLVM IR is probably easier to work with and is programming >> language agnostic. >> >> You might want to read about the LLVM Language Reference Manual ( >> http://llvm.org/docs/LangRef.html) to get a feel of whether it is >> suitable for your analysis. There may be a similar document for Clang, but >> I'm not familiar with it since I haven't worked with Clang ASTs myself. >> >> >> Is there any inbuilt library present in LLVM which I could use to extract >> this information. If not please suggest me how to write functions to do the >> same.(existing source code, reference, tutorial, example...) >> >> >> It is easy to write an LLVM pass that plugs into the opt tool that >> searches for explicit accesses to memory. The LLVM load and store >> instructions access memory (similar to how loads and stores are used to >> access memory in a RISC instruction set). That said, it is not clear >> whether this is what you want to do. Some source-level variables are >> translated into one or more SSA virtual registers, so you'll never see a >> load or store to them (as they may never exist in memory but only in >> registers). Additionally, some loads and stores to memory are not visible >> at the LLVM IR level. For example, loads and stores to stack spill slots >> are not visible at the LLVM IR level because they're only created during >> code generation (and technically, they're generated in a third IR called >> Machine Instructions that is used specifically for code generation). >> >> >> >> Of what i studied is, I need to first convert the source code into LLVM IR >> and then make an instrumenting pass which would go over this bitcode file >> and insert calls to do the analysis, but don't know exactly how to do it. >> >> >> The first thing you need to do is figure out which representation of the >> program (Clang ASTs, LLVM IR, LLVM's code generation IR) is the best for >> solving your particular problem. If you want, you can provide more details >> on what you're trying to do; people on the list can then provide feedback on >> which representation is most suitable for what you want to do. >> >> If you decide to work with LLVM IR, I then recommend reading the "How to >> Write an LLVM Pass" document (http://llvm.org/docs/WritingAnLLVMPass.html) >> as well as the Programmer's Guide ( >> http://llvm.org/docs/ProgrammersManual.html). Doxygen is also valuable ( >> http://llvm.org/doxygen/). >> >> For an example of a pass that adds run-time checks to LLVM IR loads and >> stores, look at SAFECode's load/store instrumentation pass ( >> http://llvm.org/viewvc/llvm-project/safecode/trunk/include/safecode/LoadStoreChecks.h?view=markupand >> http://llvm.org/viewvc/llvm-project/safecode/trunk/lib/InsertPoolChecks/LoadStoreChecks.cpp?view=markup). >> It's about as simple as an instrumentation pass gets. >> >> -- John T. >> >> >> Please suggest me how to go about it . >> thanks >> himanshu >> -- >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing listLLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110927/04f168cf/attachment.html> -------------- next part -------------- *study LLVM and add a memory access profiling pass into it **how to write a pass http://llvm.org/docs/WritingAnLLVMPass.html **add a memory profiling pass to llvm 1. copy lib/Transform/Instrumentation/EdgeProfiling.cpp to lib/Transform/Instrumentation/MemoryProfiling.cpp 2. edit MemoryProfiling.cpp to adapt to the new pass 3. add a line to include/llvm/LinkAllPasses.h 4. add a line to include/llvm/Transforms/Instrumentation.h **the compilation process of llvm llvmc is the driver calling the following steps 1. llvm-gcc/llvm-g++/llvm-gfortran frontend C/C++/Fortran => .ll => .bc 2. opt (use -opt option) language-independent and machine-independent transformations .bc => .bc 3. llc code generator .bc => .s 4. as assembler .s => .o 5. ld linker .o => executable **call the memory profiling pass llvmc -opt -Wo,=-insert-memory-profiling xxx.c **The position of edge profiling pass with "-insert-edge-profiling" Case 1: "-O3 -insert-edge-profiling" (USE THIS WAY!!!) The separate function-level passes at the beginning are bypassed. And the module pass "Edge Profiler" is called almost at the end, just before the last the "Function Pass Manager" pass. Case 2: "-insert-edge-profiling -O3" The separate function-level passes at the beginning are remained. The module pass "Edge Profiler" is the first one of the module-level passes. **llvm edge profiling related stuffs lib/Transforms/Instrumentation/ => Debug/lib/libLLVMInstrumentation.a lib/libprofile/ => Debug/lib/profile_rt.dylib add "BUILD_ARCHIVE = 1" to runtime/libprofile/Makefile lib/libprofile/ => Debug/lib/profile_rt.a **llvm memory profiling llvmc -v -O3 -opt -Wo,=-insert-memory-profiling -Wl,=/Users/xiaoming/Work/llvm/llvm-2.7/Debug/lib/profile_rt.a -o SOR SOR.c **change to use gold plugin for LLVM 1. build binutils gold a) ./configure --prefix=/home/vax6/p28/compiler2/xiaoming/INSTALL/binutils --enable-gold=both/gold --enable-lto --enable-plugins --enable-build-with-cxx b) make;make install c) the install ld is gold 2. build llvm-2.7 a) ./configure --with-binutils-include=/home/vax6/p28/compiler2/xiaoming/binutils-2.20.51/include --prefix=/home/vax6/p28/compiler2/xiaoming/INSTALL/llvm-2.7 --enable-optimized b) make;make install c) the built gold plugin is /home/vax6/p28/compiler2/xiaoming/LLVM/llvm-2.7/build/Release/lib/libLLVMgold.so 3. make a software link to the gold plugin in /home/vax6/p28/compiler2/xiaoming/LLVM/llvm-gcc-4.2-2.7-i686-linux/libexec/gcc/i686-pc-linux-gnu/4.2.1, which is a sub-directory of gcc front end for llvm -------------- next part -------------- A non-text attachment was scrubbed... Name: llvm-memory-profiling.patch Type: application/octet-stream Size: 3513 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110927/04f168cf/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: MemoryProfiling.c Type: text/x-csrc Size: 2207 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110927/04f168cf/attachment.c> -------------- next part -------------- A non-text attachment was scrubbed... Name: MemoryProfiling.cpp Type: text/x-c++src Size: 4634 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110927/04f168cf/attachment.cpp>