Mingliang LIU
2013-Apr-27 19:53 UTC
[LLVMdev] GSoC Proposal: Inter-Procedure Program Slicing in LLVM
Hi all, This is a GSoC 2013 proposal for LLVM project. Please see the formatted version at here: http://pacman.cs.tsinghua.edu.cn/~liuml07/files/gsoc2013-proposal-program-slicing.pdf Program slicing has been used in many applications, the criteria of which is a pair of statement and variables. I would like to write an inter-procedural program slicing pass in LLVM, which is able to calculate C program slices of source code effectively. There is no previous work implemented in LLVM, which considers both the dynamic program slicing and source code of the sliced program. Program slice contains all statements in a program that directly or indirectly act the value of a variable occurrence [5]. Program slicing has been used in many applications, e.g., program verification, testing, maintenance, automatic parallelization of program execution, automatic integration of program versions. While it's straightforward to implement the slicer in the back-end of compiler using SSA form, the source code of the original program instead of intermediate representation is preferred in most cases. Moreover, we can further narrow the notion of slice, which contains statements that influence the value of a variable occurrence for specic program inputs. This is referred as dynamic program slicing [1]. However, there is no previous work implemented in LLVM which solved the two problems. There are two public projects which implement the backwards static slicing in LLVM. Giri Written by John Criswell from UIUC, a subproject of LLVM. The Giri code contains the static backwards intra-procedure slicing passes, and runs with an older version of LLVM. It also only backtracks until it hits a load. Additional code must be written to backtrack further to find potentially reaching stores. LLVMSlicer This implementation is a complete static backwards slicer from Masaryk University. It works on the well dened data and control flow equations in a white paper by F. Tip [4]. However, this code was written for special purpose, thus it's not general enough to be use by others. They implemented the Andersen's alias algorithm [2], callgraph, and modies analysis to support the slicer, instead of using the LLVM APIs. They eliminate the useless IR statements and keep the ones aect the values of the criteria. However, neither of them generates the compilable source code slice, which is heavily needed in reality. There are several ways to do this. One is to generate the source code from sliced IR using llc tool. The issue is that the IR is not concerned with high-level semantic. The generated source code is different from the original program and not suitable for human reading. An- other approach is deleting the source code according to sliced IR, with line number information (in meta-data of each instruction). The naive script deleting sliced source code one by one fails to handle tricky cases. I think a better source code slicer is to take use of the AST info. There are three main parts of this work. First, borrow an implementation of static back-wards slicing from Giri or LLVMSlicer, and use the LLVM callgraph, mod/ref and alias interfaces as much as possible. Second, implement the dynamic program slicing using the approach 3 in the paper [1]. Third, generate the source code of the sliced program. To make the sliced source code compile directly, we need to employ clang front-end The final result of this summer of code is to make this pass work eeffectively and documented well. Further, I'll write test cases and behave as the active maintainer for this project. My long-term plan is to add more features, e.g. Objective-C/C++ support, thing slicing [3], to this project. Any comment is highly appreciated. [1] H AGRAWAL. Dynamic Program Slicing. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI),1990. 2] L.O. Andersen. Program analysis and specialization for the C programming language. PhD thesis, University of Cophenhagen, Germany, 1994. [3] M. Sridharan, S.J. Fink, and R. Bodik. Thin slicing. In PLDI'07, 2007. [4] F. Tip. A survey of program slicing techniques.Journal of Programming Languages, 1995. [5] M. Weiser. Program slicing. In Proceedings of the 5th International Conference on Software Engineering, ICSE, pages 439{449. IEEE, 1981. -- Mingliang LIU ( in Chinese) PACMAN Group, Dept. of Computer Science & Technology Tsinghua University, Beijing 100084, China Email: liuml07 at mails.tsinghua.edu.cn Homepage: http://pacman.cs.tsinghua.edu.cn/~liuml07/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130428/c2c6b584/attachment.html>
John Criswell
2013-Apr-29 14:40 UTC
[LLVMdev] GSoC Proposal: Inter-Procedure Program Slicing in LLVM
On 4/27/13 2:53 PM, Mingliang LIU wrote:> Hi all, > > This is a GSoC 2013 proposal for LLVM project. Please see the > formatted version at here: > http://pacman.cs.tsinghua.edu.cn/~liuml07/files/gsoc2013-proposal-program-slicing.pdf > <http://pacman.cs.tsinghua.edu.cn/%7Eliuml07/files/gsoc2013-proposal-program-slicing.pdf>I wanted to comment up-front to our GSoC mentors this year that an LLVM dynamic slicing pass is one of the most often requested components we get from the research community.> > Program slicing has been used in many applications, the criteria > of which is a pair of statement and variables. I would like to write > an inter-procedural program slicing pass in LLVM, which is able > to calculate C program slices of source code effectively. There is no > previous work implemented in LLVM, which considers both the dynamic > program slicing and source code of the sliced program.Actually, there is a dynamic slicing tool built using LLVM, and it maps LLVM IR statements to source-level statements for its output using the debug metadata. Swarup Sahoo and I built it and used it in our recent ASPLOS paper (http://dl.acm.org/citation.cfm?id=2451131). The tool is named Giri. As an aside, the "Giri" code you described below is a static slicing pass that we originally intended to add to our dynamic slicing code and release as one project called "Giri." That project has stalled, and if it doesn't get revitalized, I think it should be moved elsewhere (perhaps github?). What I think would be good for your project would be to take the Giri code that we've developed, update it to LLVM mainline, and make it more robust. We can provide the code to you if you get selected for GSoC. Unfortunately, I won't be mentoring this year. Perhaps Swarup (CC'ed above) or someone else will be interested.> Program slice contains all statements in a program that directly or > indirectly act the value of a variable occurrence [5]. Program slicing > has been used in many applications, e.g., program verification, > testing, maintenance, automatic parallelization of program execution, > automatic integration of program versions. > > While it's straightforward to implement the slicer in the back-end of > compiler using SSA form, the source code of the original program > instead of intermediate > representation is preferred in most cases. Moreover, we can further > narrow the notion of slice, which contains statements that influence > the value of a > variable occurrence for specic program inputs. This is referred as> dynamic program slicing [1]. However, there is no previous work > implemented in LLVM > which solved the two problems. > > There are two public projects which implement the backwards static > slicing in LLVM. > > * Giri Written by John Criswell from UIUC, a subproject of LLVM. The > Giri code contains the static backwards intra-procedure slicing > passes, and runs with an older version of LLVM. It also only > backtracks until it hits a load. Additional code must be written > to backtrack further to find potentially reaching stores. > * LLVMSlicer This implementation is a complete static backwards > slicer from Masaryk University. It works on the well dened data> and control flow equations in a white paper by F. Tip > [4]. However, this code was written for special purpose, thus it's > not general enough to be use by others. They implemented the > Andersen's alias algorithm [2], callgraph, and modies analysis to> support the slicer, instead of using the LLVM APIs. > > > They eliminate the useless IR statements and keep the ones aect the> values of the criteria. However, neither of them generates the > compilable source code > slice, which is heavily needed in reality. There are several ways to > do this. One is to generate the source code from sliced IR using llc > tool. The issue is that > the IR is not concerned with high-level semantic. The generated source > code is different from the original program and not suitable for human > reading. An- > other approach is deleting the source code according to sliced IR, > with line number information (in meta-data of each instruction). The > naive script deleting > sliced source code one by one fails to handle tricky cases. I think a > better source code slicer is to take use of the AST info.If you want to generate *compilable* source code, then I think you either need to go with the first option (using llc with the C backend) or you need to implement part or all of the slicing code within Clang. The second option you describe (using debug meta-data) will probably be too fragile to work in practice; debug information can be removed, and it may not capture things like the inclusion of header files. That said, can you provide citations showing that being able to compile the slice is needed? Does the slice need to be readable or just compilable? It is not clear to me that a compilable and human-readable slice is needed by most applications needing dynamic slicing. I suspect only one or the other is needed.> > There are three main parts of this work. > > 1. First, borrow an implementation of static back-wards slicing from > Giri or LLVMSlicer, and use the LLVM callgraph, mod/ref and alias > interfaces as much as possible. >For what purpose would you use the static analysis? Would you use it to optimize the instrumentation needed for dynamic slicing?> 1. Second, implement the dynamic program slicing using the approach 3 > in the paper [1]. >You should take a look at Giri's algorithm in our ASPLOS paper. While not novel, it does have a nifty feature that uses the SSA graph to reduce the number of events that it needs to trace to create the dynamic slice.> 1. Third, generate the source code of the sliced program. To make the > sliced source code compile directly, we need to employ clang front-end > > > The final result of this summer of code is to make this pass work > eeffectively and documented well. Further, I'll write test cases and> behave as the active > maintainer for this project. My long-term plan is to add more > features, e.g. Objective-C/C++ support, thing slicing [3], to this > project.What is your experience with LLVM IR and/or Clang ASTs? Taking our Giri code and making it robust seems doable over the summer. Building something with Clang AST's and LLVM IR and having it solid and well documented by end of summer seems a bit ambitious.> > Any comment is highly appreciated.I think your proposal should make a case that dynamic slicing is a very useful thing. For example, you should cite some papers that make use of it. While I know that dynamic slicing is useful (we keep making one-off distributions of our dynamic slicing code for researchers), that is not the most convincing argument. If you can find real-world tools that use it (or could benefit from it), that's even better. Second, your proposal should list your qualifications for the project. Why are you the right person for this task? Do you have experience writing code? Working with dynamic slicing? Have you written LLVM/Clang passes before? Third, you should explain how all the parts fit together. If you're working on dynamic slicing, it's not clear to me why you want a static inter-procedural slicing pass. Fourth, you should discuss how your passes will integrate into the LLVM toolchain. Will you be adding an option to clang? Will you build a replacement libLTO to do the inter-procedural slicing, or do you expect people to use opt to run it? -- John T.> > > [1] H AGRAWAL. Dynamic Program Slicing. In ACM SIGPLAN Conference on > Programming Language Design and Implementation (PLDI),1990. > 2] L.O. Andersen. Program analysis and specialization for the C > programming language. PhD thesis, University of Cophenhagen, Germany, > 1994. > [3] M. Sridharan, S.J. Fink, and R. Bodik. Thin slicing. In PLDI'07, 2007. > [4] F. Tip. A survey of program slicing techniques.Journal of > Programming Languages, 1995. > [5] M. Weiser. Program slicing. In Proceedings of the 5th > International Conference on Software Engineering, ICSE, pages 439{449. > IEEE, 1981. > > -- > Mingliang LIU (??? in Chinese) > > PACMAN Group, Dept. of Computer Science & Technology > Tsinghua University, Beijing 100084, China > Email: liuml07 at mails.tsinghua.edu.cn > <mailto:liuml07 at mails.tsinghua.edu.cn> > Homepage: http://pacman.cs.tsinghua.edu.cn/~liuml07/ > <http://pacman.cs.tsinghua.edu.cn/%7Eliuml07/> > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130429/0a2eb18a/attachment.html>
Reasonably Related Threads
- [LLVMdev] GSoC Proposal: Inter-Procedure Program Slicing in LLVM
- [LLVMdev] Interprocedural slicing using LLVM
- [LLVMdev] Interprocedural slicing using LLVM
- [LLVMdev] Interprocedural slicing using LLVM
- [LLVMdev] GSoC Proposal: Inter-Procedure Program Slicing in LLVM