John Regehr
2010-Mar-30 16:33 UTC
[LLVMdev] summer of code idea — checking bounds overflow bugs
John-- a couple questions: Can you explain the SAFECode model in a bit more detail? I am getting conflicting information. On one hand, some of the papers describe a system that is primarily designed to hide safety violations. On the other hand, the 2006 ICSE paper that I cited earlier today seems to be talking about catching violations. These are very different goals! What does the code in the SAFECode repository actually do? Can you comment on the speed of LLVM when shelling out to Omega? My guess would be that this will result in unacceptable compile times for large software, and that something fast and relatively simple like ABCD is a better choice for general usage. Finally a comment: it's a clear that a comprehensive system for trapping undefined behavior in Clang is a multi-year project. Some parts of this must live in Clang. Some parts, such as bounds check optimizations, should go into LLVM passes. Anyway I'm just saying that the project you outlines seems to fit very well into the overall vision of detecting undefined behavior in C programs. John
Adve, Vikram Sadanand
2010-Mar-31 14:38 UTC
[LLVMdev] summer of code idea — checking bounds overflow bugs
On Mar 30, 2010, at 11:33 AM, John Regehr wrote:> John-- a couple questions: > > Can you explain the SAFECode model in a bit more detail? I am getting > conflicting information. On one hand, some of the papers describe a > system that is primarily designed to hide safety violations. On the other > hand, the 2006 ICSE paper that I cited earlier today seems to be talking > about catching violations.SAFECode has both capabilities. The complete array bounds checking described in the ICSE paper is an option. So are the complete dangling pointer checks described in a DSN 2006 paper (that one has higher overhead for many programs). The guarantees in the PLDI 2006 paper -- partial type safety, control flow integrity, and a sound operational semantics -- hold regardless of whether those two options are enabled or not. The code in the public repository defaults to a debugging version that (a) turns on complete bounds checking; (b) I think turns on dangling pointer checks; but (c) does not use Automatic Pool Allocation, so it does not guarantee the soundness of pointer analysis. It also does much more information tracking to give useful feedback on bugs. Of course, it has much higher overhead than if we had made other choices, e.g., for use in production code. --Vikram> These are very different goals! What does the > code in the SAFECode repository actually do? > > Can you comment on the speed of LLVM when shelling out to Omega? My guess > would be that this will result in unacceptable compile times for large > software, and that something fast and relatively simple like ABCD is a > better choice for general usage. > > Finally a comment: it's a clear that a comprehensive system for trapping > undefined behavior in Clang is a multi-year project. Some parts of this > must live in Clang. Some parts, such as bounds check optimizations, > should go into LLVM passes. Anyway I'm just saying that the project you > outlines seems to fit very well into the overall vision of detecting > undefined behavior in C programs. > > John > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
John Criswell
2010-Mar-31 14:58 UTC
[LLVMdev] summer of code idea — checking bounds overflow bugs
Adve, Vikram Sadanand wrote:> On Mar 30, 2010, at 11:33 AM, John Regehr wrote: > > >> John-- a couple questions: >> >> Can you explain the SAFECode model in a bit more detail? I am getting >> conflicting information. On one hand, some of the papers describe a >> system that is primarily designed to hide safety violations. On the other >> hand, the 2006 ICSE paper that I cited earlier today seems to be talking >> about catching violations. >> > > SAFECode has both capabilities. The complete array bounds checking described in the ICSE paper is an option. So are the complete dangling pointer checks described in a DSN 2006 paper (that one has higher overhead for many programs). The guarantees in the PLDI 2006 paper -- partial type safety, control flow integrity, and a sound operational semantics -- hold regardless of whether those two options are enabled or not. > > The code in the public repository defaults to a debugging version that (a) turns on complete bounds checking; (b) I think turns on dangling pointer checks;Dangling pointer detection (DSN 2006) is disabled by default. A command line option in the sc tool will enable it.> but (c) does not use Automatic Pool Allocation, so it does not guarantee the soundness of pointer analysis.I have comments to John's other questions below.> It also does much more information tracking to give useful feedback on bugs. Of course, it has much higher overhead than if we had made other choices, e.g., for use in production code. > > --Vikram > > > >> These are very different goals! What does the >> code in the SAFECode repository actually do? >> >> Can you comment on the speed of LLVM when shelling out to Omega? My guess >> would be that this will result in unacceptable compile times for large >> software, and that something fast and relatively simple like ABCD is a >> better choice for general usage. >>I don't have numbers, but it was unusable for doing static array bounds checking on the Linux kernel during the Secure Virtual Architecture (SVA) work. There might be performance numbers in one of the SAFECode publications. The immediate engineering problem with using the Omega constraint solver is that SAFECode does multiple fork() and exec() calls to execute a separate binary program to do constraint solving; this occurs multiple times during the static array bounds checking pass. Performance could be greatly improved by using the Omega code as a library and linking it straight into the static array bounds checking pass. The overheads of constraint generation and solving would still be there, but calling into the Omega code would be much faster. Regarding the use of alternative algorithms, there are new, simple static array bounds checking passes within SAFECode that utilize analysis group chaining. When one analysis cannot determine that a GEP remains within bounds, it can ask a more powerful pass to return a result. This could potentially be used to reduce the use of the Omega method to just those GEPs that cannot be proven safe using simpler methods.>> Finally a comment: it's a clear that a comprehensive system for trapping >> undefined behavior in Clang is a multi-year project. Some parts of this >> must live in Clang. Some parts, such as bounds check optimizations, >> should go into LLVM passes. Anyway I'm just saying that the project you >> outlines seems to fit very well into the overall vision of detecting >> undefined behavior in C programs. >>Can you elaborate on the undefined behaviors you have in mind? SAFECode (when its Automatic Pool Allocation option is working) should provide sound operational semantics, so all undefined behavior due to memory safety should be gone. I'm sure there's other undefined behaviors (e.g., divide by zero), but I haven't thought much about them. I'm curious why you think some undefined behavior detectors need to be built in Clang. It seems to me that any static analysis could be built using either LLVM or Clang; there are just tradeoffs to each approach. What advantages does Clang provide? -- John T.
John Regehr
2010-Mar-31 16:27 UTC
[LLVMdev] summer of code idea — checking bounds overflow bugs
> SAFECode has both capabilities. The complete array bounds checking > described in the ICSE paper is an option.Thanks Vikram, this is great to hear. This would be super useful as a part of mainline LLVM. I'm happy to help with testing and mentoring for any effort to get this integrated.> pointer checks; but (c) does not use Automatic Pool Allocation, so it > does not guarantee the soundness of pointer analysis.As far as I'm concerned, unsound analysis is only a problem if it may lead to erroneous compilation of conforming programs. John
Maybe Matching Threads
- [LLVMdev] summer of code idea — checking bounds overflow bugs
- [LLVMdev] summer of code idea — checking bounds overflow bugs
- [LLVMdev] summer of code idea — checking bounds overflow bugs
- [LLVMdev] summer of code idea — checking bounds overflow bugs
- [LLVMdev] summer of code idea — checking bounds overflow bugs