thr3ads.net - llvm dev - [LLVMdev] summer of code idea — checking bounds overflow bugs [Mar 2010]

If this information is useful, please help other people find it:
Share via:

John Criswell

2010-Mar-31 14:58 UTC

[LLVMdev] summer of code idea — checking bounds overflow bugs

Adve, Vikram Sadanand wrote:> On Mar 30, 2010, at 11:33 AM, John Regehr wrote:
>
>   
>> John-- a couple questions:
>>
>> Can you explain the SAFECode model in a bit more detail?  I am getting 
>> conflicting information.  On one hand, some of the papers describe a 
>> system that is primarily designed to hide safety violations.  On the
other
>> hand, the 2006 ICSE paper that I cited earlier today seems to be
talking
>> about catching violations.  
>>     
>
> SAFECode has both capabilities.  The complete array bounds checking
described in the ICSE paper is an option.  So are the complete dangling pointer
checks described in a DSN 2006 paper (that one has higher overhead for many
programs).  The guarantees in the PLDI 2006 paper -- partial type safety,
control flow integrity, and a sound operational semantics -- hold regardless of
whether those two options are enabled or not.
>
> The code in the public repository defaults to a debugging version that (a)
turns on complete bounds checking;     (b) I think turns on dangling pointer
checks;
Dangling pointer detection (DSN 2006) is disabled by default.  A command 
line option in the sc tool will enable it.

>  but (c) does not use Automatic Pool Allocation, so it does not guarantee
the soundness of pointer analysis.
I have comments to John's other questions below.

>   It also does much more information tracking to give useful feedback on
bugs.  Of course, it has much higher overhead than if we had made other choices,
e.g., for use in production code.
>
> --Vikram
>
>
>   
>> These are very different goals!  What does the 
>> code in the SAFECode repository actually do?
>>
>> Can you comment on the speed of LLVM when shelling out to Omega?  My
guess
>> would be that this will result in unacceptable compile times for large 
>> software, and that something fast and relatively simple like ABCD is a 
>> better choice for general usage.
>>     
I don't have numbers, but it was unusable for doing static array bounds 
checking on the Linux kernel during the Secure Virtual Architecture 
(SVA) work.  There might be performance numbers in one of the SAFECode 
publications.

The immediate engineering problem with using the Omega constraint solver 
is that SAFECode does multiple fork() and exec() calls to execute a 
separate binary program to do constraint solving; this occurs multiple 
times during the static array bounds checking pass.  Performance could 
be greatly improved by using the Omega code as a library and linking it 
straight into the static array bounds checking pass.  The overheads of 
constraint generation and solving would still be there, but calling into 
the Omega code would be much faster.

Regarding the use of alternative algorithms, there are new, simple 
static array bounds checking passes within SAFECode that utilize 
analysis group chaining.  When one analysis cannot determine that a GEP 
remains within bounds, it can ask a more powerful pass to return a 
result.  This could potentially be used to reduce the use of the Omega 
method to just those GEPs that cannot be proven safe using simpler methods.
>> Finally a comment: it's a clear that a comprehensive system for
trapping
>> undefined behavior in Clang is a multi-year project.  Some parts of
this
>> must live in Clang.  Some parts, such as bounds check optimizations, 
>> should go into LLVM passes.  Anyway I'm just saying that the
project you
>> outlines seems to fit very well into the overall vision of detecting 
>> undefined behavior in C programs.
>>     
Can you elaborate on the undefined behaviors you have in mind?  SAFECode 
(when its Automatic Pool Allocation option is working) should provide 
sound operational semantics, so all undefined behavior due to memory 
safety should be gone.  I'm sure there's other undefined behaviors 
(e.g., divide by zero), but I haven't thought much about them.

I'm curious why you think some undefined behavior detectors need to be 
built in Clang.  It seems to me that any static analysis could be built 
using either LLVM or Clang; there are just tradeoffs to each approach.  
What advantages does Clang provide?

-- John T.

John Regehr

2010-Mar-31 16:35 UTC

head link

[LLVMdev] summer of code idea — checking bounds overflow bugs

John,

Thanks for the detailed response.
> Regarding the use of alternative algorithms, there are new, simple static 
> array bounds checking passes within SAFECode that utilize analysis group 
> chaining.  When one analysis cannot determine that a GEP remains within 
> bounds, it can ask a more powerful pass to return a result.  This could 
> potentially be used to reduce the use of the Omega method to just those
GEPs
> that cannot be proven safe using simpler methods.
I think this is the right engineering choice, with a command line option 
to simply disable the heavyweight decision procedure for people who want 
faster compile times.
> I'm curious why you think some undefined behavior detectors need to be
built
> in Clang.  It seems to me that any static analysis could be built using 
> either LLVM or Clang; there are just tradeoffs to each approach.  What 
> advantages does Clang provide?
Some checks must live in Clang because too much information has been lost 
by the time LLVM sees the code.  There are many examples but here is the 
canonical one.  A program has undefined behavior if "between two sequence 
points, an object is modified more than once, or is modified and the prior 
value is read other than to determine the value to be stored."

To implement this check in LLVM, we would have to answer the question: 
Where, in the LLVM code, are the sequence points?

John

John Regehr

2010-Mar-31 16:48 UTC

head link

[LLVMdev] summer of code idea — checking bounds overflow bugs

> Some checks must live in Clang because too much information has been lost
> by the time LLVM sees the code.  There are many examples but here is the
> canonical one.  A program has undefined behavior if "between two
sequence
> points, an object is modified more than once, or is modified and the prior
> value is read other than to determine the value to be stored."
>
> To implement this check in LLVM, we would have to answer the question:
> Where, in the LLVM code, are the sequence points?
By the way I can hear readers of this list saying to themselves "this does 
not seem like a useful check to implement."  Perhaps this is correct, but 
let's consider the tradeoffs:

- This is a relatively simple, localized check that should not be too hard 
to implement.

- Almost all of the added checks would be destroyed by LLVM after simple 
queries to the alias analyzer, so applications running with this check 
turned on will not slow down much.

- Common optimizing compilers change the meaning of a computation that 
makes this mistake.

My guess is that this check would find problems in real apps...

John

Adve, Vikram Sadanand

2010-Mar-31 19:53 UTC

head link

[LLVMdev] summer of code idea — checking bounds overflow bugs

On Mar 31, 2010, at 11:35 AM, John Regehr wrote:
<snip>>> I'm curious why you think some undefined behavior detectors need to
be built
>> in Clang.  It seems to me that any static analysis could be built using
>> either LLVM or Clang; there are just tradeoffs to each approach.  What 
>> advantages does Clang provide?
> 
> Some checks must live in Clang because too much information has been lost 
> by the time LLVM sees the code.  There are many examples but here is the 
> canonical one.  A program has undefined behavior if "between two
sequence
> points, an object is modified more than once, or is modified and the prior 
> value is read other than to determine the value to be stored."
I agree.  There are a number of such semantic rules that must be checked in the
front end, another common example being type checking rules for types that are
lowered down to the IR (e.g., all the class related rules in C++).  SAFECode  as
it currently stands doesn't try to address such properties.
> 
> To implement this check in LLVM, we would have to answer the question: 
> Where, in the LLVM code, are the sequence points?
> 
> John
--Vikram

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Mar 2010 - [LLVMdev] summer of code idea — checking bounds overflow bugs

[LLVMdev] summer of code idea — checking bounds overflow bugs

[LLVMdev] summer of code idea — checking bounds overflow bugs

[LLVMdev] summer of code idea — checking bounds overflow bugs

[LLVMdev] summer of code idea — checking bounds overflow bugs

Reasonably Related Threads