thr3ads.net - llvm dev - [LLVMdev] load widening conflicts with AddressSanitizer [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Duncan Sands

2012-Jan-24 20:31 UTC

[LLVMdev] load widening conflicts with AddressSanitizer

Hi Kostya,
>     As far as I can see the C and C++ standards are not relevant.  ASAN
works on
>     LLVM IR, not on C or C++.  Lots of different languages have LLVM
frontends.  I
>     personally turn Ada and Fortran into LLVM IR all the time for example. 
Clearly
>     the C standard is not relevant to LLVM IR coming from such languages. 
What
>     matters is how LLVM IR is defined.  As far as I know this construct is
perfectly
>     valid in LLVM IR.
>
>
> Asan will not work for Fortran and Ada anyway (at least, out of the box).
> I am not even sure that anything like asan is needed for Ada (it has bounds
> checking built-in, the dynamic memory allocation is much more restrictive).
> The tool is rather specific to C/C++ (and ObjectiveC probably, although we
have
> almost no tests for ObjectiveC, nor much knowledge in it).
> Yes, the IR transformations are done on the LLVM level, but the asan
run-time
> library heavily depends on the C/C++ semantics and even implementation,
> and you can't really separate the asan instrumentation pass from the
run-time.
it's pretty disappointing to hear that asan is basically just for C.  But
since
it is, I won't bother you anymore about this attribute (though I still
don't
like it much).

Ciao, Duncan.

Kostya Serebryany

2012-Jan-24 20:45 UTC

head link

[LLVMdev] load widening conflicts with AddressSanitizer

On Tue, Jan 24, 2012 at 12:31 PM, Duncan Sands <baldrick at free.fr>
wrote:
> Hi Kostya,
>
>
>     As far as I can see the C and C++ standards are not relevant.  ASAN
>> works on
>>    LLVM IR, not on C or C++.  Lots of different languages have LLVM
>> frontends.  I
>>    personally turn Ada and Fortran into LLVM IR all the time for
example.
>>  Clearly
>>    the C standard is not relevant to LLVM IR coming from such
languages.
>>  What
>>    matters is how LLVM IR is defined.  As far as I know this construct
is
>> perfectly
>>    valid in LLVM IR.
>>
>>
>> Asan will not work for Fortran and Ada anyway (at least, out of the
box).
>> I am not even sure that anything like asan is needed for Ada (it has
>> bounds
>> checking built-in, the dynamic memory allocation is much more
>> restrictive).
>> The tool is rather specific to C/C++ (and ObjectiveC probably, although
>> we have
>> almost no tests for ObjectiveC, nor much knowledge in it).
>> Yes, the IR transformations are done on the LLVM level, but the asan
>> run-time
>> library heavily depends on the C/C++ semantics and even implementation,
>> and you can't really separate the asan instrumentation pass from
the
>> run-time.
>>
>
> it's pretty disappointing to hear that asan is basically just for C.

If someone has use cases for other languages I'd like to hear about those.
In Fortran, asan is unlikely to be required because the language allows to
implement bounds checking simpler
(not sure about use-after-free for dynamic memory in Fortran 95). Ditto for
Ada, or e.g. Java.

--kcc

 But since> it is, I won't bother you anymore about this attribute (though I still
> don't
> like it much).
>
> Ciao, Duncan.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120124/db3001a3/attachment.html>

John Criswell

2012-Jan-24 21:08 UTC

head link

[LLVMdev] load widening conflicts with AddressSanitizer

On 1/24/12 2:31 PM, Duncan Sands wrote:> Hi Kostya,
>
>>      As far as I can see the C and C++ standards are not relevant. 
ASAN works on
>>      LLVM IR, not on C or C++.  Lots of different languages have LLVM
frontends.  I
>>      personally turn Ada and Fortran into LLVM IR all the time for
example.  Clearly
>>      the C standard is not relevant to LLVM IR coming from such
languages.  What
>>      matters is how LLVM IR is defined.  As far as I know this
construct is perfectly
>>      valid in LLVM IR.

The issue here is that a load that reads data past the end of an alloca 
can occur at the LLVM IR level in one of three ways:

1) Because the program at the original source-code level does it and is 
incorrect.
2) Because the program at the original source-code level does it and is 
correct (although that must be a pretty wacky language).
3) Load-widening introduces it when processing loads from allocas that 
are properly aligned.

As it is today, an analysis cannot look at the LLVM IR and know which 
condition is causing the load to read data past the end of the memory 
object.  As such, tools like SAFECode and ASAN don't know when to relax 
their run-time checks to permit such out-of-bounds reading; they either 
have to relax it for all such loads (in which case a bug in the C source 
code might slip through), or they have to report it all the time (and 
report false positives for correct C programs).

I assume Kostya's new attribute is a way to permit the LLVM IR to 
specify whether such an out-of-bounds read is intentional or not.

In my opinion, I don't think we should bother with an attribute.  
Load-widening's behavior does not introduce exploitable code into the 
program on commonly-used machines and operating systems(*), and 
incorrect source code at the C source level that exhibits identical 
behavior isn't exploitable, either.  SAFECode can be enhanced so that 
the run-time checks for loads relax their guarantees for aligned allocas 
that are subject to load-widening; I imagine ASAN can be similarly modified.

We won't catch some bugs in C/C++ code, but that's a natural consequence
of deciding to permit certain out-of-bounds loads at the LLVM IR level, 
IMHO.

My two cents.

-- John T.

(*) All bets are off for unconventional systems, though.

>>
>>
>> Asan will not work for Fortran and Ada anyway (at least, out of the
box).
>> I am not even sure that anything like asan is needed for Ada (it has
bounds
>> checking built-in, the dynamic memory allocation is much more
restrictive).
>> The tool is rather specific to C/C++ (and ObjectiveC probably, although
we have
>> almost no tests for ObjectiveC, nor much knowledge in it).
>> Yes, the IR transformations are done on the LLVM level, but the asan
run-time
>> library heavily depends on the C/C++ semantics and even implementation,
>> and you can't really separate the asan instrumentation pass from
the run-time.
> it's pretty disappointing to hear that asan is basically just for C. 
But since
> it is, I won't bother you anymore about this attribute (though I still
don't
> like it much).
>
> Ciao, Duncan.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Kostya Serebryany

2012-Jan-24 21:36 UTC

head link

[LLVMdev] load widening conflicts with AddressSanitizer

On Tue, Jan 24, 2012 at 1:08 PM, John Criswell <criswell at
illinois.edu>wrote:
> On 1/24/12 2:31 PM, Duncan Sands wrote:
>
>> Hi Kostya,
>>
>>      As far as I can see the C and C++ standards are not relevant. 
ASAN
>>> works on
>>>     LLVM IR, not on C or C++.  Lots of different languages have
LLVM
>>> frontends.  I
>>>     personally turn Ada and Fortran into LLVM IR all the time for
>>> example.  Clearly
>>>     the C standard is not relevant to LLVM IR coming from such
>>> languages.  What
>>>     matters is how LLVM IR is defined.  As far as I know this
construct
>>> is perfectly
>>>     valid in LLVM IR.
>>>
>>
>
> The issue here is that a load that reads data past the end of an alloca
> can occur at the LLVM IR level in one of three ways:
>
> 1) Because the program at the original source-code level does it and is
> incorrect.
> 2) Because the program at the original source-code level does it and is
> correct (although that must be a pretty wacky language).
> 3) Load-widening introduces it when processing loads from allocas that are
> properly aligned.
>
> As it is today, an analysis cannot look at the LLVM IR and know which
> condition is causing the load to read data past the end of the memory
> object.  As such, tools like SAFECode and ASAN don't know when to relax
> their run-time checks to permit such out-of-bounds reading; they either
> have to relax it for all such loads (in which case a bug in the C source
> code might slip through), or they have to report it all the time (and
> report false positives for correct C programs).
>
> I assume Kostya's new attribute is a way to permit the LLVM IR to
specify
> whether such an out-of-bounds read is intentional or not.
>
> In my opinion, I don't think we should bother with an attribute.
>  Load-widening's behavior does not introduce exploitable code into the
> program on commonly-used machines and operating systems(*), and incorrect
> source code at the C source level that exhibits identical behavior
isn't
> exploitable, either.

SAFECode can be enhanced so that the run-time checks for loads relax
their> guarantees for aligned allocas that are subject to load-widening; I imagine
> ASAN can be similarly modified.
>
ASAN *can* be modified this way (it will actually make instrumentation ~10%
cheaper).
But this mode will miss some bugs that the current mode finds.
I've seen at least a couple of such *real* bugs.

And these bugs are not only about exploitability, but also about
correctness.
If a program reads garbage, there is no simple way to statically prove that
this garbage does not affect the program's behavior.

--kcc


>
> We won't catch some bugs in C/C++ code, but that's a natural
consequence
> of deciding to permit certain out-of-bounds loads at the LLVM IR level,
> IMHO.
>
> My two cents.
>
> -- John T.
>
> (*) All bets are off for unconventional systems, though.
>
>
>
>>>
>>> Asan will not work for Fortran and Ada anyway (at least, out of the
box).
>>> I am not even sure that anything like asan is needed for Ada (it
has
>>> bounds
>>> checking built-in, the dynamic memory allocation is much more
>>> restrictive).
>>> The tool is rather specific to C/C++ (and ObjectiveC probably,
although
>>> we have
>>> almost no tests for ObjectiveC, nor much knowledge in it).
>>> Yes, the IR transformations are done on the LLVM level, but the
asan
>>> run-time
>>> library heavily depends on the C/C++ semantics and even
implementation,
>>> and you can't really separate the asan instrumentation pass
from the
>>> run-time.
>>>
>> it's pretty disappointing to hear that asan is basically just for
C.  But
>> since
>> it is, I won't bother you anymore about this attribute (though I
still
>> don't
>> like it much).
>>
>> Ciao, Duncan.
>> ______________________________**_________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>
http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120124/0fbd9f2f/attachment.html>

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Jan 2012 - [LLVMdev] load widening conflicts with AddressSanitizer

[LLVMdev] load widening conflicts with AddressSanitizer

[LLVMdev] load widening conflicts with AddressSanitizer

[LLVMdev] load widening conflicts with AddressSanitizer

[LLVMdev] load widening conflicts with AddressSanitizer

Maybe Matching Threads