thr3ads.net - llvm dev - [LLVMdev] load widening conflicts with AddressSanitizer [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Kostya Serebryany

2012-Jan-24 21:36 UTC

[LLVMdev] load widening conflicts with AddressSanitizer

On Tue, Jan 24, 2012 at 1:08 PM, John Criswell <criswell at
illinois.edu>wrote:
> On 1/24/12 2:31 PM, Duncan Sands wrote:
>
>> Hi Kostya,
>>
>>      As far as I can see the C and C++ standards are not relevant. 
ASAN
>>> works on
>>>     LLVM IR, not on C or C++.  Lots of different languages have
LLVM
>>> frontends.  I
>>>     personally turn Ada and Fortran into LLVM IR all the time for
>>> example.  Clearly
>>>     the C standard is not relevant to LLVM IR coming from such
>>> languages.  What
>>>     matters is how LLVM IR is defined.  As far as I know this
construct
>>> is perfectly
>>>     valid in LLVM IR.
>>>
>>
>
> The issue here is that a load that reads data past the end of an alloca
> can occur at the LLVM IR level in one of three ways:
>
> 1) Because the program at the original source-code level does it and is
> incorrect.
> 2) Because the program at the original source-code level does it and is
> correct (although that must be a pretty wacky language).
> 3) Load-widening introduces it when processing loads from allocas that are
> properly aligned.
>
> As it is today, an analysis cannot look at the LLVM IR and know which
> condition is causing the load to read data past the end of the memory
> object.  As such, tools like SAFECode and ASAN don't know when to relax
> their run-time checks to permit such out-of-bounds reading; they either
> have to relax it for all such loads (in which case a bug in the C source
> code might slip through), or they have to report it all the time (and
> report false positives for correct C programs).
>
> I assume Kostya's new attribute is a way to permit the LLVM IR to
specify
> whether such an out-of-bounds read is intentional or not.
>
> In my opinion, I don't think we should bother with an attribute.
>  Load-widening's behavior does not introduce exploitable code into the
> program on commonly-used machines and operating systems(*), and incorrect
> source code at the C source level that exhibits identical behavior
isn't
> exploitable, either.

SAFECode can be enhanced so that the run-time checks for loads relax
their> guarantees for aligned allocas that are subject to load-widening; I imagine
> ASAN can be similarly modified.
>
ASAN *can* be modified this way (it will actually make instrumentation ~10%
cheaper).
But this mode will miss some bugs that the current mode finds.
I've seen at least a couple of such *real* bugs.

And these bugs are not only about exploitability, but also about
correctness.
If a program reads garbage, there is no simple way to statically prove that
this garbage does not affect the program's behavior.

--kcc


>
> We won't catch some bugs in C/C++ code, but that's a natural
consequence
> of deciding to permit certain out-of-bounds loads at the LLVM IR level,
> IMHO.
>
> My two cents.
>
> -- John T.
>
> (*) All bets are off for unconventional systems, though.
>
>
>
>>>
>>> Asan will not work for Fortran and Ada anyway (at least, out of the
box).
>>> I am not even sure that anything like asan is needed for Ada (it
has
>>> bounds
>>> checking built-in, the dynamic memory allocation is much more
>>> restrictive).
>>> The tool is rather specific to C/C++ (and ObjectiveC probably,
although
>>> we have
>>> almost no tests for ObjectiveC, nor much knowledge in it).
>>> Yes, the IR transformations are done on the LLVM level, but the
asan
>>> run-time
>>> library heavily depends on the C/C++ semantics and even
implementation,
>>> and you can't really separate the asan instrumentation pass
from the
>>> run-time.
>>>
>> it's pretty disappointing to hear that asan is basically just for
C.  But
>> since
>> it is, I won't bother you anymore about this attribute (though I
still
>> don't
>> like it much).
>>
>> Ciao, Duncan.
>> ______________________________**_________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>
http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120124/0fbd9f2f/attachment.html>

Chandler Carruth

2012-Jan-24 21:53 UTC

head link

[LLVMdev] load widening conflicts with AddressSanitizer

On Tue, Jan 24, 2012 at 1:36 PM, Kostya Serebryany <kcc at google.com>
wrote:
> ASAN *can* be modified this way (it will actually make instrumentation
> ~10% cheaper).
> But this mode will miss some bugs that the current mode finds.
> I've seen at least a couple of such *real* bugs.
>
> And these bugs are not only about exploitability, but also about
> correctness.
> If a program reads garbage, there is no simple way to statically prove
> that this garbage does not affect the program's behavior.
>
We could go back to my original proposed fix -- come up with a specific way
to model a "read past the end" in the LLVM IR. Essentially capture the
act
of load widening in the IR. I'm imagining 'load i8* %ptr as i64'.
This is a
load that reads an i64 value from memory, but only fills the low 8 bits
with a value, the high bits are undef.

Then the analysis knows that the program cannot rely on the value in the
high bits, and does not flag an error. The code generator knows that the
high bits can be undef, and can emit the widened load to memory as
appropriate. We can teach the optimizers to *try* to transform C code
forming these patterns into such a load for canonicalization, and we can
provide a __builtin_....(...) syntax for explicitly performing such a load.

PS: The syntax might equally well be "load i64* %ptr as i8'; it all
depends
on what the most natural way to form this pattern in IR is -- produce an i8
value or an i64 with undef high bits? require the pointer to be the wide
type or the narrow type? i'd have to look at how these would play out in IR
to tell what the best pattern is...
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120124/217b3743/attachment.html>

John Criswell

2012-Jan-24 22:00 UTC

head link

[LLVMdev] load widening conflicts with AddressSanitizer

On 1/24/12 3:36 PM, Kostya Serebryany wrote:>
>
> On Tue, Jan 24, 2012 at 1:08 PM, John Criswell <criswell at illinois.edu
> <mailto:criswell at illinois.edu>> wrote:
>
>     On 1/24/12 2:31 PM, Duncan Sands wrote:
>
>         Hi Kostya,
>
>                 As far as I can see the C and C++ standards are not
>             relevant.  ASAN works on
>                 LLVM IR, not on C or C++.  Lots of different languages
>             have LLVM frontends.  I
>                 personally turn Ada and Fortran into LLVM IR all the
>             time for example.  Clearly
>                 the C standard is not relevant to LLVM IR coming from
>             such languages.  What
>                 matters is how LLVM IR is defined.  As far as I know
>             this construct is perfectly
>                 valid in LLVM IR.
>
>
>
>     The issue here is that a load that reads data past the end of an
>     alloca can occur at the LLVM IR level in one of three ways:
>
>     1) Because the program at the original source-code level does it
>     and is incorrect.
>     2) Because the program at the original source-code level does it
>     and is correct (although that must be a pretty wacky language).
>     3) Load-widening introduces it when processing loads from allocas
>     that are properly aligned.
>
>     As it is today, an analysis cannot look at the LLVM IR and know
>     which condition is causing the load to read data past the end of
>     the memory object.  As such, tools like SAFECode and ASAN don't
>     know when to relax their run-time checks to permit such
>     out-of-bounds reading; they either have to relax it for all such
>     loads (in which case a bug in the C source code might slip
>     through), or they have to report it all the time (and report false
>     positives for correct C programs).
>
>     I assume Kostya's new attribute is a way to permit the LLVM IR to
>     specify whether such an out-of-bounds read is intentional or not.
>
>     In my opinion, I don't think we should bother with an attribute.
>      Load-widening's behavior does not introduce exploitable code into
>     the program on commonly-used machines and operating systems(*),
>     and incorrect source code at the C source level that exhibits
>     identical behavior isn't exploitable, either. 
>
>
>     SAFECode can be enhanced so that the run-time checks for loads
>     relax their guarantees for aligned allocas that are subject to
>     load-widening; I imagine ASAN can be similarly modified.
>
> ASAN *can* be modified this way (it will actually make instrumentation 
> ~10% cheaper).
> But this mode will miss some bugs that the current mode finds.
> I've seen at least a couple of such *real* bugs.
Yes, I understand.  My question is how many such bugs have you seen that 
involve loads *and* allocas aligned in such a way that the load-widening 
optimization triggers.
>
> And these bugs are not only about exploitability, but also about 
> correctness.
> If a program reads garbage, there is no simple way to statically prove 
> that this garbage does not affect the program's behavior.
Hrm.  Actually, by relaxing the safety guarantees, SAFECode and ASAN may 
fail to detect exploitable behavior in the original program, so I take 
back my original comment.  That said, it's a pretty obscure attack, so 
it's pretty low on my list of things to worry about.

For me, the right way to go (barring a change in opinion from Chris) is 
to either disable the load-widening transform, transform the allocas to 
be larger, or to relax the safety guarantees.  The problem with 
attributes is that they are brittle; you have to make sure they get 
added to the right instructions, then you have to make sure they don't 
get removed by optimizations.

For SAFECode, I'm alright with transforms that "force" a program
to have
memory safe behavior even if they do not report a bug (such as boosting 
the allocation size of allocas subject to load-widening).  ASAN may not 
be willing to do that (and understandably so).  I'm not sure what to 
suggest.

-- John T.
>
> --kcc
>
>
>     We won't catch some bugs in C/C++ code, but that's a natural
>     consequence of deciding to permit certain out-of-bounds loads at
>     the LLVM IR level, IMHO.
>
>     My two cents.
>
>     -- John T.
>
>     (*) All bets are off for unconventional systems, though.
>
>
>
>
>             Asan will not work for Fortran and Ada anyway (at least,
>             out of the box).
>             I am not even sure that anything like asan is needed for
>             Ada (it has bounds
>             checking built-in, the dynamic memory allocation is much
>             more restrictive).
>             The tool is rather specific to C/C++ (and ObjectiveC
>             probably, although we have
>             almost no tests for ObjectiveC, nor much knowledge in it).
>             Yes, the IR transformations are done on the LLVM level,
>             but the asan run-time
>             library heavily depends on the C/C++ semantics and even
>             implementation,
>             and you can't really separate the asan instrumentation
>             pass from the run-time.
>
>         it's pretty disappointing to hear that asan is basically just
>         for C.  But since
>         it is, I won't bother you anymore about this attribute (though
>         I still don't
>         like it much).
>
>         Ciao, Duncan.
>         _______________________________________________
>         LLVM Developers mailing list
>         LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>         http://llvm.cs.uiuc.edu
>         http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120124/4176158a/attachment.html>

Kostya Serebryany

2012-Jan-24 22:08 UTC

head link

[LLVMdev] load widening conflicts with AddressSanitizer

On Tue, Jan 24, 2012 at 1:53 PM, Chandler Carruth <chandlerc at
google.com>wrote:
> On Tue, Jan 24, 2012 at 1:36 PM, Kostya Serebryany <kcc at
google.com> wrote:
>
>> ASAN *can* be modified this way (it will actually make instrumentation
>> ~10% cheaper).
>> But this mode will miss some bugs that the current mode finds.
>> I've seen at least a couple of such *real* bugs.
>>
>> And these bugs are not only about exploitability, but also about
>> correctness.
>> If a program reads garbage, there is no simple way to statically prove
>> that this garbage does not affect the program's behavior.
>>
>
> We could go back to my original proposed fix -- come up with a specific
> way to model a "read past the end" in the LLVM IR. Essentially
capture the
> act of load widening in the IR. I'm imagining 'load i8* %ptr as
i64'. This
> is a load that reads an i64 value from memory, but only fills the low 8
> bits with a value, the high bits are undef.
>
I don't *feel* the LLVM IR too well yet, but I would expect such change to
be a very disruptive revolution.
Every transformation will have to learn about the new instruction (or
instruction attribute, whatever) -- all for a relatively little value.

--kcc


>
> Then the analysis knows that the program cannot rely on the value in the
> high bits, and does not flag an error. The code generator knows that the
> high bits can be undef, and can emit the widened load to memory as
> appropriate. We can teach the optimizers to *try* to transform C code
> forming these patterns into such a load for canonicalization, and we can
> provide a __builtin_....(...) syntax for explicitly performing such a load.
>
> PS: The syntax might equally well be "load i64* %ptr as i8'; it
all
> depends on what the most natural way to form this pattern in IR is --
> produce an i8 value or an i64 with undef high bits? require the pointer to
> be the wide type or the narrow type? i'd have to look at how these
would
> play out in IR to tell what the best pattern is...
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120124/235bc34e/attachment.html>

Kostya Serebryany

2012-Jan-24 22:19 UTC

head link

[LLVMdev] load widening conflicts with AddressSanitizer

On Tue, Jan 24, 2012 at 2:00 PM, John Criswell <criswell at
illinois.edu>wrote:
>  On 1/24/12 3:36 PM, Kostya Serebryany wrote:
>
>
>
> On Tue, Jan 24, 2012 at 1:08 PM, John Criswell <criswell at
illinois.edu>wrote:
>
>> On 1/24/12 2:31 PM, Duncan Sands wrote:
>>
>>> Hi Kostya,
>>>
>>>      As far as I can see the C and C++ standards are not relevant. 
ASAN
>>>> works on
>>>>     LLVM IR, not on C or C++.  Lots of different languages have
LLVM
>>>> frontends.  I
>>>>     personally turn Ada and Fortran into LLVM IR all the time
for
>>>> example.  Clearly
>>>>     the C standard is not relevant to LLVM IR coming from such
>>>> languages.  What
>>>>     matters is how LLVM IR is defined.  As far as I know this
construct
>>>> is perfectly
>>>>     valid in LLVM IR.
>>>>
>>>
>>
>>  The issue here is that a load that reads data past the end of an
alloca
>> can occur at the LLVM IR level in one of three ways:
>>
>> 1) Because the program at the original source-code level does it and is
>> incorrect.
>> 2) Because the program at the original source-code level does it and is
>> correct (although that must be a pretty wacky language).
>> 3) Load-widening introduces it when processing loads from allocas that
>> are properly aligned.
>>
>> As it is today, an analysis cannot look at the LLVM IR and know which
>> condition is causing the load to read data past the end of the memory
>> object.  As such, tools like SAFECode and ASAN don't know when to
relax
>> their run-time checks to permit such out-of-bounds reading; they either
>> have to relax it for all such loads (in which case a bug in the C
source
>> code might slip through), or they have to report it all the time (and
>> report false positives for correct C programs).
>>
>> I assume Kostya's new attribute is a way to permit the LLVM IR to
specify
>> whether such an out-of-bounds read is intentional or not.
>>
>> In my opinion, I don't think we should bother with an attribute.
>>  Load-widening's behavior does not introduce exploitable code into
the
>> program on commonly-used machines and operating systems(*), and
incorrect
>> source code at the C source level that exhibits identical behavior
isn't
>> exploitable, either.
>
>
>  SAFECode can be enhanced so that the run-time checks for loads relax
>> their guarantees for aligned allocas that are subject to load-widening;
I
>> imagine ASAN can be similarly modified.
>>
>
> ASAN *can* be modified this way (it will actually make instrumentation
> ~10% cheaper).
> But this mode will miss some bugs that the current mode finds.
> I've seen at least a couple of such *real* bugs.
>
>
> Yes, I understand.  My question is how many such bugs have you seen that
> involve loads *and* allocas aligned in such a way that the load-widening
> optimization triggers.
>
So far I've seen two cases where the patch in MemoryDependenceAnalysis.cpp
(above) will help.
First, bug reported by Mozilla folks:
http://code.google.com/p/address-sanitizer/issues/detail?id=20#c1
Second, false warning while doing asan/clang bootstrap. There is a
struct LVFlags in clang which contains 3 bools. They've got lowered to a
32-bit load.
[Or you asked something different?]
We build a lot of our code with asan/O1 and are not moving to asan/O2
because of this problem.

BTW, I have a clang/asan 3-stage bootstrap working locally.
Once this bug is fixed, we plan to set a continuous clang/asan bootstrap
bot.




>
>
>
>  And these bugs are not only about exploitability, but also about
> correctness.
> If a program reads garbage, there is no simple way to statically prove
> that this garbage does not affect the program's behavior.
>
>
> Hrm.  Actually, by relaxing the safety guarantees, SAFECode and ASAN may
> fail to detect exploitable behavior in the original program, so I take back
> my original comment.  That said, it's a pretty obscure attack, so
it's
> pretty low on my list of things to worry about.
>
> For me, the right way to go (barring a change in opinion from Chris) is to
> either disable the load-widening transform,
>
This is what the patch does essentially. It disables a subset
of load-widening when asan is on.
We can disable all cases of load-widening, but that may cost a bit of
performance (under asan).

> transform the allocas to be larger, or to relax the safety guarantees.
> The problem with attributes is that they are brittle; you have to make sure
> they get added to the right instructions, then you have to make sure they
> don't get removed by optimizations.
>This is a function attribute, much more stable.

--kcc

>
> For SAFECode, I'm alright with transforms that "force" a
program to have
> memory safe behavior even if they do not report a bug (such as boosting the
> allocation size of allocas subject to load-widening).  ASAN may not be
> willing to do that (and understandably so).  I'm not sure what to
suggest.
>
> -- John T.
>
>
>
>  --kcc
>
>
>
>>
>> We won't catch some bugs in C/C++ code, but that's a natural
consequence
>> of deciding to permit certain out-of-bounds loads at the LLVM IR level,
>> IMHO.
>>
>> My two cents.
>>
>> -- John T.
>>
>> (*) All bets are off for unconventional systems, though.
>>
>>
>>
>>>>
>>>> Asan will not work for Fortran and Ada anyway (at least, out of
the
>>>> box).
>>>> I am not even sure that anything like asan is needed for Ada
(it has
>>>> bounds
>>>> checking built-in, the dynamic memory allocation is much more
>>>> restrictive).
>>>> The tool is rather specific to C/C++ (and ObjectiveC probably,
although
>>>> we have
>>>> almost no tests for ObjectiveC, nor much knowledge in it).
>>>> Yes, the IR transformations are done on the LLVM level, but the
asan
>>>> run-time
>>>> library heavily depends on the C/C++ semantics and even
implementation,
>>>> and you can't really separate the asan instrumentation pass
from the
>>>> run-time.
>>>>
>>> it's pretty disappointing to hear that asan is basically just
for C.
>>>  But since
>>> it is, I won't bother you anymore about this attribute (though
I still
>>> don't
>>> like it much).
>>>
>>> Ciao, Duncan.
>>>  _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120124/4a651d6c/attachment.html>

Duncan Sands

2012-Jan-25 08:27 UTC

head link

[LLVMdev] load widening conflicts with AddressSanitizer

Hi Chandler,
> On Tue, Jan 24, 2012 at 1:36 PM, Kostya Serebryany <kcc at google.com
> <mailto:kcc at google.com>> wrote:
>
>     ASAN *can* be modified this way (it will actually make instrumentation
~10%
>     cheaper).
>     But this mode will miss some bugs that the current mode finds.
>     I've seen at least a couple of such *real* bugs.
>
>     And these bugs are not only about exploitability, but also about
correctness.
>     If a program reads garbage, there is no simple way to statically prove
that
>     this garbage does not affect the program's behavior.
>
>
> We could go back to my original proposed fix -- come up with a specific way
to
> model a "read past the end" in the LLVM IR. Essentially capture
the act of load
> widening in the IR. I'm imagining 'load i8* %ptr as i64'. This
is a load that
> reads an i64 value from memory, but only fills the low 8 bits with a value,
the
> high bits are undef.
how about moving the load widening transformation to the code generators
instead (which is in essence where you are moving it)?  Unlike your suggestion,
this would have the disadvantage that you wouldn't get "knock on"
improvements
from the transform of the kind that the IR optimizers can do.  But are those
common/significant?

Ciao, Duncan.
>
> Then the analysis knows that the program cannot rely on the value in the
high
> bits, and does not flag an error. The code generator knows that the high
bits
> can be undef, and can emit the widened load to memory as appropriate. We
can
> teach the optimizers to *try* to transform C code forming these patterns
into
> such a load for canonicalization, and we can provide a __builtin_....(...)
> syntax for explicitly performing such a load.
>
> PS: The syntax might equally well be "load i64* %ptr as i8'; it
all depends on
> what the most natural way to form this pattern in IR is -- produce an i8
value
> or an i64 with undef high bits? require the pointer to be the wide type or
the
> narrow type? i'd have to look at how these would play out in IR to tell
what the
> best pattern is...
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Jan 2012 - [LLVMdev] load widening conflicts with AddressSanitizer

[LLVMdev] load widening conflicts with AddressSanitizer

[LLVMdev] load widening conflicts with AddressSanitizer

[LLVMdev] load widening conflicts with AddressSanitizer

[LLVMdev] load widening conflicts with AddressSanitizer

[LLVMdev] load widening conflicts with AddressSanitizer

[LLVMdev] load widening conflicts with AddressSanitizer

Reasonably Related Threads