Kostya Serebryany
2012-Jan-24  21:36 UTC
[LLVMdev] load widening conflicts with AddressSanitizer
On Tue, Jan 24, 2012 at 1:08 PM, John Criswell <criswell at illinois.edu>wrote:> On 1/24/12 2:31 PM, Duncan Sands wrote: > >> Hi Kostya, >> >> As far as I can see the C and C++ standards are not relevant. ASAN >>> works on >>> LLVM IR, not on C or C++. Lots of different languages have LLVM >>> frontends. I >>> personally turn Ada and Fortran into LLVM IR all the time for >>> example. Clearly >>> the C standard is not relevant to LLVM IR coming from such >>> languages. What >>> matters is how LLVM IR is defined. As far as I know this construct >>> is perfectly >>> valid in LLVM IR. >>> >> > > The issue here is that a load that reads data past the end of an alloca > can occur at the LLVM IR level in one of three ways: > > 1) Because the program at the original source-code level does it and is > incorrect. > 2) Because the program at the original source-code level does it and is > correct (although that must be a pretty wacky language). > 3) Load-widening introduces it when processing loads from allocas that are > properly aligned. > > As it is today, an analysis cannot look at the LLVM IR and know which > condition is causing the load to read data past the end of the memory > object. As such, tools like SAFECode and ASAN don't know when to relax > their run-time checks to permit such out-of-bounds reading; they either > have to relax it for all such loads (in which case a bug in the C source > code might slip through), or they have to report it all the time (and > report false positives for correct C programs). > > I assume Kostya's new attribute is a way to permit the LLVM IR to specify > whether such an out-of-bounds read is intentional or not. > > In my opinion, I don't think we should bother with an attribute. > Load-widening's behavior does not introduce exploitable code into the > program on commonly-used machines and operating systems(*), and incorrect > source code at the C source level that exhibits identical behavior isn't > exploitable, either.SAFECode can be enhanced so that the run-time checks for loads relax their> guarantees for aligned allocas that are subject to load-widening; I imagine > ASAN can be similarly modified. >ASAN *can* be modified this way (it will actually make instrumentation ~10% cheaper). But this mode will miss some bugs that the current mode finds. I've seen at least a couple of such *real* bugs. And these bugs are not only about exploitability, but also about correctness. If a program reads garbage, there is no simple way to statically prove that this garbage does not affect the program's behavior. --kcc> > We won't catch some bugs in C/C++ code, but that's a natural consequence > of deciding to permit certain out-of-bounds loads at the LLVM IR level, > IMHO. > > My two cents. > > -- John T. > > (*) All bets are off for unconventional systems, though. > > > >>> >>> Asan will not work for Fortran and Ada anyway (at least, out of the box). >>> I am not even sure that anything like asan is needed for Ada (it has >>> bounds >>> checking built-in, the dynamic memory allocation is much more >>> restrictive). >>> The tool is rather specific to C/C++ (and ObjectiveC probably, although >>> we have >>> almost no tests for ObjectiveC, nor much knowledge in it). >>> Yes, the IR transformations are done on the LLVM level, but the asan >>> run-time >>> library heavily depends on the C/C++ semantics and even implementation, >>> and you can't really separate the asan instrumentation pass from the >>> run-time. >>> >> it's pretty disappointing to hear that asan is basically just for C. But >> since >> it is, I won't bother you anymore about this attribute (though I still >> don't >> like it much). >> >> Ciao, Duncan. >> ______________________________**_________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120124/0fbd9f2f/attachment.html>
Chandler Carruth
2012-Jan-24  21:53 UTC
[LLVMdev] load widening conflicts with AddressSanitizer
On Tue, Jan 24, 2012 at 1:36 PM, Kostya Serebryany <kcc at google.com> wrote:> ASAN *can* be modified this way (it will actually make instrumentation > ~10% cheaper). > But this mode will miss some bugs that the current mode finds. > I've seen at least a couple of such *real* bugs. > > And these bugs are not only about exploitability, but also about > correctness. > If a program reads garbage, there is no simple way to statically prove > that this garbage does not affect the program's behavior. >We could go back to my original proposed fix -- come up with a specific way to model a "read past the end" in the LLVM IR. Essentially capture the act of load widening in the IR. I'm imagining 'load i8* %ptr as i64'. This is a load that reads an i64 value from memory, but only fills the low 8 bits with a value, the high bits are undef. Then the analysis knows that the program cannot rely on the value in the high bits, and does not flag an error. The code generator knows that the high bits can be undef, and can emit the widened load to memory as appropriate. We can teach the optimizers to *try* to transform C code forming these patterns into such a load for canonicalization, and we can provide a __builtin_....(...) syntax for explicitly performing such a load. PS: The syntax might equally well be "load i64* %ptr as i8'; it all depends on what the most natural way to form this pattern in IR is -- produce an i8 value or an i64 with undef high bits? require the pointer to be the wide type or the narrow type? i'd have to look at how these would play out in IR to tell what the best pattern is... -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120124/217b3743/attachment.html>
John Criswell
2012-Jan-24  22:00 UTC
[LLVMdev] load widening conflicts with AddressSanitizer
On 1/24/12 3:36 PM, Kostya Serebryany wrote:> > > On Tue, Jan 24, 2012 at 1:08 PM, John Criswell <criswell at illinois.edu > <mailto:criswell at illinois.edu>> wrote: > > On 1/24/12 2:31 PM, Duncan Sands wrote: > > Hi Kostya, > > As far as I can see the C and C++ standards are not > relevant. ASAN works on > LLVM IR, not on C or C++. Lots of different languages > have LLVM frontends. I > personally turn Ada and Fortran into LLVM IR all the > time for example. Clearly > the C standard is not relevant to LLVM IR coming from > such languages. What > matters is how LLVM IR is defined. As far as I know > this construct is perfectly > valid in LLVM IR. > > > > The issue here is that a load that reads data past the end of an > alloca can occur at the LLVM IR level in one of three ways: > > 1) Because the program at the original source-code level does it > and is incorrect. > 2) Because the program at the original source-code level does it > and is correct (although that must be a pretty wacky language). > 3) Load-widening introduces it when processing loads from allocas > that are properly aligned. > > As it is today, an analysis cannot look at the LLVM IR and know > which condition is causing the load to read data past the end of > the memory object. As such, tools like SAFECode and ASAN don't > know when to relax their run-time checks to permit such > out-of-bounds reading; they either have to relax it for all such > loads (in which case a bug in the C source code might slip > through), or they have to report it all the time (and report false > positives for correct C programs). > > I assume Kostya's new attribute is a way to permit the LLVM IR to > specify whether such an out-of-bounds read is intentional or not. > > In my opinion, I don't think we should bother with an attribute. > Load-widening's behavior does not introduce exploitable code into > the program on commonly-used machines and operating systems(*), > and incorrect source code at the C source level that exhibits > identical behavior isn't exploitable, either. > > > SAFECode can be enhanced so that the run-time checks for loads > relax their guarantees for aligned allocas that are subject to > load-widening; I imagine ASAN can be similarly modified. > > ASAN *can* be modified this way (it will actually make instrumentation > ~10% cheaper). > But this mode will miss some bugs that the current mode finds. > I've seen at least a couple of such *real* bugs.Yes, I understand. My question is how many such bugs have you seen that involve loads *and* allocas aligned in such a way that the load-widening optimization triggers.> > And these bugs are not only about exploitability, but also about > correctness. > If a program reads garbage, there is no simple way to statically prove > that this garbage does not affect the program's behavior.Hrm. Actually, by relaxing the safety guarantees, SAFECode and ASAN may fail to detect exploitable behavior in the original program, so I take back my original comment. That said, it's a pretty obscure attack, so it's pretty low on my list of things to worry about. For me, the right way to go (barring a change in opinion from Chris) is to either disable the load-widening transform, transform the allocas to be larger, or to relax the safety guarantees. The problem with attributes is that they are brittle; you have to make sure they get added to the right instructions, then you have to make sure they don't get removed by optimizations. For SAFECode, I'm alright with transforms that "force" a program to have memory safe behavior even if they do not report a bug (such as boosting the allocation size of allocas subject to load-widening). ASAN may not be willing to do that (and understandably so). I'm not sure what to suggest. -- John T.> > --kcc > > > We won't catch some bugs in C/C++ code, but that's a natural > consequence of deciding to permit certain out-of-bounds loads at > the LLVM IR level, IMHO. > > My two cents. > > -- John T. > > (*) All bets are off for unconventional systems, though. > > > > > Asan will not work for Fortran and Ada anyway (at least, > out of the box). > I am not even sure that anything like asan is needed for > Ada (it has bounds > checking built-in, the dynamic memory allocation is much > more restrictive). > The tool is rather specific to C/C++ (and ObjectiveC > probably, although we have > almost no tests for ObjectiveC, nor much knowledge in it). > Yes, the IR transformations are done on the LLVM level, > but the asan run-time > library heavily depends on the C/C++ semantics and even > implementation, > and you can't really separate the asan instrumentation > pass from the run-time. > > it's pretty disappointing to hear that asan is basically just > for C. But since > it is, I won't bother you anymore about this attribute (though > I still don't > like it much). > > Ciao, Duncan. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> > http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120124/4176158a/attachment.html>
Kostya Serebryany
2012-Jan-24  22:08 UTC
[LLVMdev] load widening conflicts with AddressSanitizer
On Tue, Jan 24, 2012 at 1:53 PM, Chandler Carruth <chandlerc at google.com>wrote:> On Tue, Jan 24, 2012 at 1:36 PM, Kostya Serebryany <kcc at google.com> wrote: > >> ASAN *can* be modified this way (it will actually make instrumentation >> ~10% cheaper). >> But this mode will miss some bugs that the current mode finds. >> I've seen at least a couple of such *real* bugs. >> >> And these bugs are not only about exploitability, but also about >> correctness. >> If a program reads garbage, there is no simple way to statically prove >> that this garbage does not affect the program's behavior. >> > > We could go back to my original proposed fix -- come up with a specific > way to model a "read past the end" in the LLVM IR. Essentially capture the > act of load widening in the IR. I'm imagining 'load i8* %ptr as i64'. This > is a load that reads an i64 value from memory, but only fills the low 8 > bits with a value, the high bits are undef. >I don't *feel* the LLVM IR too well yet, but I would expect such change to be a very disruptive revolution. Every transformation will have to learn about the new instruction (or instruction attribute, whatever) -- all for a relatively little value. --kcc> > Then the analysis knows that the program cannot rely on the value in the > high bits, and does not flag an error. The code generator knows that the > high bits can be undef, and can emit the widened load to memory as > appropriate. We can teach the optimizers to *try* to transform C code > forming these patterns into such a load for canonicalization, and we can > provide a __builtin_....(...) syntax for explicitly performing such a load. > > PS: The syntax might equally well be "load i64* %ptr as i8'; it all > depends on what the most natural way to form this pattern in IR is -- > produce an i8 value or an i64 with undef high bits? require the pointer to > be the wide type or the narrow type? i'd have to look at how these would > play out in IR to tell what the best pattern is... >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120124/235bc34e/attachment.html>
Kostya Serebryany
2012-Jan-24  22:19 UTC
[LLVMdev] load widening conflicts with AddressSanitizer
On Tue, Jan 24, 2012 at 2:00 PM, John Criswell <criswell at illinois.edu>wrote:> On 1/24/12 3:36 PM, Kostya Serebryany wrote: > > > > On Tue, Jan 24, 2012 at 1:08 PM, John Criswell <criswell at illinois.edu>wrote: > >> On 1/24/12 2:31 PM, Duncan Sands wrote: >> >>> Hi Kostya, >>> >>> As far as I can see the C and C++ standards are not relevant. ASAN >>>> works on >>>> LLVM IR, not on C or C++. Lots of different languages have LLVM >>>> frontends. I >>>> personally turn Ada and Fortran into LLVM IR all the time for >>>> example. Clearly >>>> the C standard is not relevant to LLVM IR coming from such >>>> languages. What >>>> matters is how LLVM IR is defined. As far as I know this construct >>>> is perfectly >>>> valid in LLVM IR. >>>> >>> >> >> The issue here is that a load that reads data past the end of an alloca >> can occur at the LLVM IR level in one of three ways: >> >> 1) Because the program at the original source-code level does it and is >> incorrect. >> 2) Because the program at the original source-code level does it and is >> correct (although that must be a pretty wacky language). >> 3) Load-widening introduces it when processing loads from allocas that >> are properly aligned. >> >> As it is today, an analysis cannot look at the LLVM IR and know which >> condition is causing the load to read data past the end of the memory >> object. As such, tools like SAFECode and ASAN don't know when to relax >> their run-time checks to permit such out-of-bounds reading; they either >> have to relax it for all such loads (in which case a bug in the C source >> code might slip through), or they have to report it all the time (and >> report false positives for correct C programs). >> >> I assume Kostya's new attribute is a way to permit the LLVM IR to specify >> whether such an out-of-bounds read is intentional or not. >> >> In my opinion, I don't think we should bother with an attribute. >> Load-widening's behavior does not introduce exploitable code into the >> program on commonly-used machines and operating systems(*), and incorrect >> source code at the C source level that exhibits identical behavior isn't >> exploitable, either. > > > SAFECode can be enhanced so that the run-time checks for loads relax >> their guarantees for aligned allocas that are subject to load-widening; I >> imagine ASAN can be similarly modified. >> > > ASAN *can* be modified this way (it will actually make instrumentation > ~10% cheaper). > But this mode will miss some bugs that the current mode finds. > I've seen at least a couple of such *real* bugs. > > > Yes, I understand. My question is how many such bugs have you seen that > involve loads *and* allocas aligned in such a way that the load-widening > optimization triggers. >So far I've seen two cases where the patch in MemoryDependenceAnalysis.cpp (above) will help. First, bug reported by Mozilla folks: http://code.google.com/p/address-sanitizer/issues/detail?id=20#c1 Second, false warning while doing asan/clang bootstrap. There is a struct LVFlags in clang which contains 3 bools. They've got lowered to a 32-bit load. [Or you asked something different?] We build a lot of our code with asan/O1 and are not moving to asan/O2 because of this problem. BTW, I have a clang/asan 3-stage bootstrap working locally. Once this bug is fixed, we plan to set a continuous clang/asan bootstrap bot.> > > > And these bugs are not only about exploitability, but also about > correctness. > If a program reads garbage, there is no simple way to statically prove > that this garbage does not affect the program's behavior. > > > Hrm. Actually, by relaxing the safety guarantees, SAFECode and ASAN may > fail to detect exploitable behavior in the original program, so I take back > my original comment. That said, it's a pretty obscure attack, so it's > pretty low on my list of things to worry about. > > For me, the right way to go (barring a change in opinion from Chris) is to > either disable the load-widening transform, >This is what the patch does essentially. It disables a subset of load-widening when asan is on. We can disable all cases of load-widening, but that may cost a bit of performance (under asan).> transform the allocas to be larger, or to relax the safety guarantees. > The problem with attributes is that they are brittle; you have to make sure > they get added to the right instructions, then you have to make sure they > don't get removed by optimizations. >This is a function attribute, much more stable. --kcc> > For SAFECode, I'm alright with transforms that "force" a program to have > memory safe behavior even if they do not report a bug (such as boosting the > allocation size of allocas subject to load-widening). ASAN may not be > willing to do that (and understandably so). I'm not sure what to suggest. > > -- John T. > > > > --kcc > > > >> >> We won't catch some bugs in C/C++ code, but that's a natural consequence >> of deciding to permit certain out-of-bounds loads at the LLVM IR level, >> IMHO. >> >> My two cents. >> >> -- John T. >> >> (*) All bets are off for unconventional systems, though. >> >> >> >>>> >>>> Asan will not work for Fortran and Ada anyway (at least, out of the >>>> box). >>>> I am not even sure that anything like asan is needed for Ada (it has >>>> bounds >>>> checking built-in, the dynamic memory allocation is much more >>>> restrictive). >>>> The tool is rather specific to C/C++ (and ObjectiveC probably, although >>>> we have >>>> almost no tests for ObjectiveC, nor much knowledge in it). >>>> Yes, the IR transformations are done on the LLVM level, but the asan >>>> run-time >>>> library heavily depends on the C/C++ semantics and even implementation, >>>> and you can't really separate the asan instrumentation pass from the >>>> run-time. >>>> >>> it's pretty disappointing to hear that asan is basically just for C. >>> But since >>> it is, I won't bother you anymore about this attribute (though I still >>> don't >>> like it much). >>> >>> Ciao, Duncan. >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120124/4a651d6c/attachment.html>
Duncan Sands
2012-Jan-25  08:27 UTC
[LLVMdev] load widening conflicts with AddressSanitizer
Hi Chandler,> On Tue, Jan 24, 2012 at 1:36 PM, Kostya Serebryany <kcc at google.com > <mailto:kcc at google.com>> wrote: > > ASAN *can* be modified this way (it will actually make instrumentation ~10% > cheaper). > But this mode will miss some bugs that the current mode finds. > I've seen at least a couple of such *real* bugs. > > And these bugs are not only about exploitability, but also about correctness. > If a program reads garbage, there is no simple way to statically prove that > this garbage does not affect the program's behavior. > > > We could go back to my original proposed fix -- come up with a specific way to > model a "read past the end" in the LLVM IR. Essentially capture the act of load > widening in the IR. I'm imagining 'load i8* %ptr as i64'. This is a load that > reads an i64 value from memory, but only fills the low 8 bits with a value, the > high bits are undef.how about moving the load widening transformation to the code generators instead (which is in essence where you are moving it)? Unlike your suggestion, this would have the disadvantage that you wouldn't get "knock on" improvements from the transform of the kind that the IR optimizers can do. But are those common/significant? Ciao, Duncan.> > Then the analysis knows that the program cannot rely on the value in the high > bits, and does not flag an error. The code generator knows that the high bits > can be undef, and can emit the widened load to memory as appropriate. We can > teach the optimizers to *try* to transform C code forming these patterns into > such a load for canonicalization, and we can provide a __builtin_....(...) > syntax for explicitly performing such a load. > > PS: The syntax might equally well be "load i64* %ptr as i8'; it all depends on > what the most natural way to form this pattern in IR is -- produce an i8 value > or an i64 with undef high bits? require the pointer to be the wide type or the > narrow type? i'd have to look at how these would play out in IR to tell what the > best pattern is... > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Maybe Matching Threads
- [LLVMdev] load widening conflicts with AddressSanitizer
- [LLVMdev] load widening conflicts with AddressSanitizer
- [LLVMdev] load widening conflicts with AddressSanitizer
- [LLVMdev] load widening conflicts with AddressSanitizer
- [LLVMdev] load widening conflicts with AddressSanitizer