Alexander Potapenko
2013-Mar-21 14:05 UTC
[LLVMdev] (Not) instrumenting global string literals that end up in .cstrings on Mac
(forgot to CC llvmdev) On Thu, Mar 21, 2013 at 5:54 PM, Alexander Potapenko <glider at google.com> wrote:> Hey Anna, Nick, Ted, > > We've the following problem with string literals under ASan on Mac. > Some global string constants end up being put into the .cstring > section, for which the following rules apply: > - the strings can't contain zeroes in their bodies > - the link editor places only one copy of each literal into the > output file's section > > ASan usually instruments the globals by adding redzones to the end of > them and creating a structure that contains the size of a global with > and without the redzone. > For the aforementioned strings the linker will delete the redzones, > but leave that structure untouched, which will lead to corrupt shadow > memory at run time. > > Unfortunately at instrumentation time we can't tell for sure whether > the string constant will be put into the .cstring section or not - the > decision is taken at lowering time. > https://code.google.com/p/address-sanitizer/issues/detail?id=171 > contains the writeup of the problem and a couple of suggestions on how > it can be solved. But we aren't sure that any of the solutions is > correct. > I wonder if it's at all possible to understand that a given string > constant is going to end up in a mergeable section. Otherwise, is it > possible to make every string literal live in a non-mergeable section > by setting the section name explicitly? > > TIA, > Alex
Nick Kledzik
2013-Mar-21 19:03 UTC
[LLVMdev] (Not) instrumenting global string literals that end up in .cstrings on Mac
Alexander, On Darwin the "__cstring" section (really section with type S_CSTRING_LITERAL) is defined to contain zero terminate strings of bytes that the linker can merge and re-order. If you want pad bytes before and after the string, you need to put the strings in a different section (e.g. __TEXT, __const). But, CF/NSString literals will be problematic. The compiler emits a static NS/CFString object into a data section. That object contains a pointer to its "backing" utf8 or utf16 string literal. The linker coalesce the NS/CFString objects (so that two translation units that define @"hello" will wind up using the same object). But to tell if two CF/NSString objects are the same, the linker must compare the string literal they point to. And in that check is an assertion that the string is in a __cstring or __ustring (utf16) section. So, putting the backing string for a CF/NSString into another section will cause a linker assertion. -Nick On Mar 21, 2013, at 7:05 AM, Alexander Potapenko <glider at google.com> wrote:> (forgot to CC llvmdev) > > On Thu, Mar 21, 2013 at 5:54 PM, Alexander Potapenko <glider at google.com> wrote: >> Hey Anna, Nick, Ted, >> >> We've the following problem with string literals under ASan on Mac. >> Some global string constants end up being put into the .cstring >> section, for which the following rules apply: >> - the strings can't contain zeroes in their bodies >> - the link editor places only one copy of each literal into the >> output file's section >> >> ASan usually instruments the globals by adding redzones to the end of >> them and creating a structure that contains the size of a global with >> and without the redzone. >> For the aforementioned strings the linker will delete the redzones, >> but leave that structure untouched, which will lead to corrupt shadow >> memory at run time. >> >> Unfortunately at instrumentation time we can't tell for sure whether >> the string constant will be put into the .cstring section or not - the >> decision is taken at lowering time. >> https://code.google.com/p/address-sanitizer/issues/detail?id=171 >> contains the writeup of the problem and a couple of suggestions on how >> it can be solved. But we aren't sure that any of the solutions is >> correct. >> I wonder if it's at all possible to understand that a given string >> constant is going to end up in a mergeable section. Otherwise, is it >> possible to make every string literal live in a non-mergeable section >> by setting the section name explicitly? >> >> TIA, >> Alex
Anna Zaks
2013-Mar-21 23:22 UTC
[LLVMdev] (Not) instrumenting global string literals that end up in .cstrings on Mac
Alex, I think finding a superset of globals that will end up in the "__cstring" section and not adding red zones to them is reasonable. You might be able to factor out the code that makes the decision but does not involve TargetMachine (ex: some of TargetLoweringObjectFile::getKindForGlobal). These are all constants anyway, so we are only loosing checks for invalid reads, not invalid writes. There might be other, better solutions; I am not sure.. Cheers, Anna. On Mar 21, 2013, at 12:03 PM, Nick Kledzik <kledzik at apple.com> wrote:> Alexander, > > On Darwin the "__cstring" section (really section with type S_CSTRING_LITERAL) is defined to contain zero terminate strings of bytes that the linker can merge and re-order. If you want pad bytes before and after the string, you need to put the strings in a different section (e.g. __TEXT, __const). > > But, CF/NSString literals will be problematic. The compiler emits a static NS/CFString object into a data section. That object contains a pointer to its "backing" utf8 or utf16 string literal. The linker coalesce the NS/CFString objects (so that two translation units that define @"hello" will wind up using the same object). But to tell if two CF/NSString objects are the same, the linker must compare the string literal they point to. And in that check is an assertion that the string is in a __cstring or __ustring (utf16) section. So, putting the backing string for a CF/NSString into another section will cause a linker assertion. > > -Nick > > > On Mar 21, 2013, at 7:05 AM, Alexander Potapenko <glider at google.com> wrote: >> (forgot to CC llvmdev) >> >> On Thu, Mar 21, 2013 at 5:54 PM, Alexander Potapenko <glider at google.com> wrote: >>> Hey Anna, Nick, Ted, >>> >>> We've the following problem with string literals under ASan on Mac. >>> Some global string constants end up being put into the .cstring >>> section, for which the following rules apply: >>> - the strings can't contain zeroes in their bodies >>> - the link editor places only one copy of each literal into the >>> output file's section >>> >>> ASan usually instruments the globals by adding redzones to the end of >>> them and creating a structure that contains the size of a global with >>> and without the redzone. >>> For the aforementioned strings the linker will delete the redzones, >>> but leave that structure untouched, which will lead to corrupt shadow >>> memory at run time. >>> >>> Unfortunately at instrumentation time we can't tell for sure whether >>> the string constant will be put into the .cstring section or not - the >>> decision is taken at lowering time. >>> https://code.google.com/p/address-sanitizer/issues/detail?id=171 >>> contains the writeup of the problem and a couple of suggestions on how >>> it can be solved. But we aren't sure that any of the solutions is >>> correct. >>> I wonder if it's at all possible to understand that a given string >>> constant is going to end up in a mergeable section. Otherwise, is it >>> possible to make every string literal live in a non-mergeable section >>> by setting the section name explicitly? >>> >>> TIA, >>> Alex-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130321/e318b67e/attachment.html>
Reasonably Related Threads
- [LLVMdev] (Not) instrumenting global string literals that end up in .cstrings on Mac
- [LLVMdev] RFC: -fwritable-strings Change
- [LLVMdev] RFC: -fwritable-strings Change
- [LLVMdev] How to represent zero-sized string?
- [LLVMdev] How to represent zero-sized string?