Dmitry Vyukov
2012-Jun-21 07:21 UTC
[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?
Hi, Yes, stlport was a pain to deploy and maintain + it calls normal operator new/delete (there is no way to put them into a separate namespace). Note that in some codebases we build asan/tsan runtimes from source. How the build process will look with that object file mangling? How easy it is to integrate it into a custom build process? Soon I will start integrating tsan into Go language. For the Go language we need very simple object files. No global ctors, no thread-local storage, no weak symbols and other trickery. Basically what a portable C compiler could have produced. On Wed, Jun 20, 2012 at 10:05 AM, Chandler Carruth <chandlerc at google.com>wrote:> Hello folks (and sorry if I've forgotten to CC anyone with particular >>>>>> interest to this discussion...): >>>>>> >>>>>> I've been thinking a lot about how best to build advanced runtime >>>>>> libraries like ASan, and scale them up. Note that this does *not* try to >>>>>> address any licensing issues. For now, I'll consider those orthogonal / >>>>>> solvable w/o technical contortions. =] >>>>>> >>>>>> My primary motivation: we really, *really* need runtime libraries to >>>>>> be able to use common, shared libraries. >>>>>> >>>>> >>>>> I am not sure you understand the problem as we do. >>>>> >>>>> In short, asan/tsan/msan/etc can not use any function which is also >>>>> called from the instrumented binary. >>>>> >>>> >>>> Well, I can't be sure, but this description certainly agrees with my >>>> understanding -- you need *every* part of the runtime to be completely >>>> separate from *every* part of the instrumented binary. I'm with you there. >>>> >>>> In particular, I think the current strategy for libc & system calls >>>> makes perfect sense, and I'm not trying to suggest changing it. >>>> >>>> I think the most similar situation is is this one: >>>> >>>> In the previous version of ThreadSanitizer we used a private copy of >>>>> STLport in a separate namespace and a custom libc (small subset). >>>>> >>>> >>>> My proposal is very similar except without the need to modify the C++ >>>> standard library in use. Instead, I'm suggesting post-processing the >>>> library to ensure that the standard C++ library code in the runtime is kept >>>> complete distinct from that in the instrumented binary -- everything would >>>> in fact be *mangled* differently. >>>> >>>> The goal would be to avoid the maintenance overhead of a custom C++ >>>> standard library, and instead use a normal one. My understanding is that >>>> both GCC's libstdc++ and LLVM's libc++ are significantly higher quality >>>> than STLport, and if we're doing static linking, the code bloat should be >>>> greatly reduced. We could reduce it still further by doing LTO of the >>>> runtime library, which should be very straight forward given the rest of my >>>> proposal. >>>> >>>> It would still require a very small subset of libc, likely not much >>>> more than you already have. >>>> >>>> This worked, but had problems too (Dmitry was very angry at STLport >>>>> for code bloat, stack size increase and some direct libc calls). >>>>> >>>> >>>> I would be interested to know if the above addresses most of the >>>> problems or not. >>>> >>>> >>>>> Until recently this was not causing too much pain in asan/tsan, but >>>>> our attempts to use the LLVM DWARF readers made it worse. >>>>> When tsan finds a race, we need to symbolize it online to be able to >>>>> match against a suppression and decide whether we want to emit the warning. >>>>> Today we do it in a separate addr2line process (ugly and slow). >>>>> But if we start calling the LLVM dwarf reader we end up with all >>>>> possible dependency problems (Dmitry and Alexey will know the exact ones) >>>>> because the LLVM code calls to malloc, memcpy, etc. >>>>> >>>>> Frankly, I don't have any solution other than to change the code such >>>>> that it does not call libc/libc++. >>>>> Some of that may be solved by a private copy of STLport + a bit of >>>>> custom libc (but see above about STLport) >>>>> >>>> >>>> I think my proposal is essentially in between these two: >>>> >>>> - Avoid the need for a low quality STL by using a normal C++ standard >>>> library implementation, and avoid maintenance burden by doing a link-time >>>> mangling of the symbols. >>>> >>> >>> re-linking might be too platform specific. >>> How about compiling the library into LLVM bitcode and adding >>> namespaces/prefixes to that bitcode? >>> >> >> Re-linking is a bit platform specific... >> >> It would definitely work on ELF platforms, and likely on Darwin, but >> Windows is tricky. >> >> On windows we would at least need a custom tool, but such a tool would be >> quite easy to write I suspect. We could even use the very LLVM libraries in >> question to write it! ;] Amusingly, I think with the LLVM libraries we >> could very easily write a custom tool just to mangle the symbol names in a >> collection of object files very easily and have it work on *most* platforms! >> >> Still, the bitcode idea is interesting. Doing this entirely in bitcode >> has some advantages as these types of runtimes are among the best uses for >> things like LTO: they're small, performance sensitive, can enumerate the >> entry points easily, and are likely to have a particular need for dead code >> elimination. >> > > One reason to want to have some support for doing this w/o bitcode: we may > not have the bitcode. Specifically, the goal would be to use the "normal" > C++ standard library, provided it is available to link statically > (libstdc++ and libc++ certainly are, I don't know about MSVC). That would > be much easier if we can actually use the existing archive file, and just > "fix" the .o files inside it. > > It seems likely to be the equivalent of an 'ld -r' run with a linker > script to munge the symbol names, or potentially a custom tool written with > the LLVM object file libraries. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120621/40442578/attachment.html>
Chandler Carruth
2012-Jun-21 07:52 UTC
[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?
On Thu, Jun 21, 2012 at 12:21 AM, Dmitry Vyukov <dvyukov at google.com> wrote:> Hi, > > Yes, stlport was a pain to deploy and maintain + it calls normal operator > new/delete (there is no way to put them into a separate namespace). >Ok, but putting the raw symbols into a "namespace" with the linker shouldn't be subject to these limitations. Note that in some codebases we build asan/tsan runtimes from source. How> the build process will look with that object file mangling? How easy it is > to integrate it into a custom build process? >Well, I don't know yet. ;] It was an idea, I don't have an implementation at this point. That said, I had only really imagined building the runtimes from source? Maybe I don't understand what you mean by this? The vague strategy I am imagining for the build proces is this: 1) compile runtime into a static library, just like any other static library 2) collect all the '.o' files in the static archive, and in any dependencies' static archive libraries 3) for each 'foo.o' build a 'foo_munged.o' using $tool, the _munged version has all symbols not on the whitelist for export to the instrumented binary 4) put all of the _munged '.o' files into a single runtime archive The $tool here could be "ld -r" with a linker script, or (likely necessary on windows) a very simple, dedicated tool built around the LLVM object libraries to copy each symbol, munging the name. Soon I will start integrating tsan into Go language. For the Go language we> need very simple object files. >Ok... I'm not sure whether this should really constrain the way we build the core runtime system here though. If you need some logic on the tsan side factored out into a separate library for use with Go, that would seem simpler than trying to make one sanitizer runtime library to support frontends, middle ends, and programming languages with totally separate models. No global ctors, no thread-local storage, no weak symbols and other> trickery. Basically what a portable C compiler could have produced. >These also don't seem insurmountable, even in the existing use cases. But maybe I'm not considering the actual restrictions you are, or I've misunderstood. Here is how I'm breaking down the things you've mentioned: 1) It seems reasonable to avoid global constructors, and do-able in C++ even when using the standard library and parts of LLVM. LLVM itself specifically works to avoid them. 2) TLS doesn't seem to be required by anything I'm suggesting... is there something that worries you about this? 3) I don't understand the requirement to have no weak symbols. Even a portable C compiler might produce weak symbols? Still, during the re-linking phase above, it should be possible to resolve any weak symbols? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120621/dd0759bf/attachment.html>
Dmitry Vyukov
2012-Jun-21 08:04 UTC
[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?
On Thu, Jun 21, 2012 at 11:52 AM, Chandler Carruth <chandlerc at google.com>wrote:> > Hi, >> >> Yes, stlport was a pain to deploy and maintain + it calls normal operator >> new/delete (there is no way to put them into a separate namespace). >> > > Ok, but putting the raw symbols into a "namespace" with the linker > shouldn't be subject to these limitations. >OK> > Note that in some codebases we build asan/tsan runtimes from source. How >> the build process will look with that object file mangling? How easy it is >> to integrate it into a custom build process? >> > > Well, I don't know yet. ;] It was an idea, I don't have an implementation > at this point. That said, I had only really imagined building the runtimes > from source? Maybe I don't understand what you mean by this? > > The vague strategy I am imagining for the build proces is this: > > 1) compile runtime into a static library, just like any other static > library > > 2) collect all the '.o' files in the static archive, and in any > dependencies' static archive libraries > > 3) for each 'foo.o' build a 'foo_munged.o' using $tool, the _munged > version has all symbols not on the whitelist for export to the instrumented > binary > > 4) put all of the _munged '.o' files into a single runtime archive > > > The $tool here could be "ld -r" with a linker script, or (likely necessary > on windows) a very simple, dedicated tool built around the LLVM object > libraries to copy each symbol, munging the name. > > > Soon I will start integrating tsan into Go language. For the Go language >> we need very simple object files. >> > > Ok... I'm not sure whether this should really constrain the way we build > the core runtime system here though. If you need some logic on the tsan > side factored out into a separate library for use with Go, that would seem > simpler than trying to make one sanitizer runtime library to support > frontends, middle ends, and programming languages with totally separate > models. >Yes, it will be a separate runtime library. But if tsan sources are deeply dependent on llvm sources, this may be significantly harder to do. No global ctors, no thread-local storage, no weak symbols and other>> trickery. Basically what a portable C compiler could have produced. >> > > These also don't seem insurmountable, even in the existing use cases. But > maybe I'm not considering the actual restrictions you are, or I've > misunderstood. Here is how I'm breaking down the things you've mentioned: >> > 1) It seems reasonable to avoid global constructors, and do-able in C++ > even when using the standard library and parts of LLVM. LLVM itself > specifically works to avoid them. >Is it the case for C++ library that llvm uses? 2) TLS doesn't seem to be required by anything I'm suggesting... is there> something that worries you about this? >I suspect that C/C++ library can use them. 3) I don't understand the requirement to have no weak symbols. Even a> portable C compiler might produce weak symbols? >The linker does not understand them.> Still, during the re-linking phase above, it should be possible to resolve > any weak symbols? >Well, most likely yes. There may be additional limitations that I don't know yet. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120621/efa5262b/attachment.html>
Reasonably Related Threads
- [LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?
- [LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?
- [LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?
- [LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?
- [LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?