thr3ads.net - llvm dev - [LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code? [Jun 2012]

If this information is useful, please help other people find it:
Share via:

Dmitry Vyukov

2012-Jun-21 07:21 UTC

[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

Hi,

Yes, stlport was a pain to deploy and maintain + it calls normal operator
new/delete (there is no way to put them into a separate namespace).

Note that in some codebases we build asan/tsan runtimes from source. How
the build process will look with that object file mangling? How easy it is
to integrate it into a custom build process?

Soon I will start integrating tsan into Go language. For the Go language we
need very simple object files. No global ctors, no thread-local storage, no
weak symbols and other trickery. Basically what a portable C compiler could
have produced.


On Wed, Jun 20, 2012 at 10:05 AM, Chandler Carruth <chandlerc at
google.com>wrote:
> Hello folks (and sorry if I've forgotten to CC anyone with particular
>>>>>> interest to this discussion...):
>>>>>>
>>>>>> I've been thinking a lot about how best to build
advanced runtime
>>>>>> libraries like ASan, and scale them up. Note that this
does *not* try to
>>>>>> address any licensing issues. For now, I'll
consider those orthogonal /
>>>>>> solvable w/o technical contortions. =]
>>>>>>
>>>>>> My primary motivation: we really, *really* need runtime
libraries to
>>>>>> be able to use common, shared libraries.
>>>>>>
>>>>>
>>>>> I am not sure you understand the problem as we do.
>>>>>
>>>>> In short, asan/tsan/msan/etc can not use any function which
is also
>>>>> called from the instrumented binary.
>>>>>
>>>>
>>>> Well, I can't be sure, but this description certainly
agrees with my
>>>> understanding -- you need *every* part of the runtime to be
completely
>>>> separate from *every* part of the instrumented binary. I'm
with you there.
>>>>
>>>> In particular, I think the current strategy for libc &
system calls
>>>> makes perfect sense, and I'm not trying to suggest changing
it.
>>>>
>>>> I think the most similar situation is is this one:
>>>>
>>>> In the previous version of ThreadSanitizer we used a private
copy of
>>>>> STLport in a separate namespace and a custom libc (small
subset).
>>>>>
>>>>
>>>> My proposal is very similar except without the need to modify
the C++
>>>> standard library in use. Instead, I'm suggesting
post-processing the
>>>> library to ensure that the standard C++ library code in the
runtime is kept
>>>> complete distinct from that in the instrumented binary --
everything would
>>>> in fact be *mangled* differently.
>>>>
>>>> The goal would be to avoid the maintenance overhead of a custom
C++
>>>> standard library, and instead use a normal one. My
understanding is that
>>>> both GCC's libstdc++ and LLVM's libc++ are
significantly higher quality
>>>> than STLport, and if we're doing static linking, the code
bloat should be
>>>> greatly reduced. We could reduce it still further by doing LTO
of the
>>>> runtime library, which should be very straight forward given
the rest of my
>>>> proposal.
>>>>
>>>> It would still require a very small subset of libc, likely not
much
>>>> more than you already have.
>>>>
>>>>  This worked, but had problems too (Dmitry was very angry at
STLport
>>>>> for code bloat, stack size increase and some direct libc
calls).
>>>>>
>>>>
>>>> I would be interested to know if the above addresses most of
the
>>>> problems or not.
>>>>
>>>>
>>>>>  Until recently this was not causing too much pain in
asan/tsan, but
>>>>> our attempts to use the LLVM DWARF readers made it worse.
>>>>> When tsan finds a race, we need to symbolize it online to
be able to
>>>>> match against a suppression and decide whether we want to
emit the warning.
>>>>> Today we do it in a separate addr2line process (ugly and
slow).
>>>>> But if we start calling the LLVM dwarf reader we end up
with all
>>>>> possible dependency problems (Dmitry and Alexey will know
the exact ones)
>>>>> because the LLVM code calls to malloc, memcpy, etc.
>>>>>
>>>>> Frankly, I don't have any solution other than to change
the code such
>>>>> that it does not call libc/libc++.
>>>>> Some of that may be solved by a private copy of STLport + a
bit of
>>>>> custom libc (but see above about STLport)
>>>>>
>>>>
>>>> I think my proposal is essentially in between these two:
>>>>
>>>> - Avoid the need for a low quality STL by using a normal C++
standard
>>>> library implementation, and avoid maintenance burden by doing a
link-time
>>>> mangling of the symbols.
>>>>
>>>
>>> re-linking might be too platform specific.
>>> How about compiling the library into LLVM bitcode and adding
>>> namespaces/prefixes to that bitcode?
>>>
>>
>> Re-linking is a bit platform specific...
>>
>> It would definitely work on ELF platforms, and likely on Darwin, but
>> Windows is tricky.
>>
>> On windows we would at least need a custom tool, but such a tool would
be
>> quite easy to write I suspect. We could even use the very LLVM
libraries in
>> question to write it! ;] Amusingly, I think with the LLVM libraries we
>> could very easily write a custom tool just to mangle the symbol names
in a
>> collection of object files very easily and have it work on *most*
platforms!
>>
>> Still, the bitcode idea is interesting. Doing this entirely in bitcode
>> has some advantages as these types of runtimes are among the best uses
for
>> things like LTO: they're small, performance sensitive, can
enumerate the
>> entry points easily, and are likely to have a particular need for dead
code
>> elimination.
>>
>
> One reason to want to have some support for doing this w/o bitcode: we may
> not have the bitcode. Specifically, the goal would be to use the
"normal"
> C++ standard library, provided it is available to link statically
> (libstdc++ and libc++ certainly are, I don't know about MSVC). That
would
> be much easier if we can actually use the existing archive file, and just
> "fix" the .o files inside it.
>
> It seems likely to be the equivalent of an 'ld -r' run with a
linker
> script to munge the symbol names, or potentially a custom tool written with
> the LLVM object file libraries.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120621/40442578/attachment.html>

Chandler Carruth

2012-Jun-21 07:52 UTC

head link

[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

On Thu, Jun 21, 2012 at 12:21 AM, Dmitry Vyukov <dvyukov at google.com>
wrote:
> Hi,
>
> Yes, stlport was a pain to deploy and maintain + it calls normal operator
> new/delete (there is no way to put them into a separate namespace).
>
Ok, but putting the raw symbols into a "namespace" with the linker
shouldn't be subject to these limitations.

Note that in some codebases we build asan/tsan runtimes from source.
How> the build process will look with that object file mangling? How easy it is
> to integrate it into a custom build process?
>
Well, I don't know yet. ;] It was an idea, I don't have an
implementation
at this point. That said, I had only really imagined building the runtimes
from source? Maybe I don't understand what you mean by this?

The vague strategy I am imagining for the build proces is this:

1) compile runtime into a static library, just like any other static library

2) collect all the '.o' files in the static archive, and in any
dependencies' static archive libraries

3) for each 'foo.o' build a 'foo_munged.o' using $tool, the
_munged version
has all symbols not on the whitelist for export to the instrumented binary

4) put all of the _munged '.o' files into a single runtime archive


The $tool here could be "ld -r" with a linker script, or (likely
necessary
on windows) a very simple, dedicated tool built around the LLVM object
libraries to copy each symbol, munging the name.


Soon I will start integrating tsan into Go language. For the Go language
we> need very simple object files.
>
Ok... I'm not sure whether this should really constrain the way we build
the core runtime system here though. If you need some logic on the tsan
side factored out into a separate library for use with Go, that would seem
simpler than trying to make one sanitizer runtime library to support
frontends, middle ends, and programming languages with totally separate
models.

No global ctors, no thread-local storage, no weak symbols and
other> trickery. Basically what a portable C compiler could have produced.
>
These also don't seem insurmountable, even in the existing use cases. But
maybe I'm not considering the actual restrictions you are, or I've
misunderstood. Here is how I'm breaking down the things you've
mentioned:

1) It seems reasonable to avoid global constructors, and do-able in C++
even when using the standard library and parts of LLVM. LLVM itself
specifically works to avoid them.

2) TLS doesn't seem to be required by anything I'm suggesting... is
there
something that worries you about this?

3) I don't understand the requirement to have no weak symbols. Even a
portable C compiler might produce weak symbols? Still, during the
re-linking phase above, it should be possible to resolve any weak symbols?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120621/dd0759bf/attachment.html>

Dmitry Vyukov

2012-Jun-21 08:04 UTC

head link

[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

On Thu, Jun 21, 2012 at 11:52 AM, Chandler Carruth <chandlerc at
google.com>wrote:>
> Hi,
>>
>> Yes, stlport was a pain to deploy and maintain + it calls normal
operator
>> new/delete (there is no way to put them into a separate namespace).
>>
>
> Ok, but putting the raw symbols into a "namespace" with the
linker
> shouldn't be subject to these limitations.
>
OK

>
>  Note that in some codebases we build asan/tsan runtimes from source. How
>> the build process will look with that object file mangling? How easy it
is
>> to integrate it into a custom build process?
>>
>
> Well, I don't know yet. ;] It was an idea, I don't have an
implementation
> at this point. That said, I had only really imagined building the runtimes
> from source? Maybe I don't understand what you mean by this?
>
> The vague strategy I am imagining for the build proces is this:
>
> 1) compile runtime into a static library, just like any other static
> library
>
> 2) collect all the '.o' files in the static archive, and in any
> dependencies' static archive libraries
>
> 3) for each 'foo.o' build a 'foo_munged.o' using $tool, the
_munged
> version has all symbols not on the whitelist for export to the instrumented
> binary
>
> 4) put all of the _munged '.o' files into a single runtime archive
>
>
> The $tool here could be "ld -r" with a linker script, or (likely
necessary
> on windows) a very simple, dedicated tool built around the LLVM object
> libraries to copy each symbol, munging the name.
>
>
> Soon I will start integrating tsan into Go language. For the Go language
>> we need very simple object files.
>>
>
> Ok... I'm not sure whether this should really constrain the way we
build
> the core runtime system here though. If you need some logic on the tsan
> side factored out into a separate library for use with Go, that would seem
> simpler than trying to make one sanitizer runtime library to support
> frontends, middle ends, and programming languages with totally separate
> models.
>
Yes, it will be a separate runtime library. But if tsan sources are deeply
dependent on llvm sources, this may be significantly harder to do.


No global ctors, no thread-local storage, no weak symbols and
other>> trickery. Basically what a portable C compiler could have produced.
>>
>
> These also don't seem insurmountable, even in the existing use cases.
But
> maybe I'm not considering the actual restrictions you are, or I've
> misunderstood. Here is how I'm breaking down the things you've
mentioned:
>
>
> 1) It seems reasonable to avoid global constructors, and do-able in C++
> even when using the standard library and parts of LLVM. LLVM itself
> specifically works to avoid them.
>
Is it the case for C++ library that llvm uses?

2) TLS doesn't seem to be required by anything I'm suggesting... is
there> something that worries you about this?
>
I suspect that C/C++ library can use them.

3) I don't understand the requirement to have no weak symbols. Even
a> portable C compiler might produce weak symbols?
>
The linker does not understand them.

> Still, during the re-linking phase above, it should be possible to resolve
> any weak symbols?
>
Well, most likely yes.

There may be additional limitations that I don't know yet.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120621/efa5262b/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Jun 2012 - [LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

Possibly Parallel Threads