thr3ads.net - llvm dev - [LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code? [Jun 2012]

If this information is useful, please help other people find it:
Share via:

Dmitry Vyukov

2012-Jun-21 08:04 UTC

[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

On Thu, Jun 21, 2012 at 11:52 AM, Chandler Carruth <chandlerc at
google.com>wrote:>
> Hi,
>>
>> Yes, stlport was a pain to deploy and maintain + it calls normal
operator
>> new/delete (there is no way to put them into a separate namespace).
>>
>
> Ok, but putting the raw symbols into a "namespace" with the
linker
> shouldn't be subject to these limitations.
>
OK

>
>  Note that in some codebases we build asan/tsan runtimes from source. How
>> the build process will look with that object file mangling? How easy it
is
>> to integrate it into a custom build process?
>>
>
> Well, I don't know yet. ;] It was an idea, I don't have an
implementation
> at this point. That said, I had only really imagined building the runtimes
> from source? Maybe I don't understand what you mean by this?
>
> The vague strategy I am imagining for the build proces is this:
>
> 1) compile runtime into a static library, just like any other static
> library
>
> 2) collect all the '.o' files in the static archive, and in any
> dependencies' static archive libraries
>
> 3) for each 'foo.o' build a 'foo_munged.o' using $tool, the
_munged
> version has all symbols not on the whitelist for export to the instrumented
> binary
>
> 4) put all of the _munged '.o' files into a single runtime archive
>
>
> The $tool here could be "ld -r" with a linker script, or (likely
necessary
> on windows) a very simple, dedicated tool built around the LLVM object
> libraries to copy each symbol, munging the name.
>
>
> Soon I will start integrating tsan into Go language. For the Go language
>> we need very simple object files.
>>
>
> Ok... I'm not sure whether this should really constrain the way we
build
> the core runtime system here though. If you need some logic on the tsan
> side factored out into a separate library for use with Go, that would seem
> simpler than trying to make one sanitizer runtime library to support
> frontends, middle ends, and programming languages with totally separate
> models.
>
Yes, it will be a separate runtime library. But if tsan sources are deeply
dependent on llvm sources, this may be significantly harder to do.


No global ctors, no thread-local storage, no weak symbols and
other>> trickery. Basically what a portable C compiler could have produced.
>>
>
> These also don't seem insurmountable, even in the existing use cases.
But
> maybe I'm not considering the actual restrictions you are, or I've
> misunderstood. Here is how I'm breaking down the things you've
mentioned:
>
>
> 1) It seems reasonable to avoid global constructors, and do-able in C++
> even when using the standard library and parts of LLVM. LLVM itself
> specifically works to avoid them.
>
Is it the case for C++ library that llvm uses?

2) TLS doesn't seem to be required by anything I'm suggesting... is
there> something that worries you about this?
>
I suspect that C/C++ library can use them.

3) I don't understand the requirement to have no weak symbols. Even
a> portable C compiler might produce weak symbols?
>
The linker does not understand them.

> Still, during the re-linking phase above, it should be possible to resolve
> any weak symbols?
>
Well, most likely yes.

There may be additional limitations that I don't know yet.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120621/efa5262b/attachment.html>

Chandler Carruth

2012-Jun-21 08:10 UTC

head link

[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

On Thu, Jun 21, 2012 at 1:04 AM, Dmitry Vyukov <dvyukov at google.com>
wrote:
> On Thu, Jun 21, 2012 at 11:52 AM, Chandler Carruth <chandlerc at
google.com>wrote:
>>
>>  Hi,
>>>
>>> Yes, stlport was a pain to deploy and maintain + it calls normal
>>> operator new/delete (there is no way to put them into a separate
namespace).
>>>
>>
>> Ok, but putting the raw symbols into a "namespace" with the
linker
>> shouldn't be subject to these limitations.
>>
>
> OK
>
>
>>
>>  Note that in some codebases we build asan/tsan runtimes from source.
>>> How the build process will look with that object file mangling? How
easy it
>>> is to integrate it into a custom build process?
>>>
>>
>> Well, I don't know yet. ;] It was an idea, I don't have an
implementation
>> at this point. That said, I had only really imagined building the
runtimes
>> from source? Maybe I don't understand what you mean by this?
>>
>> The vague strategy I am imagining for the build proces is this:
>>
>> 1) compile runtime into a static library, just like any other static
>> library
>>
>> 2) collect all the '.o' files in the static archive, and in any
>> dependencies' static archive libraries
>>
>> 3) for each 'foo.o' build a 'foo_munged.o' using $tool,
the _munged
>> version has all symbols not on the whitelist for export to the
instrumented
>> binary
>>
>> 4) put all of the _munged '.o' files into a single runtime
archive
>>
>>
>> The $tool here could be "ld -r" with a linker script, or
(likely
>> necessary on windows) a very simple, dedicated tool built around the
LLVM
>> object libraries to copy each symbol, munging the name.
>>
>>
>> Soon I will start integrating tsan into Go language. For the Go
language
>>> we need very simple object files.
>>>
>>
>> Ok... I'm not sure whether this should really constrain the way we
build
>> the core runtime system here though. If you need some logic on the tsan
>> side factored out into a separate library for use with Go, that would
seem
>> simpler than trying to make one sanitizer runtime library to support
>> frontends, middle ends, and programming languages with totally separate
>> models.
>>
>
> Yes, it will be a separate runtime library. But if tsan sources are deeply
> dependent on llvm sources, this may be significantly harder to do.
>
I think we should cross this bridge when we get there.

When we do, I suspect it will be reasonable, in a worst case situation, to
abstract the business logic into an isolated shared component. My hope is
that we won't even need to...

>
>
>  No global ctors, no thread-local storage, no weak symbols and other
>>> trickery. Basically what a portable C compiler could have produced.
>>>
>>
>> These also don't seem insurmountable, even in the existing use
cases. But
>> maybe I'm not considering the actual restrictions you are, or
I've
>> misunderstood. Here is how I'm breaking down the things you've
mentioned:
>>
>
>
>>
>> 1) It seems reasonable to avoid global constructors, and do-able in C++
>> even when using the standard library and parts of LLVM. LLVM itself
>> specifically works to avoid them.
>>
>
> Is it the case for C++ library that llvm uses?
>
LLVM is extremely resistent to growing external dependencies specifically
because it cannot control them. In particular the parts that a runtime is
likely to use are very unlikely to grow any problematic dependencies here.
Essentially, it is reasonable to assert that we have control over all of
LLVM's dependencies and can arrange for them to be very conservative here.

>
> 2) TLS doesn't seem to be required by anything I'm suggesting... is
there
>> something that worries you about this?
>>
>
> I suspect that C/C++ library can use them.
>
I would be very surprised if these parts of LLVM use them. If they did, I
think it would be reasonable to make it optional and disable it in some
circumstances.

>
> 3) I don't understand the requirement to have no weak symbols. Even a
>> portable C compiler might produce weak symbols?
>>
>
> The linker does not understand them.
>
>
>> Still, during the re-linking phase above, it should be possible to
>> resolve any weak symbols?
>>
>
> Well, most likely yes.
>
> There may be additional limitations that I don't know yet.
>
Sure, time will tell. That said, I don't think future work to support Go
should be the top priority in getting this system well integrated, and I
don't think there are any huge road blocks already clear at this stage
related to Go.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120621/9673bfe1/attachment.html>

Kostya Serebryany

2012-Jun-21 08:42 UTC

head link

[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

Can we alter the build system so that when building a run-time library it
modifies all .cpp files like this:
   namespace FOO {
   <file body>
   }
This will give us essentially the same thing, but w/o system dependent
object file hackery.
Maybe we can add a Clang flag to add such a namespace for us?
(This approach, as well as Chandler's original approach will have to deal
with malloc, memset, strlen, etc which still need to reside in the global
namespace)

--kcc


On Thu, Jun 21, 2012 at 12:10 PM, Chandler Carruth <chandlerc at
google.com>wrote:
> On Thu, Jun 21, 2012 at 1:04 AM, Dmitry Vyukov <dvyukov at
google.com> wrote:
>
>> On Thu, Jun 21, 2012 at 11:52 AM, Chandler Carruth <chandlerc at
google.com>wrote:
>>>
>>>  Hi,
>>>>
>>>> Yes, stlport was a pain to deploy and maintain + it calls
normal
>>>> operator new/delete (there is no way to put them into a
separate namespace).
>>>>
>>>
>>> Ok, but putting the raw symbols into a "namespace" with
the linker
>>> shouldn't be subject to these limitations.
>>>
>>
>> OK
>>
>>
>>>
>>>  Note that in some codebases we build asan/tsan runtimes from
source.
>>>> How the build process will look with that object file mangling?
How easy it
>>>> is to integrate it into a custom build process?
>>>>
>>>
>>> Well, I don't know yet. ;] It was an idea, I don't have an
>>> implementation at this point. That said, I had only really imagined
>>> building the runtimes from source? Maybe I don't understand
what you mean
>>> by this?
>>>
>>> The vague strategy I am imagining for the build proces is this:
>>>
>>> 1) compile runtime into a static library, just like any other
static
>>> library
>>>
>>> 2) collect all the '.o' files in the static archive, and in
any
>>> dependencies' static archive libraries
>>>
>>> 3) for each 'foo.o' build a 'foo_munged.o' using
$tool, the _munged
>>> version has all symbols not on the whitelist for export to the
instrumented
>>> binary
>>>
>>> 4) put all of the _munged '.o' files into a single runtime
archive
>>>
>>>
>>> The $tool here could be "ld -r" with a linker script, or
(likely
>>> necessary on windows) a very simple, dedicated tool built around
the LLVM
>>> object libraries to copy each symbol, munging the name.
>>>
>>>
>>> Soon I will start integrating tsan into Go language. For the Go
language
>>>> we need very simple object files.
>>>>
>>>
>>> Ok... I'm not sure whether this should really constrain the way
we build
>>> the core runtime system here though. If you need some logic on the
tsan
>>> side factored out into a separate library for use with Go, that
would seem
>>> simpler than trying to make one sanitizer runtime library to
support
>>> frontends, middle ends, and programming languages with totally
separate
>>> models.
>>>
>>
>> Yes, it will be a separate runtime library. But if tsan sources are
>> deeply dependent on llvm sources, this may be significantly harder to
do.
>>
>
> I think we should cross this bridge when we get there.
>
> When we do, I suspect it will be reasonable, in a worst case situation, to
> abstract the business logic into an isolated shared component. My hope is
> that we won't even need to...
>
>
>>
>>
>>  No global ctors, no thread-local storage, no weak symbols and other
>>>> trickery. Basically what a portable C compiler could have
produced.
>>>>
>>>
>>> These also don't seem insurmountable, even in the existing use
cases.
>>> But maybe I'm not considering the actual restrictions you are,
or I've
>>> misunderstood. Here is how I'm breaking down the things
you've mentioned:
>>>
>>
>>
>>>
>>> 1) It seems reasonable to avoid global constructors, and do-able in
C++
>>> even when using the standard library and parts of LLVM. LLVM itself
>>> specifically works to avoid them.
>>>
>>
>> Is it the case for C++ library that llvm uses?
>>
>
> LLVM is extremely resistent to growing external dependencies specifically
> because it cannot control them. In particular the parts that a runtime is
> likely to use are very unlikely to grow any problematic dependencies here.
> Essentially, it is reasonable to assert that we have control over all of
> LLVM's dependencies and can arrange for them to be very conservative
here.
>
>
>>
>> 2) TLS doesn't seem to be required by anything I'm
suggesting... is there
>>> something that worries you about this?
>>>
>>
>>  I suspect that C/C++ library can use them.
>>
>
> I would be very surprised if these parts of LLVM use them. If they did, I
> think it would be reasonable to make it optional and disable it in some
> circumstances.
>
>
>>
>> 3) I don't understand the requirement to have no weak symbols. Even
a
>>> portable C compiler might produce weak symbols?
>>>
>>
>> The linker does not understand them.
>>
>>
>>> Still, during the re-linking phase above, it should be possible to
>>> resolve any weak symbols?
>>>
>>
>> Well, most likely yes.
>>
>> There may be additional limitations that I don't know yet.
>>
>
> Sure, time will tell. That said, I don't think future work to support
Go
> should be the top priority in getting this system well integrated, and I
> don't think there are any huge road blocks already clear at this stage
> related to Go.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120621/be5491e1/attachment.html>

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Jun 2012 - [LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

Apparently Analagous Threads