thr3ads.net - llvm dev - [LLVMdev] [cfe-dev] weak_odr constant versus weak

If this information is useful, please help other people find it:
Share via:

John McCall

2011-Nov-09 21:01 UTC

[LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global

On Nov 9, 2011, at 11:34 AM, Rafael Espíndola wrote:>>> 1) [Requires ABI change] We emit dynamic initialization code for
weak globals
>>> (even in TUs where static initialization is required to be
performed), unless
>>> we can prove that every translation unit will use static
initialization. We
>>> emit the global plus its guard variable as a single object so the
linker can't
>>> separate them (this is the ABI change). If we can perform static
>>> initialization in any translation unit, then that TU emits a
constant weak
>>> object (in .rodata if we want) containing the folded value and with
the guard
>>> variable set to 1 (per Eli's proposal).
>> 
>> The ABI actually suggests doing exactly this, except using multiple
>> symbols linked with a COMDAT group.  Unfortunately, LLVM doesn't
>> support that COMDAT feature yet, but it could certainly be taught to.
>> This guarantees correctness as long as every translation unit emits the
>> code the same way, which is exactly what we'd get from an ABI
change,
>> except without actually breaking ABI conformance.
> 
> I like this. We already have basic support for COMDATs, but yes, it
> needs to be extended. So far we just create trivial COMDATs in codegen
> for weak objects.
> 
> We also need the IL linker itself needs to work on COMDATs too
> otherwise this bug would still exist when doing LTO.
> 
> In the "extended" example we would output
> 
> @_ZN1UI1SE1kE = weak_odr constant i32 42, align 4, comdat _ZN1UI1SE1kE
> 
> for TU1 and
> 
> @_ZN1UI1SE1kE = weak_odr global i32 0, align 4, comdat _ZN1UI1SE1kE
> ...
> define internal void @_GLOBAL__I_a() nounwind section
".text.startup"
> comdat _ZN1UI1SE1kE  {
> ....
> }
> 
> for TU2.
Unfortunately, making the comdat be for the entire function is not
conformant with the ABI, which says that you either put the variable
and its guard in different comdats or you put them in a single comdat
named for the variable.  It also doesn't actually help unless we disable
inlining.

So we still need to emit a guard variable (initialized to 1) into the
comdat for constant-initialized static locals, unless we can somehow
prove to our satisfaction that all translation units don't need this.
And we'd need LLVM to not throw away unused weak_odr globals
that are in a comdat with a used symbol.

John.

Rafael Espíndola

2011-Nov-21 17:05 UTC

head link

[LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global

> Unfortunately, making the comdat be for the entire function is not
> conformant with the ABI, which says that you either put the variable
> and its guard in different comdats or you put them in a single comdat
> named for the variable.  It also doesn't actually help unless we
disable
> inlining.
I see. Using two comdats would still cause the same problem for us,
no? So the solution in the end is to emit:

TU1:
--------------------------------
@_ZN1UI1SE1kE = weak_odr constant i32 42, align 4, comdat _ZN1UI1SE1kE
@_ZGVN1UI1SE1kE = weak_odr global i64 1, comdat _ZN1UI1SE1kE
--------------------------------

TU2:
-----------------------------------
@_ZN1UI1SE1kE = weak_odr global i32 0, align 4, comdat _ZN1UI1SE1kE
@_ZGVN1UI1SE1kE = weak_odr global i64 0, comdat _ZN1UI1SE1kE
...
@llvm.global_ctors = ....
define internal void @_GLOBAL__I_a() nounwind section ".text.startup"
....
-----------------------------------
>
> John.
Thanks,
Rafael

John McCall

2011-Nov-27 13:00 UTC

head link

[LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global

On Nov 21, 2011, at 9:05 AM, Rafael Espíndola wrote:>> Unfortunately, making the comdat be for the entire function is not
>> conformant with the ABI, which says that you either put the variable
>> and its guard in different comdats or you put them in a single comdat
>> named for the variable.  It also doesn't actually help unless we
disable
>> inlining.
> 
> I see. Using two comdats would still cause the same problem for us,
> no? So the solution in the end is to emit:
> 
> TU1:
> --------------------------------
> @_ZN1UI1SE1kE = weak_odr constant i32 42, align 4, comdat _ZN1UI1SE1kE
> @_ZGVN1UI1SE1kE = weak_odr global i64 1, comdat _ZN1UI1SE1kE
> --------------------------------
> 
> TU2:
> -----------------------------------
> @_ZN1UI1SE1kE = weak_odr global i32 0, align 4, comdat _ZN1UI1SE1kE
> @_ZGVN1UI1SE1kE = weak_odr global i64 0, comdat _ZN1UI1SE1kE
> ...
> @llvm.global_ctors = ....
> define internal void @_GLOBAL__I_a() nounwind section
".text.startup" ....
> -----------------------------------
Exactly.

To sketch out the proposed IR extension a bit more:
1.  We add 'comdat "name"' to the global variable and function
productions.  I have the COMDAT name in quotes only because
there's no other precedent for a bare identifier in the IR grammar.
I don't think we want to allow this on aliases;  I think I could
probably invent reasonable semantics, but it's really not worth
worrying about without cause.
2.  A symbol with a COMDAT name must be a definition.
3.  All symbols sharing the same COMDAT name are required to
share the same linkage and visibility.  Conveniently, this lets us
talk about the COMDAT's linkage / etc.
4.  A symbol with a COMDAT name is considered to be referenced
if any symbol with the same COMDAT name is referenced
(ignoring this rule).
5.  It's undefined behavior if two modules are linked and they
export different sets of symbols with a given COMDAT name.
6.  Otherwise, if two modules are linked and they both export
symbols with a given COMDAT name, all the symbols must be
taken from the same module.

I think that covers it.

The implementation can be optimized around the following
properties of the typical use patterns in the C++ ABI.
a) Most symbols do not need COMDAT names.  Or they
don't need "non-trivial" COMDAT names, i.e. COMDAT names
containing other symbols or not matching their own name.
b) When symbols do need COMDAT names, we'll almost
always know exactly how many symbols are going in the group.
That number will usually be two.
c) It's frequently going to be convenient to be able to add
a COMDAT name to a GV after the GV was allocated.
d) Otherwise, symbols will probably never need to change
or remove their COMDAT name, and we probably don't even
need to add API for it.
e) Many clients are going to want to be able to efficiently test
whether a symbol is in a COMDAT group.
f) Those clients will also generally care about efficiently
iterating over all the symbols in that group.

I'd suggest having a bit on GlobalValue and a side-table
on the Module mapping from GVs to COMDAT objects,
where COMDAT objects are allocated as part of the
StringMapEntry on the Module and don't really contain
any data except their name and a list of GV*s, heavily
optimized for the two-element case.

John.

Rafael Espíndola

2014-Sep-05 02:48 UTC

head link

[LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global

> I see. Using two comdats would still cause the same problem for us,
> no? So the solution in the end is to emit:
>
> TU1:
> --------------------------------
> @_ZN1UI1SE1kE = weak_odr constant i32 42, align 4, comdat _ZN1UI1SE1kE
> @_ZGVN1UI1SE1kE = weak_odr global i64 1, comdat _ZN1UI1SE1kE
> --------------------------------
>
> TU2:
> -----------------------------------
> @_ZN1UI1SE1kE = weak_odr global i32 0, align 4, comdat _ZN1UI1SE1kE
> @_ZGVN1UI1SE1kE = weak_odr global i64 0, comdat _ZN1UI1SE1kE
> ...
> @llvm.global_ctors = ....
> define internal void @_GLOBAL__I_a() nounwind section
".text.startup" ....
> -----------------------------------
Restarting a really old thread now that we have comdat support in the IR.

While the above idea would work, there are two problems with it

* Existing compilers (clang and gcc) produce a comdat with just the
constant in TU1. Linking one of those with TU2 can still cause a crash
since the guard variable would be undefined.
* It requires always outputting the guard variable.

Since neither gcc nor clang implement this part of the ABI, I was
thinking if there was a better way to do it. One interesting option is
putting the .init_array of TU2 in the comdat. That is exactly what we
do for windows. In fact, just passing -Xclang -mllvm -Xclang
-enable-structor-comdat will avoids the crash in the above example.

Given that this has been broken since forever, waiting a bit more for
https://sourceware.org/bugzilla/show_bug.cgi?id=17350 to be fixed and
then flipping -enable-structor-comdat might be the best way to fix
this. With that done we can add the function and the guard variable in
TU2 to the comdat to remove a bit of bloat (see attached patch).

What is the discussion list for the itanium abi? Should I propose a patch?

Cheers,
Rafael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: t.patch
Type: text/x-patch
Size: 1372 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140904/f457599a/attachment.bin>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Sep 2014 - [LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global

[LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global

[LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global

[LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global

[LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global

Reasonably Related Threads