John McCall
2011-Nov-08  08:12 UTC
[LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global
On Nov 7, 2011, at 9:47 AM, Richard Smith wrote:>> In cases where the C++ standard requires static initialization, >> introducing a write violates the guarantees of the C++ standard for static >> initialization. Therefore, I'm not sure the whole "make the constant >> writable" approach is actually viable. > > There is another problem which afflicts all solutions presented thus far, for > the other kind of weak global values in C++. For a static variable defined > within an inline function, we can select the variable plus guard from a TU > with dynamic initialization, and select the function definition from a TU with > static initialization, with the result that the object doesn't get initialized > at all.> I have two new proposals for fixing this, which I believe actually work. > > 1) [Requires ABI change] We emit dynamic initialization code for weak globals > (even in TUs where static initialization is required to be performed), unless > we can prove that every translation unit will use static initialization. We > emit the global plus its guard variable as a single object so the linker can't > separate them (this is the ABI change). If we can perform static > initialization in any translation unit, then that TU emits a constant weak > object (in .rodata if we want) containing the folded value and with the guard > variable set to 1 (per Eli's proposal).The ABI actually suggests doing exactly this, except using multiple symbols linked with a COMDAT group. Unfortunately, LLVM doesn't support that COMDAT feature yet, but it could certainly be taught to. This guarantees correctness as long as every translation unit emits the code the same way, which is exactly what we'd get from an ABI change, except without actually breaking ABI conformance. Mach-O doesn't support anything like COMDAT, but the Darwin linker apparently gives significantly stronger guarantees about which object files it will take symbols from, as long as all objects have all of the symbols. John.
Rafael Espíndola
2011-Nov-09  19:34 UTC
[LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global
>> 1) [Requires ABI change] We emit dynamic initialization code for weak globals >> (even in TUs where static initialization is required to be performed), unless >> we can prove that every translation unit will use static initialization. We >> emit the global plus its guard variable as a single object so the linker can't >> separate them (this is the ABI change). If we can perform static >> initialization in any translation unit, then that TU emits a constant weak >> object (in .rodata if we want) containing the folded value and with the guard >> variable set to 1 (per Eli's proposal). > > The ABI actually suggests doing exactly this, except using multiple > symbols linked with a COMDAT group. Unfortunately, LLVM doesn't > support that COMDAT feature yet, but it could certainly be taught to. > This guarantees correctness as long as every translation unit emits the > code the same way, which is exactly what we'd get from an ABI change, > except without actually breaking ABI conformance.I like this. We already have basic support for COMDATs, but yes, it needs to be extended. So far we just create trivial COMDATs in codegen for weak objects. We also need the IL linker itself needs to work on COMDATs too otherwise this bug would still exist when doing LTO. In the "extended" example we would output @_ZN1UI1SE1kE = weak_odr constant i32 42, align 4, comdat _ZN1UI1SE1kE for TU1 and @_ZN1UI1SE1kE = weak_odr global i32 0, align 4, comdat _ZN1UI1SE1kE ... define internal void @_GLOBAL__I_a() nounwind section ".text.startup" comdat _ZN1UI1SE1kE { .... } for TU2.> Mach-O doesn't support anything like COMDAT, but the Darwin linker > apparently gives significantly stronger guarantees about which object > files it will take symbols from, as long as all objects have all of the > symbols. > > John.Cheers, Rafael
John McCall
2011-Nov-09  21:01 UTC
[LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global
On Nov 9, 2011, at 11:34 AM, Rafael Espíndola wrote:>>> 1) [Requires ABI change] We emit dynamic initialization code for weak globals >>> (even in TUs where static initialization is required to be performed), unless >>> we can prove that every translation unit will use static initialization. We >>> emit the global plus its guard variable as a single object so the linker can't >>> separate them (this is the ABI change). If we can perform static >>> initialization in any translation unit, then that TU emits a constant weak >>> object (in .rodata if we want) containing the folded value and with the guard >>> variable set to 1 (per Eli's proposal). >> >> The ABI actually suggests doing exactly this, except using multiple >> symbols linked with a COMDAT group. Unfortunately, LLVM doesn't >> support that COMDAT feature yet, but it could certainly be taught to. >> This guarantees correctness as long as every translation unit emits the >> code the same way, which is exactly what we'd get from an ABI change, >> except without actually breaking ABI conformance. > > I like this. We already have basic support for COMDATs, but yes, it > needs to be extended. So far we just create trivial COMDATs in codegen > for weak objects. > > We also need the IL linker itself needs to work on COMDATs too > otherwise this bug would still exist when doing LTO. > > In the "extended" example we would output > > @_ZN1UI1SE1kE = weak_odr constant i32 42, align 4, comdat _ZN1UI1SE1kE > > for TU1 and > > @_ZN1UI1SE1kE = weak_odr global i32 0, align 4, comdat _ZN1UI1SE1kE > ... > define internal void @_GLOBAL__I_a() nounwind section ".text.startup" > comdat _ZN1UI1SE1kE { > .... > } > > for TU2.Unfortunately, making the comdat be for the entire function is not conformant with the ABI, which says that you either put the variable and its guard in different comdats or you put them in a single comdat named for the variable. It also doesn't actually help unless we disable inlining. So we still need to emit a guard variable (initialized to 1) into the comdat for constant-initialized static locals, unless we can somehow prove to our satisfaction that all translation units don't need this. And we'd need LLVM to not throw away unused weak_odr globals that are in a comdat with a used symbol. John.
Reasonably Related Threads
- [LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global
- [LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global
- [LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global
- [LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global
- [LLVMdev] weak_odr constant versus weak_odr global