John McCall
2011-Nov-09 21:01 UTC
[LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global
On Nov 9, 2011, at 11:34 AM, Rafael Espíndola wrote:>>> 1) [Requires ABI change] We emit dynamic initialization code for weak globals >>> (even in TUs where static initialization is required to be performed), unless >>> we can prove that every translation unit will use static initialization. We >>> emit the global plus its guard variable as a single object so the linker can't >>> separate them (this is the ABI change). If we can perform static >>> initialization in any translation unit, then that TU emits a constant weak >>> object (in .rodata if we want) containing the folded value and with the guard >>> variable set to 1 (per Eli's proposal). >> >> The ABI actually suggests doing exactly this, except using multiple >> symbols linked with a COMDAT group. Unfortunately, LLVM doesn't >> support that COMDAT feature yet, but it could certainly be taught to. >> This guarantees correctness as long as every translation unit emits the >> code the same way, which is exactly what we'd get from an ABI change, >> except without actually breaking ABI conformance. > > I like this. We already have basic support for COMDATs, but yes, it > needs to be extended. So far we just create trivial COMDATs in codegen > for weak objects. > > We also need the IL linker itself needs to work on COMDATs too > otherwise this bug would still exist when doing LTO. > > In the "extended" example we would output > > @_ZN1UI1SE1kE = weak_odr constant i32 42, align 4, comdat _ZN1UI1SE1kE > > for TU1 and > > @_ZN1UI1SE1kE = weak_odr global i32 0, align 4, comdat _ZN1UI1SE1kE > ... > define internal void @_GLOBAL__I_a() nounwind section ".text.startup" > comdat _ZN1UI1SE1kE { > .... > } > > for TU2.Unfortunately, making the comdat be for the entire function is not conformant with the ABI, which says that you either put the variable and its guard in different comdats or you put them in a single comdat named for the variable. It also doesn't actually help unless we disable inlining. So we still need to emit a guard variable (initialized to 1) into the comdat for constant-initialized static locals, unless we can somehow prove to our satisfaction that all translation units don't need this. And we'd need LLVM to not throw away unused weak_odr globals that are in a comdat with a used symbol. John.
Rafael Espíndola
2011-Nov-21 17:05 UTC
[LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global
> Unfortunately, making the comdat be for the entire function is not > conformant with the ABI, which says that you either put the variable > and its guard in different comdats or you put them in a single comdat > named for the variable. It also doesn't actually help unless we disable > inlining.I see. Using two comdats would still cause the same problem for us, no? So the solution in the end is to emit: TU1: -------------------------------- @_ZN1UI1SE1kE = weak_odr constant i32 42, align 4, comdat _ZN1UI1SE1kE @_ZGVN1UI1SE1kE = weak_odr global i64 1, comdat _ZN1UI1SE1kE -------------------------------- TU2: ----------------------------------- @_ZN1UI1SE1kE = weak_odr global i32 0, align 4, comdat _ZN1UI1SE1kE @_ZGVN1UI1SE1kE = weak_odr global i64 0, comdat _ZN1UI1SE1kE ... @llvm.global_ctors = .... define internal void @_GLOBAL__I_a() nounwind section ".text.startup" .... -----------------------------------> > John.Thanks, Rafael
John McCall
2011-Nov-27 13:00 UTC
[LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global
On Nov 21, 2011, at 9:05 AM, Rafael Espíndola wrote:>> Unfortunately, making the comdat be for the entire function is not >> conformant with the ABI, which says that you either put the variable >> and its guard in different comdats or you put them in a single comdat >> named for the variable. It also doesn't actually help unless we disable >> inlining. > > I see. Using two comdats would still cause the same problem for us, > no? So the solution in the end is to emit: > > TU1: > -------------------------------- > @_ZN1UI1SE1kE = weak_odr constant i32 42, align 4, comdat _ZN1UI1SE1kE > @_ZGVN1UI1SE1kE = weak_odr global i64 1, comdat _ZN1UI1SE1kE > -------------------------------- > > TU2: > ----------------------------------- > @_ZN1UI1SE1kE = weak_odr global i32 0, align 4, comdat _ZN1UI1SE1kE > @_ZGVN1UI1SE1kE = weak_odr global i64 0, comdat _ZN1UI1SE1kE > ... > @llvm.global_ctors = .... > define internal void @_GLOBAL__I_a() nounwind section ".text.startup" .... > -----------------------------------Exactly. To sketch out the proposed IR extension a bit more: 1. We add 'comdat "name"' to the global variable and function productions. I have the COMDAT name in quotes only because there's no other precedent for a bare identifier in the IR grammar. I don't think we want to allow this on aliases; I think I could probably invent reasonable semantics, but it's really not worth worrying about without cause. 2. A symbol with a COMDAT name must be a definition. 3. All symbols sharing the same COMDAT name are required to share the same linkage and visibility. Conveniently, this lets us talk about the COMDAT's linkage / etc. 4. A symbol with a COMDAT name is considered to be referenced if any symbol with the same COMDAT name is referenced (ignoring this rule). 5. It's undefined behavior if two modules are linked and they export different sets of symbols with a given COMDAT name. 6. Otherwise, if two modules are linked and they both export symbols with a given COMDAT name, all the symbols must be taken from the same module. I think that covers it. The implementation can be optimized around the following properties of the typical use patterns in the C++ ABI. a) Most symbols do not need COMDAT names. Or they don't need "non-trivial" COMDAT names, i.e. COMDAT names containing other symbols or not matching their own name. b) When symbols do need COMDAT names, we'll almost always know exactly how many symbols are going in the group. That number will usually be two. c) It's frequently going to be convenient to be able to add a COMDAT name to a GV after the GV was allocated. d) Otherwise, symbols will probably never need to change or remove their COMDAT name, and we probably don't even need to add API for it. e) Many clients are going to want to be able to efficiently test whether a symbol is in a COMDAT group. f) Those clients will also generally care about efficiently iterating over all the symbols in that group. I'd suggest having a bit on GlobalValue and a side-table on the Module mapping from GVs to COMDAT objects, where COMDAT objects are allocated as part of the StringMapEntry on the Module and don't really contain any data except their name and a list of GV*s, heavily optimized for the two-element case. John.
Rafael Espíndola
2014-Sep-05 02:48 UTC
[LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global
> I see. Using two comdats would still cause the same problem for us, > no? So the solution in the end is to emit: > > TU1: > -------------------------------- > @_ZN1UI1SE1kE = weak_odr constant i32 42, align 4, comdat _ZN1UI1SE1kE > @_ZGVN1UI1SE1kE = weak_odr global i64 1, comdat _ZN1UI1SE1kE > -------------------------------- > > TU2: > ----------------------------------- > @_ZN1UI1SE1kE = weak_odr global i32 0, align 4, comdat _ZN1UI1SE1kE > @_ZGVN1UI1SE1kE = weak_odr global i64 0, comdat _ZN1UI1SE1kE > ... > @llvm.global_ctors = .... > define internal void @_GLOBAL__I_a() nounwind section ".text.startup" .... > -----------------------------------Restarting a really old thread now that we have comdat support in the IR. While the above idea would work, there are two problems with it * Existing compilers (clang and gcc) produce a comdat with just the constant in TU1. Linking one of those with TU2 can still cause a crash since the guard variable would be undefined. * It requires always outputting the guard variable. Since neither gcc nor clang implement this part of the ABI, I was thinking if there was a better way to do it. One interesting option is putting the .init_array of TU2 in the comdat. That is exactly what we do for windows. In fact, just passing -Xclang -mllvm -Xclang -enable-structor-comdat will avoids the crash in the above example. Given that this has been broken since forever, waiting a bit more for https://sourceware.org/bugzilla/show_bug.cgi?id=17350 to be fixed and then flipping -enable-structor-comdat might be the best way to fix this. With that done we can add the function and the guard variable in TU2 to the comdat to remove a bit of bloat (see attached patch). What is the discussion list for the itanium abi? Should I propose a patch? Cheers, Rafael -------------- next part -------------- A non-text attachment was scrubbed... Name: t.patch Type: text/x-patch Size: 1372 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140904/f457599a/attachment.bin>
Apparently Analagous Threads
- [LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global
- [LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global
- [LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global
- [LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global
- [LLVMdev] [cfe-dev] weak_odr constant versus weak_odr global