Gaier, Bjoern via llvm-dev
2019-Nov-22 11:20 UTC
[llvm-dev] What is an address-significant symbol?
Hello LLVM- and Clang-List, I'm not sure if this a subject for LLVM or Clang - but there is something I don't understand. I wrote the following code in C++: searchPlanschi(&__stdio_common_vswprintf); 'searchPlanschi' is a function I provide. Clang generates the following assembly code for this: lea rcx, [rip + __stdio_common_vswprintf] call "?searchPlanschi@@YAXPEAX at Z" I was surprised to see the register rip there, as far as I know this is the Instruction register, right? Why do I need the rip register to get the address of the function? I searched the assembly file for '__stdio_common_vswprintf' to get some hints about this. The only thing I found was: .addrsig .addrsig_sym __stdio_common_vswprintf So I googled ".addrsig" and found the following text: "This section is used to mark symbols as address-significant, i.e. the address of the symbol is used in a comparison or leaks outside the translation unit. It has the same meaning as the absence of the LLVM attributes unnamed_addr and local_unnamed_addr. Any sections referred to by symbols that are not marked as address-significant in any object file may be safely merged by a linker without breaking the address uniqueness guarantee provided by the C and C++ language standards. The contents of the section are a sequence of ULEB128-encoded integers referring to the symbol table indexes of the address-significant symbols." But sadly... this is way over my head. What does that actually mean? Does that explain the code construct with the rip register? Is that a form of optimization? Thank you in advance for any help! Kind greetings Björn Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr. DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus Bode, Heiko Lampert, Takashi Nagano, Takeshi Fukushima. Junichi Tajika -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191122/47e58af4/attachment.html>
Peter Smith via llvm-dev
2019-Nov-22 11:34 UTC
[llvm-dev] What is an address-significant symbol?
On Fri, 22 Nov 2019 at 11:20, Gaier, Bjoern via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Hello LLVM- and Clang-List, > > > > I’m not sure if this a subject for LLVM or Clang – but there is something I don’t understand. I wrote the following code in C++: > > > > searchPlanschi(&__stdio_common_vswprintf); > > > > ‚searchPlanschi‘ is a function I provide. Clang generates the following assembly code for this: > > > > lea rcx, [rip + __stdio_common_vswprintf] > > call "?searchPlanschi@@YAXPEAX at Z" > > > > I was surprised to see the register rip there, as far as I know this is the Instruction register, right? Why do I need the rip register to get the address of the function? I searched the assembly file for ‘__stdio_common_vswprintf’ to get some hints about this. The only thing I found was: > > > > .addrsig > > .addrsig_sym __stdio_common_vswprintf > > > > So I googled “.addrsig” and found the following text: > > “This section is used to mark symbols as address-significant, i.e. the address of the symbol is used in a comparison or leaks outside the translation unit. It has the same meaning as the absence of the LLVM attributes unnamed_addr and local_unnamed_addr. > Any sections referred to by symbols that are not marked as address-significant in any object file may be safely merged by a linker without breaking the address uniqueness guarantee provided by the C and C++ language standards. > The contents of the section are a sequence of ULEB128-encoded integers referring to the symbol table indexes of the address-significant symbols.” > > But sadly… this is way over my head. What does that actually mean? Does that explain the code construct with the rip register? Is that a form of optimization? >There is a linker optimization called identical code folding (ICF). The details of the initial implementation in gold is described in https://ai.google/research/pubs/pub36912 . ICF can cause problems when the program depends on functions having a unique address, both gold and LLD have an --icf=safe mode that limits the scope of the optimization to avoid folding sections that are "address-significant". The implementation in gold uses linker heuristics such as relocation type to determine address significance. The implementation in LLD uses information generated by the compiler, which is placed in .addrsig. I can't answer the question about code-generation off the top of my head, my understanding is that .addrsig is primarily used for implementing --icf=safe in linkers, I don't think it has an affect on code-generation. Hope this is of some help Peter> > > Thank you in advance for any help! > > > > Kind greetings > > Björn > > Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr. DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus Bode, Heiko Lampert, Takashi Nagano, Takeshi Fukushima. Junichi Tajika > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev