Christopher Friedt via llvm-dev
2021-Nov-23 21:09 UTC
[llvm-dev] -fhash-long-section-names=N, -fhashed-section-names=map.csv
Hi list, I'm a bit new to hacking LLVM / Clang, and I wanted to add a new command line option "-fhash-long-section-names=N". The change will help to overcome the 16-character limit in section names in macOS[1] which is currently a bit of a showstopper for a certain feature in one specific project. The option itself does not necessarily need to be tied to macOS. ELF does not impose such a limitation on section name size. The default would be to preserve existing behaviour and not to hash section names but instead continue to return errors [2]. The minimum size for N is chosen to be 16. The maximum value is arbitrary. A value of 0 indicates "no hashing". The hashing process will consist of: * SHA256 * Base64 * Truncate to N This is already a somewhat common approach to solving this problem on macOS. The basic idea is this (N = 16): // this is a short section, so no change __attribute__((section("foo"))) => "foo" // this "long" section has been hashed __attribute__((section("ThisSectionNameIsTooLong"))) => "ip9RNVxH27rCS+Ix" In the unlikely event of a section name collision, it would be good to throw an error (a good test point). Also, since hashing is not trivially reversible, I would like to add another option -fhashed-section-names=map.csv, which would forward hashed section names in a format easy to read by subsequent tooling. For macOS, specifically, patterns like the following would also need transformation: section("__DATA,phoo") extern struct foo foo_start[] __asm("section$start$__DATA$phoo"); extern struct foo foo_end[] __asm("section$end$__DATA$phoo"); This is kind of a macOS parallel of linker-generated start and stop symbols in ELF world. The clang frontend changes were fairly straightforward and it was quite simple to create the transform itself in python and llvm. I'm a little unsure of how to proceed from here. Likely there will be some aspect of AST and some aspect of Sema involved. I have gone over the documentation and examples [3][4], and I'm still not entirely sure. I have done some brute-forcing and have played around with MCSectionMachO.cpp and MCSymbolMachO.cpp, but I think that is definitely the wrong approach. Finally, my questions: 1. First, is this a feature that upstream would accept? 2. Should I use the AST / Replacement approach mentioned in [5]? 3. Is there another, preferable form of "backend magic" that should be used? 4. Are there any existing tests that would be good examples to borrow from? Would you be able to point me in the right direction? Thanks, and hope you are well. C [1] See "section[16]" here: https://opensource.apple.com/source/cctools/cctools-921/include/mach-o/loader.h.auto.html [2] error: argument to 'section' attribute is not valid for this target: mach-o section specifier requires a section whose length is between 1 and 16 characters [3] https://clang.llvm.org/hacking.html [4] https://clang.llvm.org/docs/InternalsManual.html#adding-new-command-line-option [5] https://youtu.be/VqCkCDFLSsc?t=2370
Christopher Friedt via llvm-dev
2021-Nov-25 14:15 UTC
[llvm-dev] -fhash-long-section-names=N, -fhashed-section-names=map.csv
Made some progress and would still very much like to get some feedback from LLVM devs. I ended up implementing an AST matcher via the tutorial[1] and have a PoC here[2] which works[3]. I do need to move the matcher out of the custom tool that I created and into Sema still. Also added some regression tests (although, I think I might need to move that logic to unit tests instead). One thing I've come to realize a bit more though is that I might need to target LangOptions rather than CodeGenOptions. Is anyone able to confirm that is the correct location for this option? Working with the AST and SemaCXX has surprisingly little to do with specific machine code generation. I guess that's just how LLVM is architected. There is one exception in this case though; I still need to somehow determine if the target object file is MachO in order to decide whether to call MCSectionMachO::ParseSectionSpecifier(). Is there an easy way to determine the target object file format via the ASTContext? [1] https://clang.llvm.org/docs/LibASTMatchersTutorial.html [2] https://github.com/llvm/llvm-project/compare/main...cfriedt:fhash-long-section-names [3] % long-section-converter clang/test/SemaCXX/attr-section-hashed-macos.cpp \ -- -fhash-long-section-names=16 VarDecl 0x1470cdec0 <.../attr-section-hashed-macos.cpp:1:1, line:2:5> col:5 foo 'int' `-SectionAttr 0x1470cdf28 <line:1:16, col:59> section "__RODATA,ip9RNVxH27rCS+Ix" VarDecl 0x1470ce390 parent 0x14702fc08 <.../attr-section-hashed-macos.cpp:8:3, col:24> col:14 used start_foo 'int[]' extern `-AsmLabelAttr 0x1470ce408 <col:32> "section$start$__RODATA$ip9RNVxH27rCS+Ix" IsLiteralLabel VarDecl 0x147808e00 parent 0x14702fc08 <.../attr-section-hashed-macos.cpp:9:3, col:22> col:14 used end_foo 'int[]' extern `-AsmLabelAttr 0x147808e78 <col:30> "section$end$__RODATA$ip9RNVxH27rCS+Ix" IsLiteralLabel