Tobias Edler von Koch via llvm-dev
2018-May-11 18:12 UTC
[llvm-dev] [RFC] (Thin)LTO with Linker Scripts
RFC: (Thin)LTO with Linker Scripts At the last US LLVM Developers' Meeting, we presented [1] a proposal for linker script support in (Thin)LTO. In this RFC, I would like to describe the proposal in more detail and invite the community's feedback, so we can build consensus on the upstream implementation. The end goal of this effort is to extend the benefits of (Thin)LTO, including significant code size and performance improvements, to the many embedded and system-level software projects that rely on linker scripts to control (ELF) image layout. In particular, this proposal seeks to: 1. Ensure that ELF sections emitted by LTO match the same path-based linker script rules that they would have matched if the project was compiled without LTO. 2. Make module optimization passes aware of the final output sections of symbols in order to limit inter-section (e.g. inlining) or enable intra-section (e.g. constant merging) optimizations where needed. 3. Implement these features without changing the behavior of the compiler when linker script information is *not* available, particularly on source files that contain symbols carrying explicit section attributes. This proposal only addresses changes to Clang/LLVM. The linker also needs to be enhanced to support LTO with linker scripts; so far, this has only been done for qcld, the linker shipped with the Hexagon SDK. The proposed implementation involves small changes throughout the compilation flow, so the rest of this document follows the progression from source file to linking. Individual changes, which could map to patches, are marked using "(X.Y)" to help with referencing. Step 1: Compilation of individual files ====================================== In order to determine the output section for symbols, the linker needs to be able to match symbols in bitcode files to linker script rules; however, bitcode does not naturally contain section names for symbols (except for those with explicit section attributes). (1.1) For this reason, we run a pass immediately prior to bitcode emission that initializes a minimal backend and uses the backend to obtain a section name for each GlobalObject. The section name is then stored in the GlobalObject's explicit section attribute. (1.2) ThinLTO currently marks all symbols with explicit sections as "not eligible for import", which is overly conservative when LTO is aware of the linker script. Since all GlobalObjects now have an explicit section, this behavior needs to be disabled. (1.3) Items 1.1 and 1.2 assume that the linker provides linker script information to LTO. They should therefore not be the default behavior, but be guarded under a clang flag, e.g. "-flto-ls", which is passed in addition to -flto[=thin], when the user knows that their linker has this capability. Note: In the presentation, we proposed a dedicated attribute "linker_input_section" instead of using the explicit section attribute. After discussions with Peter, I believe we don't need to introduce an additional attribute. Step 2: Symbol resolution in the linker ====================================== The linker loads all bitcode input files and performs symbol resolution. The IRSymtab already exposes the explicit section attributes for symbols; in our case, all symbols now carry this information. (2.1) In addition to communicating the exisiting SymbolResolution flags (VisibileToRegularObj etc.), the linker also matches the linker script, determines an output section for each symbol, and passes it to LTO as part of the SymbolResolution data structure. (2.2) The linker needs to determine an output section for *all* symbols, including those with internal linkage. lto::InputFile::symbols() currently only exposes external symbols, so an additional argument is added to include locals. (2.3) The linker provides a unique Module Id for each input file to LTO. This is necessary for the linker to later identify the file origin of each symbol emitted from LTO. Step 3: ThinLTO Import, (Thin)LTO Optimization ============================================= (3.1) The information provided by the linker is stored in the IR prior to merging (Regular LTO) and before and after importing (ThinLTO). The goal is for every GlobalObject to have two additional attributes by the time optimizations are run: - "linker_output_section": This is used for limiting/enabling optimizations based on knowledge of the eventual section placement of a symbol. - "module_id": This keeps track of the file origin of each symbol and will be used during CodeGen to 'tag' symbols with their origin so the linker can (re-)match the correct linker script rules after LTO. (3.2) To reduce 'futile' importing in ThinLTO, output section information can be taken into account when determining the import/export sets. For instance, functions whose callers are in different output sections will not be inlined (see 3.3 below), so it does not make sense to import them. (3.3) Some optimization passes need to be modified to utilize linker script information - in some cases to enable, and some cases to disable optimizations. Passes that currently behave conservatively for GlobalObjects with explicit section attributes can be enhanced to take output section information into account. For instance, ConstantMergePass should merge global constants that are located in the same output section. On the other hand, we need to prevent inlining across output section boundaries, to name one example. Step 4: Code Generation / ELF emission ===================================== The output file names produced by (Thin)LTO necessarily differ from the linker's input files. The linker thus wouldn't be able to match these to path-based linker script rules. (4.1) When linker script information is available, we propose to augment the ELF section names that symbols are emitted to with the module ID. For example, define void @myFun() section ".text.myFun" "linker_output_sectio"=".text" "module_id"="ABC123(f.o)" { } would be emitted to .section ".text.myFun^^ABC123(f.o)","ax", at progbits This enables the linker to then strip the part after the delimiter (^^) and override the origin file for the symbol with the original input file for the purpose of linker script matching. Item 4.1 doesn't necessarily need to be implemented in target-independent code. The backends can override target lowering functions responsible for ELF section selection, so each backend could have its own convention of how the module ID is encoded. Conclusion ========= This document outlined a proposal for the implementation of LTO with linker scripts. A variant of the described approach has been in production use for some time. It successfully extended the benefits of LTO to a number of embedded applications that would have otherwise suffered correctness issues if built with LTO. Before we start implementing this upstream, I would appreciate your comments and ideas. I am particularly interested in any linker script use cases that are prevalent in projects you care about but that do not readily fit the above model. References: [1] Talk presented at 2017 US LLVM Developers' Meeting, San Jose, CA. Slides: http://llvm.org/devmtg/2017-10/slides/LTOLinkerScriptsEdlerVonKoch.pdf Video: https://youtu.be/hhaPAKUt35E -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
Peter Smith via llvm-dev
2018-May-14 13:14 UTC
[llvm-dev] [RFC] (Thin)LTO with Linker Scripts
Hello Tobias, Thanks very much for the RFC, I think that this will be useful in persuading embedded developers to use LTO in their projects. I think the overall approach for communication between the linker and code generator sounds reasonable. I've got some questions/comments based on some experience with Arm's proprietary linker, which supports LTO but has a different linker script mechanism than GNU ld compatible linkers. I'm not hugely familiar with the details of LTO at the moment so apologies in advance for any misunderstandings on my part. My understanding from the RFC is: - All global objects in the bitcode file will be assigned a section name. - A linker will communicate the output section of all global objects. - Certain transformations won't be performed if the output section is different. The common use cases that I can see that might not fit perfectly into that model: - Code that is in different OutputSections but it will be logically correct and in many cases desirable to perform transformations on as if they were in the same output section. - Output section placement rules that are not based on names, for example Arm's linker can assign sections to an output section until the output section size limit is reached, then a different output section is used. I admit that this may be more of a problem for linkers that have a different linker script model. I think both cases are illustrative of a use case where the precise output section does not matter, but there is a vaguer goal of placing a subset of the input sections in a subset of the output sections.>From what I can tell there isn't a way for the code generator to tellthe difference between code that is placed in different output sections and it is not correct or beneficial to optimize and code that is placed in different output sections and it is correct and beneficial to optimize together. I think that this kind of use case could be supported by doing something like: - Linker informs code generator the output sections that must not use any information from another module and may not contribute any information to another module. For example an output section that is representing an overlay. - Linker can omit the output section information for sections that the user doesn't care where they go, and let the linker decide based on some size constraint later. I think the latter case could be made to work by assigning a "don't care" module id, assuming -ffunction-sections. The former case is a bit more difficult as we would still want some way to distinguish the module id for placement. I think that these are mostly details rather than fundamental problems though. Peter On 11 May 2018 at 19:12, Tobias Edler von Koch via llvm-dev <llvm-dev at lists.llvm.org> wrote:> RFC: (Thin)LTO with Linker Scripts > > At the last US LLVM Developers' Meeting, we presented [1] a proposal for > linker > script support in (Thin)LTO. In this RFC, I would like to describe the > proposal in more detail and invite the community's feedback, so we can build > consensus on the upstream implementation. > > The end goal of this effort is to extend the benefits of (Thin)LTO, > including > significant code size and performance improvements, to the many embedded and > system-level software projects that rely on linker scripts to control (ELF) > image layout. > > In particular, this proposal seeks to: > > 1. Ensure that ELF sections emitted by LTO match the same path-based linker > script rules that they would have matched if the project was compiled > without LTO. > > 2. Make module optimization passes aware of the final output sections of > symbols in order to limit inter-section (e.g. inlining) or enable > intra-section (e.g. constant merging) optimizations where needed. > > 3. Implement these features without changing the behavior of the compiler > when > linker script information is *not* available, particularly on source > files > that contain symbols carrying explicit section attributes. > > This proposal only addresses changes to Clang/LLVM. The linker also needs to > be > enhanced to support LTO with linker scripts; so far, this has only been done > for qcld, the linker shipped with the Hexagon SDK. > > The proposed implementation involves small changes throughout the > compilation > flow, so the rest of this document follows the progression from source file > to > linking. Individual changes, which could map to patches, are marked using > "(X.Y)" to help with referencing. > > Step 1: Compilation of individual files > ======================================> > In order to determine the output section for symbols, the linker needs to be > able to match symbols in bitcode files to linker script rules; however, > bitcode > does not naturally contain section names for symbols (except for those with > explicit section attributes). > > (1.1) For this reason, we run a pass immediately prior to bitcode emission > that > initializes a minimal backend and uses the backend to obtain a section name > for > each GlobalObject. The section name is then stored in the GlobalObject's > explicit section attribute. > > (1.2) ThinLTO currently marks all symbols with explicit sections as "not > eligible for import", which is overly conservative when LTO is aware of the > linker script. Since all GlobalObjects now have an explicit section, this > behavior needs to be disabled. > > (1.3) Items 1.1 and 1.2 assume that the linker provides linker script > information to LTO. They should therefore not be the default behavior, but > be > guarded under a clang flag, e.g. "-flto-ls", which is passed in addition to > -flto[=thin], when the user knows that their linker has this capability. > > Note: In the presentation, we proposed a dedicated attribute > "linker_input_section" instead of using the explicit section attribute. > After discussions with Peter, I believe we don't need to introduce an > additional attribute. > > Step 2: Symbol resolution in the linker > ======================================> > The linker loads all bitcode input files and performs symbol resolution. The > IRSymtab already exposes the explicit section attributes for symbols; in our > case, all symbols now carry this information. > > (2.1) In addition to communicating the exisiting SymbolResolution flags > (VisibileToRegularObj etc.), the linker also matches the linker script, > determines an output section for each symbol, and passes it to LTO as part > of > the SymbolResolution data structure. > > (2.2) The linker needs to determine an output section for *all* symbols, > including those with internal linkage. lto::InputFile::symbols() currently > only > exposes external symbols, so an additional argument is added to include > locals. > > (2.3) The linker provides a unique Module Id for each input file to LTO. > This > is necessary for the linker to later identify the file origin of each symbol > emitted from LTO. > > Step 3: ThinLTO Import, (Thin)LTO Optimization > =============================================> > (3.1) The information provided by the linker is stored in the IR prior to > merging (Regular LTO) and before and after importing (ThinLTO). The goal is > for > every GlobalObject to have two additional attributes by the time > optimizations > are run: > - "linker_output_section": This is used for limiting/enabling > optimizations > based on knowledge of the eventual section placement of a symbol. > - "module_id": This keeps track of the file origin of each symbol and will > be > used during CodeGen to 'tag' symbols with their origin so the linker can > (re-)match the correct linker script rules after LTO. > > (3.2) To reduce 'futile' importing in ThinLTO, output section information > can > be taken into account when determining the import/export sets. For instance, > functions whose callers are in different output sections will not be inlined > (see 3.3 below), so it does not make sense to import them. > > (3.3) Some optimization passes need to be modified to utilize linker script > information - in some cases to enable, and some cases to disable > optimizations. > Passes that currently behave conservatively for GlobalObjects with explicit > section attributes can be enhanced to take output section information into > account. For instance, ConstantMergePass should merge global constants that > are > located in the same output section. On the other hand, we need to prevent > inlining across output section boundaries, to name one example. > > Step 4: Code Generation / ELF emission > =====================================> > The output file names produced by (Thin)LTO necessarily differ from the > linker's input files. The linker thus wouldn't be able to match these to > path-based linker script rules. > > (4.1) When linker script information is available, we propose to augment the > ELF section names that symbols are emitted to with the module ID. For > example, > > define void @myFun() section ".text.myFun" > "linker_output_sectio"=".text" "module_id"="ABC123(f.o)" { } > > would be emitted to > > .section ".text.myFun^^ABC123(f.o)","ax", at progbits > > This enables the linker to then strip the part after the delimiter (^^) and > override the origin file for the symbol with the original input file for the > purpose of linker script matching. > > Item 4.1 doesn't necessarily need to be implemented in target-independent > code. > The backends can override target lowering functions responsible for ELF > section > selection, so each backend could have its own convention of how the module > ID > is encoded. > > Conclusion > =========> > This document outlined a proposal for the implementation of LTO with linker > scripts. A variant of the described approach has been in production use for > some time. It successfully extended the benefits of LTO to a number of > embedded > applications that would have otherwise suffered correctness issues if built > with LTO. Before we start implementing this upstream, I would appreciate > your > comments and ideas. I am particularly interested in any linker script use > cases > that are prevalent in projects you care about but that do not readily fit > the > above model. > > References: > [1] Talk presented at 2017 US LLVM Developers' Meeting, San Jose, CA. > Slides: > http://llvm.org/devmtg/2017-10/slides/LTOLinkerScriptsEdlerVonKoch.pdf > Video: https://youtu.be/hhaPAKUt35E > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > a Linux Foundation Collaborative Project. > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Tobias von Koch via llvm-dev
2018-May-15 15:51 UTC
[llvm-dev] [RFC] (Thin)LTO with Linker Scripts
Hi Peter, On Mon, May 14, 2018 at 8:14 AM Peter Smith via llvm-dev < llvm-dev at lists.llvm.org> wrote:> My understanding from the RFC is: > - All global objects in the bitcode file will be assigned a section name. >... which is equal to the section name that they would have been emitted to if this was a regular compilation. In addition to allowing the linker to read section names from the bitcode, this also helps support mixing -ffunction-sections and -fno-function-sections and similar options (forgot to mention that in the RFC). - A linker will communicate the output section of all global objects. Correct. (Global objects in the LLVM sense, so that includes objects with local linkage).> - Certain transformations won't be performed if the output section is > different. >Correct. Plus, others can be enabled if they're safe to apply when we know things are going to the same output section.> The common use cases that I can see that might not fit perfectly into > that model: > - Code that is in different OutputSections but it will be logically > correct and in many cases desirable to perform transformations on as > if they were in the same output section.Right. The output section that the linker communicates for a symbol doesn't need to correspond to a "physical" output section. So let's say if the linker knows (or the user somehow tells it) that two output sections should be considered equivalent, the linker can communicate the same output section identifier for symbols in either of the two physical output sections. This is perfectly safe since the output section info is only ever used to enable/inhibit optimizations, not for actual symbol emission by LTO. - Output section placement rules that are not based on names, for> example Arm's linker can assign sections to an output section until > the output section size limit is reached, then a different output > section is used. I admit that this may be more of a problem for > linkers that have a different linker script model. >That should actually just work in the existing model. Before LTO runs, we don't know the size of symbols anyway, so the linker will just communicate the original output section for all of them and we apply optimizations across them as if they all fitted in the same section. After LTO, some may end up in the 'overflow' section but LTO doesn't need to know about that since it wouldn't have been correct for the user to make any assumptions about what ends up in the original section vs overflow in the first place. I think both cases are illustrative of a use case where the precise> output section does not matter, but there is a vaguer goal of placing > a subset of the input sections in a subset of the output sections. > From what I can tell there isn't a way for the code generator to tell > the difference between code that is placed in different output > sections and it is not correct or beneficial to optimize and code that > is placed in different output sections and it is correct and > beneficial to optimize together. >Perhaps we should rename the "output section" that is communicated to LTO to something less specific to make it clear that it can be used for exactly this purpose. Optimization domain? Partition? I think that this kind of use case could be supported by doing something> like: > - Linker informs code generator the output sections that must not use > any information from another module and may not contribute any > information to another module. For example an output section that is > representing an overlay. >It's not so much about other modules (files) - you could have multiple files contributing input sections to the same overlay, for instance, and you would want to optimize across them. But you wouldn't want to de-duplicate a constant from another overlay. I think the OutputSectionID-as-optimization-domain idea captures this use case, no? - Linker can omit the output section information for sections that the> user doesn't care where they go, and let the linker decide based on > some size constraint later.That's an interesting idea to allow a 'don't care' output section ID; we would have to be pretty careful in defining what that means on a per-optimization basis. That is, am I allowed to inline a function with a defined output section into a function without one (probably no)? Vice versa (probably yes)? I think that these are mostly details rather than fundamental problems> though. >Thank you very much for your comments! Tobias -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180515/a38cb19d/attachment.html>