Peter Smith via llvm-dev
2018-Mar-15 16:43 UTC
[llvm-dev] [RFC] Stop giving a default CPU to the LTO plugin?
Hello everyone, this is most likely Arm specific, but could affect other targets where there is a somewhat complex relationship between the triple and mcpu option. At present when clang is used as a linker driver for the gold-plugin and when using and an explicit -mcpu is not given to clang, then clang will always generate a -Wl,-plugin-opt=mcpu=<default CPU> where the default CPU is based on the triple. I think that this is causing more problems than it is worth and I'd like clang to not pass -Wl,-plugin-opt=mcpu=<CPU> unless the -mcpu=<CPU> is explicitly on the clang command line. I'd like to know if I'm missing something important, and if there would be any support for a patch to not pass an implicit mcpu via plugin? There may also be a way of handling an inappropriate cpu/triple combination more gracefully. The full story of a problem with the Arm target is available at https://bugs.llvm.org/show_bug.cgi?id=36542 . A short description of the problem is: - Source files are compiled with -mthumb --target=arm-linux-androideabi -mcpu=cortex-a7, this records a triple of thumbv7a-linux-androideabi in the bitcode file. - The command line for the link step only passes --targetarm-linux-androideabi. - The default cpu passed to LTO for a triple arm-linux-androideabi is the very old arm7tdmi. - When the ARMAsmBackend is created from the combination of triple thumbv7a-linux-androideabi and cpu arm7tdmi the resultant target features only include Thumb1. Amongst other things this prevents MC from widening branch instructions to the Thumb2 wide branch. - Code generation prior to MC seems to prefer the attributes in the bitcode file (triple thumbv7a-linux-androideabi -mcpu=generic), this selects a narrow branch instruction for a tail call with the expectation it will be widened by MC. End result is a relocation out of range error. In summary the combination of -mcpu=arm7tdmi and a triple from the bitcode file of thumbv7a-linux-androideabi does not make sense, but when passed to LTO seems to: - Escape all the existing warnings about nonsensical triples [*]. - On Arm targets at least; having a final link step without a cpu is likely to be quite common so I don't think that this will be a corner case. - The Build Attributes of the output ELF file get the triple ARMv4t from -mcpu=arm7tdmi which is lying to the linker as the file contains ARMv7a instructions. My personal preference is that we shouldn't pass an implicit CPU derived from the triple, at least for Arm targets. If that isn't possible I think that we should make sure we detect inappropriate cpu/triple combinations and either warn or error. Does anyone have any strong opinions on which way to go? I'm happy to work on a patch to improve this. I know that this can be worked around by explicitly passing a cpu or more specific triple to the link line, however this is often not done in existing builds. Thanks in advance for any suggestions Peter [*] If the bitcode files don't have a .o suffix a CodeGenAction is performed on the bitcode files giving the warning: warning: overriding the module target triple with armv4t--linux-androideabi [-Woverride-module]
Friedman, Eli via llvm-dev
2018-Mar-15 19:12 UTC
[llvm-dev] [RFC] Stop giving a default CPU to the LTO plugin?
On 3/15/2018 9:43 AM, Peter Smith via llvm-dev wrote:> Hello everyone, this is most likely Arm specific, but could affect > other targets where there is a somewhat complex relationship between > the triple and mcpu option. > > At present when clang is used as a linker driver for the gold-plugin > and when using and an explicit -mcpu is not given to clang, then clang > will always generate a -Wl,-plugin-opt=mcpu=<default CPU> where the > default CPU is based on the triple. > > I think that this is causing more problems than it is worth and I'd > like clang to not pass -Wl,-plugin-opt=mcpu=<CPU> unless the > -mcpu=<CPU> is explicitly on the clang command line. > > I'd like to know if I'm missing something important, and if there > would be any support for a patch to not pass an implicit mcpu via > plugin? There may also be a way of handling an inappropriate > cpu/triple combination more gracefully. > > The full story of a problem with the Arm target is available at > https://bugs.llvm.org/show_bug.cgi?id=36542 . A short description of > the problem is: > - Source files are compiled with -mthumb > --target=arm-linux-androideabi -mcpu=cortex-a7, > this records a triple of thumbv7a-linux-androideabi in the bitcode file. > - The command line for the link step only passes --target> arm-linux-androideabi. > - The default cpu passed to LTO for a triple arm-linux-androideabi is > the very old arm7tdmi. > - When the ARMAsmBackend is created from the combination of triple > thumbv7a-linux-androideabi and cpu arm7tdmi the resultant target > features only include Thumb1. Amongst other things this prevents MC > from widening branch instructions to the Thumb2 wide branch. > - Code generation prior to MC seems to prefer the attributes in the > bitcode file (triple thumbv7a-linux-androideabi -mcpu=generic), this > selects a narrow branch instruction for a tail call with the > expectation it will be widened by MC. End result is a relocation out > of range error. > > In summary the combination of -mcpu=arm7tdmi and a triple from the > bitcode file of thumbv7a-linux-androideabi does not make sense, but > when passed to LTO seems to: > - Escape all the existing warnings about nonsensical triples [*]. > - On Arm targets at least; having a final link step without a cpu is > likely to be quite common so I don't think that this will be a corner > case. > - The Build Attributes of the output ELF file get the triple ARMv4t > from -mcpu=arm7tdmi which is lying to the linker as the file contains > ARMv7a instructions.Having ARMv7a instructions in an ARMv4t file shouldn't be a problem: a function should be allowed to override the CPU attributes to generate code for a newer target. This is generally done using the "target" function attribute. If this doesn't work correctly, we should fix it. It looks like it's currently broken; testcase: void g(); __attribute__((target("thumb,arch=cortex-a53"))) void f() { g(); }> > My personal preference is that we shouldn't pass an implicit CPU > derived from the triple, at least for Arm targets. If that isn't > possible I think that we should make sure we detect inappropriate > cpu/triple combinations and either warn or error. Does anyone have any > strong opinions on which way to go? I'm happy to work on a patch to > improve this. > > I know that this can be worked around by explicitly passing a cpu or > more specific triple to the link line, however this is often not done > in existing builds. > > Thanks in advance for any suggestionsIt doesn't really make sense to override the target specified in the bitcode; I agree we shouldn't do that by default. But I don't think changing that actually fixes the underlying bug. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Peter Smith via llvm-dev
2018-Mar-16 10:38 UTC
[llvm-dev] [RFC] Stop giving a default CPU to the LTO plugin?
Thanks for the example, that is very useful in working out the overall scope of the problem, which is now wider than I thought it was. I've put some comments inline. On 15 March 2018 at 19:12, Friedman, Eli <efriedma at codeaurora.org> wrote:> On 3/15/2018 9:43 AM, Peter Smith via llvm-dev wrote: >> >> Hello everyone, this is most likely Arm specific, but could affect >> other targets where there is a somewhat complex relationship between >> the triple and mcpu option. >> >> At present when clang is used as a linker driver for the gold-plugin >> and when using and an explicit -mcpu is not given to clang, then clang >> will always generate a -Wl,-plugin-opt=mcpu=<default CPU> where the >> default CPU is based on the triple. >> >> I think that this is causing more problems than it is worth and I'd >> like clang to not pass -Wl,-plugin-opt=mcpu=<CPU> unless the >> -mcpu=<CPU> is explicitly on the clang command line. >> >> I'd like to know if I'm missing something important, and if there >> would be any support for a patch to not pass an implicit mcpu via >> plugin? There may also be a way of handling an inappropriate >> cpu/triple combination more gracefully. >> >> The full story of a problem with the Arm target is available at >> https://bugs.llvm.org/show_bug.cgi?id=36542 . A short description of >> the problem is: >> - Source files are compiled with -mthumb >> --target=arm-linux-androideabi -mcpu=cortex-a7, >> this records a triple of thumbv7a-linux-androideabi in the bitcode file. >> - The command line for the link step only passes --target>> arm-linux-androideabi. >> - The default cpu passed to LTO for a triple arm-linux-androideabi is >> the very old arm7tdmi. >> - When the ARMAsmBackend is created from the combination of triple >> thumbv7a-linux-androideabi and cpu arm7tdmi the resultant target >> features only include Thumb1. Amongst other things this prevents MC >> from widening branch instructions to the Thumb2 wide branch. >> - Code generation prior to MC seems to prefer the attributes in the >> bitcode file (triple thumbv7a-linux-androideabi -mcpu=generic), this >> selects a narrow branch instruction for a tail call with the >> expectation it will be widened by MC. End result is a relocation out >> of range error. >> >> In summary the combination of -mcpu=arm7tdmi and a triple from the >> bitcode file of thumbv7a-linux-androideabi does not make sense, but >> when passed to LTO seems to: >> - Escape all the existing warnings about nonsensical triples [*]. >> - On Arm targets at least; having a final link step without a cpu is >> likely to be quite common so I don't think that this will be a corner >> case. >> - The Build Attributes of the output ELF file get the triple ARMv4t >> from -mcpu=arm7tdmi which is lying to the linker as the file contains >> ARMv7a instructions. > > > Having ARMv7a instructions in an ARMv4t file shouldn't be a problem: a > function should be allowed to override the CPU attributes to generate code > for a newer target. This is generally done using the "target" function > attribute. If this doesn't work correctly, we should fix it. It looks like > it's currently broken; testcase: > > void g(); > __attribute__((target("thumb,arch=cortex-a53"))) > void f() { g(); } >Hmmm, allowing that makes life much more complicated. For example I can also write: void g(); __attribute__((target("thumb,arch=cortex-m0"))) void f() { g(); } void i(); __attribute__((target("arm,arch=cortex-a53"))) void h() { i(); } With -mcpu=cortex-m0 and get ARM code within an object claiming to be Thumb only with no errors or warnings, with no chance of a linker detecting a mismatch either. I think that part of this is the same problem that is observed in PR36542 the ARMAsmBackend that is responsible for widening the tail call to a Thumb2 branch is created with ARMv4T which doesn't support Thumb1. There has been a recent change that threads through the existing SubtargetInfo instead of recreating it from the triple alone. It is worth mentioning that the object level BuildAttributes do not include Thumbv7a which is misleading to a linker as it will be expecting no ARMv7A in the object. Has there already been a discussion about what per function code-generation with BuildAttributes higher than the base object should mean in the context of capabilities of the ARMAsmBackend and BuildAttributes? My thoughts right now are that if ARMAsmBackend is to operate at an object level, rather than a per-function level then it has to use the capabilities of the highest architecture in the file. This also means giving the object BuildAttributes of the highest architecture in the file, and giving an error if they contradict, for example mixing Thumb Cortex-M0 and ARM Cortex-A53. If the ARMAsmBackend could be made to work on a per-function level then there is a chance that we could only widen the tail call to g() in f(), but not elsewhere. To honestly describe this in the BuildAttributes we would need to use per Section or per Function attributes though. My suggestion to move forward here is: - Recreate the SubtargetInfo based on the merge of all the Targets and CPU information that we have seen, or warn/error if they are incompatible. - Ouput the Tag_CPU_arch BuildAttributes based on the merge of all the Targets and CPU information that we have seen. It is probably worth moving any discussion of this particular part to PR36542 since it is somewhat Arm specific. I'll add this comment to there.>> >> My personal preference is that we shouldn't pass an implicit CPU >> derived from the triple, at least for Arm targets. If that isn't >> possible I think that we should make sure we detect inappropriate >> cpu/triple combinations and either warn or error. Does anyone have any >> strong opinions on which way to go? I'm happy to work on a patch to >> improve this. >> >> I know that this can be worked around by explicitly passing a cpu or >> more specific triple to the link line, however this is often not done >> in existing builds. >> >> Thanks in advance for any suggestions > > It doesn't really make sense to override the target specified in the > bitcode; I agree we shouldn't do that by default. But I don't think > changing that actually fixes the underlying bug. >It won't fix the underlying bug, but depending on we deal with nonsensical combinations of triple and CPU we may end up with a warning or an error for a common use case.> -Eli > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux > Foundation Collaborative Project >