Today I updated to trunk the toolchain for my work developing on Cortex-M4F. I was super excited to see three commits by Tim Northover that actually attempt to improve the machine code generation for my target, or any ARM target for that matter (as opposed to other important work on compiler correctness or architectural elegance or formatting comment white-space, I mean). Is he alone or are there others working toward such improvements? The subject of two of his commits dealt with substituting MOVW/MOVT pairs for an LDR and a lit-pool. Isn't this what MachineConstantPool and ARMConstantIslandPass was all about? I vaguely recall a while back that it was disabled by some Darwin snob who thought no useful target benefited from it. What about enabling it again? Perhaps you've noticed in the last two months that someone's been porting it to the MIPS target, suggesting to me that it's still a good starting point. Finally, I would really like to see this optimization be promoted from -Oz to -Os. Doesn't it satisfy the criteria for -Os over -Oz? Tim's other commit was about stack adjustment folding. So, Tim, did you see the treads with Andrea Mucignat back in October? She asked for some help so that she could provide a patch to improve machine code generation for Thumb entry/exit points. No one with knowledge about the matter responded. This commit of yours looks to me like you do have some knowledge about it. She seems to have given up (and judging by the way she was treated, I don't blame her -- sad). But would you review what she was attempting, please? http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/ 066451.html 066461.html, 066466.html, 066470.html, 066475.html 066641.html, 066650.html Thanks, Gary Fuehrer
On Dec 3, 2013, at 2:01 PM, Gary Fuehrer <gfuehrer at defiant-tech.com> wrote:> Today I updated to trunk the toolchain for my work developing on Cortex-M4F. I was super excited to see three commits by Tim Northover that actually attempt to improve the machine code generation for my target, or any ARM target for that matter (as opposed to other important work on compiler correctness or architectural elegance or formatting comment white-space, I mean). Is he alone or are there others working toward such improvements? > > The subject of two of his commits dealt with substituting MOVW/MOVT pairs for an LDR and a lit-pool. Isn't this what MachineConstantPool and ARMConstantIslandPass was all about? I vaguely recall a while back that it was disabled by some Darwin snob who thought no useful target benefited from it.You recall incorrectly.> What about enabling it again? Perhaps you've noticed in the last two months that someone's been porting it to the MIPS target, suggesting to me that it's still a good starting point. Finally, I would really like to see this optimization be promoted from -Oz to -Os. Doesn't it satisfy the criteria for -Os over -Oz? > > Tim's other commit was about stack adjustment folding. So, Tim, did you see the treads with Andrea Mucignat back in October? She asked for some help so that she could provide a patch to improve machine code generation for Thumb entry/exit points. No one with knowledge about the matter responded. This commit of yours looks to me like you do have some knowledge about it. She seems to have given up (and judging by the way she was treated, I don't blame her -- sad). But would you review what she was attempting, please? > > http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/ > 066451.html > 066461.html, 066466.html, 066470.html, 066475.html > 066641.html, 066650.html > > Thanks, > Gary Fuehrer > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On 12/3/2013 4:22 PM, Jim Grosbach wrote:> > On Dec 3, 2013, at 2:01 PM, Gary Fuehrer <gfuehrer at defiant-tech.com> wrote: > >> ... ARMConstantIslandPass ... I vaguely recall a while back that it was disabled ... > > You recall incorrectly.Ooops. I apologize. Islands are not being placed within range of a near LDR. They only appear between functions. (It seemed to me like ARMConstanIslandPass was not being used to make them.) Does anyone know, is this expected?
Hi Gary, On 3 December 2013 22:01, Gary Fuehrer <gfuehrer at defiant-tech.com> wrote:> The subject of two of his commits dealt with substituting MOVW/MOVT pairs > for an LDR and a lit-pool. Isn't this what MachineConstantPool and > ARMConstantIslandPass was all about?Both are essential components to using lit-pools: the MachineConstantPool is just LLVM's underlying machinery and ARMConstantIslands is for fixing up out of range loads and so on so they can actually be used. My recent changes have been to fix Darwin CodeGen so that they're actually useful (previously we combined movw/movt pairs referring to the same global but not litpool ones, which meant that litpools actually took up more room), and to enable them in "-Oz" mode. It sounds like you're on an ELF platform, in which case (fingers crossed) you already get the combining unless you compile with "-fPIC". The "-Oz" change *should* just apply directly and be useful. Please let me know if it doesn't, I'd like to get the same benefits on ELF if at all possible.> I vaguely recall a while back that it was disabled [...] What about enabling it again?The only matching situation I can think of there is that Jim suggested I use a different approach for constants on 64-bit AArch64. I don't know the performance numbers, but frankly I was glad to kill it. The ConstantIslands pass is complicated enough that it *really* needs to justify itself and I don't think code size is a priority on AArch64. It's still present, supported and enabled in 32-bit ARM, though until today only used for targets that didn't have movw/movt available (and perhaps some odd corner cases like floating constants).> Islands are not being placed within range of a near LDR. They only appear > between functions. (It seemed to me like ARMConstanIslandPass was not > being used to make them.)That's a very worrying bug, and shouldn't be happening at all. Do you have a .ll test-case you can show us?> Finally, I would really like to see > this optimization be promoted from -Oz to -Os. Doesn't it satisfy the > criteria for -Os over -Oz?Not generally, since in LLVM -Os means roughly "don't bloat code speculatively while looking for performance". On A-class cores, litpools are almost always slower so they don't qualify. We *could* enable it at -Os on M-class CPUs separately, but not without benchmark evidence (and I suspect it would have a bad effect even there).> Tim's other commit was about stack adjustment folding. So, Tim, did you see > the treads with Andrea Mucignat back in October?It rings some bells, but I wasn't paying much attention. I think she did get knowledgeable help; nothing in the thread jumps out as wrong. It looks like a reasonable goal, but as Renato said, should be considered carefully. It's a nasty area of the compiler and has knock-on effects. Cheers. Tim.
On 4 December 2013 07:42, Tim Northover <t.p.northover at gmail.com> wrote:> We *could* > enable it at -Os on M-class CPUs separately, but not without benchmark > evidence (and I suspect it would have a bad effect even there).Hi Tim, Gary, I think this is an interesting proposition... I don't like checks on CPU name/arch/class to guide low-level optimization decisions, and adding yet-another space level would complicate matters. But adding special flags to control fine-grained behaviour would be possible, and even letting it on by default on Clang if the arch is M-class. Not without benchmark evidence, of course. cheers, --renato
On Dec 3, 2013, at 11:42 PM, Tim Northover <t.p.northover at gmail.com> wrote:> Hi Gary, > > On 3 December 2013 22:01, Gary Fuehrer <gfuehrer at defiant-tech.com> wrote: >> The subject of two of his commits dealt with substituting MOVW/MOVT pairs >> for an LDR and a lit-pool. Isn't this what MachineConstantPool and >> ARMConstantIslandPass was all about? > > Both are essential components to using lit-pools: the > MachineConstantPool is just LLVM's underlying machinery and > ARMConstantIslands is for fixing up out of range loads and so on so > they can actually be used. > > My recent changes have been to fix Darwin CodeGen so that they're > actually useful (previously we combined movw/movt pairs referring to > the same global but not litpool ones, which meant that litpools > actually took up more room), and to enable them in "-Oz" mode. > > It sounds like you're on an ELF platform, in which case (fingers > crossed) you already get the combining unless you compile with > "-fPIC". The "-Oz" change *should* just apply directly and be useful. > Please let me know if it doesn't, I'd like to get the same benefits on > ELF if at all possible. > >> I vaguely recall a while back that it was disabled [...] What about enabling it again? > > The only matching situation I can think of there is that Jim suggested > I use a different approach for constants on 64-bit AArch64. I don't > know the performance numbers, but frankly I was glad to kill it. The > ConstantIslands pass is complicated enough that it *really* needs to > justify itself and I don't think code size is a priority on AArch64. > > It's still present, supported and enabled in 32-bit ARM, though until > today only used for targets that didn't have movw/movt available (and > perhaps some odd corner cases like floating constants).It’s still enabled and used for all 32 bit targets, actually. Just not as aggressively. Consider 64 and 128 bit floating point and vector constants, for example.> >> Islands are not being placed within range of a near LDR. They only appear >> between functions. (It seemed to me like ARMConstanIslandPass was not >> being used to make them.) > > That's a very worrying bug, and shouldn't be happening at all. Do you > have a .ll test-case you can show us? > >> Finally, I would really like to see >> this optimization be promoted from -Oz to -Os. Doesn't it satisfy the >> criteria for -Os over -Oz? > > Not generally, since in LLVM -Os means roughly "don't bloat code > speculatively while looking for performance". On A-class cores, > litpools are almost always slower so they don't qualify. We *could* > enable it at -Os on M-class CPUs separately, but not without benchmark > evidence (and I suspect it would have a bad effect even there). > >> Tim's other commit was about stack adjustment folding. So, Tim, did you see >> the treads with Andrea Mucignat back in October? > > It rings some bells, but I wasn't paying much attention. I think she > did get knowledgeable help; nothing in the thread jumps out as wrong. > It looks like a reasonable goal, but as Renato said, should be > considered carefully. It's a nasty area of the compiler and has > knock-on effects. > > Cheers. > > Tim. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On 12/4/2013 12:42 AM, Tim Northover wrote: > Hi Gary, > > On 3 December 2013 22:01, Gary Fuehrer <gfuehrer at defiant-tech.com> wrote: >> Islands are not being placed within range of a near LDR. They only appear >> between functions. (It seemed to me like ARMConstanIslandPass was not >> being used to make them.) > > That's a very worrying bug, and shouldn't be happening at all. Do you > have a .ll test-case you can show us? I have a large number of instances in my firmware, so I'll work at producing an .ll test-case. But first I'll get smart about what you said concerning elf -- I had been using -target arm-none-eabi. I just tried arm-elf-eabi and that had almost no effect (but an 'interesting' one nonetheless). But I don't know yet if that triple (or quad) supplies the necessary elf-ness. >> Tim's other commit was about stack adjustment folding. So, Tim, did you see >> the treads with Andrea Mucignat back in October? > > It rings some bells, but I wasn't paying much attention. I think she > did get knowledgeable help; nothing in the thread jumps out as wrong. > It looks like a reasonable goal, but as Renato said, should be > considered carefully. It's a nasty area of the compiler and has > knock-on effects. Thank you for double checking. And for the caution -- I was inclined to look into this one but now I know that it's contraindicated against noobieness. - Gary