Joan Lluch via llvm-dev
2019-Jun-06 14:28 UTC
[llvm-dev] Strange behaviour of post-legalising optimisations(?)
Hi Tim, Thank you for your reply. It actually helped a lot to narrow the issue, as previously I didn’t even know where to look. I have been following the code in the debugger, specially the LSRInstance::SolveRecurse function. This function traverses recursively all possible ‘Formulae’, and determines the best instruction combination for the loop generation, based on minimal cost. The SolveRecurse function uses Cost::RateFormula as the basis for the cost computation. I found that the RateFormula function does not return the what-would-be best values for my architecture. In particular, my understanding is that the whole implementation kind of assumes that all architectures will have post-increment addressing modes, even if they don’t. For example, consider the following formulas: reg({%dest,+,1}<nsw><%for.cond>) reg(%dest) + 1*reg({0,+,1}<nuw><nsw><%for.cond>) The RateFormula function returns ‘1 instruction’ for either of them, however, in my case, the first one will require an explicit add instruction (post increment is not supported), which should account for an instruction count of 2 instead of just 1. The count should be 2 because this additional add can not be folded with other instructions, as it’s exclusively to increment the pointer. So the problem is that in most situations, llvm choses the first form rather than the second one due to the register overhead in the second one, in spite that in my architecture it would be much better to chose the second (which would happen if the first one resulted in a 2 instruction count). At this point, I advanced significantly on the understanding of the problem, but I got stuck at a more concrete problem. It seems to me that there’s no way to instruct LLVM to adopt a more costly result for the first form, as it would be necessary for my architecture, unless I’m missing something (?). (As a side note: I implemented “isLegalAddressingMode”, but it turned to be virtually identical to the default one except for the smaller immediate sizes, so I think it doesn’t make much difference. Post/Pre increment addressing mode is not supported in my architecture so this is reflected, and I haven’t either added the setIndexedLoadAction calls to my TargetLowering constructor. I also created my own TargetTransformInfo implementation based on the ARM code, which works fine as the functions get diligently called, but I found that it’s not of much help because it doesn’t have the necessary hooks to solve this issue) So any additional help would be appreciated. Joan Lluch> On 5 Jun 2019, at 19:33, Tim Northover <t.p.northover at gmail.com> wrote: > > Hi Joan, > > On Wed, 5 Jun 2019 at 08:58, Joan Lluch via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Any ideas about why this happens?. Particularly, what could cause this change of behaviour just by adding an ‘if’ before the ‘for’? Note that the IR code for the for ‘for’ body is IDENTICAL in both cases, so this is definitely a LLVM thing. > > If you run "llc -print-after-all" you should be able to see which pass > first introduces the behaviour you don't like. In this case it seems > to be "Loop Strength Reduction", in > lib/Transforms/Scalar/LoopStrengthReduce.cpp. It makes use of a lot of > hooks in TargetTransformInfo (search for "TTI.") when making its > decisions and if you haven't implemented versions of those matching > your target its heuristics are going to get this kind of thing wrong. > > Beyond that I'm just guessing, but isLegalAddressingMode looks like it > could be a critical one for your case (it's the only thing you've > mentioned being able to do neatly). > > Cheers. > > Tim.
Tim Northover via llvm-dev
2019-Jun-06 14:42 UTC
[llvm-dev] Strange behaviour of post-legalising optimisations(?)
Hi, On Thu, 6 Jun 2019 at 07:28, Joan Lluch <joan.lluch at icloud.com> wrote:> So any additional help would be appreciated.I'm not terribly familiar with the code involved, but I did notice x86 (which doesn't have post-increment addressing either) managed to avoid the form you don't want. So perhaps seeing where it diverges from your path would be a good starting point. Cheers. Tim.
Joan Lluch via llvm-dev
2019-Jun-06 21:54 UTC
[llvm-dev] Strange behaviour of post-legalising optimisations(?)
Hi Tim, After studying the x86 backend I got it working ! What happens is that considering the following formulas: reg({%dest,+,1}<nsw><%for.cond>) reg(%dest) + 1*reg({0,+,1}<nuw><nsw><%for.cond>) the RateFormula function is designed to not increment the instruction counter when the second form appears more than once. This is what it gives it some cost advantage over the first form, as I wanted. Now, the x86 takes advantage of this by implementing an unique isLSRCostLess function on its TargetTransformInfo. The isLSRCostLess function compares costs for the Loop Strength Reduction algorithm and there’s a hook to custom implement it. The particular implementation of x86 is the only one (along with the SystemZ) that takes the number of instructions into account and puts that as main cost. All the other targets just rely on the default isLSRCostLess implementation that takes number of registers as priority. So that was it. So thanks for all your pointers! Joan Lluch> On 6 Jun 2019, at 16:42, Tim Northover <t.p.northover at gmail.com> wrote: > > Hi, > > On Thu, 6 Jun 2019 at 07:28, Joan Lluch <joan.lluch at icloud.com> wrote: >> So any additional help would be appreciated. > > I'm not terribly familiar with the code involved, but I did notice x86 > (which doesn't have post-increment addressing either) managed to avoid > the form you don't want. So perhaps seeing where it diverges from your > path would be a good starting point. > > Cheers. > > Tim.