via llvm-dev
2019-Aug-23 11:51 UTC
[llvm-dev] Using [GlobalISel] to provide peephole optimizations
Hi, GlobalISel is fantastic, but obviously lacks a lot of the transforms that makes SelectionDAG so good. Whilst it's plenty usable, you'll find yourself wanting/needing to add a lot of manual little transforms to clean things up. I know of the RFC for a new Combiner with its own syntax (https://reviews.llvm.org/D54286 is the latest I can find of it), but after manually adding my Nth manually coded pass for a niggling important transform, and then needing to add more cases to FoldImmediate/MemoryOperand/OptimizeLoad after it. I wondered how hard it would be to allow GlobalISel to reselect machine patterns, eg after they've been made available by other passes. What I was thinking is in addition to anything else that's coming, allowing Instructions to exist on the input side of Pat<>, and using the same InstructionSelector we already have to reselect. To my surprise... not many changes are required to seemingly make this work: // fold loads in to compare instructions def : Pat<(CPw_sr i32:$k, (MOVw_wf iPTR:$s)), (CPw_sf i32:$k, iPTR:$s)>; And it looks like SDNodeXForms will work off the bat, along with complex renderers. The main catches being with constants that require checking (due G_CONSTANT being handled differently to immediates), along with needing to add checks that the instruction has the same implicits, and that they're dead where appropriate. But it seems viable and is definitely easy to use so I'm just wondering... is this something that's being considered/is appealing to people? And/or is the restriction of not allowing Instructions on the LHS quite an intentional design decision? Because it seems that this would provide some value even for those not using GlobalISel as their primary selector, just as a way of quickly describing peephole optimizations and leveraging the very nifty little VM there to implement them. In theory, a lot of pattern fragments could even be added automatically, by comparing pattern fragments and the machine opcodes they represent - giving a free automatically generated "foldImmediate", among other things. A diff of the proof-of-concept can be found here: (https://paste.ee/p/aDHIg), note though it's really just a curiosity to get some conversation going, that other Selectors will reject these patterns (I haven't added their escape routes), and that you're likely to break it in many different ways. And of course that they should exist as their own table, and that Reselect ought return the newly selected instruction. But it's neat enough for me as is to get some useful patterns added already, with these caveats in mind. Any thoughts? Regards, Alex Davies PS Apologies if the diff doesn't work first go - had to wrestle with quite a few out-of-tree things back to get things in order. Note, includes fix for https://bugs.llvm.org/show_bug.cgi?id=42032 also. Oh, and if there's already a peephole pattern emitter that I've missed, please do let me know, I'll easily survive the shame. :)
Amara Emerson via llvm-dev
2019-Aug-26 20:23 UTC
[llvm-dev] Using [GlobalISel] to provide peephole optimizations
+Daniel who’s prototyping the new combiner framework.> On Aug 23, 2019, at 4:51 AM, via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi, > > GlobalISel is fantastic, but obviously lacks a lot of the transforms that > makes SelectionDAG so good. Whilst it's plenty usable, you'll find yourself > wanting/needing to add a lot of manual little transforms to clean things up. > > I know of the RFC for a new Combiner with its own syntax > (https://reviews.llvm.org/D54286 is the latest I can find of it), but after > manually adding my Nth manually coded pass for a niggling important > transform, and then needing to add more cases to > FoldImmediate/MemoryOperand/OptimizeLoad after it. I wondered how hard it > would be to allow GlobalISel to reselect machine patterns, eg after they've > been made available by other passes. > > What I was thinking is in addition to anything else that's coming, allowing > Instructions to exist on the input side of Pat<>, and using the same > InstructionSelector we already have to reselect. > > To my surprise... not many changes are required to seemingly make this work: > > // fold loads in to compare instructions > def : Pat<(CPw_sr i32:$k, (MOVw_wf iPTR:$s)), (CPw_sf i32:$k, iPTR:$s)>; > > And it looks like SDNodeXForms will work off the bat, along with complex > renderers. The main catches being with constants that require checking (due > G_CONSTANT being handled differently to immediates), along with needing to > add checks that the instruction has the same implicits, and that they're > dead where appropriate. But it seems viable and is definitely easy to use so > I'm just wondering... is this something that's being considered/is appealing > to people? And/or is the restriction of not allowing Instructions on the LHS > quite an intentional design decision? > > Because it seems that this would provide some value even for those not using > GlobalISel as their primary selector, just as a way of quickly describing > peephole optimizations and leveraging the very nifty little VM there to > implement them. In theory, a lot of pattern fragments could even be added > automatically, by comparing pattern fragments and the machine opcodes they > represent - giving a free automatically generated "foldImmediate", among > other things. > > A diff of the proof-of-concept can be found here: > (https://paste.ee/p/aDHIg), note though it's really just a curiosity to get > some conversation going, that other Selectors will reject these patterns (I > haven't added their escape routes), and that you're likely to break it in > many different ways. And of course that they should exist as their own > table, and that Reselect ought return the newly selected instruction. > > But it's neat enough for me as is to get some useful patterns added already, > with these caveats in mind. > > Any thoughts? > > Regards, > Alex Davies > > PS Apologies if the diff doesn't work first go - had to wrestle with quite a > few out-of-tree things back to get things in order. Note, includes fix for > https://bugs.llvm.org/show_bug.cgi?id=42032 also. Oh, and if there's already > a peephole pattern emitter that I've missed, please do let me know, I'll > easily survive the shame. :) > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Daniel Sanders via llvm-dev
2019-Aug-30 01:08 UTC
[llvm-dev] Using [GlobalISel] to provide peephole optimizations
Hi Alex, Sorry for the slow reply, I was on duty for merging from upstream and fixing issues this week and that took almost all my time.> On Aug 26, 2019, at 13:23, Amara Emerson <aemerson at apple.com> wrote: > > +Daniel who’s prototyping the new combiner framework. > >> On Aug 23, 2019, at 4:51 AM, via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hi, >> >> GlobalISel is fantastic, but obviously lacks a lot of the transforms that >> makes SelectionDAG so good. Whilst it's plenty usable, you'll find yourself >> wanting/needing to add a lot of manual little transforms to clean things up. >> >> I know of the RFC for a new Combiner with its own syntax >> (https://reviews.llvm.org/D54286 is the latest I can find of it),I'm still working on this but there's been higher priority tasks throughout the year that have taken up a fair bit of my time. I'm hoping to have some news on this fairly soon>> but after >> manually adding my Nth manually coded pass for a niggling important >> transform, and then needing to add more cases to >> FoldImmediate/MemoryOperand/OptimizeLoad after it. I wondered how hard it >> would be to allow GlobalISel to reselect machine patterns, eg after they've >> been made available by other passes. >> >> What I was thinking is in addition to anything else that's coming, allowing >> Instructions to exist on the input side of Pat<>, and using the same >> InstructionSelector we already have to reselect. >> >> To my surprise... not many changes are required to seemingly make this work: >> >> // fold loads in to compare instructions >> def : Pat<(CPw_sr i32:$k, (MOVw_wf iPTR:$s)), (CPw_sf i32:$k, iPTR:$s)>; >> >> And it looks like SDNodeXForms will work off the bat, along with complex >> renderers. The main catches being with constants that require checking (due >> G_CONSTANT being handled differently to immediates), along with needing to >> add checks that the instruction has the same implicits, and that they're >> dead where appropriate. But it seems viable and is definitely easy to use soIt makes sense that a fair bit of this will work. Unlike DAGISel, the input and output are based around the same representation. Aside from the ones you mentioned, the main limitations you'll run into will stem from limitations on the patterns syntax (e.g. instructions are only allowed one result) and semantics (e.g. references to register classes like GPR32:$src specify a type constraint but not a register class constraint).>> I'm just wondering... is this something that's being considered/is appealing >> to people?Something along these lines should be doable with the Combiner work when it lands and I can see some benefits to a post-isel combiner. The same benefits would also apply to the peephole mechanism you're proposing.>> And/or is the restriction of not allowing Instructions on the LHS >> quite an intentional design decision?This restriction comes primarily from DAGISel where the LHS is matching SDNodes in the DAG whereas the RHS is emitting Machine SDNodes (albeit briefly) and then converting them to MachineInstrs. The importer that GlobalISel uses to import SelectionDAG rules into GlobalISel's instruction selector carried this restriction forwards. W.r.t the instruction selector (and ignoring the issues I mentioned above), it makes sense in principle for instructions to be allowed except for at the root of the pattern because GlobalISel allows passes before instruction selection to select instructions. The only reason it doesn't make sense at the root of the pattern is that the instruction selector skips instructions that are already target machine instructions or one of the permitted target independent machine instructions. The big issue would be: When are these new rules executed? The instruction selector is intentionally less general than a combiner since combiners are more expensive (re-applying rules until convergence is difficult and expensive) and the instruction selector is a mandatory pass that runs at all optimization levels. I wouldn't be in favour of making the instruction selector more like a combiner by re-processing instructions but I think that this peephole idea would make sense as a separate pass like a second (but optional) instruction selection pass, or an optional post-isel combiner. I think you may be proposing adding it to PeepholeOptimizer.cpp but I didn't see anything calling reselect() in your patch.>> Because it seems that this would provide some value even for those not using >> GlobalISel as their primary selector, just as a way of quickly describing >> peephole optimizations and leveraging the very nifty little VM there to >> implement them. In theory, a lot of pattern fragments could even be added >> automatically, by comparing pattern fragments and the machine opcodes they >> represent - giving a free automatically generated "foldImmediate", among >> other things.I'm not sure I understood this bit. Could you give an example?>> A diff of the proof-of-concept can be found here: >> (https://paste.ee/p/aDHIg), note though it's really just a curiosity to get >> some conversation going, that other Selectors will reject these patterns (I >> haven't added their escape routes), and that you're likely to break it in >> many different ways. And of course that they should exist as their own >> table, and that Reselect ought return the newly selected instruction. >> >> But it's neat enough for me as is to get some useful patterns added already, >> with these caveats in mind. >> >> Any thoughts? >> >> Regards, >> Alex Davies >> >> PS Apologies if the diff doesn't work first go - had to wrestle with quite a >> few out-of-tree things back to get things in order. Note, includes fix for >> https://bugs.llvm.org/show_bug.cgi?id=42032 also. Oh, and if there's already >> a peephole pattern emitter that I've missed, please do let me know, I'll >> easily survive the shame. :) >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >