paolo via llvm-dev
2019-Aug-15 09:41 UTC
[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions
Hi Roman,> That depends. > If there's LLVM intrinsic for it, then any normal optimization pass could do it. > In cttz's case it's mainly done in LoopIdiom pass.Oh yes. Thank you! Unfortunately several of the instructions of the bit manipulation extension don't seem to have an intrinsic already in LLVM. That will require to add some passes to the middle end.> Again, i'd say this is too broad of a question. > If there is LLVM IR intrinsic, then you only need to lower it, > and optionally ensure that middle-end passes form it from appropriate IR. > > If there isn't one, then yes, you'd want to match all the beautiful wilderness > of the possible patterns that combine into that instruction. > > While it's really tempting to just add IR intrinsic for everything, > please do note that a new intrinsic is completely opaque to the rest of LLVM. > It does not magically get peep-hole folds, so those would need to be added, > especially if you intend to form said intrinsic within the middle-end from IR. > > This may change some day when these peep-hole folds are auto-inferred, > but that is not so nowadays. Really looking forward to that. >It would be definitely interesting. Anyway adding such complex instructions to the middle end seems material for another patch. Unless things change in the meantime. For now we can provide a lower level optimization of smaller bit manipulation patterns. But I'll definitely look into adding those passes as they would provide much more optimization. Many thanks. Paolo -------------- next part -------------- A non-text attachment was scrubbed... Name: pEpkey.asc Type: application/pgp-keys Size: 2456 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190815/8997a154/attachment.key>
Roman Lebedev via llvm-dev
2019-Aug-15 10:20 UTC
[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions
On Thu, Aug 15, 2019 at 12:41 PM paolo <paolo.savini at embecosm.com> wrote:> > Hi Roman, > > That depends. > > If there's LLVM intrinsic for it, then any normal optimization pass could do it. > > In cttz's case it's mainly done in LoopIdiom pass. > Oh yes. Thank you! > > Unfortunately several of the instructions of the bit manipulation > extension don't seem to have an intrinsic already in LLVM. > > That will require to add some passes to the middle end. > > > Again, i'd say this is too broad of a question. > > If there is LLVM IR intrinsic, then you only need to lower it, > > and optionally ensure that middle-end passes form it from appropriate IR. > > > > If there isn't one, then yes, you'd want to match all the beautiful wilderness > > of the possible patterns that combine into that instruction. > > > > While it's really tempting to just add IR intrinsic for everything, > > please do note that a new intrinsic is completely opaque to the rest of LLVM. > > It does not magically get peep-hole folds, so those would need to be added, > > especially if you intend to form said intrinsic within the middle-end from IR. > > > > This may change some day when these peep-hole folds are auto-inferred, > > but that is not so nowadays. Really looking forward to that. > > > It would be definitely interesting. > > Anyway adding such complex instructions to the middle end seems material > for another patch. Unless things change in the meantime. > > For now we can provide a lower level optimization of smaller bit > manipulation patterns. > > But I'll definitely look into adding those passes as they would provide > much more optimization.I'm not sure what you mean by "more passes" in the reply. If there is no matching instruction/intrinsic, then i'm not sure how a pass would help. *Please* do note my comment about adding new instructions/intrinsics. While it's not and immovable obstacle, it by no means should be treated lightly. If you want to add new LLVM IR instruction/intrinsic, with intention of actually producing it from other instructions in middle-end (as opposed to just lowering it from compiler front-end, or not producing it in middle-end), you must also consider how said new IR instruction/intrinsic will affect all other optimization passes, and *that* cost *is* high. E.g. if you add 'andn', you then need to find every fold that would look for and(not(y), x) or and(x, not(y)) and teach it about 'andn'. Things will be more fun with more complex patterns :)> Many thanks. > > PaoloRoman
paolo via llvm-dev
2019-Aug-28 17:08 UTC
[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions
Hi Roman, following from a similar discussion that started on Phabricator: https://reviews.llvm.org/D66479 I'd like to re-elaborate my answer and change a bit the scope of the question. Regardless of the user interface that the bit manipulation patch provides to the user (e.g. frontend intrinsics, C code...) the C or LLVM IR implementation of some of the instructions from the bit manipulation proposal cannot (as far as I know) be patternmatched directly with RISCVISD nodes because they span over multiple basic blocks. For this reason we need to implement idiom recognition in the middle end and, if the pattern matches, emit an LLVM intrinsic, like LLVM does for ctlz and cttz. Than we would be able to select such instruction. On 15/08/2019 11:20, Roman Lebedev wrote:> On Thu, Aug 15, 2019 at 12:41 PM paolo <paolo.savini at embecosm.com> wrote: >> Hi Roman, >>> That depends. >>> If there's LLVM intrinsic for it, then any normal optimization pass could do it. >>> In cttz's case it's mainly done in LoopIdiom pass. >> Oh yes. Thank you! >> >> Unfortunately several of the instructions of the bit manipulation >> extension don't seem to have an intrinsic already in LLVM. >> >> That will require to add some passes to the middle end. >> >>> Again, i'd say this is too broad of a question. >>> If there is LLVM IR intrinsic, then you only need to lower it, >>> and optionally ensure that middle-end passes form it from appropriate IR. >>> >>> If there isn't one, then yes, you'd want to match all the beautiful wilderness >>> of the possible patterns that combine into that instruction. >>> >>> While it's really tempting to just add IR intrinsic for everything, >>> please do note that a new intrinsic is completely opaque to the rest of LLVM. >>> It does not magically get peep-hole folds, so those would need to be added, >>> especially if you intend to form said intrinsic within the middle-end from IR. >>> >>> This may change some day when these peep-hole folds are auto-inferred, >>> but that is not so nowadays. Really looking forward to that. >>> >> It would be definitely interesting. >> >> Anyway adding such complex instructions to the middle end seems material >> for another patch. Unless things change in the meantime. >> >> For now we can provide a lower level optimization of smaller bit >> manipulation patterns. >> >> But I'll definitely look into adding those passes as they would provide >> much more optimization. > I'm not sure what you mean by "more passes" in the reply. > If there is no matching instruction/intrinsic, then i'm not sure how a > pass would help. > > *Please* do note my comment about adding new instructions/intrinsics. > While it's not and immovable obstacle, it by no means should be treated lightly. > If you want to add new LLVM IR instruction/intrinsic, with intention of actually > producing it from other instructions in middle-end (as opposed to just lowering > it from compiler front-end, or not producing it in middle-end), > you must also consider how said new IR instruction/intrinsic will affect > all other optimization passes, and *that* cost *is* high.I see. But we might need to do it anyway. With caution. As you already pointed out earlier if we add an LLVM intrinsic that is lowered directly form a front end intrinsic, that would be impenetrable by middle end optimizations. An advantage would be that it would be lowered "safely" (with no mutual interference from optimization passes) into the corresponding asm, but that also means that any optimization from LLVM that could provide even better code (even faster or smaller than the expected bit manipulation asm) wouldn't be possible. All that being said, in the case a user wants to be sure that some specific bit manipulation asm instructions are selected without interference (think about for instance critical programs like C implementations of block ciphers for which both performance and security are crucial), how would you see to provide inline asm behind the interface functions (e.g. _rv32_andn): uint32_t _rv32_andn(uint32_t a, uint32_t b) { uint32_t res; __asm__ ("andn %0, %1, %2" : "=r"(res) : "r"(a), "r"(b)); return res; } as opposed to provide a chain of front-end intrinsic that are lowered to LLVM intrinsics and then asm? uint32_t _rv32_andn(uint32_t a, uint32_t b) { return __builtin_andn(a, b); } Nothing would change form the user's perspective, but I guess that would imply a difference in LLVM for ... maintainability?> E.g. if you add 'andn', you then need to find every fold that would look for > and(not(y), x) or and(x, not(y)) and teach it about 'andn'. > Things will be more fun with more complex patterns :) > >> Many thanks. >> >> Paolo > RomanPaolo