paolo via llvm-dev
2019-Aug-14 12:12 UTC
[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions
Hi all, I'm currently working on the implementation for LLVM of the RISCV Bit Manipulation ISA extension described by Clifford Wolf in the following presentation: https://content.riscv.org/wp-content/uploads/2019/06/17.10-b_wolf.pdf and the following document: https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.90.pdf The aim is to provide the intrinsic functions to the user in order to implement code that is more optimal for bit manipulations on RISCV targets, but also to provide automatic optimization by lowering simple code patterns into optimized bit manipulation assembly, %neg = xor i32 %1, -1 -----> andn t0, t0, t1 %and = and i32 %0, %neg just in case the user wants such optimization but is not aware of all the bits that can be optimized. I'm dealing with the fact that it is pretty hard to select some patterns of DAG nodes in order to replace them with an optimal machine equivalent machine instruction. Take for intsance the count leading zeros operation: uint32_t clz (uint32_t x) { for (int count = 0; count < 32; count++ ) { if ((x << count) < 0) return count; } return 32; } It needs a loop to be performed and that makes it difficult to be lowered because it goes through several basic blocks, and different optimizations can easily compromise the pattern recognition. What I'm wondering is, is there already any place in LLVM where complex patterns like this are already lowered into single instructions? (e.g.: clz, that is used quite often). Maybe at a higher level? Another point of view that I've been suggested and that I'd like to discuss is: does it make sense to implement such lowering for operations that normally a user wouldn't implement from scratch when an intrinsic function is already available for that? Many thanks. Paolo Savini -------------- next part -------------- A non-text attachment was scrubbed... Name: pEpkey.asc Type: application/pgp-keys Size: 2456 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190814/858b3f8f/attachment.key>
Tim Northover via llvm-dev
2019-Aug-14 12:30 UTC
[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions
Hi Paolo, On Wed, 14 Aug 2019 at 13:13, paolo via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Take for intsance the count leading zeros operation:The example implementation has a couple of problems (no uint32_t will be negative, and any shift you'd think might turn a positive number into a negative one is undefined behaviour). But there is some code in lib/Transforms/Scalar/LoopIdiomRecognize.cpp designed to spot loops that really are calculating ctlz etc and replace them with the proper intrinsic call. The tests seem to give some examples of the kind of thing it can see: https://github.com/llvm/llvm-project/blob/master/llvm/test/Transforms/LoopIdiom/X86/ctlz.ll> Another point of view that I've been suggested and that I'd like to > discuss is: does it make sense to implement such lowering for operations > that normally a user wouldn't implement from scratch when an intrinsic > function is already available for that?Probably not during CodeGen since that mostly works at the basic block level, but in general yes (hence LoopIdiomRecognize) Cheers. Tim.
Roman Lebedev via llvm-dev
2019-Aug-14 12:31 UTC
[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions
On Wed, Aug 14, 2019 at 3:13 PM paolo via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Hi all,Hi.> I'm currently working on the implementation for LLVM of the RISCV Bit > Manipulation ISA extension described by Clifford Wolf in the following > presentation: > > https://content.riscv.org/wp-content/uploads/2019/06/17.10-b_wolf.pdf > > and the following document: > > https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.90.pdfNice!> The aim is to provide the intrinsic functions to the user in order to > implement code that is more optimal for bit manipulations on RISCV > targets, but also to provide automatic optimization by lowering simple > code patterns into optimized bit manipulation assembly, > > %neg = xor i32 %1, -1 -----> andn t0, t0, t1 > %and = and i32 %0, %neg > > just in case the user wants such optimization but is not aware of all > the bits that can be optimized. > > > I'm dealing with the fact that it is pretty hard to select some patterns > of DAG nodes in order to replace them with an optimal machine equivalent > machine instruction. > > Take for intsance the count leading zeros operation: > > > uint32_t clz (uint32_t x) > { > for (int count = 0; count < 32; count++ ) { > if ((x << count) < 0) > return count; > } > return 32; > } > > > It needs a loop to be performed and that makes it difficult to be > lowered because it goes through several basic blocks, and different > optimizations can easily compromise the pattern recognition.You only want to lower LLVM IR @llvm.cttz intrinsic in *that* case.> What I'm wondering is, is there already any place in LLVM where complex > patterns like this are already lowered into single instructions? (e.g.: > clz, that is used quite often). Maybe at a higher level?That depends. If there's LLVM intrinsic for it, then any normal optimization pass could do it. In cttz's case it's mainly done in LoopIdiom pass.> Another point of view that I've been suggested and that I'd like to > discuss is: does it make sense to implement such lowering for operations > that normally a user wouldn't implement from scratch when an intrinsic > function is already available for that?Again, i'd say this is too broad of a question. If there is LLVM IR intrinsic, then you only need to lower it, and optionally ensure that middle-end passes form it from appropriate IR. If there isn't one, then yes, you'd want to match all the beautiful wilderness of the possible patterns that combine into that instruction. While it's really tempting to just add IR intrinsic for everything, please do note that a new intrinsic is completely opaque to the rest of LLVM. It does not magically get peep-hole folds, so those would need to be added, especially if you intend to form said intrinsic within the middle-end from IR. This may change some day when these peep-hole folds are auto-inferred, but that is not so nowadays. Really looking forward to that.> Many thanks. > > Paolo SaviniRoman.> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
paolo via llvm-dev
2019-Aug-15 09:41 UTC
[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions
Hi Roman,> That depends. > If there's LLVM intrinsic for it, then any normal optimization pass could do it. > In cttz's case it's mainly done in LoopIdiom pass.Oh yes. Thank you! Unfortunately several of the instructions of the bit manipulation extension don't seem to have an intrinsic already in LLVM. That will require to add some passes to the middle end.> Again, i'd say this is too broad of a question. > If there is LLVM IR intrinsic, then you only need to lower it, > and optionally ensure that middle-end passes form it from appropriate IR. > > If there isn't one, then yes, you'd want to match all the beautiful wilderness > of the possible patterns that combine into that instruction. > > While it's really tempting to just add IR intrinsic for everything, > please do note that a new intrinsic is completely opaque to the rest of LLVM. > It does not magically get peep-hole folds, so those would need to be added, > especially if you intend to form said intrinsic within the middle-end from IR. > > This may change some day when these peep-hole folds are auto-inferred, > but that is not so nowadays. Really looking forward to that. >It would be definitely interesting. Anyway adding such complex instructions to the middle end seems material for another patch. Unless things change in the meantime. For now we can provide a lower level optimization of smaller bit manipulation patterns. But I'll definitely look into adding those passes as they would provide much more optimization. Many thanks. Paolo -------------- next part -------------- A non-text attachment was scrubbed... Name: pEpkey.asc Type: application/pgp-keys Size: 2456 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190815/8997a154/attachment.key>