thr3ads.net - llvm dev - [llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions [Aug 2019]

If this information is useful, please help other people find it:
Share via:

paolo via llvm-dev

2019-Aug-14 12:12 UTC

[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

Hi all,

I'm currently working on the implementation for LLVM of the RISCV Bit
Manipulation ISA extension described by Clifford Wolf in the following
presentation:

https://content.riscv.org/wp-content/uploads/2019/06/17.10-b_wolf.pdf

and the following document:

https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.90.pdf

The aim is to provide the intrinsic functions to the user in order to
implement code that is more optimal for bit manipulations on RISCV
targets, but also to provide automatic optimization by lowering simple
code patterns into optimized bit manipulation assembly,

    %neg = xor i32 %1, -1                    ----->      andn t0, t0, t1
    %and = and i32 %0, %neg

just in case the user wants such optimization but is not aware of all
the bits that can be optimized.


I'm dealing with the fact that it is pretty hard to select some patterns
of DAG nodes in order to replace them with an optimal machine equivalent
machine instruction.

Take for intsance the count leading zeros operation:


    uint32_t clz (uint32_t x)

    {

        for (int count = 0; count < 32; count++ ) {

            if ((x << count) < 0)

                return count;

        }

        return 32;

    }


It needs a loop to be performed and that makes it difficult to be
lowered because it goes through several basic blocks, and different
optimizations can easily compromise the pattern recognition.


What I'm wondering is, is there already any place in LLVM where complex
patterns like this are already lowered into single instructions? (e.g.:
clz, that is used quite often). Maybe at a higher level?


Another point of view that I've been suggested and that I'd like to
discuss is: does it make sense to implement such lowering for operations
that normally a user wouldn't implement from scratch when an intrinsic
function is already available for that?


Many thanks.

Paolo Savini

-------------- next part --------------
A non-text attachment was scrubbed...
Name: pEpkey.asc
Type: application/pgp-keys
Size: 2456 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190814/858b3f8f/attachment.key>

Tim Northover via llvm-dev

2019-Aug-14 12:30 UTC

head link

[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

Hi Paolo,

On Wed, 14 Aug 2019 at 13:13, paolo via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Take for intsance the count leading zeros operation:
The example implementation has a couple of problems (no uint32_t will
be negative, and any shift you'd think might turn a positive number
into a negative one is undefined behaviour).

But there is some code in lib/Transforms/Scalar/LoopIdiomRecognize.cpp
designed to spot loops that really are calculating ctlz etc and
replace them with the proper intrinsic call. The tests seem to give
some examples of the kind of thing it can see:
https://github.com/llvm/llvm-project/blob/master/llvm/test/Transforms/LoopIdiom/X86/ctlz.ll
> Another point of view that I've been suggested and that I'd like to
> discuss is: does it make sense to implement such lowering for operations
> that normally a user wouldn't implement from scratch when an intrinsic
> function is already available for that?
Probably not during CodeGen since that mostly works at the basic block
level, but in general yes (hence LoopIdiomRecognize)

Cheers.

Tim.

Roman Lebedev via llvm-dev

2019-Aug-14 12:31 UTC

head link

[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

On Wed, Aug 14, 2019 at 3:13 PM paolo via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> Hi all,Hi.
> I'm currently working on the implementation for LLVM of the RISCV Bit
> Manipulation ISA extension described by Clifford Wolf in the following
> presentation:
>
> https://content.riscv.org/wp-content/uploads/2019/06/17.10-b_wolf.pdf
>
> and the following document:
>
> https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.90.pdfNice!
> The aim is to provide the intrinsic functions to the user in order to
> implement code that is more optimal for bit manipulations on RISCV
> targets, but also to provide automatic optimization by lowering simple
> code patterns into optimized bit manipulation assembly,
>
>     %neg = xor i32 %1, -1                    ----->      andn t0, t0, t1
>     %and = and i32 %0, %neg
>
> just in case the user wants such optimization but is not aware of all
> the bits that can be optimized.
>
>
> I'm dealing with the fact that it is pretty hard to select some
patterns
> of DAG nodes in order to replace them with an optimal machine equivalent
> machine instruction.
>
> Take for intsance the count leading zeros operation:
>
>
>     uint32_t clz (uint32_t x)
>     {
>         for (int count = 0; count < 32; count++ ) {
>             if ((x << count) < 0)
>                 return count;
>         }
>         return 32;
>     }
>
>
> It needs a loop to be performed and that makes it difficult to be
> lowered because it goes through several basic blocks, and different
> optimizations can easily compromise the pattern recognition.You only want to lower LLVM IR @llvm.cttz intrinsic in *that* case.
> What I'm wondering is, is there already any place in LLVM where complex
> patterns like this are already lowered into single instructions? (e.g.:
> clz, that is used quite often). Maybe at a higher level?That depends.
If there's LLVM intrinsic for it, then any normal optimization pass could do
it.
In cttz's case it's mainly done in LoopIdiom pass.
> Another point of view that I've been suggested and that I'd like to
> discuss is: does it make sense to implement such lowering for operations
> that normally a user wouldn't implement from scratch when an intrinsic
> function is already available for that?Again, i'd say this is too broad of a question.
If there is LLVM IR intrinsic, then you only need to lower it,
and optionally ensure that middle-end passes form it from appropriate IR.

If there isn't one, then yes, you'd want to match all the beautiful
wilderness
of the possible patterns that combine into that instruction.

While it's really tempting to just add IR intrinsic for everything,
please do note that a new intrinsic is completely opaque to the rest of LLVM.
It does not magically get peep-hole folds, so those would need to be added,
especially if you intend to form said intrinsic within the middle-end from IR.

This may change some day when these peep-hole folds are auto-inferred,
but that is not so nowadays. Really looking forward to that.
> Many thanks.
>
> Paolo SaviniRoman.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

paolo via llvm-dev

2019-Aug-15 09:41 UTC

head link

[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

Hi Roman,> That depends.
> If there's LLVM intrinsic for it, then any normal optimization pass
could do it.
> In cttz's case it's mainly done in LoopIdiom pass.Oh yes. Thank you!

Unfortunately several of the instructions of the bit manipulation
extension don't seem to have an intrinsic already in LLVM.

That will require to add some passes to the middle end.
> Again, i'd say this is too broad of a question.
> If there is LLVM IR intrinsic, then you only need to lower it,
> and optionally ensure that middle-end passes form it from appropriate IR.
>
> If there isn't one, then yes, you'd want to match all the beautiful
wilderness
> of the possible patterns that combine into that instruction.
>
> While it's really tempting to just add IR intrinsic for everything,
> please do note that a new intrinsic is completely opaque to the rest of
LLVM.
> It does not magically get peep-hole folds, so those would need to be added,
> especially if you intend to form said intrinsic within the middle-end from
IR.
>
> This may change some day when these peep-hole folds are auto-inferred,
> but that is not so nowadays. Really looking forward to that.
>It would be definitely interesting.

Anyway adding such complex instructions to the middle end seems material
for another patch. Unless things change in the meantime.

For now we can provide a lower level optimization of smaller bit
manipulation patterns.

But I'll definitely look into adding those passes as they would provide
much more optimization.


Many thanks.

Paolo


-------------- next part --------------
A non-text attachment was scrubbed...
Name: pEpkey.asc
Type: application/pgp-keys
Size: 2456 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190815/8997a154/attachment.key>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Aug 2019 - [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

Possibly Parallel Threads