thr3ads.net - llvm dev - [llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions [Aug 2019]

If this information is useful, please help other people find it:
Share via:

paolo via llvm-dev

2019-Aug-15 09:41 UTC

[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

Hi Roman,> That depends.
> If there's LLVM intrinsic for it, then any normal optimization pass
could do it.
> In cttz's case it's mainly done in LoopIdiom pass.Oh yes. Thank you!

Unfortunately several of the instructions of the bit manipulation
extension don't seem to have an intrinsic already in LLVM.

That will require to add some passes to the middle end.
> Again, i'd say this is too broad of a question.
> If there is LLVM IR intrinsic, then you only need to lower it,
> and optionally ensure that middle-end passes form it from appropriate IR.
>
> If there isn't one, then yes, you'd want to match all the beautiful
wilderness
> of the possible patterns that combine into that instruction.
>
> While it's really tempting to just add IR intrinsic for everything,
> please do note that a new intrinsic is completely opaque to the rest of
LLVM.
> It does not magically get peep-hole folds, so those would need to be added,
> especially if you intend to form said intrinsic within the middle-end from
IR.
>
> This may change some day when these peep-hole folds are auto-inferred,
> but that is not so nowadays. Really looking forward to that.
>It would be definitely interesting.

Anyway adding such complex instructions to the middle end seems material
for another patch. Unless things change in the meantime.

For now we can provide a lower level optimization of smaller bit
manipulation patterns.

But I'll definitely look into adding those passes as they would provide
much more optimization.


Many thanks.

Paolo


-------------- next part --------------
A non-text attachment was scrubbed...
Name: pEpkey.asc
Type: application/pgp-keys
Size: 2456 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190815/8997a154/attachment.key>

Roman Lebedev via llvm-dev

2019-Aug-15 10:20 UTC

head link

[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

On Thu, Aug 15, 2019 at 12:41 PM paolo <paolo.savini at embecosm.com>
wrote:>
> Hi Roman,
> > That depends.
> > If there's LLVM intrinsic for it, then any normal optimization
pass could do it.
> > In cttz's case it's mainly done in LoopIdiom pass.
> Oh yes. Thank you!
>
> Unfortunately several of the instructions of the bit manipulation
> extension don't seem to have an intrinsic already in LLVM.
>
> That will require to add some passes to the middle end.
>
> > Again, i'd say this is too broad of a question.
> > If there is LLVM IR intrinsic, then you only need to lower it,
> > and optionally ensure that middle-end passes form it from appropriate
IR.
> >
> > If there isn't one, then yes, you'd want to match all the
beautiful wilderness
> > of the possible patterns that combine into that instruction.
> >
> > While it's really tempting to just add IR intrinsic for
everything,
> > please do note that a new intrinsic is completely opaque to the rest
of LLVM.
> > It does not magically get peep-hole folds, so those would need to be
added,
> > especially if you intend to form said intrinsic within the middle-end
from IR.
> >
> > This may change some day when these peep-hole folds are auto-inferred,
> > but that is not so nowadays. Really looking forward to that.
> >
> It would be definitely interesting.
>
> Anyway adding such complex instructions to the middle end seems material
> for another patch. Unless things change in the meantime.
>
> For now we can provide a lower level optimization of smaller bit
> manipulation patterns.
>
> But I'll definitely look into adding those passes as they would provide
> much more optimization.
I'm not sure what you mean by "more passes" in the reply.
If there is no matching instruction/intrinsic, then i'm not sure how a
pass would help.

*Please* do note my comment about adding new instructions/intrinsics.
While it's not and immovable obstacle, it by no means should be treated
lightly.
If you want to add new LLVM IR instruction/intrinsic, with intention of actually
producing it from other instructions in middle-end (as opposed to just lowering
it from compiler front-end, or not producing it in middle-end),
you must also consider how said new IR instruction/intrinsic will affect
all other optimization passes, and *that* cost *is* high.

E.g. if you add 'andn', you then need to find every fold that would look
for
and(not(y), x) or and(x, not(y)) and teach it about 'andn'.
Things will be more fun with more complex patterns :)
> Many thanks.
>
> PaoloRoman

paolo via llvm-dev

2019-Aug-28 17:08 UTC

head link

[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

Hi Roman,

following from a similar discussion that started on Phabricator:

https://reviews.llvm.org/D66479

I'd like to re-elaborate my answer and change a bit the scope of the
question.

Regardless of the user interface that the bit manipulation patch
provides to the user (e.g. frontend intrinsics, C code...) the C or LLVM
IR implementation of some of the instructions from the bit manipulation
proposal cannot (as far as I know) be patternmatched directly with
RISCVISD nodes because they span over multiple basic blocks.

For this reason we need to implement idiom recognition in the middle end
and, if the pattern matches, emit an LLVM intrinsic, like LLVM does for
ctlz and cttz. Than we would be able to select such instruction.

On 15/08/2019 11:20, Roman Lebedev wrote:> On Thu, Aug 15, 2019 at 12:41 PM paolo <paolo.savini at embecosm.com>
wrote:
>> Hi Roman,
>>> That depends.
>>> If there's LLVM intrinsic for it, then any normal optimization
pass could do it.
>>> In cttz's case it's mainly done in LoopIdiom pass.
>> Oh yes. Thank you!
>>
>> Unfortunately several of the instructions of the bit manipulation
>> extension don't seem to have an intrinsic already in LLVM.
>>
>> That will require to add some passes to the middle end.
>>
>>> Again, i'd say this is too broad of a question.
>>> If there is LLVM IR intrinsic, then you only need to lower it,
>>> and optionally ensure that middle-end passes form it from
appropriate IR.
>>>
>>> If there isn't one, then yes, you'd want to match all the
beautiful wilderness
>>> of the possible patterns that combine into that instruction.
>>>
>>> While it's really tempting to just add IR intrinsic for
everything,
>>> please do note that a new intrinsic is completely opaque to the
rest of LLVM.
>>> It does not magically get peep-hole folds, so those would need to
be added,
>>> especially if you intend to form said intrinsic within the
middle-end from IR.
>>>
>>> This may change some day when these peep-hole folds are
auto-inferred,
>>> but that is not so nowadays. Really looking forward to that.
>>>
>> It would be definitely interesting.
>>
>> Anyway adding such complex instructions to the middle end seems
material
>> for another patch. Unless things change in the meantime.
>>
>> For now we can provide a lower level optimization of smaller bit
>> manipulation patterns.
>>
>> But I'll definitely look into adding those passes as they would
provide
>> much more optimization.
> I'm not sure what you mean by "more passes" in the reply.
> If there is no matching instruction/intrinsic, then i'm not sure how a
> pass would help.
>
> *Please* do note my comment about adding new instructions/intrinsics.
> While it's not and immovable obstacle, it by no means should be treated
lightly.
> If you want to add new LLVM IR instruction/intrinsic, with intention of
actually
> producing it from other instructions in middle-end (as opposed to just
lowering
> it from compiler front-end, or not producing it in middle-end),
> you must also consider how said new IR instruction/intrinsic will affect
> all other optimization passes, and *that* cost *is* high.

I see. But we might need to do it anyway. With caution. As you already
pointed out earlier if we add an LLVM intrinsic that is lowered directly
form a front end intrinsic, that would be impenetrable by middle end
optimizations. An advantage would be that it would be lowered "safely"
(with no mutual interference from optimization passes) into the
corresponding asm, but that also means that any optimization from LLVM
that could provide even better code (even faster or smaller than the
expected bit manipulation asm) wouldn't be possible.

All that being said, in the case a user wants to be sure that some
specific bit manipulation asm instructions are selected without
interference (think about for instance critical programs like C
implementations of block ciphers for which both performance and security
are crucial), how would you see to provide inline asm behind the
interface functions (e.g. _rv32_andn):

uint32_t _rv32_andn(uint32_t a, uint32_t b) {

    uint32_t res;

    __asm__ ("andn %0, %1, %2" : "=r"(res) :
"r"(a), "r"(b));

    return res;

}

as opposed to provide a chain of front-end intrinsic that are lowered to
LLVM intrinsics and then asm?

uint32_t _rv32_andn(uint32_t a, uint32_t b) {

    return __builtin_andn(a, b);

}

Nothing would change form the user's perspective, but I guess that would
imply a difference in LLVM for ... maintainability?
> E.g. if you add 'andn', you then need to find every fold that would
look for
> and(not(y), x) or and(x, not(y)) and teach it about 'andn'.
> Things will be more fun with more complex patterns :)
>
>> Many thanks.
>>
>> Paolo
> Roman
Paolo

llvm dev - Aug 2019 - [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions