thr3ads.net - llvm dev - [llvm-dev] Correct modelling of instructions with types smaller than the register class [Mar 2020]

If this information is useful, please help other people find it:
Share via:

Dominik Montada via llvm-dev

2020-Mar-02 08:44 UTC

[llvm-dev] Correct modelling of instructions with types smaller than the register class

Hi Quentin,

thank you for the reply! This clears up a lot of the questions I was 
having. It seems like we should definitely invest some time in rewriting 
some of our legalization rules then! I also posted some questions 
further down below. I would appreciate getting your opinion on them.
> Hi Dominik,
>
> I’ll do a brief reply here and if you want more information we can talk
further :).
>
>> On Feb 27, 2020, at 5:40 AM, Dominik Montada <dominik.montada at
hightec-rt.com> wrote:
>>
>> Hi Quentin, Hi Amara,
>>
>> I was following your discussion on D75086 regarding declaring types as
legal even if they are smaller than the actual register class (e.g. s16 and
gpr32). We are working on a backend which only has 32 and 64-bit registers and
we recently had a problem regarding exactly this where we had to declare
G_UNMERGE_VALUES and G_MERGE_VALUES with a smaller type of <s32 as legal,
even though we don't have a register class that matches and instruction
selection implicitly sign extends them to s32. The reasoning was that for
example in the case of G_UNMERGE_VALUES, the legalizer widens the source type
when widening the destination type, which we don't want (as the big type
then doesn't fit our registers anymore). So we decided to only make sure
that the bigger type matches a register class and simply allowed smaller types
<s32.
>>
>> What I am gathering now from this discussion is that this is basically
an incorrect modelling and only works around another problem. Could you expand
on this in more detail?
> A few years ago we discussed the possibility and/or desire of allowing
registers to be larger than the actual legalized type or whether they should be
in sync with the legalized type.
>
> For instance, consider
> s8 = G_AND s8, s8
>
> If your target supports `s32 = G_AND` then allowing registers to be larger
than legalized type would yield that `s8 = G_AND s8, s8` is legal (and really
any type smaller than or equal to s32 would be legal for that instruction).
>
> Then the question was how do we represent that: the relevant bits are the
low 8 bits, but the container is 32-bit.
> E.g.,
> s8(32) = G_AND s8(32), s8(32)
>
> We explored that a little bit and found that it was dangerous to carry
undefined bits around (the upper 24 bits in that case) and that there was no
real use case for this. Tim (Northover) might remember more details (CC’ed).
>
> Therefore, we decided that the legal types should exactly match what the
container (the register) will be. So in theory, if you’re not following that
model, you’re in violation of that and that’s where problems arise (and thus
D75086 to workaround them).
This leads me to another question. What would the correct modelling look 
like for special registers that cannot be read/written, like a carry-bit 
register? The G_UADDO/UADDE instructions have an s1 carry and our target 
only supports reading this special register, but not writing to it (at 
least not directly and without doing some hacks). We found this 
especially hard to model, since GlobalISel doesn't really have a notion 
of special registers in generic instructions. In the end we went with an 
approach of defining a pseudo-register-class, which is not allocatable 
and copyable and emit a COPY to/from our physical carry bit register. 
This generally works the way we want and the COPY is eliminated. However 
it is not eliminated when using O0.

I guess this probably has something to do with not defining an extra 
register bank and teaching our regbank info that such a carry-bit vreg 
should be assigned to this regbank? Right now it is simply assigned to 
the GPR bank, which seems to be the problem for O0.

Nevertheless I still dislike this approach as I don't want to rely on 
the COPY being eliminated. I would much rather not emit it at all, but 
this obviously means that I would throw away the carry vreg of 
G_UADDO/UADDE, potentially changing the semantics of the instruction.
>
>
>> I was thinking about implementing some custom legalization which uses
G_EXTRACT or shifts to get what I want, but G_EXTRACT extracts as many bits as
the destination type, so I would end up with the same problem again. Shifts are
also not optimal, since we have a target instruction available to extract n bits
from a given offset. Is it valid to emit target instructions during
legalization,
> Yes, that’s completely legal.
>
> The whole idea of GISel is that the backends should be able to insert
target specific instructions whenever they want. For instance, we use that
extensively in the lowering of calls directly in the IRTranslator, i.e., at the
very beginning of GISel.
That is very good to know! Thanks for the clarification.>
>> i.e. before instruction selection, or would the best modelling be to
just use the shifts and then use some pre-instruction-selection combiner to
merge them to the desired target instruction?
> That’s a question I cannot answer directly, it really depends on what are
the implementation trade-offs on your end.
>
> I have a personal preference for generic shifts as it is possible that
other generic optimizations can get rid of them. Now, if those shifts are too
difficult to combine away or if you know that they are not optimizable anyway,
you may want to go directly with the target specific instruction.
>
> Just keep in mind that whenever you use a target specific instruction, this
is just basically an opaque object that the generic optimizations have to deal
with.
Hm, good point. It should be a straightforward combine rule, so going 
with the shifts and the combiner seems to be a better approach at first 
glance. Thanks!


Best regards,

Dominik
>
> Cheers,
> -Quentin
>
>> Best regards,
>>
>> Dominik
>>
>> -- 
>> ----------------------------------------------------------------------
>> Dominik Montada                   Email: dominik.montada at
hightec-rt.com
>> HighTec EDV-Systeme GmbH          Phone: +49 681 92613 19
>> Europaallee 19                    Fax:   +49-681-92613-26
>> D-66113 Saarbrücken               WWW: http://www.hightec-rt.com
>>
>> Managing Director: Vera Strothmann
>> Register Court: Saarbrücken, HRB 10445, VAT ID: DE 138344222
>>
>> This e-mail may contain confidential and/or privileged information. If
>> you are not the intended recipient please notify the sender immediately
>> and destroy this e-mail. Any unauthorised copying, disclosure or
>> distribution of the material in this e-mail is strictly forbidden.
>> ---
>>
>-- 
----------------------------------------------------------------------
Dominik Montada                   Email: dominik.montada at hightec-rt.com
HighTec EDV-Systeme GmbH          Phone: +49 681 92613 19
Europaallee 19                    Fax:   +49-681-92613-26
D-66113 Saarbrücken               WWW: http://www.hightec-rt.com

Managing Director: Vera Strothmann
Register Court: Saarbrücken, HRB 10445, VAT ID: DE 138344222

This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient please notify the sender immediately
and destroy this e-mail. Any unauthorised copying, disclosure or
distribution of the material in this e-mail is strictly forbidden.
---

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5409 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200302/38fa8991/attachment.bin>

Quentin Colombet via llvm-dev

2020-Mar-09 22:31 UTC

head link

[llvm-dev] Correct modelling of instructions with types smaller than the register class

Hi Dominik,
> On Mar 2, 2020, at 12:44 AM, Dominik Montada <dominik.montada at
hightec-rt.com> wrote:
> 
> Hi Quentin,
> 
> thank you for the reply! This clears up a lot of the questions I was
having. It seems like we should definitely invest some time in rewriting some of
our legalization rules then! I also posted some questions further down below. I
would appreciate getting your opinion on them.
> 
>> Hi Dominik,
>> 
>> I’ll do a brief reply here and if you want more information we can talk
further :).
>> 
>>> On Feb 27, 2020, at 5:40 AM, Dominik Montada <dominik.montada at
hightec-rt.com> wrote:
>>> 
>>> Hi Quentin, Hi Amara,
>>> 
>>> I was following your discussion on D75086 regarding declaring types
as legal even if they are smaller than the actual register class (e.g. s16 and
gpr32). We are working on a backend which only has 32 and 64-bit registers and
we recently had a problem regarding exactly this where we had to declare
G_UNMERGE_VALUES and G_MERGE_VALUES with a smaller type of <s32 as legal,
even though we don't have a register class that matches and instruction
selection implicitly sign extends them to s32. The reasoning was that for
example in the case of G_UNMERGE_VALUES, the legalizer widens the source type
when widening the destination type, which we don't want (as the big type
then doesn't fit our registers anymore). So we decided to only make sure
that the bigger type matches a register class and simply allowed smaller types
<s32.
>>> 
>>> What I am gathering now from this discussion is that this is
basically an incorrect modelling and only works around another problem. Could
you expand on this in more detail?
>> A few years ago we discussed the possibility and/or desire of allowing
registers to be larger than the actual legalized type or whether they should be
in sync with the legalized type.
>> 
>> For instance, consider
>> s8 = G_AND s8, s8
>> 
>> If your target supports `s32 = G_AND` then allowing registers to be
larger than legalized type would yield that `s8 = G_AND s8, s8` is legal (and
really any type smaller than or equal to s32 would be legal for that
instruction).
>> 
>> Then the question was how do we represent that: the relevant bits are
the low 8 bits, but the container is 32-bit.
>> E.g.,
>> s8(32) = G_AND s8(32), s8(32)
>> 
>> We explored that a little bit and found that it was dangerous to carry
undefined bits around (the upper 24 bits in that case) and that there was no
real use case for this. Tim (Northover) might remember more details (CC’ed).
>> 
>> Therefore, we decided that the legal types should exactly match what
the container (the register) will be. So in theory, if you’re not following that
model, you’re in violation of that and that’s where problems arise (and thus
D75086 to workaround them).
> 
> This leads me to another question. What would the correct modelling look
like for special registers that cannot be read/written, like a carry-bit
register? The G_UADDO/UADDE instructions have an s1 carry and our target only
supports reading this special register, but not writing to it (at least not
directly and without doing some hacks).
I would expect that you could create a register class and register bank for the
special register. That way you have something to map to when you do register
bank select.
The register class could be "not allocatable” (see CCR in AArch64 for
instance).
Note, that if that class is unallocable, you’ll have to manage manually any
overlapping that could arise.
E.g.,
s32 ..., s1 carry1 = G_UADDO
s32 ..., s1 carry2 = G_UADDO
s32 extended_carry1 = zext s1 carry1 <— carry1 will be destroyed by the
second G_UADDO, you’ll have to fix the schedule or copy the value manually.
s32 extended_carry2 = zext s1 carry2
icmp s32 extended_carry1, extended_carry2

Then when you do instruction selection, just map the register to the physical
register. (You may want to do the fix-ups I mentioned here.)
>  We found this especially hard to model, since GlobalISel doesn't
really have a notion of special registers in generic instructions. In the end we
went with an approach of defining a pseudo-register-class, which is not
allocatable and copyable and emit a COPY to/from our physical carry bit
register. This generally works the way we want and the COPY is eliminated.
However it is not eliminated when using O0.
> 
> I guess this probably has something to do with not defining an extra
register bank and teaching our regbank info that such a carry-bit vreg should be
assigned to this regbank? Right now it is simply assigned to the GPR bank, which
seems to be the problem for O0.
Yeah, I think you’d want a different reg bank here. Like you said this is a
problem if the copies don't get optimized away.
> 
> Nevertheless I still dislike this approach as I don't want to rely on
the COPY being eliminated. I would much rather not emit it at all, but this
obviously means that I would throw away the carry vreg of G_UADDO/UADDE,
potentially changing the semantics of the instruction.
Generally speaking, I think you would need a copy (or zext/sext) to move the
value in general purpose land (like in my example with zext). If you don’t have
such use, the copy wouldn’t be emitted and there is nothing to do.

To be honest, I think the situation is complicated today because GISel lack the
fix-up that I mentioned. I am actually surprised that didn’t come up as an issue
yet (x86 is full of these flags). Going forward, I think it should just come
with the framework: you assign a physreg selection time, GISel copies it
around/reschedule appropriately to preserve the value. There’s a lot of wishful
thinking here x).

Cheers,
-Quentin
> 
>> 
>> 
>>> I was thinking about implementing some custom legalization which
uses G_EXTRACT or shifts to get what I want, but G_EXTRACT extracts as many bits
as the destination type, so I would end up with the same problem again. Shifts
are also not optimal, since we have a target instruction available to extract n
bits from a given offset. Is it valid to emit target instructions during
legalization,
>> Yes, that’s completely legal.
>> 
>> The whole idea of GISel is that the backends should be able to insert
target specific instructions whenever they want. For instance, we use that
extensively in the lowering of calls directly in the IRTranslator, i.e., at the
very beginning of GISel.
> That is very good to know! Thanks for the clarification.
>> 
>>> i.e. before instruction selection, or would the best modelling be
to just use the shifts and then use some pre-instruction-selection combiner to
merge them to the desired target instruction?
>> That’s a question I cannot answer directly, it really depends on what
are the implementation trade-offs on your end.
>> 
>> I have a personal preference for generic shifts as it is possible that
other generic optimizations can get rid of them. Now, if those shifts are too
difficult to combine away or if you know that they are not optimizable anyway,
you may want to go directly with the target specific instruction.
>> 
>> Just keep in mind that whenever you use a target specific instruction,
this is just basically an opaque object that the generic optimizations have to
deal with.
> 
> Hm, good point. It should be a straightforward combine rule, so going with
the shifts and the combiner seems to be a better approach at first glance.
Thanks!
> 
> 
> Best regards,
> 
> Dominik
> 
>> 
>> Cheers,
>> -Quentin
>> 
>>> Best regards,
>>> 
>>> Dominik
>>> 
>>> -- 
>>>
----------------------------------------------------------------------
>>> Dominik Montada                   Email: dominik.montada at
hightec-rt.com
>>> HighTec EDV-Systeme GmbH          Phone: +49 681 92613 19
>>> Europaallee 19                    Fax:   +49-681-92613-26
>>> D-66113 Saarbrücken               WWW: http://www.hightec-rt.com
>>> 
>>> Managing Director: Vera Strothmann
>>> Register Court: Saarbrücken, HRB 10445, VAT ID: DE 138344222
>>> 
>>> This e-mail may contain confidential and/or privileged information.
If
>>> you are not the intended recipient please notify the sender
immediately
>>> and destroy this e-mail. Any unauthorised copying, disclosure or
>>> distribution of the material in this e-mail is strictly forbidden.
>>> ---
>>> 
>> 
> -- 
> ----------------------------------------------------------------------
> Dominik Montada                   Email: dominik.montada at hightec-rt.com
> HighTec EDV-Systeme GmbH          Phone: +49 681 92613 19
> Europaallee 19                    Fax:   +49-681-92613-26
> D-66113 Saarbrücken               WWW: http://www.hightec-rt.com
<http://www.hightec-rt.com/>
> 
> Managing Director: Vera Strothmann
> Register Court: Saarbrücken, HRB 10445, VAT ID: DE 138344222
> 
> This e-mail may contain confidential and/or privileged information. If
> you are not the intended recipient please notify the sender immediately
> and destroy this e-mail. Any unauthorised copying, disclosure or
> distribution of the material in this e-mail is strictly forbidden.
> ---
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200309/2b282ce7/attachment.html>

Dominik Montada via llvm-dev

2020-Mar-12 16:46 UTC

head link

[llvm-dev] Correct modelling of instructions with types smaller than the register class

Hi Quentin,

thank you for the reply! I have a couple more questions that came up 
when I tried to implement this today. I hope you can help me out with 
this again!

Am 09.03.20 um 23:31 schrieb Quentin Colombet:> I would expect that you could create a register class and register 
> bank for the special register. That way you have something to map to 
> when you do register bank select.
> The register class could be "not allocatable” (see CCR in AArch64 for 
> instance).
> Note, that if that class is unallocable, you’ll have to manage 
> manually any overlapping that could arise.
> E.g.,
> s32 ..., s1 carry1 = G_UADDO
> s32 ..., s1 carry2 = G_UADDO
> s32 extended_carry1 = zext s1 carry1 <— carry1 will be destroyed by 
> the second G_UADDO, you’ll have to fix the schedule or copy the value 
> manually.
> s32 extended_carry2 = zext s1 carry2
> icmp s32 extended_carry1, extended_carry2So before instruction selection we would need to check if there are any 
overlaps and if so try to fix them if possible or abort compilation if 
that is not possible?>
> Then when you do instruction selection, just map the register to the 
> physical register. (You may want to do the fix-ups I mentioned here.)
What exactly do you mean by "map the register to the physical
register"?
Do you mean that I should just emit a COPY from/to this physical 
register and constrain the virtual register to the non-allocatable 
register class? Because this actually gives me an assertion that the 
register class must be allocatable. Or do you mean to use the physical 
register in the target instruction instead of the virtual one (i.e. drop 
the usage of the virtual register and replace it with the physical 
register)? The target instruction uses the physical register implicitly 
by the way.

I also tried to take a look at how AArch64 implements G_UADDO/G_UADDE 
but it doesn't seem to fully implement selection for those? Maybe I just 
didn't read the code properly...
>
> I would expect that you could create a register class and register 
> bank for the special register. That way you have something to map to 
> when you do register bank select.
>I looked at the CC regbank of AArch64, however I didn't see any mappings 
using it in RegBankInfo. Is that something that AArch64 simply hasn't 
implemented or is there something going on that I'm not seeing. I mean, 
what is the point to have a register bank when it is not used during 
regbankselect? I would at least expect that for example G_UADDO would 
use this register bank for the s1 carry out.>
> Generally speaking, I think you would need a copy (or zext/sext) to 
> move the value in general purpose land (like in my example with zext). 
> If you don’t have such use, the copy wouldn’t be emitted and there is 
> nothing to do.
>
> To be honest, I think the situation is complicated today because GISel 
> lack the fix-up that I mentioned. I am actually surprised that didn’t 
> come up as an issue yet (x86 is full of these flags). Going forward, I 
> think it should just come with the framework: you assign a physreg 
> selection time, GISel copies it around/reschedule appropriately to 
> preserve the value. There’s a lot of wishful thinking here x).
>
> Cheers,
> -Quentin
>Best regards,

Dominik

-- 
----------------------------------------------------------------------
Dominik Montada                   Email: dominik.montada at hightec-rt.com
HighTec EDV-Systeme GmbH          Phone: +49 681 92613 19
Europaallee 19                    Fax:   +49-681-92613-26
D-66113 Saarbrücken               WWW: http://www.hightec-rt.com

Managing Director: Vera Strothmann
Register Court: Saarbrücken, HRB 10445, VAT ID: DE 138344222

This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient please notify the sender immediately
and destroy this e-mail. Any unauthorised copying, disclosure or
distribution of the material in this e-mail is strictly forbidden.
---

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5409 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200312/60c2d4ac/attachment-0001.bin>

llvm dev - Mar 2020 - Correct modelling of instructions with types smaller than the register class

[llvm-dev] Correct modelling of instructions with types smaller than the register class

[llvm-dev] Correct modelling of instructions with types smaller than the register class

[llvm-dev] Correct modelling of instructions with types smaller than the register class