Dominik Montada via llvm-dev
2020-Mar-02 08:44 UTC
[llvm-dev] Correct modelling of instructions with types smaller than the register class
Hi Quentin, thank you for the reply! This clears up a lot of the questions I was having. It seems like we should definitely invest some time in rewriting some of our legalization rules then! I also posted some questions further down below. I would appreciate getting your opinion on them.> Hi Dominik, > > I’ll do a brief reply here and if you want more information we can talk further :). > >> On Feb 27, 2020, at 5:40 AM, Dominik Montada <dominik.montada at hightec-rt.com> wrote: >> >> Hi Quentin, Hi Amara, >> >> I was following your discussion on D75086 regarding declaring types as legal even if they are smaller than the actual register class (e.g. s16 and gpr32). We are working on a backend which only has 32 and 64-bit registers and we recently had a problem regarding exactly this where we had to declare G_UNMERGE_VALUES and G_MERGE_VALUES with a smaller type of <s32 as legal, even though we don't have a register class that matches and instruction selection implicitly sign extends them to s32. The reasoning was that for example in the case of G_UNMERGE_VALUES, the legalizer widens the source type when widening the destination type, which we don't want (as the big type then doesn't fit our registers anymore). So we decided to only make sure that the bigger type matches a register class and simply allowed smaller types <s32. >> >> What I am gathering now from this discussion is that this is basically an incorrect modelling and only works around another problem. Could you expand on this in more detail? > A few years ago we discussed the possibility and/or desire of allowing registers to be larger than the actual legalized type or whether they should be in sync with the legalized type. > > For instance, consider > s8 = G_AND s8, s8 > > If your target supports `s32 = G_AND` then allowing registers to be larger than legalized type would yield that `s8 = G_AND s8, s8` is legal (and really any type smaller than or equal to s32 would be legal for that instruction). > > Then the question was how do we represent that: the relevant bits are the low 8 bits, but the container is 32-bit. > E.g., > s8(32) = G_AND s8(32), s8(32) > > We explored that a little bit and found that it was dangerous to carry undefined bits around (the upper 24 bits in that case) and that there was no real use case for this. Tim (Northover) might remember more details (CC’ed). > > Therefore, we decided that the legal types should exactly match what the container (the register) will be. So in theory, if you’re not following that model, you’re in violation of that and that’s where problems arise (and thus D75086 to workaround them).This leads me to another question. What would the correct modelling look like for special registers that cannot be read/written, like a carry-bit register? The G_UADDO/UADDE instructions have an s1 carry and our target only supports reading this special register, but not writing to it (at least not directly and without doing some hacks). We found this especially hard to model, since GlobalISel doesn't really have a notion of special registers in generic instructions. In the end we went with an approach of defining a pseudo-register-class, which is not allocatable and copyable and emit a COPY to/from our physical carry bit register. This generally works the way we want and the COPY is eliminated. However it is not eliminated when using O0. I guess this probably has something to do with not defining an extra register bank and teaching our regbank info that such a carry-bit vreg should be assigned to this regbank? Right now it is simply assigned to the GPR bank, which seems to be the problem for O0. Nevertheless I still dislike this approach as I don't want to rely on the COPY being eliminated. I would much rather not emit it at all, but this obviously means that I would throw away the carry vreg of G_UADDO/UADDE, potentially changing the semantics of the instruction.> > >> I was thinking about implementing some custom legalization which uses G_EXTRACT or shifts to get what I want, but G_EXTRACT extracts as many bits as the destination type, so I would end up with the same problem again. Shifts are also not optimal, since we have a target instruction available to extract n bits from a given offset. Is it valid to emit target instructions during legalization, > Yes, that’s completely legal. > > The whole idea of GISel is that the backends should be able to insert target specific instructions whenever they want. For instance, we use that extensively in the lowering of calls directly in the IRTranslator, i.e., at the very beginning of GISel.That is very good to know! Thanks for the clarification.> >> i.e. before instruction selection, or would the best modelling be to just use the shifts and then use some pre-instruction-selection combiner to merge them to the desired target instruction? > That’s a question I cannot answer directly, it really depends on what are the implementation trade-offs on your end. > > I have a personal preference for generic shifts as it is possible that other generic optimizations can get rid of them. Now, if those shifts are too difficult to combine away or if you know that they are not optimizable anyway, you may want to go directly with the target specific instruction. > > Just keep in mind that whenever you use a target specific instruction, this is just basically an opaque object that the generic optimizations have to deal with.Hm, good point. It should be a straightforward combine rule, so going with the shifts and the combiner seems to be a better approach at first glance. Thanks! Best regards, Dominik> > Cheers, > -Quentin > >> Best regards, >> >> Dominik >> >> -- >> ---------------------------------------------------------------------- >> Dominik Montada Email: dominik.montada at hightec-rt.com >> HighTec EDV-Systeme GmbH Phone: +49 681 92613 19 >> Europaallee 19 Fax: +49-681-92613-26 >> D-66113 Saarbrücken WWW: http://www.hightec-rt.com >> >> Managing Director: Vera Strothmann >> Register Court: Saarbrücken, HRB 10445, VAT ID: DE 138344222 >> >> This e-mail may contain confidential and/or privileged information. If >> you are not the intended recipient please notify the sender immediately >> and destroy this e-mail. Any unauthorised copying, disclosure or >> distribution of the material in this e-mail is strictly forbidden. >> --- >> >-- ---------------------------------------------------------------------- Dominik Montada Email: dominik.montada at hightec-rt.com HighTec EDV-Systeme GmbH Phone: +49 681 92613 19 Europaallee 19 Fax: +49-681-92613-26 D-66113 Saarbrücken WWW: http://www.hightec-rt.com Managing Director: Vera Strothmann Register Court: Saarbrücken, HRB 10445, VAT ID: DE 138344222 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. --- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5409 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200302/38fa8991/attachment.bin>
Quentin Colombet via llvm-dev
2020-Mar-09 22:31 UTC
[llvm-dev] Correct modelling of instructions with types smaller than the register class
Hi Dominik,> On Mar 2, 2020, at 12:44 AM, Dominik Montada <dominik.montada at hightec-rt.com> wrote: > > Hi Quentin, > > thank you for the reply! This clears up a lot of the questions I was having. It seems like we should definitely invest some time in rewriting some of our legalization rules then! I also posted some questions further down below. I would appreciate getting your opinion on them. > >> Hi Dominik, >> >> I’ll do a brief reply here and if you want more information we can talk further :). >> >>> On Feb 27, 2020, at 5:40 AM, Dominik Montada <dominik.montada at hightec-rt.com> wrote: >>> >>> Hi Quentin, Hi Amara, >>> >>> I was following your discussion on D75086 regarding declaring types as legal even if they are smaller than the actual register class (e.g. s16 and gpr32). We are working on a backend which only has 32 and 64-bit registers and we recently had a problem regarding exactly this where we had to declare G_UNMERGE_VALUES and G_MERGE_VALUES with a smaller type of <s32 as legal, even though we don't have a register class that matches and instruction selection implicitly sign extends them to s32. The reasoning was that for example in the case of G_UNMERGE_VALUES, the legalizer widens the source type when widening the destination type, which we don't want (as the big type then doesn't fit our registers anymore). So we decided to only make sure that the bigger type matches a register class and simply allowed smaller types <s32. >>> >>> What I am gathering now from this discussion is that this is basically an incorrect modelling and only works around another problem. Could you expand on this in more detail? >> A few years ago we discussed the possibility and/or desire of allowing registers to be larger than the actual legalized type or whether they should be in sync with the legalized type. >> >> For instance, consider >> s8 = G_AND s8, s8 >> >> If your target supports `s32 = G_AND` then allowing registers to be larger than legalized type would yield that `s8 = G_AND s8, s8` is legal (and really any type smaller than or equal to s32 would be legal for that instruction). >> >> Then the question was how do we represent that: the relevant bits are the low 8 bits, but the container is 32-bit. >> E.g., >> s8(32) = G_AND s8(32), s8(32) >> >> We explored that a little bit and found that it was dangerous to carry undefined bits around (the upper 24 bits in that case) and that there was no real use case for this. Tim (Northover) might remember more details (CC’ed). >> >> Therefore, we decided that the legal types should exactly match what the container (the register) will be. So in theory, if you’re not following that model, you’re in violation of that and that’s where problems arise (and thus D75086 to workaround them). > > This leads me to another question. What would the correct modelling look like for special registers that cannot be read/written, like a carry-bit register? The G_UADDO/UADDE instructions have an s1 carry and our target only supports reading this special register, but not writing to it (at least not directly and without doing some hacks).I would expect that you could create a register class and register bank for the special register. That way you have something to map to when you do register bank select. The register class could be "not allocatable” (see CCR in AArch64 for instance). Note, that if that class is unallocable, you’ll have to manage manually any overlapping that could arise. E.g., s32 ..., s1 carry1 = G_UADDO s32 ..., s1 carry2 = G_UADDO s32 extended_carry1 = zext s1 carry1 <— carry1 will be destroyed by the second G_UADDO, you’ll have to fix the schedule or copy the value manually. s32 extended_carry2 = zext s1 carry2 icmp s32 extended_carry1, extended_carry2 Then when you do instruction selection, just map the register to the physical register. (You may want to do the fix-ups I mentioned here.)> We found this especially hard to model, since GlobalISel doesn't really have a notion of special registers in generic instructions. In the end we went with an approach of defining a pseudo-register-class, which is not allocatable and copyable and emit a COPY to/from our physical carry bit register. This generally works the way we want and the COPY is eliminated. However it is not eliminated when using O0. > > I guess this probably has something to do with not defining an extra register bank and teaching our regbank info that such a carry-bit vreg should be assigned to this regbank? Right now it is simply assigned to the GPR bank, which seems to be the problem for O0.Yeah, I think you’d want a different reg bank here. Like you said this is a problem if the copies don't get optimized away.> > Nevertheless I still dislike this approach as I don't want to rely on the COPY being eliminated. I would much rather not emit it at all, but this obviously means that I would throw away the carry vreg of G_UADDO/UADDE, potentially changing the semantics of the instruction.Generally speaking, I think you would need a copy (or zext/sext) to move the value in general purpose land (like in my example with zext). If you don’t have such use, the copy wouldn’t be emitted and there is nothing to do. To be honest, I think the situation is complicated today because GISel lack the fix-up that I mentioned. I am actually surprised that didn’t come up as an issue yet (x86 is full of these flags). Going forward, I think it should just come with the framework: you assign a physreg selection time, GISel copies it around/reschedule appropriately to preserve the value. There’s a lot of wishful thinking here x). Cheers, -Quentin> >> >> >>> I was thinking about implementing some custom legalization which uses G_EXTRACT or shifts to get what I want, but G_EXTRACT extracts as many bits as the destination type, so I would end up with the same problem again. Shifts are also not optimal, since we have a target instruction available to extract n bits from a given offset. Is it valid to emit target instructions during legalization, >> Yes, that’s completely legal. >> >> The whole idea of GISel is that the backends should be able to insert target specific instructions whenever they want. For instance, we use that extensively in the lowering of calls directly in the IRTranslator, i.e., at the very beginning of GISel. > That is very good to know! Thanks for the clarification. >> >>> i.e. before instruction selection, or would the best modelling be to just use the shifts and then use some pre-instruction-selection combiner to merge them to the desired target instruction? >> That’s a question I cannot answer directly, it really depends on what are the implementation trade-offs on your end. >> >> I have a personal preference for generic shifts as it is possible that other generic optimizations can get rid of them. Now, if those shifts are too difficult to combine away or if you know that they are not optimizable anyway, you may want to go directly with the target specific instruction. >> >> Just keep in mind that whenever you use a target specific instruction, this is just basically an opaque object that the generic optimizations have to deal with. > > Hm, good point. It should be a straightforward combine rule, so going with the shifts and the combiner seems to be a better approach at first glance. Thanks! > > > Best regards, > > Dominik > >> >> Cheers, >> -Quentin >> >>> Best regards, >>> >>> Dominik >>> >>> -- >>> ---------------------------------------------------------------------- >>> Dominik Montada Email: dominik.montada at hightec-rt.com >>> HighTec EDV-Systeme GmbH Phone: +49 681 92613 19 >>> Europaallee 19 Fax: +49-681-92613-26 >>> D-66113 Saarbrücken WWW: http://www.hightec-rt.com >>> >>> Managing Director: Vera Strothmann >>> Register Court: Saarbrücken, HRB 10445, VAT ID: DE 138344222 >>> >>> This e-mail may contain confidential and/or privileged information. If >>> you are not the intended recipient please notify the sender immediately >>> and destroy this e-mail. Any unauthorised copying, disclosure or >>> distribution of the material in this e-mail is strictly forbidden. >>> --- >>> >> > -- > ---------------------------------------------------------------------- > Dominik Montada Email: dominik.montada at hightec-rt.com > HighTec EDV-Systeme GmbH Phone: +49 681 92613 19 > Europaallee 19 Fax: +49-681-92613-26 > D-66113 Saarbrücken WWW: http://www.hightec-rt.com <http://www.hightec-rt.com/> > > Managing Director: Vera Strothmann > Register Court: Saarbrücken, HRB 10445, VAT ID: DE 138344222 > > This e-mail may contain confidential and/or privileged information. If > you are not the intended recipient please notify the sender immediately > and destroy this e-mail. Any unauthorised copying, disclosure or > distribution of the material in this e-mail is strictly forbidden. > ----------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200309/2b282ce7/attachment.html>
Dominik Montada via llvm-dev
2020-Mar-12 16:46 UTC
[llvm-dev] Correct modelling of instructions with types smaller than the register class
Hi Quentin, thank you for the reply! I have a couple more questions that came up when I tried to implement this today. I hope you can help me out with this again! Am 09.03.20 um 23:31 schrieb Quentin Colombet:> I would expect that you could create a register class and register > bank for the special register. That way you have something to map to > when you do register bank select. > The register class could be "not allocatable” (see CCR in AArch64 for > instance). > Note, that if that class is unallocable, you’ll have to manage > manually any overlapping that could arise. > E.g., > s32 ..., s1 carry1 = G_UADDO > s32 ..., s1 carry2 = G_UADDO > s32 extended_carry1 = zext s1 carry1 <— carry1 will be destroyed by > the second G_UADDO, you’ll have to fix the schedule or copy the value > manually. > s32 extended_carry2 = zext s1 carry2 > icmp s32 extended_carry1, extended_carry2So before instruction selection we would need to check if there are any overlaps and if so try to fix them if possible or abort compilation if that is not possible?> > Then when you do instruction selection, just map the register to the > physical register. (You may want to do the fix-ups I mentioned here.)What exactly do you mean by "map the register to the physical register"? Do you mean that I should just emit a COPY from/to this physical register and constrain the virtual register to the non-allocatable register class? Because this actually gives me an assertion that the register class must be allocatable. Or do you mean to use the physical register in the target instruction instead of the virtual one (i.e. drop the usage of the virtual register and replace it with the physical register)? The target instruction uses the physical register implicitly by the way. I also tried to take a look at how AArch64 implements G_UADDO/G_UADDE but it doesn't seem to fully implement selection for those? Maybe I just didn't read the code properly...> > I would expect that you could create a register class and register > bank for the special register. That way you have something to map to > when you do register bank select. >I looked at the CC regbank of AArch64, however I didn't see any mappings using it in RegBankInfo. Is that something that AArch64 simply hasn't implemented or is there something going on that I'm not seeing. I mean, what is the point to have a register bank when it is not used during regbankselect? I would at least expect that for example G_UADDO would use this register bank for the s1 carry out.> > Generally speaking, I think you would need a copy (or zext/sext) to > move the value in general purpose land (like in my example with zext). > If you don’t have such use, the copy wouldn’t be emitted and there is > nothing to do. > > To be honest, I think the situation is complicated today because GISel > lack the fix-up that I mentioned. I am actually surprised that didn’t > come up as an issue yet (x86 is full of these flags). Going forward, I > think it should just come with the framework: you assign a physreg > selection time, GISel copies it around/reschedule appropriately to > preserve the value. There’s a lot of wishful thinking here x). > > Cheers, > -Quentin >Best regards, Dominik -- ---------------------------------------------------------------------- Dominik Montada Email: dominik.montada at hightec-rt.com HighTec EDV-Systeme GmbH Phone: +49 681 92613 19 Europaallee 19 Fax: +49-681-92613-26 D-66113 Saarbrücken WWW: http://www.hightec-rt.com Managing Director: Vera Strothmann Register Court: Saarbrücken, HRB 10445, VAT ID: DE 138344222 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. --- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5409 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200312/60c2d4ac/attachment-0001.bin>