Am 31/08/20 um 14:10 schrieb David Greene:> Lorenzo Casalino via llvm-dev <llvm-dev at lists.llvm.org> writes: > >> Furthermore, after register allocation there is a non-negligible effort >> to properly annotate instructions which share the same output register... >> >> Concerning the usage of the live ranges to tie annotated instruction and >> intrinsic, I have some doubts: >> >> 1. After register allocation, since metadata intrinsics are skipped >> (otherwise, they would be involved in the register allocation >> process, increasing the register pressure), the instruction stream >> would present both virtual and physical registers, which I am not >> sure it is totally ok. > They would have to participate in register allocation.Should they? I mean: the register allocation "simply" creates a map (VirtReg -> PhysReg), and actual register re-writing takes place in a subsequent machine pass. So, we could avoid their partecipation in register allocation, reducing register pressure and spill/reload work. As a downside, we would have intrinsics with virtual registers as outputs, but it is not a problem, since they do not perform any real computation.> I think the only > downside would be an intrinsic that artificially extends the live range > of a value by using it past its true dead point, either because the use > really is the "last" one or because it fills a "hole" in the live range > that otherwise would exist (for example a use in one of the if-then-else > branches that would otherwise not exist). > > If the intrinsics really shadow "real" instructions then it should be > possible to place them such that this is not an issue; for example, you > could place them immediately before the "real" instruction.I do not think this would be possible: before register allocation, code is SSA form, thus the annotated instruction *must* preceeds the intrinsic annotating it. An alternative is to place the annotating intrinsic before the instruction who ends the specific live-range (not necessarely be an immediate predecessor). Just to point out a problem to cope with: instruction scheduling must be aware of this particular positioning of annotation intrinsics.> It's possible they could introduce extra spills and reloads, in that if > a value is spilled it would be reloaded before the intrinsic. If the > intrinsic were placed immediately before the "real" instruction then the > reload would very likely be re-used for the "real" instruction so this > is probably not an issue in practice.Yes, I agree Kind regards, -- Lorenzo
Lorenzo Casalino <lorenzo.casalino93 at gmail.com> writes:> Am 31/08/20 um 14:10 schrieb David Greene: >> Lorenzo Casalino via llvm-dev <llvm-dev at lists.llvm.org> writes: >> >>> Furthermore, after register allocation there is a non-negligible effort >>> to properly annotate instructions which share the same output register... >>> >>> Concerning the usage of the live ranges to tie annotated instruction and >>> intrinsic, I have some doubts: >>> >>> 1. After register allocation, since metadata intrinsics are skipped >>> (otherwise, they would be involved in the register allocation >>> process, increasing the register pressure), the instruction stream >>> would present both virtual and physical registers, which I am not >>> sure it is totally ok.>> They would have to participate in register allocation.> Should they? I mean: the register allocation "simply" creates a map > (VirtReg -> PhysReg), and actual register re-writing takes place in a > subsequent machine pass.Maybe they could be skipped? I don't know if there's any precedent for that.> So, we could avoid their partecipation in register allocation, > reducing register pressure and spill/reload work. As a downside, we > would have intrinsics with virtual registers as outputs, but it is not > a problem, since they do not perform any real computation.If we can get that to work, yes I guess having no-op intrinsics with virtual registers would be ok. I don't know how the backend post-RA would cope with that though. There might be lots of asserts that assume physical registers.>> If the intrinsics really shadow "real" instructions then it should be >> possible to place them such that this is not an issue; for example, you >> could place them immediately before the "real" instruction. > > I do not think this would be possible: before register allocation, code is > SSA form, thus the annotated instruction *must* preceeds the intrinsic > annotating it.Oh yes of course. Duh. :)> An alternative is to place the annotating intrinsic before the > instruction who ends the specific live-range (not necessarely be an > immediate predecessor).I'm not sure exactly what you mean, but it strikes me just now that if the intrinsic is connected to the target instruction via the target instruction's output value, then putting the intrinsic right after the target instruction should not have any live range issues, unless the target instruction were truly dead, in which case the intrinsic would keep it alive. But since the intrinsic would eventually go away, I assume we could eliminate the target instruction at the same time. If the target instruction output is used *somewhere* it has a live range and adding another use just after the def should not affect register allocation appreciably. It could of course affect spill choice heuristics like number of uses of a value but that's probably in the noise. It could, however, affect folding (e.g. mem operands) because a single use of a load would turn into two uses, preventing folding. It's not clear to me whether you would *want* folding in your use-case since you apparently need to do something special with the load anyway.> Just to point out a problem to cope with: instruction scheduling must be > aware of this particular positioning of annotation intrinsics.Probably true. This is a difficult problem, one I have dealt with. If you want to keep two instructions "close" during scheduling it is a real pain. ScheduleDAG has a concept for "glue" nodes but it's pretty hacky and difficult to maintain in the presence of upstream churn. My initial attempt to avoid the need for codegen metadata took this approach and it was quite infeasible. My second approach to hack in the information in other ways wasn't much more successful. :( I think we've uncovered a number of tricky issues when trying to encode metadata via intrinsics. To me, at least, they clearly point to the need for a first-class solution and I think you agree with that too. Chris also seemed to at least give tentative support to the idea. I wonder if we're at the point of drafting an initial RFC for review. -David
Am 08/09/20 um 17:57 schrieb David Greene:> Lorenzo Casalino <lorenzo.casalino93 at gmail.com> writes: > >> Am 31/08/20 um 14:10 schrieb David Greene: >>> Lorenzo Casalino via llvm-dev <llvm-dev at lists.llvm.org> writes: >>> >>>> Furthermore, after register allocation there is a non-negligible effort >>>> to properly annotate instructions which share the same output register... >>>> >>>> Concerning the usage of the live ranges to tie annotated instruction and >>>> intrinsic, I have some doubts: >>>> >>>> 1. After register allocation, since metadata intrinsics are skipped >>>> (otherwise, they would be involved in the register allocation >>>> process, increasing the register pressure), the instruction stream >>>> would present both virtual and physical registers, which I am not >>>> sure it is totally ok. >>> They would have to participate in register allocation. >> Should they? I mean: the register allocation "simply" creates a map >> (VirtReg -> PhysReg), and actual register re-writing takes place in a >> subsequent machine pass. > Maybe they could be skipped? I don't know if there's any precedent for > that.I think that they could be neglected, since they just carry information; there's no point in allocating physical registers for their unused output.>> So, we could avoid their partecipation in register allocation, >> reducing register pressure and spill/reload work. As a downside, we >> would have intrinsics with virtual registers as outputs, but it is not >> a problem, since they do not perform any real computation. > If we can get that to work, yes I guess having no-op intrinsics with > virtual registers would be ok. I don't know how the backend post-RA > would cope with that though. There might be lots of asserts that assume > physical registers.Yes, if I recall correctly, there are a lot of check of the type of register.>>> If the intrinsics really shadow "real" instructions then it should be >>> possible to place them such that this is not an issue; for example, you >>> could place them immediately before the "real" instruction. >> I do not think this would be possible: before register allocation, code is >> SSA form, thus the annotated instruction *must* preceeds the intrinsic >> annotating it. > Oh yes of course. Duh. :) > >> An alternative is to place the annotating intrinsic before the >> instruction who ends the specific live-range (not necessarely be an >> immediate predecessor). > I'm not sure exactly what you meanI mean, to avoid artificial extension of the live-range, place the annotating intrinsic (I) before the instruction (K) that kills the live-range (but the intrinsic (I) does not have to be an *immediate* predecessor of (K) in the instruction stream). For instance, assume to have the following SSA stream (I am using the ARM Thumb2 MIR since I've been working mainly on that backend): #i %res = t2ANDrr %src_1_i, %src_2_i ... #j %null = llvm.metadata %a, (some metadata) ... #l %c = t2STRi12 %res, %stack_slot_res Where instruction #l kills the live-range representing %res, and instructions #j is covered by the live-range of %res, which spans from #i to #l. Giving a total ordering to the stream of instructions, #i <= #j <= #l. As you can infer, intrinsic represented by instruction #j does not have to be immediate predecessor of #l (that is, there can exist an instruction #k such that #j < #k < #l). In such way, the live-range won't be extended (at least, in this trivial case...)> but it strikes me just now that if > the intrinsic is connected to the target instruction via the target > instruction's output value, then putting the intrinsic right after the > target instruction should not have any live range issues, unless the > target instruction were truly dead, in which case the intrinsic would > keep it alive. But since the intrinsic would eventually go away, I > assume we could eliminate the target instruction at the same time. > > If the target instruction output is used *somewhere* it has a live range > and adding another use just after the def should not affect register > allocation appreciably.Yes! :D> It could of course affect spill choice > heuristics like number of uses of a value but that's probably in the > noise. > > It could, however, affect folding (e.g. mem operands) because a single > use of a load would turn into two uses, preventing folding. It's not > clear to me whether you would *want* folding in your use-case since you > apparently need to do something special with the load anyway.Uhm...yes, folding requires particular attention; but, in my project, I avoided the problem by "disabling" folding, so I didn't care really much about that aspect.>> Just to point out a problem to cope with: instruction scheduling must be >> aware of this particular positioning of annotation intrinsics. > Probably true. This is a difficult problem, one I have dealt with. If > you want to keep two instructions "close" during scheduling it is a real > pain. ScheduleDAG has a concept for "glue" nodes but it's pretty hacky > and difficult to maintain in the presence of upstream churn. My initial > attempt to avoid the need for codegen metadata took this approach and it > was quite infeasible. My second approach to hack in the information in > other ways wasn't much more successful. :(It is just only an idea, but could MI Bundles be profitably employed?> I think we've uncovered a number of tricky issues when trying to encode > metadata via intrinsics. To me, at least, they clearly point to the > need for a first-class solution and I think you agree with that too. > Chris also seemed to at least give tentative support to the idea.Yep!> I wonder if we're at the point of drafting an initial RFC for review.Uh, this a good question. To be honest, it would the first time for me. For sure, we could start by pinpointing the main problems and challenges -- that we identified -- that the employment of intrinsics would face.> -David