thr3ads.net - llvm dev - [llvm-dev] InlineSpiller - hoists leave virtual registers without live intervals [Nov 2019]

If this information is useful, please help other people find it:
Share via:

Wei Mi via llvm-dev

2019-Nov-05 01:06 UTC

[llvm-dev] InlineSpiller - hoists leave virtual registers without live intervals

On Mon, Nov 4, 2019 at 12:18 PM Quentin Colombet <qcolombet at apple.com>
wrote:
> Hi Alex,
>
> Thanks for reporting this.
> Wei worked on the hoisting optimization.
>
> @Wei, could you work with Alex to see what is the problem.
>
> Cheers,
> -Quentin
>
> > On Nov 3, 2019, at 5:20 AM, via llvm-dev <llvm-dev at
lists.llvm.org>
> wrote:
> >
> >       /// Optimizations after all the reg selections and spills are
done.
> >       void InlineSpiller::postOptimization() {
HSpiller.hoistAllSpills();
> > }
> >
> > Seems a problematic function to me, as hoistAllSpills() uses
> > TII.storeRegToStackSlot() to insert new spills.
> >
> > The problem is, TII.storeRegToStackSlot is allowed to create new
virtual
> > registers, which can not be allocated a range as this whole thing is
> called
> > _after_ all reg selection is complete.
> >
> > If I'm right in this, I do not see how the in-tree target
AMDGPU::SI has
> not
> > been affected, as it creates virtual registers in both load and store
> stack
> > operations in SIInstrInfo.cpp - which is where I confirmed to myself
> that it
> > was okay to do so. When compilation broke,
> >
>
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20130812/184331.htm
> > l further suggested that the intention is that you can... but I do not
> see
> > how a hoist can ever pass verification/compile correctly. Am I doing
> > something incorrectly here? Or are hoistable spills just that rare on
GPU
> > code that it's never come up?
>
Hi Alex,

I havn't noticed before that TII.storeRegToStackSlot is allowed to create
new virtual registers. I am surprised to know it is allowed to do that. I
am mostly working on x86 and TII.storeRegToStackSlot for x86 doesn't create
new register. As you said, if TII.storeRegToStackSlot is allowed to create
new virtual registers, that could be a problem. It is not exposed maybe
because of some later pass cover the problem -- maybe RegScavenger?

Are you aware of any other architecture other than AMDGPU on
which TII.storeRegToStackSlot creates new virtual registers? Could you
explain in which case new virtual registers will be created?

Thanks,
Wei.

> >
> > As a side note, compiler option "-disable-spill-hoist"
(DisableHoisting)
> is
> > referenced in exactly zero places, so if anyone in a similar situation
> finds
> > this post, maybe don't bother testing with that flag just yet. :)
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191104/58990152/attachment.html>

via llvm-dev

2019-Nov-05 02:43 UTC

head link

[llvm-dev] InlineSpiller - hoists leave virtual registers without live intervals

From: Wei Mi <wmi at google.com> > I havn't noticed before that TII.storeRegToStackSlot is allowed to
create new virtual registers. I am surprised to know it is allowed to do that. I
am mostly working on x86 and TII.storeRegToStackSlot for x86 doesn't create
new register. As you said, if TII.storeRegToStackSlot is allowed to create new
virtual registers, that could be a problem. It is not exposed maybe because of
some later pass cover the problem -- maybe RegScavenger?
> Are you aware of any other architecture other than AMDGPU on which
TII.storeRegToStackSlot creates new virtual registers? Could you explain in
which case new virtual registers will be created?
It is quite possible it's a largely forgotten feature from before
RegScavenger was fully featured, however being able to storeRegToStackSlot did
solve a major problem for me.

On the architecture I'm on (and many DSPs like it, I suspect), there's
many special function registers that control various aspects of the CPU. These
registers can be pushed, popped, and moved to/from GPRs. The issue is that the
architecture has no "move from SFR to stack + offset" command, which
LLVM requires for stack-frame store/loads. They must go through a GPR
intermediate. Even solving the frame address would not avoid this issue, as
there is no "move from SFR to [address]" either, they must either be
directly pushed/popped from stack, or go through an intermediate.

Implementing "getLargestLegalSuperClass" to return a RegisterClass of
(GPRs + SFRs) goes a long way towards avoiding these difficult spills, as
they'll often be assigned a GPR over call-clobbering etc, but unfortunately
AFAIK there's no way to tell the register allocator that spills must be from
the GPR side of that union. Sometimes, it will request that the SFR gets
spilled, and so in those instances you ultimately need a temporary in
storeRegToStackSlot. One option would be a pseudo, followed by a pass using
RegScavenger to provide the intermediate - but I would expect the codegen to be
worse than generating the intermediate in storeRegToStackSlot, if supported.

Really though, this is all just a workaround to a problem. What's really
desired/wanted, is just a way to say that SFRs must be spilled to GPRs, a
"getSpillRegClass" if you will. This seems a missing piece/problem in
general, whilst trying to find out how to address the problem I came across at
least one question to llvm-dev on how to do it, unanswered iirc.

AMDGPU also seemingly has an extensive/complex workaround in
"EnableSpillVGPRToAGPR", implemented in SILowerSGPRSpills, which if I
understand it correctly, attempts to simplify similar two-stage spills
post-regalloc by eliminating unnecessary frame indices. I'm not familiar
with the architecture and may be wrong on the constraints they're working
around, but on glancing it over, it seems to me that if a hypothetical
"spillRegClass()" could be expressed, AMDGPU too would not need a
temporary to be allocated in storeRegToStackSlot, and could also do away with
much of SILowerSGPRSpills as well.

	/// Special case of eliminateFrameIndex. Returns true if the SGPR was spilled
to
	/// a VGPR and the stack slot can be safely eliminated when all other users are
	/// handled.
	bool SIRegisterInfo::eliminateSGPRToVGPRSpillFrameIndex

Please do let me know if I've missed a better way at addressing this,
perhaps there is just the function I'm looking for already.

Thank you,
Alex

Wei Mi via llvm-dev

2019-Nov-05 19:41 UTC

head link

[llvm-dev] InlineSpiller - hoists leave virtual registers without live intervals

On Mon, Nov 4, 2019 at 6:41 PM <alex.davies at iinet.net.au> wrote:
> From: Wei Mi <wmi at google.com>
> > I havn't noticed before that TII.storeRegToStackSlot is allowed to
> create new virtual registers. I am surprised to know it is allowed to do
> that. I am mostly working on x86 and TII.storeRegToStackSlot for x86
> doesn't create new register. As you said, if TII.storeRegToStackSlot is
> allowed to create new virtual registers, that could be a problem. It is not
> exposed maybe because of some later pass cover the problem -- maybe
> RegScavenger?
>
> > Are you aware of any other architecture other than AMDGPU on which
> TII.storeRegToStackSlot creates new virtual registers? Could you explain in
> which case new virtual registers will be created?
>
> It is quite possible it's a largely forgotten feature from before
> RegScavenger was fully featured, however being able to storeRegToStackSlot
> did solve a major problem for me.
>
>I see. You expect that TII.storeRegToStackSlot could allow new temporary to
be generated to handle the irregular spill easily and the patch
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20130812/184331.html
said
it did some preparation for that, but looks like in reality no one actually
let TII.storeRegToStackSlot generate new temporary yet. I find RegScavenger
calls TII.storeRegToStackSlot as well when spill is needed to scavenge a
register. If TII.storeRegToStackSlot allows to generate new temporary, that
will be a problem for RegScavenger as well (Looks like RegScavenger doesn't
work in an iterative way)

So I guess currently most implementation in llvm assumes
TII.storeRegToStackSlot doesn't generate new virtual register usage?

> On the architecture I'm on (and many DSPs like it, I suspect),
there's
> many special function registers that control various aspects of the CPU.
> These registers can be pushed, popped, and moved to/from GPRs. The issue is
> that the architecture has no "move from SFR to stack + offset"
command,
> which LLVM requires for stack-frame store/loads. They must go through a GPR
> intermediate. Even solving the frame address would not avoid this issue, as
> there is no "move from SFR to [address]" either, they must either
be
> directly pushed/popped from stack, or go through an intermediate.
>
> Implementing "getLargestLegalSuperClass" to return a
RegisterClass of
> (GPRs + SFRs) goes a long way towards avoiding these difficult spills, as
> they'll often be assigned a GPR over call-clobbering etc, but
unfortunately
> AFAIK there's no way to tell the register allocator that spills must be
> from the GPR side of that union. Sometimes, it will request that the SFR
> gets spilled, and so in those instances you ultimately need a temporary in
> storeRegToStackSlot. One option would be a pseudo, followed by a pass using
> RegScavenger to provide the intermediate - but I would expect the codegen
> to be worse than generating the intermediate in storeRegToStackSlot, if
> supported.
>
> Really though, this is all just a workaround to a problem. What's
really
> desired/wanted, is just a way to say that SFRs must be spilled to GPRs, a
> "getSpillRegClass" if you will. This seems a missing
piece/problem in
> general, whilst trying to find out how to address the problem I came across
> at least one question to llvm-dev on how to do it, unanswered iirc.
>
> AMDGPU also seemingly has an extensive/complex workaround in
> "EnableSpillVGPRToAGPR", implemented in SILowerSGPRSpills, which
if I
> understand it correctly, attempts to simplify similar two-stage spills
> post-regalloc by eliminating unnecessary frame indices. I'm not
familiar
> with the architecture and may be wrong on the constraints they're
working
> around, but on glancing it over, it seems to me that if a hypothetical
> "spillRegClass()" could be expressed, AMDGPU too would not need a
temporary
> to be allocated in storeRegToStackSlot, and could also do away with much of
> SILowerSGPRSpills as well.
>
>         /// Special case of eliminateFrameIndex. Returns true if the SGPR
> was spilled to
>         /// a VGPR and the stack slot can be safely eliminated when all
> other users are
>         /// handled.
>         bool SIRegisterInfo::eliminateSGPRToVGPRSpillFrameIndex
>
> Please do let me know if I've missed a better way at addressing this,
> perhaps there is just the function I'm looking for already.
>
Yes, EnableSpillVGPRToAGPR seems to work from your description. But I am
not familiar with GPU register allocation and have no experience about how
this kind of irregular spill is handled in llvm now generally.

Quentin, do you have suggestion about how such irregular spill should be
handled?

I added Mattias and Matthew to the thread as well because they are very
experienced on GPU register allocation.

Thanks,
Wei.

>
> Thank you,
> Alex
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191105/0bfbd037/attachment-0001.html>

llvm dev - Nov 2019 - InlineSpiller - hoists leave virtual registers without live intervals

[llvm-dev] InlineSpiller - hoists leave virtual registers without live intervals

[llvm-dev] InlineSpiller - hoists leave virtual registers without live intervals

[llvm-dev] InlineSpiller - hoists leave virtual registers without live intervals