thr3ads.net - llvm dev - [llvm-dev] globalisel: cross-bank constant propagation? [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Nicolai Hähnle via llvm-dev

2021-Mar-29 18:51 UTC

[llvm-dev] globalisel: cross-bank constant propagation?

On Mon, Mar 29, 2021 at 3:34 PM Jay Foad <jay.foad at gmail.com> wrote:
> On Mon, 29 Mar 2021 at 14:04, Matt Arsenault <arsenm2 at gmail.com>
wrote:
> > > On Mar 27, 2021, at 04:56, Jay Foad via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> > >
> > > Hi Nicolai!
> > >
> > > For simplicity our regbankselect says that all operands of VALU
> > > instructions have to go in vgprs. Moving some of them into sgprs
is
> > > left as an optimisation for a later pass. As you know there are
limits
> > > on //how many// operands of a VALU instruction can be sgprs or
> > > constants, which are not simple to express in terms of
alternative
> > > operand mappings.
> > >
> > > Thanks,
> > > Jay.
> >
> >
> > There are 2 issues:
> > 1. Current RegBankSelect does not consider the uses when selecting the
> bank. This is a general missing optimization
> > 2. For the AMDGPU case, I think we should have a post-regbankselect
> combiner for this. It’s often better to materialize constants for each bank
>
This sounds desirable, but how would this work on GMIR without interfering
in strange ways with pattern matching?

The challenge I see is: due to the constant bus limitation, there are
situations in which such a combiner has to make a choice as to which
operand to combine / eliminate the COPY. If the choice is arbitrary, it may
sometimes inhibit later pattern matching.

Related: are we clear on what is legal for GMIR? If G_ADD of regbank sgpr
into regbank vgpr is supposed to be legal, does it then have to satisfy the
constant bus limitation? If so, why are we still calling it G_ADD instead
of V_ADD?


> >
> > I don’t think we actually want to have to look through copies, and the
> places we do are just working around the status quo.
> >
> > The folding SGPR/constants into instructions should be a new and
> improved version of SIFoldOperands. I think optimizing this is beyond the
> scope of what RegBankSelect and selection patterns. Far too much code would
> need to be taught to respect and preserve the constant bus limitation
> otherwise, so that’s why everything uses VGPRs.
>
> I can understand leaving it to a later pass to fold //sgprs or
> constants// into an instruction. What I can't understand is how you do
> the same kind of thing for more complex selection patterns like:
>
>   t:sgpr = G_ADD y:sgpr, z:sgpr
>   t':vgpr = COPY t:sgpr
>   r:vgpr = G_ADD x:vgpt, t':vgpr
>
> How can we select v_add3_u32 from this? I can only think of two options:
>
> 1. Select s_add and v_add and leave it to a later pass to combine
> them. This seems to be giving up on doing decent pattern-based
> instruction selection.
> 2. Match it in the instruction selector, using a pattern that
> (explicitly or implicitly) looks through the cross-bank copy. But then
> you're back to the problem that two of the inputs are sgprs, which may
> or may not be valid according to complex operand restrictions.
>
The SelectionDAG pattern for v_add3 and friends already checks this using a
C++ code fragment, doesn't it?

Cheers,
Nicolai

>
> Thanks,
> Jay.
>

-- 
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210329/0bdd8d63/attachment.html>

Jay Foad via llvm-dev

2021-Mar-30 10:32 UTC

head link

[llvm-dev] globalisel: cross-bank constant propagation?

On Mon, 29 Mar 2021 at 19:51, Nicolai Hähnle <nhaehnle at gmail.com>
wrote:> On Mon, Mar 29, 2021 at 3:34 PM Jay Foad <jay.foad at gmail.com>
wrote:
>> On Mon, 29 Mar 2021 at 14:04, Matt Arsenault <arsenm2 at
gmail.com> wrote:
>> > I don’t think we actually want to have to look through copies, and
the places we do are just working around the status quo.
>> >
>> > The folding SGPR/constants into instructions should be a new and
improved version of SIFoldOperands. I think optimizing this is beyond the scope
of what RegBankSelect and selection patterns. Far too much code would need to be
taught to respect and preserve the constant bus limitation otherwise, so that’s
why everything uses VGPRs.
>>
>> I can understand leaving it to a later pass to fold //sgprs or
>> constants// into an instruction. What I can't understand is how you
do
>> the same kind of thing for more complex selection patterns like:
>>
>>   t:sgpr = G_ADD y:sgpr, z:sgpr
>>   t':vgpr = COPY t:sgpr
>>   r:vgpr = G_ADD x:vgpt, t':vgpr
>>
>> How can we select v_add3_u32 from this? I can only think of two
options:
>>
>> 1. Select s_add and v_add and leave it to a later pass to combine
>> them. This seems to be giving up on doing decent pattern-based
>> instruction selection.
>> 2. Match it in the instruction selector, using a pattern that
>> (explicitly or implicitly) looks through the cross-bank copy. But then
>> you're back to the problem that two of the inputs are sgprs, which
may
>> or may not be valid according to complex operand restrictions.
>
> The SelectionDAG pattern for v_add3 and friends already checks this using a
C++ code fragment, doesn't it?
Yes but I had always assumed that was just a heuristic. Are we saying
it's required for correctness? Actually I'm confused about the whole
concept of checking for constant bus violations at this stage.

In a normal compiler, the instruction selector would be allowed to
select any instruction that works, regardless of register classes, and
it would be the register allocator's job to copy the input values into
suitable input registers. So given this GMIR:

  t:sgpr = G_ADD y:sgpr, z:sgpr
  t':vgpr = COPY t:sgpr
  r:vgpr = G_ADD x:vgpt, t':vgpr

I would naively hope that the instruction selector could select this
without worrying about register banks or constant bus restrictions:

  %4:vgpr_32 = V_ADD3_U32 %0:vgpr_32, %1:vgpr_32, %2:vgpr_32

Then optionally SIFoldOperands (which does know about constant bus
restrictions) could modify some of the input register classes to take
sgprs instead.

Then the register allocator would do its job, inserting sgpr-to-vgpr
copies as and where necessary.

Jay.

llvm dev - Mar 2021 - globalisel: cross-bank constant propagation?

[llvm-dev] globalisel: cross-bank constant propagation?

[llvm-dev] globalisel: cross-bank constant propagation?