thr3ads.net - llvm dev - [LLVMdev] Long-Term ISel Design [Mar 2011]

If this information is useful, please help other people find it:
Share via:

David A. Greene

2011-Mar-17 16:32 UTC

[LLVMdev] Long-Term ISel Design

Chris Lattner <clattner at apple.com> writes:
>> 1. We have special target-specific operators for certain shuffles in
X86,
>>   such as X86unpckl.
> It also eliminates a lot of fragility.  Before doing this, X86
> legalize would have to be very careful to specifically form shuffles
> that it knew isel would turn into (e.g.) unpck operations.  Now
> instead of forming specific carefully constructed shuffle masks (not
> making sure other code doesn't violate them) it can just directly form
> the X86ISD node.
Right.  What I've presented would reverse this.  Rather than making
Legalize have to know about what table-driven isel can and cannot do,
have table-driven isel run first, see what it can do and then leave
the rest for manual selection.

We would still keep the existing pre-table-driven-isel passes so we'd
still have a chance to do some cleanup before the main table-driven
isel.

Obviously a lot of details have to be worked out.
>> 2. Sometimes DAGs are legal in some contexts but not others and it is a
>>   pain to deal with.  A good example is VBROADCAST, where a
<0,0,0,0>
>>   shuffle is natively supported if the source vector is in memory.
>>   Otherwise it's not legal and manual lowering is required.  In
this
>>   case the legality check is doing the DAG match by hand, replicating
>>   what TableGen-produced code already does.
> Yes, this isn't good.  Instead, the shuffle should be legalized to
> something that takes a pointer (memory operand).  That means that X86
> isel would form the *fully legal* X86ISD node, and nothing would be
> able to break it and it could never fail to match.
Well, it dopesn't _have_to_ form an X86ISD node.  I don't do that now.
But it's fragile in the sense that no one else should mess with that
piece of the DAG.

But the real point is that in forming the X86ISD node currently, I'm
doing exaclty what the tblgen-generated code already does.  If the
shuffle doesn't take a memory operand, then I have to lower it to
something else.  Where I do that (before or after table-driven isel)
doesn't matter.  I do the same work either way.  But by doing it after I
avoid writing duplicate DAG matching code in the case where the operand
is in memory.
>> These two examples are related: we're duplicating functionality
manually
>> that's already available automatically.
>
> Not sure what you mean by this.
I mean that in legalize/lowering we're massaging the DAG to get it into
a state where tabel-driven isel can match it.  There is a lot of code
like this:

if (shuffle_is_MOVL)
  do_nothing_and_return

It's duplicating exactly the checks that the table-driven isel does
later.  In the VBROADCASTSS/D case, it's doing an entire DAG match
to check whether it's implementable with VBROADCASTSS/D.

Why not just let table-driven isel run first and take care of these
checks just once?  If something doesn't match, we then know it needs
manual lowering and selection.
>>                  legalize
>>                     |
>>                     V
>>        manual lowering (X86ISelLowering)
>>                     |
>>                     V
>>         manual isel (X86ISelDAGToDAG)
>>                     |
>>                     V
>>    table-driven isel (.td files/X86GenDAGISel)
>>                     |
>>                     V
>>      manual isel (some to-be-design piece)
>
> I'm not sure what you mean here.  Are you suggesting that these be
> completely separate passes over the dag?  Why do "manual isel"
and
> "table driven isel" as separate passes?  If they are interlaced,
then
> how is this different than what we already have?
No, not as separate passes.  Right now we have code like this in
X86ISelDAGToDAG:

X86DAGToDAGISel::Select(SDNode *Node) {
  ...
  switch (Opcode) {
    ...do a bunch of manual selection...
  }
  // If we get here we didn't select manually.
  SelectCode();  // Select via table-driven isel, abort if no match.
}

What I'm proposing is that we do this:

X86DAGToDAGISel::Select(SDNode *Node) {
  ...
  switch (Opcode) {
    ...do a bunch of manual selection, less than before...
  }

  // If we get here we didn't select manually.

  result = SelectCode();  // Select via table-driven isel.

  if (result_is_good()) {
    return;
  }

  switch (Opcode) {
    ...do a bunch of manual selection, some that used to be above and in
       legalize/lowering...
  }

  cannot_select_abort();
}
> You're saying that we do this on a node-by-node basis?
You mean on an SDNode basis?  I don't think so.  As I said, details have
to be worked out but I imagine we'd send the selection DAG to the
tblgen-generated code as we do now and any "leftover" bits would have
to be processed afterward by the manual selector.

Now, there is a phase ordering issue in that some of the
legalize/lowering code massages the tree so the tblgen stuff can make a
"good" match.  We probably still want to do that in some cases so we
keep that code where it is.  What I'm aiming at getting rid of is all of
the code that does:

if (this_is_already_matchable()) {
  return;
}
> The reason that codegen aborts on unselectable operations is that they
> are invalid and should not be formed.  Your example of vbroadcast is a
> great one: the X86ISD node for it *should not take a vector register
> input*.  If it does, then the X86ISD node is incorrectly defined.
What I'm saying is that there would be no X86ISD node.  If the pattern
is there, tblgen-produced code will match it.  If not, we have to lower
it manually and would just do what we'd have done in legalize/lowering
before, the difference being we'd now do it after table-driven isel
rather than before.

                                 -Dave

Chris Lattner

2011-Mar-18 20:19 UTC

head link

[LLVMdev] Long-Term ISel Design

On Mar 17, 2011, at 9:32 AM, David A. Greene wrote:> Chris Lattner <clattner at apple.com> writes:
>>> 1. We have special target-specific operators for certain shuffles
in X86,
>>>  such as X86unpckl.
> 
>> It also eliminates a lot of fragility.  Before doing this, X86
>> legalize would have to be very careful to specifically form shuffles
>> that it knew isel would turn into (e.g.) unpck operations.  Now
>> instead of forming specific carefully constructed shuffle masks (not
>> making sure other code doesn't violate them) it can just directly
form
>> the X86ISD node.
> 
> Right.  What I've presented would reverse this.  Rather than making
> Legalize have to know about what table-driven isel can and cannot do,
> have table-driven isel run first, see what it can do and then leave
> the rest for manual selection.
> 
> We would still keep the existing pre-table-driven-isel passes so we'd
> still have a chance to do some cleanup before the main table-driven
> isel.
> 
> Obviously a lot of details have to be worked out.
I'm not seeing how this is useful for shuffles.  Since tblgen doesn't
generate table based matching for *any* shuffles, all of the matching code would
end up as C++ code in X86ISelDagToDag, which would give us all of the problems
we had before by moving to X86ISD nodes.
>>> 2. Sometimes DAGs are legal in some contexts but not others and it
is a
>>>  pain to deal with.  A good example is VBROADCAST, where a
<0,0,0,0>
>>>  shuffle is natively supported if the source vector is in memory.
>>>  Otherwise it's not legal and manual lowering is required.  In
this
>>>  case the legality check is doing the DAG match by hand,
replicating
>>>  what TableGen-produced code already does.
> 
>> Yes, this isn't good.  Instead, the shuffle should be legalized to
>> something that takes a pointer (memory operand).  That means that X86
>> isel would form the *fully legal* X86ISD node, and nothing would be
>> able to break it and it could never fail to match.
> 
> Well, it dopesn't _have_to_ form an X86ISD node.  I don't do that
now.
> But it's fragile in the sense that no one else should mess with that
> piece of the DAG.
I don't consider that an acceptable approach.  There is no way to prevent
something from CSE'ing a load away and "breaking" the dag, or
moving an add between the nodes etc.  You're violating a design principle of
selection dags.
> But the real point is that in forming the X86ISD node currently, I'm
> doing exaclty what the tblgen-generated code already does.  If the
> shuffle doesn't take a memory operand, then I have to lower it to
> something else.  Where I do that (before or after table-driven isel)
> doesn't matter.  I do the same work either way.  But by doing it after
I
> avoid writing duplicate DAG matching code in the case where the operand
> is in memory.
I don't agree, and I don't see why this is as bad as you're saying. 
The code creating these nodes is already target specific.  You seem to be
objecting to this because it is easier to write .td files (which turn into
generated isel code) than it is to write legalize code in C++.  If that is the
problem you've identified, then why not make it possible to write legalize
code in .td files?
>>> These two examples are related: we're duplicating functionality
manually
>>> that's already available automatically.
>> 
>> Not sure what you mean by this.
> 
> I mean that in legalize/lowering we're massaging the DAG to get it into
> a state where tabel-driven isel can match it.  There is a lot of code
> like this:
> 
> if (shuffle_is_MOVL)
>  do_nothing_and_return
> 
> It's duplicating exactly the checks that the table-driven isel does
> later.  In the VBROADCASTSS/D case, it's doing an entire DAG match
> to check whether it's implementable with VBROADCASTSS/D.
> 
> Why not just let table-driven isel run first and take care of these
> checks just once?  If something doesn't match, we then know it needs
> manual lowering and selection.
I think that this is just because the current code is in a half converted state.
Bruno can say more, but the ultimate goal is to make ISD::SHUFFLE completely
illegal on X86, so you'd never have this sort of thing.
> 
>>>                 legalize
>>>                    |
>>>                    V
>>>       manual lowering (X86ISelLowering)
>>>                    |
>>>                    V
>>>        manual isel (X86ISelDAGToDAG)
>>>                    |
>>>                    V
>>>   table-driven isel (.td files/X86GenDAGISel)
>>>                    |
>>>                    V
>>>     manual isel (some to-be-design piece)
>> 
> 
>> I'm not sure what you mean here.  Are you suggesting that these be
>> completely separate passes over the dag?  Why do "manual
isel" and
>> "table driven isel" as separate passes?  If they are
interlaced, then
>> how is this different than what we already have?
> 
> No, not as separate passes.  Right now we have code like this in
> X86ISelDAGToDAG:
> 
> X86DAGToDAGISel::Select(SDNode *Node) {
>  ...
>  switch (Opcode) {
>    ...do a bunch of manual selection...
>  }
>  // If we get here we didn't select manually.
>  SelectCode();  // Select via table-driven isel, abort if no match.
> }
> 
> What I'm proposing is that we do this:
> 
> X86DAGToDAGISel::Select(SDNode *Node) {
>  ...
>  switch (Opcode) {
>    ...do a bunch of manual selection, less than before...
>  }
> 
>  // If we get here we didn't select manually.
> 
>  result = SelectCode();  // Select via table-driven isel.
> 
>  if (result_is_good()) {
>    return;
>  }
> 
>  switch (Opcode) {
>    ...do a bunch of manual selection, some that used to be above and in
>       legalize/lowering...
>  }
> 
>  cannot_select_abort();
> }
Ok, much better than separate passes, thanks for the clarification!

-Chris

David A. Greene

2011-Mar-27 20:16 UTC

head link

[LLVMdev] Long-Term ISel Design

Chris Lattner <clattner at apple.com> writes:
>> We would still keep the existing pre-table-driven-isel passes so
we'd
>> still have a chance to do some cleanup before the main table-driven
>> isel.
>> 
>> Obviously a lot of details have to be worked out.
>
> I'm not seeing how this is useful for shuffles.  Since tblgen
doesn't
> generate table based matching for *any* shuffles, all of the matching
> code would end up as C++ code in X86ISelDagToDag, which would give us
> all of the problems we had before by moving to X86ISD nodes.
Actually, it would be matching code in X86ISelLowering, in the form
of is.*Mask.  For example:

In X86ISelLowering.cpp:

bool X86ISelLowering::isSHUFPMask(...) {
  ...
}
unsigned X86::getShuffleSHUFImmediate(SDNode *N) {
  ...
}

In X86InstrFragmentsSIMD.td:

def SHUFFLE_get_shuf_imm : SDNodeXForm<vector_shuffle, [{
  return getI8Imm(X86::getShuffleSHUFImmediate(N));
}]>;
def shufp : PatFrag<(ops node:$lhs, node:$rhs),
                    (vector_shuffle node:$lhs, node:$rhs), [{
  return X86::isSHUFPMask(cast<ShuffleVectorSDNode>(N));
}], SHUFFLE_get_shuf_imm>;

Ok, so far this is exactly the same as today.

What we eliminate is this:

void X86TargetLowering::LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) {
  ...
  isShuffleMaskLegal(...)
}

bool X86TargetLowering::isShuffleMaskLegal(const SmallVectorImpl<int>
&M,
                                           EVT VT) const {
  // FIXME: pshufb, blends, shifts.
  return (VT.getVectorNumElements() == 2 ||
          ShuffleVectorSDNode::isSplatMask(&M[0], VT) ||
          isMOVLMask(M, VT) ||
          isSHUFPMask(M, VT) ||
          ...
}

We git rid of this call to isSHUFPMask, which currently happens during
legalize.  Instead of trying to see if shuffles are already legal, just
run the isntruction selector and see what it can find.  For everything
else, we know by definition it's not legal and we have to manually
lower it somehow, just like today.  The only difference is that we do
it after (or in conjunction with) isel instead of before it.  This will
eliminate a lot of confusing and redundant code in X86ISelLowering.
>> Well, it dopesn't _have_to_ form an X86ISD node.  I don't do
that now.
>> But it's fragile in the sense that no one else should mess with
that
>> piece of the DAG.
> I don't consider that an acceptable approach.  There is no way to
> prevent something from CSE'ing a load away and "breaking" the
dag, or
> moving an add between the nodes etc.  You're violating a design
> principle of selection dags.
Fair enough.  The current proposal eliminates any need to do any of
this.  I did end up creating a special X86 node for VBROADCAST because
you're right, its too brittle to rely on something else not breaking it.
>> But the real point is that in forming the X86ISD node currently,
I'm
>> doing exaclty what the tblgen-generated code already does.  If the
>> shuffle doesn't take a memory operand, then I have to lower it to
>> something else.  Where I do that (before or after table-driven isel)
>> doesn't matter.  I do the same work either way.  But by doing it
after I
>> avoid writing duplicate DAG matching code in the case where the operand
>> is in memory.
> I don't agree, and I don't see why this is as bad as you're
saying.
> The code creating these nodes is already target specific.  You seem to
> be objecting to this because it is easier to write .td files (which
> turn into generated isel code) than it is to write legalize code in
> C++.  If that is the problem you've identified, then why not make it
> possible to write legalize code in .td files?
I _am_ saying writing .td files is much easier, and I'm saying that half
the legalize code is already there, the part that checks whether stuff
is already legal.  I'm not trying to make the manual lowering case
easier.  I'm trying to avoid having to write manual code to discover
patterns that isel is already perfectly capable of discovering on its
own.

Take the broadcast case.  The .td is something like this (theoretical,
because I did end up creating an X86broadcast node) :

(set VR128:$dst, (v8f32 (splat_lo (v4f32 scalar_to_vector (f32 load addr:$src)),
                                  (undef))))

It's fairly straightforward.

The way things are now, in legalize I have to look for broadcast-like
operations and make sure they come from memory, something like:

if (...we have avx, this is a splat from element 0, etc....) {
  if (...operand 2 is undef...)
    if (...operand 1 is a scalar_to_vector...)
      if (...scalar_to_vector operand is a load...)
         if (...the load only has one use and is foldable...)
           ...transform this node to an X86braodcast node...
}

then write the .td pattern as:

(set VR128:$dst, (v8f32 (X86vbroadcast (f32 load addr:$src))))

In addition, I have to write the code to lower a splat_lo-type
thing that comes from a register (can't use VBROADCAST) but I
have to do that no matter what since it's not a legal DAG.

I had to write a whole bunch of manual matching code.  This is
error-prone is completely automatable by tblegen.  The current
difficulty is that the tblegen-generated matcher runs too late, after we
have a legal DAG guarantee.  If it ran earlier and instead of aborting
simply returned the DAGs (or fragments) it could not match, I could
avoid writing all this nasty code.

Using TableGen to generate legalize code (the other half that really has
to morph nodes) is an idea I've tossed around from time to time.  I ran
into this need very early on.  It would be nice to write patterns like:

def : Pat<(...some dag...),
          (...some other, non-target-specific dag...)>;

I think there's a baked-in assumption in TableGen that the pattern
matched to is a machine instruction dag.  The memory is fuzzy but I
remember trying to do something like this and failing utterly.  :)
> I think that this is just because the current code is in a half
> converted state.  Bruno can say more, but the ultimate goal is to make
> ISD::SHUFFLE completely illegal on X86, so you'd never have this sort
> of thing.
Erm...how would that work exactly?  Manual matching and lowering for all
shuffles?  That sounds like a huge step backward.  Shuffle is a key
operation on X86.  Anything that can automate its selection is a huge
win in terms of correctness.  After writing all of the shuffle matching
code for AVX (on its way into trunk), I don't ever, ever, ever want to
have to do anything like that ever again, ever.  :)

                                  -Dave

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Mar 2011 - [LLVMdev] Long-Term ISel Design

[LLVMdev] Long-Term ISel Design

[LLVMdev] Long-Term ISel Design

[LLVMdev] Long-Term ISel Design

Maybe Matching Threads