thr3ads.net - llvm dev - [LLVMdev] Long-Term ISel Design [Apr 2011]

If this information is useful, please help other people find it:
Share via:

David A. Greene

2011-Mar-27 20:16 UTC

[LLVMdev] Long-Term ISel Design

Chris Lattner <clattner at apple.com> writes:
>> We would still keep the existing pre-table-driven-isel passes so
we'd
>> still have a chance to do some cleanup before the main table-driven
>> isel.
>> 
>> Obviously a lot of details have to be worked out.
>
> I'm not seeing how this is useful for shuffles.  Since tblgen
doesn't
> generate table based matching for *any* shuffles, all of the matching
> code would end up as C++ code in X86ISelDagToDag, which would give us
> all of the problems we had before by moving to X86ISD nodes.
Actually, it would be matching code in X86ISelLowering, in the form
of is.*Mask.  For example:

In X86ISelLowering.cpp:

bool X86ISelLowering::isSHUFPMask(...) {
  ...
}
unsigned X86::getShuffleSHUFImmediate(SDNode *N) {
  ...
}

In X86InstrFragmentsSIMD.td:

def SHUFFLE_get_shuf_imm : SDNodeXForm<vector_shuffle, [{
  return getI8Imm(X86::getShuffleSHUFImmediate(N));
}]>;
def shufp : PatFrag<(ops node:$lhs, node:$rhs),
                    (vector_shuffle node:$lhs, node:$rhs), [{
  return X86::isSHUFPMask(cast<ShuffleVectorSDNode>(N));
}], SHUFFLE_get_shuf_imm>;

Ok, so far this is exactly the same as today.

What we eliminate is this:

void X86TargetLowering::LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) {
  ...
  isShuffleMaskLegal(...)
}

bool X86TargetLowering::isShuffleMaskLegal(const SmallVectorImpl<int>
&M,
                                           EVT VT) const {
  // FIXME: pshufb, blends, shifts.
  return (VT.getVectorNumElements() == 2 ||
          ShuffleVectorSDNode::isSplatMask(&M[0], VT) ||
          isMOVLMask(M, VT) ||
          isSHUFPMask(M, VT) ||
          ...
}

We git rid of this call to isSHUFPMask, which currently happens during
legalize.  Instead of trying to see if shuffles are already legal, just
run the isntruction selector and see what it can find.  For everything
else, we know by definition it's not legal and we have to manually
lower it somehow, just like today.  The only difference is that we do
it after (or in conjunction with) isel instead of before it.  This will
eliminate a lot of confusing and redundant code in X86ISelLowering.
>> Well, it dopesn't _have_to_ form an X86ISD node.  I don't do
that now.
>> But it's fragile in the sense that no one else should mess with
that
>> piece of the DAG.
> I don't consider that an acceptable approach.  There is no way to
> prevent something from CSE'ing a load away and "breaking" the
dag, or
> moving an add between the nodes etc.  You're violating a design
> principle of selection dags.
Fair enough.  The current proposal eliminates any need to do any of
this.  I did end up creating a special X86 node for VBROADCAST because
you're right, its too brittle to rely on something else not breaking it.
>> But the real point is that in forming the X86ISD node currently,
I'm
>> doing exaclty what the tblgen-generated code already does.  If the
>> shuffle doesn't take a memory operand, then I have to lower it to
>> something else.  Where I do that (before or after table-driven isel)
>> doesn't matter.  I do the same work either way.  But by doing it
after I
>> avoid writing duplicate DAG matching code in the case where the operand
>> is in memory.
> I don't agree, and I don't see why this is as bad as you're
saying.
> The code creating these nodes is already target specific.  You seem to
> be objecting to this because it is easier to write .td files (which
> turn into generated isel code) than it is to write legalize code in
> C++.  If that is the problem you've identified, then why not make it
> possible to write legalize code in .td files?
I _am_ saying writing .td files is much easier, and I'm saying that half
the legalize code is already there, the part that checks whether stuff
is already legal.  I'm not trying to make the manual lowering case
easier.  I'm trying to avoid having to write manual code to discover
patterns that isel is already perfectly capable of discovering on its
own.

Take the broadcast case.  The .td is something like this (theoretical,
because I did end up creating an X86broadcast node) :

(set VR128:$dst, (v8f32 (splat_lo (v4f32 scalar_to_vector (f32 load addr:$src)),
                                  (undef))))

It's fairly straightforward.

The way things are now, in legalize I have to look for broadcast-like
operations and make sure they come from memory, something like:

if (...we have avx, this is a splat from element 0, etc....) {
  if (...operand 2 is undef...)
    if (...operand 1 is a scalar_to_vector...)
      if (...scalar_to_vector operand is a load...)
         if (...the load only has one use and is foldable...)
           ...transform this node to an X86braodcast node...
}

then write the .td pattern as:

(set VR128:$dst, (v8f32 (X86vbroadcast (f32 load addr:$src))))

In addition, I have to write the code to lower a splat_lo-type
thing that comes from a register (can't use VBROADCAST) but I
have to do that no matter what since it's not a legal DAG.

I had to write a whole bunch of manual matching code.  This is
error-prone is completely automatable by tblegen.  The current
difficulty is that the tblegen-generated matcher runs too late, after we
have a legal DAG guarantee.  If it ran earlier and instead of aborting
simply returned the DAGs (or fragments) it could not match, I could
avoid writing all this nasty code.

Using TableGen to generate legalize code (the other half that really has
to morph nodes) is an idea I've tossed around from time to time.  I ran
into this need very early on.  It would be nice to write patterns like:

def : Pat<(...some dag...),
          (...some other, non-target-specific dag...)>;

I think there's a baked-in assumption in TableGen that the pattern
matched to is a machine instruction dag.  The memory is fuzzy but I
remember trying to do something like this and failing utterly.  :)
> I think that this is just because the current code is in a half
> converted state.  Bruno can say more, but the ultimate goal is to make
> ISD::SHUFFLE completely illegal on X86, so you'd never have this sort
> of thing.
Erm...how would that work exactly?  Manual matching and lowering for all
shuffles?  That sounds like a huge step backward.  Shuffle is a key
operation on X86.  Anything that can automate its selection is a huge
win in terms of correctness.  After writing all of the shuffle matching
code for AVX (on its way into trunk), I don't ever, ever, ever want to
have to do anything like that ever again, ever.  :)

                                  -Dave

Chris Lattner

2011-Apr-09 05:32 UTC

head link

[LLVMdev] Long-Term ISel Design

On Mar 27, 2011, at 1:16 PM, David A. Greene wrote:
> Chris Lattner <clattner at apple.com> writes:
> 
>>> We would still keep the existing pre-table-driven-isel passes so
we'd
>>> still have a chance to do some cleanup before the main table-driven
>>> isel.
>>> 
>>> Obviously a lot of details have to be worked out.
>> 
>> I'm not seeing how this is useful for shuffles.  Since tblgen
doesn't
>> generate table based matching for *any* shuffles, all of the matching
>> code would end up as C++ code in X86ISelDagToDag, which would give us
>> all of the problems we had before by moving to X86ISD nodes.
> 
> bool X86TargetLowering::isShuffleMaskLegal(const SmallVectorImpl<int>
&M,
>                                           EVT VT) const {
>  // FIXME: pshufb, blends, shifts.
>  return (VT.getVectorNumElements() == 2 ||
>          ShuffleVectorSDNode::isSplatMask(&M[0], VT) ||
>          isMOVLMask(M, VT) ||
>          isSHUFPMask(M, VT) ||
>          ...
> }
> 
> We git rid of this call to isSHUFPMask, which currently happens during
> legalize.  Instead of trying to see if shuffles are already legal, just
> run the isntruction selector and see what it can find.  For everything
> else, we know by definition it's not legal and we have to manually
> lower it somehow, just like today.  The only difference is that we do
> it after (or in conjunction with) isel instead of before it.  This will
> eliminate a lot of confusing and redundant code in X86ISelLowering.
I don't really see where you're going with this.  I agree that there is
confusing and fragile code for shuffle lowering, but this is what Bruno is
working on fixing.  To me, the problem is that legalize of SHUFFLE_VECTOR
eliminates shuffles that are not directly matchable into a single machine
instruction, but preserves ones that do match.  This means that there is
duplicated code in both legalize and isel that has to know that a shuffle is an
"unpacklps" or whatever.  Other parts of the code that generate
shuffles generally know exactly what shuffle they want, and have to synthesize a
shuffle node with the right operands to get it to select into the right
operation.

To me at least, the right solution to this problem is to make one X86ISD node
for each legal shuffle.  This has several desirable properties:

1. Legalize is now responsible for eliminating ALL vector_shuffle nodes, and it
is really obvious what shuffles it is generating when reading the dag dumps.
2. Other code that generates shuffles can just generate their exact node.
3. The duplicated isel code is gone, because there is a simple mapping between
X86ISD nodes and machine instrs.
4. We push fewer shuffle masks through the compiler which is a marginal speedup.

To me, there are two missing pieces here:
1. The x86 backend is half way converted over to the new model.
2. We *really really* want a way to express shuffle masks in .td files, and
tblgen should generate the Legalize code.  The "def X86Punpcklbw"
should include a (per cpu) cost for the shuffle as well as the mask that it
matches, or a predicate to run if it the mask isn't constant.
>> I think that this is just because the current code is in a half
>> converted state.  Bruno can say more, but the ultimate goal is to make
>> ISD::SHUFFLE completely illegal on X86, so you'd never have this
sort
>> of thing.
> 
> Erm...how would that work exactly?  Manual matching and lowering for all
> shuffles?  That sounds like a huge step backward.
I'm not sure how it is a step backward.  Legalize already does matching to
"know and ignore" a shuffle that will match a legal instruction.  It
isn't hard to change that code to "know and transform" it to the
instruction.  The net result of this is that the duplicate code in isel goes
away.
> Shuffle is a key
> operation on X86.  Anything that can automate its selection is a huge
> win in terms of correctness.  After writing all of the shuffle matching
> code for AVX (on its way into trunk), I don't ever, ever, ever want to
> have to do anything like that ever again, ever.  :)
Yes, I certainly agree that tblgen should generate the shuffle matching code, I
just think that the generated code should be executed in the LegalizeOp(shuffle
vector) code.

-Chris

David A. Greene

2011-Apr-11 20:16 UTC

head link

[LLVMdev] Long-Term ISel Design

Chris Lattner <clattner at apple.com> writes:
> To me at least, the right solution to this problem is to make one
> X86ISD node for each legal shuffle.  This has several desirable
> properties:
>
> 1. Legalize is now responsible for eliminating ALL vector_shuffle
>    nodes, and it is really obvious what shuffles it is generating when
>    reading the dag dumps.
> 2. Other code that generates shuffles can just generate their exact
>    node.
> 3. The duplicated isel code is gone, because there is a simple mapping
>    between X86ISD nodes and machine instrs.
> 4. We push fewer shuffle masks through the compiler which is a marginal
speedup.
It also has some not-so-desirable properties:

1. We need special isel DAG nodes for each shuffle.  That's an extra
   maintenance cost for something that should be automatable.  More
   shuffle instructions are coming, I'm sure.  I'm worried this
won't
   scale.

2. It makes shuffle nodes special relative to other kinds of nodes.  Why
   not have special DAG operators for _every_ kind of machine
   instruction?  I know some have them and that's always been a source
   of confusion for me.
> 2. We *really really* want a way to express shuffle masks in .td
> files, and tblgen should generate the Legalize code.  The "def
> X86Punpcklbw" should include a (per cpu) cost for the shuffle as well
> as the mask that it matches, or a predicate to run if it the mask
> isn't constant.
I agree this is highly desireable.  What's the difference between
generated legalize code and a generated instruction selection matcher?

Can we somehow work this so we don't need special target machine DAG
nodes?  If we're generating legalize code to lower to special target
machine DAG nodes and then matching really simple patterns using those
nodes, it seems to me just as easy to generate code to lower to machine
instructions directly.

Using target-specific DAG nodes as an intermediate step is fine, but I
don't think we want to stop there.
>>> I think that this is just because the current code is in a half
>>> converted state.  Bruno can say more, but the ultimate goal is to
make
>>> ISD::SHUFFLE completely illegal on X86, so you'd never have
this sort
>>> of thing.
>> 
>> Erm...how would that work exactly?  Manual matching and lowering for
all
>> shuffles?  That sounds like a huge step backward.
>
> I'm not sure how it is a step backward.  Legalize already does
> matching to "know and ignore" a shuffle that will match a legal
> instruction.  It isn't hard to change that code to "know and
> transform" it to the instruction.  The net result of this is that the
> duplicate code in isel goes away.
If we can support masks in TableGen and auto-generate this it is
goodness.
> Yes, I certainly agree that tblgen should generate the shuffle
> matching code, I just think that the generated code should be executed
> in the LegalizeOp(shuffle vector) code.
I guess as long as it's automatic I don't particularly care where it
gets done.  :)

Is this plan written up anywhere?  I was really confused by the changes
in x86 isel and it would have been nice to have something to look at.

                                 -Dave

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Apr 2011 - [LLVMdev] Long-Term ISel Design

[LLVMdev] Long-Term ISel Design

[LLVMdev] Long-Term ISel Design

[LLVMdev] Long-Term ISel Design

Maybe Matching Threads