Chris Lattner <clattner at apple.com> writes:>> 1. We have special target-specific operators for certain shuffles in X86, >> such as X86unpckl.> It also eliminates a lot of fragility. Before doing this, X86 > legalize would have to be very careful to specifically form shuffles > that it knew isel would turn into (e.g.) unpck operations. Now > instead of forming specific carefully constructed shuffle masks (not > making sure other code doesn't violate them) it can just directly form > the X86ISD node.Right. What I've presented would reverse this. Rather than making Legalize have to know about what table-driven isel can and cannot do, have table-driven isel run first, see what it can do and then leave the rest for manual selection. We would still keep the existing pre-table-driven-isel passes so we'd still have a chance to do some cleanup before the main table-driven isel. Obviously a lot of details have to be worked out.>> 2. Sometimes DAGs are legal in some contexts but not others and it is a >> pain to deal with. A good example is VBROADCAST, where a <0,0,0,0> >> shuffle is natively supported if the source vector is in memory. >> Otherwise it's not legal and manual lowering is required. In this >> case the legality check is doing the DAG match by hand, replicating >> what TableGen-produced code already does.> Yes, this isn't good. Instead, the shuffle should be legalized to > something that takes a pointer (memory operand). That means that X86 > isel would form the *fully legal* X86ISD node, and nothing would be > able to break it and it could never fail to match.Well, it dopesn't _have_to_ form an X86ISD node. I don't do that now. But it's fragile in the sense that no one else should mess with that piece of the DAG. But the real point is that in forming the X86ISD node currently, I'm doing exaclty what the tblgen-generated code already does. If the shuffle doesn't take a memory operand, then I have to lower it to something else. Where I do that (before or after table-driven isel) doesn't matter. I do the same work either way. But by doing it after I avoid writing duplicate DAG matching code in the case where the operand is in memory.>> These two examples are related: we're duplicating functionality manually >> that's already available automatically. > > Not sure what you mean by this.I mean that in legalize/lowering we're massaging the DAG to get it into a state where tabel-driven isel can match it. There is a lot of code like this: if (shuffle_is_MOVL) do_nothing_and_return It's duplicating exactly the checks that the table-driven isel does later. In the VBROADCASTSS/D case, it's doing an entire DAG match to check whether it's implementable with VBROADCASTSS/D. Why not just let table-driven isel run first and take care of these checks just once? If something doesn't match, we then know it needs manual lowering and selection.>> legalize >> | >> V >> manual lowering (X86ISelLowering) >> | >> V >> manual isel (X86ISelDAGToDAG) >> | >> V >> table-driven isel (.td files/X86GenDAGISel) >> | >> V >> manual isel (some to-be-design piece) >> I'm not sure what you mean here. Are you suggesting that these be > completely separate passes over the dag? Why do "manual isel" and > "table driven isel" as separate passes? If they are interlaced, then > how is this different than what we already have?No, not as separate passes. Right now we have code like this in X86ISelDAGToDAG: X86DAGToDAGISel::Select(SDNode *Node) { ... switch (Opcode) { ...do a bunch of manual selection... } // If we get here we didn't select manually. SelectCode(); // Select via table-driven isel, abort if no match. } What I'm proposing is that we do this: X86DAGToDAGISel::Select(SDNode *Node) { ... switch (Opcode) { ...do a bunch of manual selection, less than before... } // If we get here we didn't select manually. result = SelectCode(); // Select via table-driven isel. if (result_is_good()) { return; } switch (Opcode) { ...do a bunch of manual selection, some that used to be above and in legalize/lowering... } cannot_select_abort(); }> You're saying that we do this on a node-by-node basis?You mean on an SDNode basis? I don't think so. As I said, details have to be worked out but I imagine we'd send the selection DAG to the tblgen-generated code as we do now and any "leftover" bits would have to be processed afterward by the manual selector. Now, there is a phase ordering issue in that some of the legalize/lowering code massages the tree so the tblgen stuff can make a "good" match. We probably still want to do that in some cases so we keep that code where it is. What I'm aiming at getting rid of is all of the code that does: if (this_is_already_matchable()) { return; }> The reason that codegen aborts on unselectable operations is that they > are invalid and should not be formed. Your example of vbroadcast is a > great one: the X86ISD node for it *should not take a vector register > input*. If it does, then the X86ISD node is incorrectly defined.What I'm saying is that there would be no X86ISD node. If the pattern is there, tblgen-produced code will match it. If not, we have to lower it manually and would just do what we'd have done in legalize/lowering before, the difference being we'd now do it after table-driven isel rather than before. -Dave
On Mar 17, 2011, at 9:32 AM, David A. Greene wrote:> Chris Lattner <clattner at apple.com> writes: >>> 1. We have special target-specific operators for certain shuffles in X86, >>> such as X86unpckl. > >> It also eliminates a lot of fragility. Before doing this, X86 >> legalize would have to be very careful to specifically form shuffles >> that it knew isel would turn into (e.g.) unpck operations. Now >> instead of forming specific carefully constructed shuffle masks (not >> making sure other code doesn't violate them) it can just directly form >> the X86ISD node. > > Right. What I've presented would reverse this. Rather than making > Legalize have to know about what table-driven isel can and cannot do, > have table-driven isel run first, see what it can do and then leave > the rest for manual selection. > > We would still keep the existing pre-table-driven-isel passes so we'd > still have a chance to do some cleanup before the main table-driven > isel. > > Obviously a lot of details have to be worked out.I'm not seeing how this is useful for shuffles. Since tblgen doesn't generate table based matching for *any* shuffles, all of the matching code would end up as C++ code in X86ISelDagToDag, which would give us all of the problems we had before by moving to X86ISD nodes.>>> 2. Sometimes DAGs are legal in some contexts but not others and it is a >>> pain to deal with. A good example is VBROADCAST, where a <0,0,0,0> >>> shuffle is natively supported if the source vector is in memory. >>> Otherwise it's not legal and manual lowering is required. In this >>> case the legality check is doing the DAG match by hand, replicating >>> what TableGen-produced code already does. > >> Yes, this isn't good. Instead, the shuffle should be legalized to >> something that takes a pointer (memory operand). That means that X86 >> isel would form the *fully legal* X86ISD node, and nothing would be >> able to break it and it could never fail to match. > > Well, it dopesn't _have_to_ form an X86ISD node. I don't do that now. > But it's fragile in the sense that no one else should mess with that > piece of the DAG.I don't consider that an acceptable approach. There is no way to prevent something from CSE'ing a load away and "breaking" the dag, or moving an add between the nodes etc. You're violating a design principle of selection dags.> But the real point is that in forming the X86ISD node currently, I'm > doing exaclty what the tblgen-generated code already does. If the > shuffle doesn't take a memory operand, then I have to lower it to > something else. Where I do that (before or after table-driven isel) > doesn't matter. I do the same work either way. But by doing it after I > avoid writing duplicate DAG matching code in the case where the operand > is in memory.I don't agree, and I don't see why this is as bad as you're saying. The code creating these nodes is already target specific. You seem to be objecting to this because it is easier to write .td files (which turn into generated isel code) than it is to write legalize code in C++. If that is the problem you've identified, then why not make it possible to write legalize code in .td files?>>> These two examples are related: we're duplicating functionality manually >>> that's already available automatically. >> >> Not sure what you mean by this. > > I mean that in legalize/lowering we're massaging the DAG to get it into > a state where tabel-driven isel can match it. There is a lot of code > like this: > > if (shuffle_is_MOVL) > do_nothing_and_return > > It's duplicating exactly the checks that the table-driven isel does > later. In the VBROADCASTSS/D case, it's doing an entire DAG match > to check whether it's implementable with VBROADCASTSS/D. > > Why not just let table-driven isel run first and take care of these > checks just once? If something doesn't match, we then know it needs > manual lowering and selection.I think that this is just because the current code is in a half converted state. Bruno can say more, but the ultimate goal is to make ISD::SHUFFLE completely illegal on X86, so you'd never have this sort of thing.> >>> legalize >>> | >>> V >>> manual lowering (X86ISelLowering) >>> | >>> V >>> manual isel (X86ISelDAGToDAG) >>> | >>> V >>> table-driven isel (.td files/X86GenDAGISel) >>> | >>> V >>> manual isel (some to-be-design piece) >> > >> I'm not sure what you mean here. Are you suggesting that these be >> completely separate passes over the dag? Why do "manual isel" and >> "table driven isel" as separate passes? If they are interlaced, then >> how is this different than what we already have? > > No, not as separate passes. Right now we have code like this in > X86ISelDAGToDAG: > > X86DAGToDAGISel::Select(SDNode *Node) { > ... > switch (Opcode) { > ...do a bunch of manual selection... > } > // If we get here we didn't select manually. > SelectCode(); // Select via table-driven isel, abort if no match. > } > > What I'm proposing is that we do this: > > X86DAGToDAGISel::Select(SDNode *Node) { > ... > switch (Opcode) { > ...do a bunch of manual selection, less than before... > } > > // If we get here we didn't select manually. > > result = SelectCode(); // Select via table-driven isel. > > if (result_is_good()) { > return; > } > > switch (Opcode) { > ...do a bunch of manual selection, some that used to be above and in > legalize/lowering... > } > > cannot_select_abort(); > }Ok, much better than separate passes, thanks for the clarification! -Chris
Chris Lattner <clattner at apple.com> writes:>> We would still keep the existing pre-table-driven-isel passes so we'd >> still have a chance to do some cleanup before the main table-driven >> isel. >> >> Obviously a lot of details have to be worked out. > > I'm not seeing how this is useful for shuffles. Since tblgen doesn't > generate table based matching for *any* shuffles, all of the matching > code would end up as C++ code in X86ISelDagToDag, which would give us > all of the problems we had before by moving to X86ISD nodes.Actually, it would be matching code in X86ISelLowering, in the form of is.*Mask. For example: In X86ISelLowering.cpp: bool X86ISelLowering::isSHUFPMask(...) { ... } unsigned X86::getShuffleSHUFImmediate(SDNode *N) { ... } In X86InstrFragmentsSIMD.td: def SHUFFLE_get_shuf_imm : SDNodeXForm<vector_shuffle, [{ return getI8Imm(X86::getShuffleSHUFImmediate(N)); }]>; def shufp : PatFrag<(ops node:$lhs, node:$rhs), (vector_shuffle node:$lhs, node:$rhs), [{ return X86::isSHUFPMask(cast<ShuffleVectorSDNode>(N)); }], SHUFFLE_get_shuf_imm>; Ok, so far this is exactly the same as today. What we eliminate is this: void X86TargetLowering::LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) { ... isShuffleMaskLegal(...) } bool X86TargetLowering::isShuffleMaskLegal(const SmallVectorImpl<int> &M, EVT VT) const { // FIXME: pshufb, blends, shifts. return (VT.getVectorNumElements() == 2 || ShuffleVectorSDNode::isSplatMask(&M[0], VT) || isMOVLMask(M, VT) || isSHUFPMask(M, VT) || ... } We git rid of this call to isSHUFPMask, which currently happens during legalize. Instead of trying to see if shuffles are already legal, just run the isntruction selector and see what it can find. For everything else, we know by definition it's not legal and we have to manually lower it somehow, just like today. The only difference is that we do it after (or in conjunction with) isel instead of before it. This will eliminate a lot of confusing and redundant code in X86ISelLowering.>> Well, it dopesn't _have_to_ form an X86ISD node. I don't do that now. >> But it's fragile in the sense that no one else should mess with that >> piece of the DAG.> I don't consider that an acceptable approach. There is no way to > prevent something from CSE'ing a load away and "breaking" the dag, or > moving an add between the nodes etc. You're violating a design > principle of selection dags.Fair enough. The current proposal eliminates any need to do any of this. I did end up creating a special X86 node for VBROADCAST because you're right, its too brittle to rely on something else not breaking it.>> But the real point is that in forming the X86ISD node currently, I'm >> doing exaclty what the tblgen-generated code already does. If the >> shuffle doesn't take a memory operand, then I have to lower it to >> something else. Where I do that (before or after table-driven isel) >> doesn't matter. I do the same work either way. But by doing it after I >> avoid writing duplicate DAG matching code in the case where the operand >> is in memory.> I don't agree, and I don't see why this is as bad as you're saying. > The code creating these nodes is already target specific. You seem to > be objecting to this because it is easier to write .td files (which > turn into generated isel code) than it is to write legalize code in > C++. If that is the problem you've identified, then why not make it > possible to write legalize code in .td files?I _am_ saying writing .td files is much easier, and I'm saying that half the legalize code is already there, the part that checks whether stuff is already legal. I'm not trying to make the manual lowering case easier. I'm trying to avoid having to write manual code to discover patterns that isel is already perfectly capable of discovering on its own. Take the broadcast case. The .td is something like this (theoretical, because I did end up creating an X86broadcast node) : (set VR128:$dst, (v8f32 (splat_lo (v4f32 scalar_to_vector (f32 load addr:$src)), (undef)))) It's fairly straightforward. The way things are now, in legalize I have to look for broadcast-like operations and make sure they come from memory, something like: if (...we have avx, this is a splat from element 0, etc....) { if (...operand 2 is undef...) if (...operand 1 is a scalar_to_vector...) if (...scalar_to_vector operand is a load...) if (...the load only has one use and is foldable...) ...transform this node to an X86braodcast node... } then write the .td pattern as: (set VR128:$dst, (v8f32 (X86vbroadcast (f32 load addr:$src)))) In addition, I have to write the code to lower a splat_lo-type thing that comes from a register (can't use VBROADCAST) but I have to do that no matter what since it's not a legal DAG. I had to write a whole bunch of manual matching code. This is error-prone is completely automatable by tblegen. The current difficulty is that the tblegen-generated matcher runs too late, after we have a legal DAG guarantee. If it ran earlier and instead of aborting simply returned the DAGs (or fragments) it could not match, I could avoid writing all this nasty code. Using TableGen to generate legalize code (the other half that really has to morph nodes) is an idea I've tossed around from time to time. I ran into this need very early on. It would be nice to write patterns like: def : Pat<(...some dag...), (...some other, non-target-specific dag...)>; I think there's a baked-in assumption in TableGen that the pattern matched to is a machine instruction dag. The memory is fuzzy but I remember trying to do something like this and failing utterly. :)> I think that this is just because the current code is in a half > converted state. Bruno can say more, but the ultimate goal is to make > ISD::SHUFFLE completely illegal on X86, so you'd never have this sort > of thing.Erm...how would that work exactly? Manual matching and lowering for all shuffles? That sounds like a huge step backward. Shuffle is a key operation on X86. Anything that can automate its selection is a huge win in terms of correctness. After writing all of the shuffle matching code for AVX (on its way into trunk), I don't ever, ever, ever want to have to do anything like that ever again, ever. :) -Dave