Chris Lattner <clattner at apple.com> writes:>> We would still keep the existing pre-table-driven-isel passes so we'd >> still have a chance to do some cleanup before the main table-driven >> isel. >> >> Obviously a lot of details have to be worked out. > > I'm not seeing how this is useful for shuffles. Since tblgen doesn't > generate table based matching for *any* shuffles, all of the matching > code would end up as C++ code in X86ISelDagToDag, which would give us > all of the problems we had before by moving to X86ISD nodes.Actually, it would be matching code in X86ISelLowering, in the form of is.*Mask. For example: In X86ISelLowering.cpp: bool X86ISelLowering::isSHUFPMask(...) { ... } unsigned X86::getShuffleSHUFImmediate(SDNode *N) { ... } In X86InstrFragmentsSIMD.td: def SHUFFLE_get_shuf_imm : SDNodeXForm<vector_shuffle, [{ return getI8Imm(X86::getShuffleSHUFImmediate(N)); }]>; def shufp : PatFrag<(ops node:$lhs, node:$rhs), (vector_shuffle node:$lhs, node:$rhs), [{ return X86::isSHUFPMask(cast<ShuffleVectorSDNode>(N)); }], SHUFFLE_get_shuf_imm>; Ok, so far this is exactly the same as today. What we eliminate is this: void X86TargetLowering::LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) { ... isShuffleMaskLegal(...) } bool X86TargetLowering::isShuffleMaskLegal(const SmallVectorImpl<int> &M, EVT VT) const { // FIXME: pshufb, blends, shifts. return (VT.getVectorNumElements() == 2 || ShuffleVectorSDNode::isSplatMask(&M[0], VT) || isMOVLMask(M, VT) || isSHUFPMask(M, VT) || ... } We git rid of this call to isSHUFPMask, which currently happens during legalize. Instead of trying to see if shuffles are already legal, just run the isntruction selector and see what it can find. For everything else, we know by definition it's not legal and we have to manually lower it somehow, just like today. The only difference is that we do it after (or in conjunction with) isel instead of before it. This will eliminate a lot of confusing and redundant code in X86ISelLowering.>> Well, it dopesn't _have_to_ form an X86ISD node. I don't do that now. >> But it's fragile in the sense that no one else should mess with that >> piece of the DAG.> I don't consider that an acceptable approach. There is no way to > prevent something from CSE'ing a load away and "breaking" the dag, or > moving an add between the nodes etc. You're violating a design > principle of selection dags.Fair enough. The current proposal eliminates any need to do any of this. I did end up creating a special X86 node for VBROADCAST because you're right, its too brittle to rely on something else not breaking it.>> But the real point is that in forming the X86ISD node currently, I'm >> doing exaclty what the tblgen-generated code already does. If the >> shuffle doesn't take a memory operand, then I have to lower it to >> something else. Where I do that (before or after table-driven isel) >> doesn't matter. I do the same work either way. But by doing it after I >> avoid writing duplicate DAG matching code in the case where the operand >> is in memory.> I don't agree, and I don't see why this is as bad as you're saying. > The code creating these nodes is already target specific. You seem to > be objecting to this because it is easier to write .td files (which > turn into generated isel code) than it is to write legalize code in > C++. If that is the problem you've identified, then why not make it > possible to write legalize code in .td files?I _am_ saying writing .td files is much easier, and I'm saying that half the legalize code is already there, the part that checks whether stuff is already legal. I'm not trying to make the manual lowering case easier. I'm trying to avoid having to write manual code to discover patterns that isel is already perfectly capable of discovering on its own. Take the broadcast case. The .td is something like this (theoretical, because I did end up creating an X86broadcast node) : (set VR128:$dst, (v8f32 (splat_lo (v4f32 scalar_to_vector (f32 load addr:$src)), (undef)))) It's fairly straightforward. The way things are now, in legalize I have to look for broadcast-like operations and make sure they come from memory, something like: if (...we have avx, this is a splat from element 0, etc....) { if (...operand 2 is undef...) if (...operand 1 is a scalar_to_vector...) if (...scalar_to_vector operand is a load...) if (...the load only has one use and is foldable...) ...transform this node to an X86braodcast node... } then write the .td pattern as: (set VR128:$dst, (v8f32 (X86vbroadcast (f32 load addr:$src)))) In addition, I have to write the code to lower a splat_lo-type thing that comes from a register (can't use VBROADCAST) but I have to do that no matter what since it's not a legal DAG. I had to write a whole bunch of manual matching code. This is error-prone is completely automatable by tblegen. The current difficulty is that the tblegen-generated matcher runs too late, after we have a legal DAG guarantee. If it ran earlier and instead of aborting simply returned the DAGs (or fragments) it could not match, I could avoid writing all this nasty code. Using TableGen to generate legalize code (the other half that really has to morph nodes) is an idea I've tossed around from time to time. I ran into this need very early on. It would be nice to write patterns like: def : Pat<(...some dag...), (...some other, non-target-specific dag...)>; I think there's a baked-in assumption in TableGen that the pattern matched to is a machine instruction dag. The memory is fuzzy but I remember trying to do something like this and failing utterly. :)> I think that this is just because the current code is in a half > converted state. Bruno can say more, but the ultimate goal is to make > ISD::SHUFFLE completely illegal on X86, so you'd never have this sort > of thing.Erm...how would that work exactly? Manual matching and lowering for all shuffles? That sounds like a huge step backward. Shuffle is a key operation on X86. Anything that can automate its selection is a huge win in terms of correctness. After writing all of the shuffle matching code for AVX (on its way into trunk), I don't ever, ever, ever want to have to do anything like that ever again, ever. :) -Dave
On Mar 27, 2011, at 1:16 PM, David A. Greene wrote:> Chris Lattner <clattner at apple.com> writes: > >>> We would still keep the existing pre-table-driven-isel passes so we'd >>> still have a chance to do some cleanup before the main table-driven >>> isel. >>> >>> Obviously a lot of details have to be worked out. >> >> I'm not seeing how this is useful for shuffles. Since tblgen doesn't >> generate table based matching for *any* shuffles, all of the matching >> code would end up as C++ code in X86ISelDagToDag, which would give us >> all of the problems we had before by moving to X86ISD nodes. > > bool X86TargetLowering::isShuffleMaskLegal(const SmallVectorImpl<int> &M, > EVT VT) const { > // FIXME: pshufb, blends, shifts. > return (VT.getVectorNumElements() == 2 || > ShuffleVectorSDNode::isSplatMask(&M[0], VT) || > isMOVLMask(M, VT) || > isSHUFPMask(M, VT) || > ... > } > > We git rid of this call to isSHUFPMask, which currently happens during > legalize. Instead of trying to see if shuffles are already legal, just > run the isntruction selector and see what it can find. For everything > else, we know by definition it's not legal and we have to manually > lower it somehow, just like today. The only difference is that we do > it after (or in conjunction with) isel instead of before it. This will > eliminate a lot of confusing and redundant code in X86ISelLowering.I don't really see where you're going with this. I agree that there is confusing and fragile code for shuffle lowering, but this is what Bruno is working on fixing. To me, the problem is that legalize of SHUFFLE_VECTOR eliminates shuffles that are not directly matchable into a single machine instruction, but preserves ones that do match. This means that there is duplicated code in both legalize and isel that has to know that a shuffle is an "unpacklps" or whatever. Other parts of the code that generate shuffles generally know exactly what shuffle they want, and have to synthesize a shuffle node with the right operands to get it to select into the right operation. To me at least, the right solution to this problem is to make one X86ISD node for each legal shuffle. This has several desirable properties: 1. Legalize is now responsible for eliminating ALL vector_shuffle nodes, and it is really obvious what shuffles it is generating when reading the dag dumps. 2. Other code that generates shuffles can just generate their exact node. 3. The duplicated isel code is gone, because there is a simple mapping between X86ISD nodes and machine instrs. 4. We push fewer shuffle masks through the compiler which is a marginal speedup. To me, there are two missing pieces here: 1. The x86 backend is half way converted over to the new model. 2. We *really really* want a way to express shuffle masks in .td files, and tblgen should generate the Legalize code. The "def X86Punpcklbw" should include a (per cpu) cost for the shuffle as well as the mask that it matches, or a predicate to run if it the mask isn't constant.>> I think that this is just because the current code is in a half >> converted state. Bruno can say more, but the ultimate goal is to make >> ISD::SHUFFLE completely illegal on X86, so you'd never have this sort >> of thing. > > Erm...how would that work exactly? Manual matching and lowering for all > shuffles? That sounds like a huge step backward.I'm not sure how it is a step backward. Legalize already does matching to "know and ignore" a shuffle that will match a legal instruction. It isn't hard to change that code to "know and transform" it to the instruction. The net result of this is that the duplicate code in isel goes away.> Shuffle is a key > operation on X86. Anything that can automate its selection is a huge > win in terms of correctness. After writing all of the shuffle matching > code for AVX (on its way into trunk), I don't ever, ever, ever want to > have to do anything like that ever again, ever. :)Yes, I certainly agree that tblgen should generate the shuffle matching code, I just think that the generated code should be executed in the LegalizeOp(shuffle vector) code. -Chris
Chris Lattner <clattner at apple.com> writes:> To me at least, the right solution to this problem is to make one > X86ISD node for each legal shuffle. This has several desirable > properties: > > 1. Legalize is now responsible for eliminating ALL vector_shuffle > nodes, and it is really obvious what shuffles it is generating when > reading the dag dumps. > 2. Other code that generates shuffles can just generate their exact > node. > 3. The duplicated isel code is gone, because there is a simple mapping > between X86ISD nodes and machine instrs. > 4. We push fewer shuffle masks through the compiler which is a marginal speedup.It also has some not-so-desirable properties: 1. We need special isel DAG nodes for each shuffle. That's an extra maintenance cost for something that should be automatable. More shuffle instructions are coming, I'm sure. I'm worried this won't scale. 2. It makes shuffle nodes special relative to other kinds of nodes. Why not have special DAG operators for _every_ kind of machine instruction? I know some have them and that's always been a source of confusion for me.> 2. We *really really* want a way to express shuffle masks in .td > files, and tblgen should generate the Legalize code. The "def > X86Punpcklbw" should include a (per cpu) cost for the shuffle as well > as the mask that it matches, or a predicate to run if it the mask > isn't constant.I agree this is highly desireable. What's the difference between generated legalize code and a generated instruction selection matcher? Can we somehow work this so we don't need special target machine DAG nodes? If we're generating legalize code to lower to special target machine DAG nodes and then matching really simple patterns using those nodes, it seems to me just as easy to generate code to lower to machine instructions directly. Using target-specific DAG nodes as an intermediate step is fine, but I don't think we want to stop there.>>> I think that this is just because the current code is in a half >>> converted state. Bruno can say more, but the ultimate goal is to make >>> ISD::SHUFFLE completely illegal on X86, so you'd never have this sort >>> of thing. >> >> Erm...how would that work exactly? Manual matching and lowering for all >> shuffles? That sounds like a huge step backward. >> I'm not sure how it is a step backward. Legalize already does > matching to "know and ignore" a shuffle that will match a legal > instruction. It isn't hard to change that code to "know and > transform" it to the instruction. The net result of this is that the > duplicate code in isel goes away.If we can support masks in TableGen and auto-generate this it is goodness.> Yes, I certainly agree that tblgen should generate the shuffle > matching code, I just think that the generated code should be executed > in the LegalizeOp(shuffle vector) code.I guess as long as it's automatic I don't particularly care where it gets done. :) Is this plan written up anywhere? I was really confused by the changes in x86 isel and it would have been nice to have something to look at. -Dave