All, As I've done more integrating of AVX work upstream and more tuning here, I've run across several things which are clunky in the current isel design. A couple examples I can remember offhand: 1. We have special target-specific operators for certain shuffles in X86, such as X86unpckl. I don't completely understand why but Bruno indicated it was to address inefficiecies. One of those is the need to check masks multiple times (once at legalize and again at isel). 2. Sometimes DAGs are legal in some contexts but not others and it is a pain to deal with. A good example is VBROADCAST, where a <0,0,0,0> shuffle is natively supported if the source vector is in memory. Otherwise it's not legal and manual lowering is required. In this case the legality check is doing the DAG match by hand, replicating what TableGen-produced code already does. These two examples are related: we're duplicating functionality manually that's already available automatically. As I've been thinking about this, it strikes me that we could get rid of the target-specific operators and a lot of other manual checks if we just had another isel phase. Let's say we structured things this way: legalize | V manual lowering (X86ISelLowering) | V manual isel (X86ISelDAGToDAG) | V table-driven isel (.td files/X86GenDAGISel) | V manual isel (some to-be-design piece) The idea is that we keep the existing manual pieces where they are to clean things up for TableGen-based isel and/or handle special cases. Maybe we consider getting rid of some in the future but that's a separate questions. The way things are now, if table-driven isel fails the codegen aborts. In the above scheme we get one last chance to do manual lowering before we give up. This helps the shuffle mask case by turning this: legalize | V check shuffle mask legality (X86ISelLowering) | V check shuffle mask legality (table-driven isel predicates) To this: legalize | V X86ISelLowering (no mask legality checks) | V check shuffle mask legality (table-driven isel predicates) | V lower remaining shuffles manually The advantage is that in the final stage we already know the shuffle mask isn't implementable manually so there's no need to check for legality. We simply need to implement whatever X86ISelLowering would have done in those case previously. This also helps example 2. In the memory-operand case we will match to a VBROADCASTSS/D. If we don't match we'll fall through to manual lowering and we'll implement the reg-reg broadcast via some other combination of shuffles. So we more gracefully handle situations where sometimes things are legal and sometimes they aren't depending on the context. Perhaps I'm repeating something that's already been discussed. Thoughts? -Dave
David Greene <dag at cray.com> writes:> The advantage is that in the final stage we already know the shuffle > mask isn't implementable manually so there's no need to check for > legality.s/manually/natively/ -Dave
On Mar 16, 2011, at 1:44 PM, David Greene wrote:> All, > > As I've done more integrating of AVX work upstream and more tuning here, > I've run across several things which are clunky in the current isel > design. A couple examples I can remember offhand: > > 1. We have special target-specific operators for certain shuffles in X86, > such as X86unpckl. I don't completely understand why but Bruno > indicated it was to address inefficiecies. One of those is the need > to check masks multiple times (once at legalize and again at isel).It also eliminates a lot of fragility. Before doing this, X86 legalize would have to be very careful to specifically form shuffles that it knew isel would turn into (e.g.) unpck operations. Now instead of forming specific carefully constructed shuffle masks (not making sure other code doesn't violate them) it can just directly form the X86ISD node.> 2. Sometimes DAGs are legal in some contexts but not others and it is a > pain to deal with. A good example is VBROADCAST, where a <0,0,0,0> > shuffle is natively supported if the source vector is in memory. > Otherwise it's not legal and manual lowering is required. In this > case the legality check is doing the DAG match by hand, replicating > what TableGen-produced code already does.Yes, this isn't good. Instead, the shuffle should be legalized to something that takes a pointer (memory operand). That means that X86 isel would form the *fully legal* X86ISD node, and nothing would be able to break it and it could never fail to match.> These two examples are related: we're duplicating functionality manually > that's already available automatically.Not sure what you mean by this.> As I've been thinking about this, it strikes me that we could get rid of > the target-specific operators and a lot of other manual checks if we > just had another isel phase. Let's say we structured things this way: > > legalize > | > V > manual lowering (X86ISelLowering) > | > V > manual isel (X86ISelDAGToDAG) > | > V > table-driven isel (.td files/X86GenDAGISel) > | > V > manual isel (some to-be-design piece) > > The idea is that we keep the existing manual pieces where they are to > clean things up for TableGen-based isel and/or handle special cases. > Maybe we consider getting rid of some in the future but that's a > separate questions.I'm not sure what you mean here. Are you suggesting that these be completely separate passes over the dag? Why do "manual isel" and "table driven isel" as separate passes? If they are interlaced, then how is this different than what we already have?> The way things are now, if table-driven isel fails the codegen aborts. > In the above scheme we get one last chance to do manual lowering before > we give up. This helps the shuffle mask case by turning this: > > legalize > | > V > check shuffle mask legality (X86ISelLowering) > | > V > check shuffle mask legality (table-driven isel predicates)You're saying that we do this on a node-by-node basis? The reason that codegen aborts on unselectable operations is that they are invalid and should not be formed. Your example of vbroadcast is a great one: the X86ISD node for it *should not take a vector register input*. If it does, then the X86ISD node is incorrectly defined. -Chris
Chris Lattner <clattner at apple.com> writes:>> 1. We have special target-specific operators for certain shuffles in X86, >> such as X86unpckl.> It also eliminates a lot of fragility. Before doing this, X86 > legalize would have to be very careful to specifically form shuffles > that it knew isel would turn into (e.g.) unpck operations. Now > instead of forming specific carefully constructed shuffle masks (not > making sure other code doesn't violate them) it can just directly form > the X86ISD node.Right. What I've presented would reverse this. Rather than making Legalize have to know about what table-driven isel can and cannot do, have table-driven isel run first, see what it can do and then leave the rest for manual selection. We would still keep the existing pre-table-driven-isel passes so we'd still have a chance to do some cleanup before the main table-driven isel. Obviously a lot of details have to be worked out.>> 2. Sometimes DAGs are legal in some contexts but not others and it is a >> pain to deal with. A good example is VBROADCAST, where a <0,0,0,0> >> shuffle is natively supported if the source vector is in memory. >> Otherwise it's not legal and manual lowering is required. In this >> case the legality check is doing the DAG match by hand, replicating >> what TableGen-produced code already does.> Yes, this isn't good. Instead, the shuffle should be legalized to > something that takes a pointer (memory operand). That means that X86 > isel would form the *fully legal* X86ISD node, and nothing would be > able to break it and it could never fail to match.Well, it dopesn't _have_to_ form an X86ISD node. I don't do that now. But it's fragile in the sense that no one else should mess with that piece of the DAG. But the real point is that in forming the X86ISD node currently, I'm doing exaclty what the tblgen-generated code already does. If the shuffle doesn't take a memory operand, then I have to lower it to something else. Where I do that (before or after table-driven isel) doesn't matter. I do the same work either way. But by doing it after I avoid writing duplicate DAG matching code in the case where the operand is in memory.>> These two examples are related: we're duplicating functionality manually >> that's already available automatically. > > Not sure what you mean by this.I mean that in legalize/lowering we're massaging the DAG to get it into a state where tabel-driven isel can match it. There is a lot of code like this: if (shuffle_is_MOVL) do_nothing_and_return It's duplicating exactly the checks that the table-driven isel does later. In the VBROADCASTSS/D case, it's doing an entire DAG match to check whether it's implementable with VBROADCASTSS/D. Why not just let table-driven isel run first and take care of these checks just once? If something doesn't match, we then know it needs manual lowering and selection.>> legalize >> | >> V >> manual lowering (X86ISelLowering) >> | >> V >> manual isel (X86ISelDAGToDAG) >> | >> V >> table-driven isel (.td files/X86GenDAGISel) >> | >> V >> manual isel (some to-be-design piece) >> I'm not sure what you mean here. Are you suggesting that these be > completely separate passes over the dag? Why do "manual isel" and > "table driven isel" as separate passes? If they are interlaced, then > how is this different than what we already have?No, not as separate passes. Right now we have code like this in X86ISelDAGToDAG: X86DAGToDAGISel::Select(SDNode *Node) { ... switch (Opcode) { ...do a bunch of manual selection... } // If we get here we didn't select manually. SelectCode(); // Select via table-driven isel, abort if no match. } What I'm proposing is that we do this: X86DAGToDAGISel::Select(SDNode *Node) { ... switch (Opcode) { ...do a bunch of manual selection, less than before... } // If we get here we didn't select manually. result = SelectCode(); // Select via table-driven isel. if (result_is_good()) { return; } switch (Opcode) { ...do a bunch of manual selection, some that used to be above and in legalize/lowering... } cannot_select_abort(); }> You're saying that we do this on a node-by-node basis?You mean on an SDNode basis? I don't think so. As I said, details have to be worked out but I imagine we'd send the selection DAG to the tblgen-generated code as we do now and any "leftover" bits would have to be processed afterward by the manual selector. Now, there is a phase ordering issue in that some of the legalize/lowering code massages the tree so the tblgen stuff can make a "good" match. We probably still want to do that in some cases so we keep that code where it is. What I'm aiming at getting rid of is all of the code that does: if (this_is_already_matchable()) { return; }> The reason that codegen aborts on unselectable operations is that they > are invalid and should not be formed. Your example of vbroadcast is a > great one: the X86ISD node for it *should not take a vector register > input*. If it does, then the X86ISD node is incorrectly defined.What I'm saying is that there would be no X86ISD node. If the pattern is there, tblgen-produced code will match it. If not, we have to lower it manually and would just do what we'd have done in legalize/lowering before, the difference being we'd now do it after table-driven isel rather than before. -Dave