Dan Gohman
2008-Aug-05 17:39 UTC
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
On Tue, August 5, 2008 8:32 am, David Greene wrote:> On Monday 04 August 2008 17:56, Dan Gohman wrote: >> The applymask approach leverages use-def information rather than >> what can be thought of as duplicating a subset of it, making the IR > > I don't understand what you mean by "duplicating" here.If you look just at the case where every instruction in a given use-def sub-dag uses the same mask, adding that mask as an operand to all of them is largely just duplicating the information about them all being connected. This is the common case that applymask is aimed at. In the case where multiple masks are used, applymask can still cope, but the neat thing is that in this case it serves to mark the dataflow edges where masks change.> You need some > kind of use-def information for the masks themselves because at some > point they need to be register-allocated.What I'm talking about here is just in LLVM IR. I agree that we want mask registers as operands during register allocation, and probably also instruction selection.> >> less cluttered. And, it makes it trivially straightforward to write >> passes that work correctly on both masked and unmasked code. > > I had a thought on this, actually. Let's say the mask is the very last > operand on masked instructions. Most passes don't care about the mask > at all. They can just ignore it. Since they don't look at the extra > operand > right now, there shouldn't be many changes necessary (some asserts > may need to be fixed, etc.). > > Think about instcombine. It's matching patterns. If the matcher doesn't > look at masks, that may be ok most of the time (mod corner cases which > I fully appreciate can be a real pain to track down). If we want fancy > instcombine tricks that understand masks, we can add those later.If masks are operands, instcombine will need to check if all the relevent masks match before many of the transformations it does, and it'll need to take care to put the mask operand in the instructions it creates. With applymask, I believe instcombine wouldn't require any modifications, except things like "case ApplyMaskInst: break;" in a few places. Applymask makes masks in the IR so easy to reason about, most passes won't need to do any special reasoning.> >> > Not all dependencies are readily expressed >> > in the instructions. How would one express TableGen patterns for such >> > things? >> >> The syntax above is an idea for LLVM IR. SelectionDAG doesn't >> necessarily >> have to use the same approach. > > What do you mean by "ideal for LLVM IR?" This looks very much _not_ ideal > to > me from a debugging standpoint. It's difficult to understand. It took me > reading through the proposal a few times to grok what you are talking > about.I said "idea", not "ideal" :-). But I just meant that LLVM IR and SelectionDAG don't have to do the same thing.> >> I think we all recognize the need, and in the absence of better >> alternatives are willing to accept the mask operand approach. It would >> have a significant impact on everyone, even those that don't use masks. > > How do you define "significant impact?" Compile time? Development > effort? > Transition pain? All of the above? More?With mask operands, many passes will need to explicitly check for masks even if they don't care and just want to be conservatively correct. With applymask, passes will often be able to operate on masked IR just as aggressively as non-masked IR.>> I don't want to stand in the way of progress, but this alternative >> approach seems promising enough to be worth consideration. > > Alternatives are always welcome and worth considering. I'm looking at the > kind of things the LLVM community is going to want to support and I'm > pretty sure masks are going to be a very big part of architectures in the > future. We're done with clock speed improvements, so we need to rely on > architecture more. Vectorization is a well-known technique to improve > single thread performance and masks are critical to producing efficient > vector > code. > > If y'all agree with this premise, it seems to me that we want to support > such architectures in as straightforward a way as possible so as to > minimize > future pain when we're all writing complex and beautiful vector hacks. :)I think we basically agree here :-). For me, that applymask simplifies the reasoning that optimizers must do for masked instructions is a large part of what motivates it for consideration.> What can we learn from the IA64 and ARM backends? How do they handle > their masks (scalar predication)? Is all the if-conversion done in > target-specific passes?It's in lib/CodeGen/IfConversion.cpp, but it wouldn't be usable for vectors. If-conversion for vectors must be done as part of the vectorization (whether that's the user/front-end or the optimizer). Dan
David Greene
2008-Aug-05 18:27 UTC
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
On Tuesday 05 August 2008 12:39, Dan Gohman wrote:> On Tue, August 5, 2008 8:32 am, David Greene wrote: > > On Monday 04 August 2008 17:56, Dan Gohman wrote: > >> The applymask approach leverages use-def information rather than > >> what can be thought of as duplicating a subset of it, making the IR > > > > I don't understand what you mean by "duplicating" here. > > If you look just at the case where every instruction in a > given use-def sub-dag uses the same mask, adding that mask as > an operand to all of them is largely just duplicating the > information about them all being connected. This is the common > case that applymask is aimed at.But why couldn't one argue the same thing for other operands and make LLVM IR a stack machine? It just seems strange to me to treat a fundamental part of the IR differently simply because it came along later.> In the case where multiple masks are used, applymask can still > cope, but the neat thing is that in this case it serves to > mark the dataflow edges where masks change.What advantage does that give over explicit masks in SSA form? PHI nodes do the same thing.> > You need some > > kind of use-def information for the masks themselves because at some > > point they need to be register-allocated. > > What I'm talking about here is just in LLVM IR. I agree that we want > mask registers as operands during register allocation, and probably > also instruction selection.So how does isel go from an implicit-masked LLVM IR to an explicit-masked MachineInstruction? And then wouldn't all Machine passes need to understand mask operands?> > Think about instcombine. It's matching patterns. If the matcher doesn't > > look at masks, that may be ok most of the time (mod corner cases which > > I fully appreciate can be a real pain to track down). If we want fancy > > instcombine tricks that understand masks, we can add those later. > > If masks are operands, instcombine will need to check if all the > relevent masks match before many of the transformations it does,I'm not sure about "many." Many of instcombines tricks are based on expression reorderings which are valid with or without masks. We might need to do some mask manipulation (ANDing or ORing them, for example) but we'd have to do that anyway with applymask if the incoming masks to expressions being combined don't match. We'd have to introduce a new applymask that takes the result of the logical mask operation. So, it seems to me that instcombine needs to check and manipulate masks either way. We'd have to be careful about not introducing traps but again instcombine will have to worry about that with applymask as well in the general case. It strikes me that you're indicating that with applymask, instcombine can just ignore the implicit mask operand on every instruction (because it isn't explicit and instcombine doesn't look at it right now). I'm in fact saying the exact same thing, except masks would be explicit and instcombine would just not look at those operands. Neither solution eliminates the need for instcombine to be careful and consult masks from time to time. Perhaps I'm totally missing something. Concrete examples would be helpful.> and it'll need to take care to put the mask operand in the > instructions it creates.Or create a new applymask if it needs to generate new masks as the result of combining.> With applymask, I believe instcombine wouldn't require any > modifications, except things like "case ApplyMaskInst: break;" in > a few places. Applymask makes masks in the IR so easy to reason > about, most passes won't need to do any special reasoning.I don't think this is quite right. See the comment above.> >> The syntax above is an idea for LLVM IR. SelectionDAG doesn't > >> necessarily > >> have to use the same approach. > > > > What do you mean by "ideal for LLVM IR?" This looks very much _not_ > > ideal to > > me from a debugging standpoint. It's difficult to understand. It took > > me reading through the proposal a few times to grok what you are talking > > about. > > I said "idea", not "ideal" :-). But I just meant that LLVM IR > and SelectionDAG don't have to do the same thing.Oops, sorry about the misread. :) I don't think SelectionDAG is really the issue. The issue is how to reason about vector code and debug compilers. In my experience, explicit is usually the better way to go for maintainability.> > How do you define "significant impact?" Compile time? Development > > effort? > > Transition pain? All of the above? More? > > With mask operands, many passes will need to explicitly check for > masks even if they don't care and just want to be conservatively > correct.If passes need to worry about masks for correctness, that doesn't change with the representation of masks. That's a program dataflow issue and applies no matter how it's represented.> With applymask, passes will often be able to operate on masked IR > just as aggressively as non-masked IR.And they should be able to do so with any other representation as well. The compiler needs to maintain semantics, not syntax.> > If y'all agree with this premise, it seems to me that we want to support > > such architectures in as straightforward a way as possible so as to > > minimize > > future pain when we're all writing complex and beautiful vector hacks. > > :) > > I think we basically agree here :-). For me, that applymask > simplifies the reasoning that optimizers must do for masked > instructions is a large part of what motivates it for > consideration.I'm not convinced that there is a simplification of reasoning but I'm open to changing my mind. Again, concrete examples would be really helpful. I do agree that in some cases code rewrites might be avoided because existing Instructions will retain the same number of operands they have now. But that's not a "reasoning" issue. That's a mechanics issue (how you implement the algorithm). Avoiding the one-time pain of converting this kind of code doesn't seem worth the cost to maintainability to me. The key with all of this is going to be introducing it incrementally. The first step is to support vector of i1 all the way through isel. Next is vector select. Then we can start selectively adding masks to Instructions. As we do this, if a vector operation can trap and a mask isn't available for that Instruction yet, then we just can't vectorize that code yet. There's no loss from what we have right now, which is total inability to vectorize conditional code in a reasonable way, save for a few tricks for corner cases. If instcombine isn't ready to handle masks on some Instructions, then we can't vectorize in those cases. We don't *need* to convert instcombine or any other pass all at once. We can do it gradually and introduce more vector capability as we go.> > What can we learn from the IA64 and ARM backends? How do they handle > > their masks (scalar predication)? Is all the if-conversion done in > > target-specific passes? > > It's in lib/CodeGen/IfConversion.cpp, but it wouldn't be usable for > vectors. If-conversion for vectors must be done as part of the > vectorization (whether that's the user/front-end or the optimizer).I meant using them as examples of representation. I know that scalar if-conversion is different than vector if-conversion. I'm curious about how IfConversion.cpp represents the predicates it creates. Since it's in CodeGen I guess the MachineInstruction representation just includes them as extra operands. -Dave
David Greene
2008-Aug-07 19:13 UTC
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
On Tuesday 05 August 2008 13:27, David Greene wrote:> Neither solution eliminates the need for instcombine to be careful and > consult masks from time to time. > > Perhaps I'm totally missing something. Concrete examples would be helpful.Ok, so I took my own advice and thought about CSE and instcombine a bit. I wrote the code by hand in a sort of pseudo-llvm language, so don't crucify me for mistakes. :) CSE is an optimization that needs to pay attention to masks. You can't (easily) CSE an expression with different masks. Using a mask-per-operation setup, code might look like this: CSE Mask Example Mask-per-operation: %m1 = ... %m2 = ... %r1 = add %r2, %r3, %m1 %r4 = add %r2, %r3, %m2 We can only CSE the adds if %m1 == %m2. I'd think it would be sufficient to just check whether the mask registers are the same, though you could imagine walking back through the use-def chains to do more complex analysis. Esentially, the mask operand becomes part of the expression you're trying to match. CSE definitely needs to be changed to understand masks. With applymask, things look something like this: applymask: %m1 = ... %m2 = ... %r5 = applymask %r2, %m1 %r6 = applymask %r3, %m1 %r7 = applymask %r2, %m2 %r8 = applymask %r3, %m2 %r1 = add %r5, %r6 %r4 = add %r7, %r8 This is interesting because CSE will not do the optimization since the operands of the adds are different. CSE does not have to change to understand masks. Value numbering will have to understand applymask to make sure it doesn't do something bad because it doesn't care about the names of operands but the renaming of registers required by applymask is somewhat attractive. Now let's look at an instcombine example: Instcombine example: -A + -B --> -(A + B) Mask-per-operation: %m1 = ... %m2 = ... %m3 = ... %r1 = neg %r2, %m1 %r1 = select %r1, %r2, %m1 # Define all elements %r3 = neg %r4, %m2 %r3 = select %r3, %r4, %m2 # Define all elements %r5 = add %r1, %r3, %m3 Since the masks on the negs are different we have to be careful in how we do this. In the easiest case, we simply disallow combining here, which requires instcombine to examine the masks and, as in the CSE example above, potentially just look at the mask register names to determine equality. One could imagine doing the combining this way: %m1 = ... %m2 = ... %m3 = ... # Do the combining where legal %m4 = and %m1, %m2 %r6 = add %r2, %r4, %m4 %r7 = neg %r6, %m4 %r7 = select %r7, %r6, %m4 # Define all elements for final merge # Take care of the rest %m6 = xor %m1, %m4 %m7 = xor %m2, %m4 %m8 = xor %m3, %m4 %r1 = neg %r2, %m6 %r1 = select %r1, %r2, %m6 # Define all elements for add %r3 = neg %r4, %m7 %r3 = select %r3, %r4, %m7 # Define all elements for add %r5 = add %r1, %r3, %m8 %r5 = select %r5, %r1, %m7 # Define all elements for final merge # Do the final merge %r5 = select %r7, %r5, %m4 But this is obviously a loss. We may be able to get away with eliminating some merges (I'm being conservative about undefined elements here) but it would still be a loss. applymask: %m1 = ... %m2 = ... %m3 = ... %r6 = applymask %r2, %m1 %r7 = applymask %r4, %m2 %r1 = neg %r6 %r3 = neg %r7 %r8 = applymask %r1. %m3 %r9 = applymask %r3, %m3 %r5 = add %r8, %r9 This is really no different than the mask-per-operation case. Instcombine still has to consider masks when evaluating the legality and profitability of combining. Here instcombine would have to walk back through the def-use lists on the operands of the neg and add to find if they are masked and then determine the mask value (or mask register name for simpler equality testing). Again, doing the combining in this case would be detrimental. My guess is that most combining turns out worse code when operations are masked and the masks are not equal. Given these two examples, I see some benefit to applymask for transformations like CSE and other pattern-matching algorithms where the renaming of registers that applymask requires converts previously matching patterns into patterns that don't match. For something like instcombine or value numbering that is based on value flow and/or pure semantics, I don't see any benefit. Of course, these are just two examples. One could repeat the exercise for any transformation we're interested in. I hope this is helpful in moving us along in the discussion. Feedback and comments are welcome! -Dave
Possibly Parallel Threads
- [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
- [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
- [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
- [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
- Own R function doubt