Dan Gohman
2008-Aug-04 22:56 UTC
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
On Aug 4, 2008, at 2:02 PM, David Greene wrote:> On Saturday 02 August 2008 16:47, Dan Gohman wrote: > >> * Vector Gather/Scatter>> This would complicate analyses that look at load and store addresses, >> but if we really want to do gather/scatter without messes, this >> might be >> an acceptable tradeoff. > > By "complicate" do you mean "need to look at multiple addresses from a > single instruction?" Or is there more than that? I'm trying to > understand > all the implications.I mean just that -- we have a fair amount of code built around looking at the addresses of load and store nodes that in some cases would need to be restructured if it would cope with multiple addresses at a time.> > >> While adding a mask operand to every instruction that needs it would >> serve the intended purpose, it would also enlarge and complicate IR, >> even in code that doesn't need masks. It's a common use-case to >> have a >> single mask used by many adjacent instructions, so this would also be >> highly redundant. > > But explicit is better than implicit in my experience. It's also > the LLVM > philosophy to be as explicit as possible. > >> An alternative that exploits this common use-case is to add a new >> applymask instruction: >> >> %w = applymask <2 x f32> %v, <2 x i1> %m >> >> The semantics would be to copy %v into %w, and implicitly apply >> mask %m >> to all users (recursively) of %w, unless overridden by another >> applymask. For example: >> >> %p = applymask <2 x f32*> %q, <2 x i1> %m >> %x = load <2 x f32*>* %p ; implicitly masked by %m >> %y = add <2 x f32> %x, %w ; implicitly masked by %m >> %z = mul <2 x f32> %y, %y ; implicitly masked by %m > > Yuck. I don't like this at all. It makes reading the IR harder > because now > you need to worry about context.I don't disagree with these. I think it's a trade-off, with LLVM design philosophy and IR cleanliness arguments on both sides. The applymask approach leverages use-def information rather than what can be thought of as duplicating a subset of it, making the IR less cluttered. And, it makes it trivially straightforward to write passes that work correctly on both masked and unmasked code.> Not all dependencies are readily expressed > in the instructions. How would one express TableGen patterns for such > things?The syntax above is an idea for LLVM IR. SelectionDAG doesn't necessarily have to use the same approach.> > > My understanding is that we came away with a general agreement to add > mask support to operations that can trap and to memory operations, > That > would mean adding masks to floating-point arithmetic and memory > operations. > As I recall, Chris experssed some interest in create separate > integer and fp > arithmetic instructions anyway, so it doesn't seem to be a lot of > additional > work to add masks to the fp side since instcombine, et. al. will > need to know > about entirely new operations anyway.I think we all recognize the need, and in the absence of better alternatives are willing to accept the mask operand approach. It would have a significant impact on everyone, even those that don't use masks. I don't want to stand in the way of progress, but this alternative approach seems promising enough to be worth consideration.> > > We concluded that operation results would be undefined for vector > elements > corresponding to a zero mask bit. > > We also talked about adding a vector select, which is crucial for > any code > that uses masks.Right. This applymask idea doesn't conflict with these. Dan
David Greene
2008-Aug-05 15:32 UTC
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
On Monday 04 August 2008 17:56, Dan Gohman wrote:> > By "complicate" do you mean "need to look at multiple addresses from a > > single instruction?" Or is there more than that? I'm trying to > > understand > > all the implications. > > I mean just that -- we have a fair amount of code built around looking > at the addresses of load and store nodes that in some cases would need > to be restructured if it would cope with multiple addresses at a time.Ok. I should think that this would be feasible to do. In the worst case it's an N^2 loop looking at all pairs. And N is usually going to be small.> >> %p = applymask <2 x f32*> %q, <2 x i1> %m > >> %x = load <2 x f32*>* %p ; implicitly masked by %m > >> %y = add <2 x f32> %x, %w ; implicitly masked by %m > >> %z = mul <2 x f32> %y, %y ; implicitly masked by %m > > > > Yuck. I don't like this at all. It makes reading the IR harder > > because now > > you need to worry about context. > > I don't disagree with these. I think it's a trade-off, with LLVM > design philosophy and IR cleanliness arguments on both sides. > > The applymask approach leverages use-def information rather than > what can be thought of as duplicating a subset of it, making the IRI don't understand what you mean by "duplicating" here. You need some kind of use-def information for the masks themselves because at some point they need to be register-allocated.> less cluttered. And, it makes it trivially straightforward to write > passes that work correctly on both masked and unmasked code.I had a thought on this, actually. Let's say the mask is the very last operand on masked instructions. Most passes don't care about the mask at all. They can just ignore it. Since they don't look at the extra operand right now, there shouldn't be many changes necessary (some asserts may need to be fixed, etc.). Think about instcombine. It's matching patterns. If the matcher doesn't look at masks, that may be ok most of the time (mod corner cases which I fully appreciate can be a real pain to track down). If we want fancy instcombine tricks that understand masks, we can add those later.> > Not all dependencies are readily expressed > > in the instructions. How would one express TableGen patterns for such > > things? > > The syntax above is an idea for LLVM IR. SelectionDAG doesn't > necessarily > have to use the same approach.What do you mean by "ideal for LLVM IR?" This looks very much _not_ ideal to me from a debugging standpoint. It's difficult to understand. It took me reading through the proposal a few times to grok what you are talking about.> I think we all recognize the need, and in the absence of better > alternatives are willing to accept the mask operand approach. It would > have a significant impact on everyone, even those that don't use masks.How do you define "significant impact?" Compile time? Development effort? Transition pain? All of the above? More? For architectures that don't use masks, either the mask gets set to all 1's or we have non-masked versions of operators. I honestly don't know which is the desireable route to take. My guess is that the optimizers will have to understand whether or not the target architecture supports masks and not generate them (e.g. no if-conversion) if the target doesn't support them. I wonder if there is some way to un-if-convert to eliminate masks if necessary. I'm thinking about code portability and JIT issues when readfing in LLVM IR that was produced at some earlier time. Perhaps this isn't an issue we need to worry about right now.> I don't want to stand in the way of progress, but this alternative > approach seems promising enough to be worth consideration.Alternatives are always welcome and worth considering. I'm looking at the kind of things the LLVM community is going to want to support and I'm pretty sure masks are going to be a very big part of architectures in the future. We're done with clock speed improvements, so we need to rely on architecture more. Vectorization is a well-known technique to improve single thread performance and masks are critical to producing efficient vector code. If y'all agree with this premise, it seems to me that we want to support such architectures in as straightforward a way as possible so as to minimize future pain when we're all writing complex and beautiful vector hacks. :) What can we learn from the IA64 and ARM backends? How do they handle their masks (scalar predication)? Is all the if-conversion done in target-specific passes?> > We concluded that operation results would be undefined for vector > > elements > > corresponding to a zero mask bit. > > > > We also talked about adding a vector select, which is crucial for > > any code > > that uses masks. > > Right. This applymask idea doesn't conflict with these.Yep. I just wanted to be thorough. -Dave
Dan Gohman
2008-Aug-05 17:39 UTC
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
On Tue, August 5, 2008 8:32 am, David Greene wrote:> On Monday 04 August 2008 17:56, Dan Gohman wrote: >> The applymask approach leverages use-def information rather than >> what can be thought of as duplicating a subset of it, making the IR > > I don't understand what you mean by "duplicating" here.If you look just at the case where every instruction in a given use-def sub-dag uses the same mask, adding that mask as an operand to all of them is largely just duplicating the information about them all being connected. This is the common case that applymask is aimed at. In the case where multiple masks are used, applymask can still cope, but the neat thing is that in this case it serves to mark the dataflow edges where masks change.> You need some > kind of use-def information for the masks themselves because at some > point they need to be register-allocated.What I'm talking about here is just in LLVM IR. I agree that we want mask registers as operands during register allocation, and probably also instruction selection.> >> less cluttered. And, it makes it trivially straightforward to write >> passes that work correctly on both masked and unmasked code. > > I had a thought on this, actually. Let's say the mask is the very last > operand on masked instructions. Most passes don't care about the mask > at all. They can just ignore it. Since they don't look at the extra > operand > right now, there shouldn't be many changes necessary (some asserts > may need to be fixed, etc.). > > Think about instcombine. It's matching patterns. If the matcher doesn't > look at masks, that may be ok most of the time (mod corner cases which > I fully appreciate can be a real pain to track down). If we want fancy > instcombine tricks that understand masks, we can add those later.If masks are operands, instcombine will need to check if all the relevent masks match before many of the transformations it does, and it'll need to take care to put the mask operand in the instructions it creates. With applymask, I believe instcombine wouldn't require any modifications, except things like "case ApplyMaskInst: break;" in a few places. Applymask makes masks in the IR so easy to reason about, most passes won't need to do any special reasoning.> >> > Not all dependencies are readily expressed >> > in the instructions. How would one express TableGen patterns for such >> > things? >> >> The syntax above is an idea for LLVM IR. SelectionDAG doesn't >> necessarily >> have to use the same approach. > > What do you mean by "ideal for LLVM IR?" This looks very much _not_ ideal > to > me from a debugging standpoint. It's difficult to understand. It took me > reading through the proposal a few times to grok what you are talking > about.I said "idea", not "ideal" :-). But I just meant that LLVM IR and SelectionDAG don't have to do the same thing.> >> I think we all recognize the need, and in the absence of better >> alternatives are willing to accept the mask operand approach. It would >> have a significant impact on everyone, even those that don't use masks. > > How do you define "significant impact?" Compile time? Development > effort? > Transition pain? All of the above? More?With mask operands, many passes will need to explicitly check for masks even if they don't care and just want to be conservatively correct. With applymask, passes will often be able to operate on masked IR just as aggressively as non-masked IR.>> I don't want to stand in the way of progress, but this alternative >> approach seems promising enough to be worth consideration. > > Alternatives are always welcome and worth considering. I'm looking at the > kind of things the LLVM community is going to want to support and I'm > pretty sure masks are going to be a very big part of architectures in the > future. We're done with clock speed improvements, so we need to rely on > architecture more. Vectorization is a well-known technique to improve > single thread performance and masks are critical to producing efficient > vector > code. > > If y'all agree with this premise, it seems to me that we want to support > such architectures in as straightforward a way as possible so as to > minimize > future pain when we're all writing complex and beautiful vector hacks. :)I think we basically agree here :-). For me, that applymask simplifies the reasoning that optimizers must do for masked instructions is a large part of what motivates it for consideration.> What can we learn from the IA64 and ARM backends? How do they handle > their masks (scalar predication)? Is all the if-conversion done in > target-specific passes?It's in lib/CodeGen/IfConversion.cpp, but it wouldn't be usable for vectors. If-conversion for vectors must be done as part of the vectorization (whether that's the user/front-end or the optimizer). Dan
Possibly Parallel Threads
- [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
- [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
- [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
- [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
- [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR