Dan Gohman
2008-Aug-02 21:47 UTC
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
Here are some ideas that came out of brainstorming during and after the dev meeting. Disclaimer: they're just ideas :-). * Vector Gather/Scatter One way to do gather/scatter would be to extend vector types to support pointer element types, and extend load, store, and getelementptr to operate on vectors of pointers. A typical gather sequence would then look like this: %vi = load <2 x i64>* %indices ; load a vector of indices %vp = gep <2 x f32*> %base, <2 x i64> %vi ; compute an address vector %vx = load <2 x f32*> %vp ; gather This is significantly less ugly than a build-it-yourself gather, and would be much friendlier to masks (below) than the "individually extract each i1 from the mask" approach. Note that this wouldn't support multiple alignments or multiple address spaces in a single gather/scatter. Similarly, volatile would be all-or-nothing. These don't seem like show-stoppers though. This would complicate analyses that look at load and store addresses, but if we really want to do gather/scatter without messes, this might be an acceptable tradeoff. * Masks While adding a mask operand to every instruction that needs it would serve the intended purpose, it would also enlarge and complicate IR, even in code that doesn't need masks. It's a common use-case to have a single mask used by many adjacent instructions, so this would also be highly redundant. An alternative that exploits this common use-case is to add a new applymask instruction: %w = applymask <2 x f32> %v, <2 x i1> %m The semantics would be to copy %v into %w, and implicitly apply mask %m to all users (recursively) of %w, unless overridden by another applymask. For example: %p = applymask <2 x f32*> %q, <2 x i1> %m %x = load <2 x f32*>* %p ; implicitly masked by %m %y = add <2 x f32> %x, %w ; implicitly masked by %m %z = mul <2 x f32> %y, %y ; implicitly masked by %m This approach minimizes enlargement of IR in code that doesn't use masks, and it reduces the impact on complexity -- a pass like instcombine that wants to combine an instruction and one of its operands into a single instruction could safely do so without knowing anything about masks. The applymask instruction could apply to scalars as well as vectors. The set of instructions requiring mask legalization would be found by recursively traversing the uses of all applymask instructions. extractvalue on a masked-out value yields undef. overriding a value's mask with a new applymask instruction that de-masks any elements leaves those elements set to undef. It would be illegal for an instruction with multiple operands to have multiple distinct masks. This includes PHI nodes. Insertelement and shufflevector with masked-out values would not be immune to the mask; assigning a value to a masked-out element of a vector does not make that element non-undef. Code wanting to fill in masked-out values must apply a new mask to the operands first. Masks would not themselves be maskable. That is, the result of an applymask could not be used (even indirectly) as the second operand of another applymask. Dan
Duncan Sands
2008-Aug-03 10:39 UTC
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
Hi,> The set of instructions requiring mask legalization would be found by > recursively traversing the uses of all applymask instructions.what if the applymask is in one function, but the use of it is in another? Ciao, Duncan.
Dan Gohman
2008-Aug-04 20:58 UTC
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
On Aug 3, 2008, at 3:39 AM, Duncan Sands wrote:> Hi, > >> The set of instructions requiring mask legalization would be found by >> recursively traversing the uses of all applymask instructions. > > what if the applymask is in one function, but the use of it is in > another?I think we could just disallow this. If we get to the point of wanting mask-aware library routines, it might make sense to introduce an unmask instruction, which would be a sort of inverse to applymask that would return the original value, with any masked-out portion set to undef. Front-ends would then explicitly pass the mask along with the unmasked arguments to library routines, which would reapply the mask as needed. It's conceivable that we might want something even fancier than that in the future, but we can figure that out later. Thanks, Dan
David Greene
2008-Aug-04 21:02 UTC
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
On Saturday 02 August 2008 16:47, Dan Gohman wrote:> * Vector Gather/Scatter > > One way to do gather/scatter would be to extend vector types to > support pointer element types, and extend load, store, and getelementptr > to operate on vectors of pointers. > > A typical gather sequence would then look like this: > > %vi = load <2 x i64>* %indices ; load a vector of indices > %vp = gep <2 x f32*> %base, <2 x i64> %vi ; compute an address vector > %vx = load <2 x f32*> %vp ; gatherThis looks very good to me. It will make vector gather/scatter code much cleaner than the "extract data and mask bits" stuff we talked about at the meeting.> Note that this wouldn't support multiple alignments or multiple > address spaces in a single gather/scatter. Similarly, volatile > would be all-or-nothing. These don't seem like show-stoppers though.Nope. If alignment is a concern, my assumption would be that the same alignment would be required on all elements, otherwise one would not be able to vectorize the gather/scatter. One doesn't generally mix data types in a gather/scatter operations.> This would complicate analyses that look at load and store addresses, > but if we really want to do gather/scatter without messes, this might be > an acceptable tradeoff.By "complicate" do you mean "need to look at multiple addresses from a single instruction?" Or is there more than that? I'm trying to understand all the implications.> While adding a mask operand to every instruction that needs it would > serve the intended purpose, it would also enlarge and complicate IR, > even in code that doesn't need masks. It's a common use-case to have a > single mask used by many adjacent instructions, so this would also be > highly redundant.But explicit is better than implicit in my experience. It's also the LLVM philosophy to be as explicit as possible.> An alternative that exploits this common use-case is to add a new > applymask instruction: > > %w = applymask <2 x f32> %v, <2 x i1> %m > > The semantics would be to copy %v into %w, and implicitly apply mask %m > to all users (recursively) of %w, unless overridden by another > applymask. For example: > > %p = applymask <2 x f32*> %q, <2 x i1> %m > %x = load <2 x f32*>* %p ; implicitly masked by %m > %y = add <2 x f32> %x, %w ; implicitly masked by %m > %z = mul <2 x f32> %y, %y ; implicitly masked by %mYuck. I don't like this at all. It makes reading the IR harder because now you need to worry about context. Not all dependencies are readily expressed in the instructions. How would one express TableGen patterns for such things? My understanding is that we came away with a general agreement to add mask support to operations that can trap and to memory operations, That would mean adding masks to floating-point arithmetic and memory operations. As I recall, Chris experssed some interest in create separate integer and fp arithmetic instructions anyway, so it doesn't seem to be a lot of additional work to add masks to the fp side since instcombine, et. al. will need to know about entirely new operations anyway. We concluded that operation results would be undefined for vector elements corresponding to a zero mask bit. We also talked about adding a vector select, which is crucial for any code that uses masks. -Dave
Evan Cheng
2008-Aug-04 22:02 UTC
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
I agree with David. I don't like the implicit use of masks. That seems to unnecessarily complicate LLVM IR. Evan On Aug 4, 2008, at 2:02 PM, David Greene wrote:> >> While adding a mask operand to every instruction that needs it would >> serve the intended purpose, it would also enlarge and complicate IR, >> even in code that doesn't need masks. It's a common use-case to >> have a >> single mask used by many adjacent instructions, so this would also be >> highly redundant. > > But explicit is better than implicit in my experience. It's also > the LLVM > philosophy to be as explicit as possible. > >> An alternative that exploits this common use-case is to add a new >> applymask instruction: >> >> %w = applymask <2 x f32> %v, <2 x i1> %m >> >> The semantics would be to copy %v into %w, and implicitly apply >> mask %m >> to all users (recursively) of %w, unless overridden by another >> applymask. For example: >> >> %p = applymask <2 x f32*> %q, <2 x i1> %m >> %x = load <2 x f32*>* %p ; implicitly masked by %m >> %y = add <2 x f32> %x, %w ; implicitly masked by %m >> %z = mul <2 x f32> %y, %y ; implicitly masked by %m > > Yuck. I don't like this at all. It makes reading the IR harder > because now > you need to worry about context. Not all dependencies are readily > expressed > in the instructions. How would one express TableGen patterns for such > things? > > My understanding is that we came away with a general agreement to add > mask support to operations that can trap and to memory operations, > That > would mean adding masks to floating-point arithmetic and memory > operations. > As I recall, Chris experssed some interest in create separate > integer and fp > arithmetic instructions anyway, so it doesn't seem to be a lot of > additional > work to add masks to the fp side since instcombine, et. al. will > need to know > about entirely new operations anyway. > > We concluded that operation results would be undefined for vector > elements > corresponding to a zero mask bit. > > We also talked about adding a vector select, which is crucial for > any code > that uses masks. > > -Dave > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Dan Gohman
2008-Aug-04 22:56 UTC
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
On Aug 4, 2008, at 2:02 PM, David Greene wrote:> On Saturday 02 August 2008 16:47, Dan Gohman wrote: > >> * Vector Gather/Scatter>> This would complicate analyses that look at load and store addresses, >> but if we really want to do gather/scatter without messes, this >> might be >> an acceptable tradeoff. > > By "complicate" do you mean "need to look at multiple addresses from a > single instruction?" Or is there more than that? I'm trying to > understand > all the implications.I mean just that -- we have a fair amount of code built around looking at the addresses of load and store nodes that in some cases would need to be restructured if it would cope with multiple addresses at a time.> > >> While adding a mask operand to every instruction that needs it would >> serve the intended purpose, it would also enlarge and complicate IR, >> even in code that doesn't need masks. It's a common use-case to >> have a >> single mask used by many adjacent instructions, so this would also be >> highly redundant. > > But explicit is better than implicit in my experience. It's also > the LLVM > philosophy to be as explicit as possible. > >> An alternative that exploits this common use-case is to add a new >> applymask instruction: >> >> %w = applymask <2 x f32> %v, <2 x i1> %m >> >> The semantics would be to copy %v into %w, and implicitly apply >> mask %m >> to all users (recursively) of %w, unless overridden by another >> applymask. For example: >> >> %p = applymask <2 x f32*> %q, <2 x i1> %m >> %x = load <2 x f32*>* %p ; implicitly masked by %m >> %y = add <2 x f32> %x, %w ; implicitly masked by %m >> %z = mul <2 x f32> %y, %y ; implicitly masked by %m > > Yuck. I don't like this at all. It makes reading the IR harder > because now > you need to worry about context.I don't disagree with these. I think it's a trade-off, with LLVM design philosophy and IR cleanliness arguments on both sides. The applymask approach leverages use-def information rather than what can be thought of as duplicating a subset of it, making the IR less cluttered. And, it makes it trivially straightforward to write passes that work correctly on both masked and unmasked code.> Not all dependencies are readily expressed > in the instructions. How would one express TableGen patterns for such > things?The syntax above is an idea for LLVM IR. SelectionDAG doesn't necessarily have to use the same approach.> > > My understanding is that we came away with a general agreement to add > mask support to operations that can trap and to memory operations, > That > would mean adding masks to floating-point arithmetic and memory > operations. > As I recall, Chris experssed some interest in create separate > integer and fp > arithmetic instructions anyway, so it doesn't seem to be a lot of > additional > work to add masks to the fp side since instcombine, et. al. will > need to know > about entirely new operations anyway.I think we all recognize the need, and in the absence of better alternatives are willing to accept the mask operand approach. It would have a significant impact on everyone, even those that don't use masks. I don't want to stand in the way of progress, but this alternative approach seems promising enough to be worth consideration.> > > We concluded that operation results would be undefined for vector > elements > corresponding to a zero mask bit. > > We also talked about adding a vector select, which is crucial for > any code > that uses masks.Right. This applymask idea doesn't conflict with these. Dan
Devang Patel
2008-Aug-12 21:00 UTC
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
Dan, On Aug 2, 2008, at 2:47 PM, Dan Gohman wrote:> Masks would not themselves be maskable. That is, the result of an > applymask could not be used (even indirectly) as the second operand > of another applymask.What is the motivation behind this restriction ? - Devang
Daniel Berlin
2008-Aug-12 21:51 UTC
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
On Tue, Aug 12, 2008 at 5:00 PM, Devang Patel <dpatel at apple.com> wrote:> Dan, > > On Aug 2, 2008, at 2:47 PM, Dan Gohman wrote: > >> Masks would not themselves be maskable. That is, the result of an >> applymask could not be used (even indirectly) as the second operand >> of another applymask. > > What is the motivation behind this restriction ?It also seems hard to enforce and completely arbitrary :) Not to mention value numbering could easily be taught to combine masks when value numbering to simply dependent applymasks, etc.
Dan Gohman
2008-Aug-12 21:58 UTC
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
On Tue, 2008-08-12 at 14:00 -0700, Devang Patel wrote:> Dan, > > On Aug 2, 2008, at 2:47 PM, Dan Gohman wrote: > > > Masks would not themselves be maskable. That is, the result of an > > applymask could not be used (even indirectly) as the second operand > > of another applymask. > > What is the motivation behind this restriction ?This is basically preventing the applymask instruction itself from being masked, which doesn't really make sense -- "conditionally use this mask using this other mask". Mask values are just vectors of i1, so the way to compute the intersection of two mask values is to use an 'and' instruction. Dan
Maybe Matching Threads
- [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
- [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
- [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
- [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
- [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR