thr3ads.net - llvm dev - [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Dan Gohman

2008-Aug-02 21:47 UTC

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

Here are some ideas that came out of brainstorming during and after
the dev meeting. Disclaimer: they're just ideas :-).

* Vector Gather/Scatter

One way to do gather/scatter would be to extend vector types to
support pointer element types, and extend load, store, and getelementptr
to operate on vectors of pointers.

A typical gather sequence would then look like this:

  %vi = load <2 x i64>* %indices             ; load a vector of indices
  %vp = gep <2 x f32*> %base, <2 x i64> %vi  ; compute an address
vector
  %vx = load <2 x f32*> %vp                  ; gather

This is significantly less ugly than a build-it-yourself gather, and
would be much friendlier to masks (below) than the "individually extract
each i1 from the mask" approach.

Note that this wouldn't support multiple alignments or multiple
address spaces in a single gather/scatter. Similarly, volatile
would be all-or-nothing. These don't seem like show-stoppers though.

This would complicate analyses that look at load and store addresses,
but if we really want to do gather/scatter without messes, this might be
an acceptable tradeoff.

* Masks

While adding a mask operand to every instruction that needs it would
serve the intended purpose, it would also enlarge and complicate IR,
even in code that doesn't need masks. It's a common use-case to have a
single mask used by many adjacent instructions, so this would also be
highly redundant.

An alternative that exploits this common use-case is to add a new
applymask instruction:

  %w = applymask <2 x f32> %v, <2 x i1> %m

The semantics would be to copy %v into %w, and implicitly apply mask %m
to all users (recursively) of %w, unless overridden by another
applymask. For example:

  %p = applymask <2 x f32*> %q, <2 x i1> %m
  %x = load <2 x f32*>* %p                   ; implicitly masked by %m
  %y = add <2 x f32> %x, %w                  ; implicitly masked by %m
  %z = mul <2 x f32> %y, %y                  ; implicitly masked by %m

This approach minimizes enlargement of IR in code that doesn't use
masks, and it reduces the impact on complexity -- a pass like
instcombine that wants to combine an instruction and one of its operands
into a single instruction could safely do so without knowing anything
about masks.

The applymask instruction could apply to scalars as well as vectors.

The set of instructions requiring mask legalization would be found by
recursively traversing the uses of all applymask instructions.

extractvalue on a masked-out value yields undef. overriding a value's
mask with a new applymask instruction that de-masks any elements leaves
those elements set to undef.

It would be illegal for an instruction with multiple operands to have
multiple distinct masks. This includes PHI nodes.

Insertelement and shufflevector with masked-out values would not be
immune to the mask; assigning a value to a masked-out element of a
vector does not make that element non-undef. Code wanting to fill in
masked-out values must apply a new mask to the operands first.

Masks would not themselves be maskable. That is, the result of an
applymask could not be used (even indirectly) as the second operand
of another applymask.

Dan

Duncan Sands

2008-Aug-03 10:39 UTC

head link

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

Hi,
> The set of instructions requiring mask legalization would be found by
> recursively traversing the uses of all applymask instructions.
what if the applymask is in one function, but the use of it is in another?

Ciao,

Duncan.

Dan Gohman

2008-Aug-04 20:58 UTC

head link

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

On Aug 3, 2008, at 3:39 AM, Duncan Sands wrote:
> Hi,
>
>> The set of instructions requiring mask legalization would be found by
>> recursively traversing the uses of all applymask instructions.
>
> what if the applymask is in one function, but the use of it is in  
> another?

I think we could just disallow this.

If we get to the point of wanting mask-aware library routines, it might
make sense to introduce an unmask instruction, which would be a sort of
inverse to applymask that would return the original value, with any
masked-out portion set to undef. Front-ends would then explicitly
pass the mask along with the unmasked arguments to library routines,
which would reapply the mask as needed. It's conceivable that we might
want something even fancier than that in the future, but we can figure
that out later.

Thanks,

Dan

David Greene

2008-Aug-04 21:02 UTC

head link

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

On Saturday 02 August 2008 16:47, Dan Gohman wrote:
> * Vector Gather/Scatter
>
> One way to do gather/scatter would be to extend vector types to
> support pointer element types, and extend load, store, and getelementptr
> to operate on vectors of pointers.
>
> A typical gather sequence would then look like this:
>
>   %vi = load <2 x i64>* %indices             ; load a vector of
indices
>   %vp = gep <2 x f32*> %base, <2 x i64> %vi  ; compute an
address vector
>   %vx = load <2 x f32*> %vp                  ; gather
This looks very good to me.  It will make vector gather/scatter code much
cleaner than the "extract data and mask bits" stuff we talked about at
the
meeting.
> Note that this wouldn't support multiple alignments or multiple
> address spaces in a single gather/scatter. Similarly, volatile
> would be all-or-nothing. These don't seem like show-stoppers though.
Nope.  If alignment is a concern, my assumption would be that the same
alignment would be required on all elements, otherwise one would not be
able to vectorize the gather/scatter.  One doesn't generally mix data types
in a gather/scatter operations.
> This would complicate analyses that look at load and store addresses,
> but if we really want to do gather/scatter without messes, this might be
> an acceptable tradeoff.
By "complicate" do you mean "need to look at multiple addresses
from a
single instruction?"  Or is there more than that?  I'm trying to
understand
all the implications.
> While adding a mask operand to every instruction that needs it would
> serve the intended purpose, it would also enlarge and complicate IR,
> even in code that doesn't need masks. It's a common use-case to
have a
> single mask used by many adjacent instructions, so this would also be
> highly redundant.
But explicit is better than implicit in my experience.  It's also the LLVM
philosophy to be as explicit as possible.
> An alternative that exploits this common use-case is to add a new
> applymask instruction:
>
>   %w = applymask <2 x f32> %v, <2 x i1> %m
>
> The semantics would be to copy %v into %w, and implicitly apply mask %m
> to all users (recursively) of %w, unless overridden by another
> applymask. For example:
>
>   %p = applymask <2 x f32*> %q, <2 x i1> %m
>   %x = load <2 x f32*>* %p                   ; implicitly masked by
%m
>   %y = add <2 x f32> %x, %w                  ; implicitly masked by
%m
>   %z = mul <2 x f32> %y, %y                  ; implicitly masked by
%m
Yuck.  I don't like this at all.  It makes reading the IR harder because now
you need to worry about context.  Not all dependencies are readily expressed
in the instructions.  How would one express TableGen patterns for such
things?

My understanding is that we came away with a general agreement to add
mask support to operations that can trap and to memory operations,  That
would mean adding masks to floating-point arithmetic and memory operations.
As I recall, Chris experssed some interest in create separate integer and fp
arithmetic instructions anyway, so it doesn't seem to be a lot of additional
work to add masks to the fp side since instcombine, et. al. will need to know
about entirely new operations anyway.

We concluded that operation results would be undefined for vector elements 
corresponding to a zero mask bit.

We also talked about adding a vector select, which is crucial for any code 
that uses masks.

                                                                -Dave

Evan Cheng

2008-Aug-04 22:02 UTC

head link

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

I agree with David. I don't like the implicit use of masks. That seems  
to unnecessarily complicate LLVM IR.

Evan

On Aug 4, 2008, at 2:02 PM, David Greene wrote:>
>> While adding a mask operand to every instruction that needs it would
>> serve the intended purpose, it would also enlarge and complicate IR,
>> even in code that doesn't need masks. It's a common use-case to
>> have a
>> single mask used by many adjacent instructions, so this would also be
>> highly redundant.
>
> But explicit is better than implicit in my experience.  It's also  
> the LLVM
> philosophy to be as explicit as possible.
>
>> An alternative that exploits this common use-case is to add a new
>> applymask instruction:
>>
>>  %w = applymask <2 x f32> %v, <2 x i1> %m
>>
>> The semantics would be to copy %v into %w, and implicitly apply  
>> mask %m
>> to all users (recursively) of %w, unless overridden by another
>> applymask. For example:
>>
>>  %p = applymask <2 x f32*> %q, <2 x i1> %m
>>  %x = load <2 x f32*>* %p                   ; implicitly masked
by %m
>>  %y = add <2 x f32> %x, %w                  ; implicitly masked
by %m
>>  %z = mul <2 x f32> %y, %y                  ; implicitly masked
by %m
>
> Yuck.  I don't like this at all.  It makes reading the IR harder  
> because now
> you need to worry about context.  Not all dependencies are readily  
> expressed
> in the instructions.  How would one express TableGen patterns for such
> things?
>
> My understanding is that we came away with a general agreement to add
> mask support to operations that can trap and to memory operations,   
> That
> would mean adding masks to floating-point arithmetic and memory  
> operations.
> As I recall, Chris experssed some interest in create separate  
> integer and fp
> arithmetic instructions anyway, so it doesn't seem to be a lot of  
> additional
> work to add masks to the fp side since instcombine, et. al. will  
> need to know
> about entirely new operations anyway.
>
> We concluded that operation results would be undefined for vector  
> elements
> corresponding to a zero mask bit.
>
> We also talked about adding a vector select, which is crucial for  
> any code
> that uses masks.
>
>                                                                -Dave
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Dan Gohman

2008-Aug-04 22:56 UTC

head link

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

On Aug 4, 2008, at 2:02 PM, David Greene wrote:
> On Saturday 02 August 2008 16:47, Dan Gohman wrote:
>
>> * Vector Gather/Scatter
>> This would complicate analyses that look at load and store addresses,
>> but if we really want to do gather/scatter without messes, this  
>> might be
>> an acceptable tradeoff.
>
> By "complicate" do you mean "need to look at multiple
addresses from a
> single instruction?"  Or is there more than that?  I'm trying to  
> understand
> all the implications.
I mean just that -- we have a fair amount of code built around looking
at the addresses of load and store nodes that in some cases would need
to be restructured if it would cope with multiple addresses at a time.
>
>
>> While adding a mask operand to every instruction that needs it would
>> serve the intended purpose, it would also enlarge and complicate IR,
>> even in code that doesn't need masks. It's a common use-case to
>> have a
>> single mask used by many adjacent instructions, so this would also be
>> highly redundant.
>
> But explicit is better than implicit in my experience.  It's also  
> the LLVM
> philosophy to be as explicit as possible.
>
>> An alternative that exploits this common use-case is to add a new
>> applymask instruction:
>>
>>  %w = applymask <2 x f32> %v, <2 x i1> %m
>>
>> The semantics would be to copy %v into %w, and implicitly apply  
>> mask %m
>> to all users (recursively) of %w, unless overridden by another
>> applymask. For example:
>>
>>  %p = applymask <2 x f32*> %q, <2 x i1> %m
>>  %x = load <2 x f32*>* %p                   ; implicitly masked
by %m
>>  %y = add <2 x f32> %x, %w                  ; implicitly masked
by %m
>>  %z = mul <2 x f32> %y, %y                  ; implicitly masked
by %m
>
> Yuck.  I don't like this at all.  It makes reading the IR harder  
> because now
> you need to worry about context.
I don't disagree with these. I think it's a trade-off, with LLVM
design philosophy and IR cleanliness arguments on both sides.

The applymask approach leverages use-def information rather than
what can be thought of as duplicating a subset of it, making the IR
less cluttered. And, it makes it trivially straightforward to write
passes that work correctly on both masked and unmasked code.
>  Not all dependencies are readily expressed
> in the instructions.  How would one express TableGen patterns for such
> things?
The syntax above is an idea for LLVM IR. SelectionDAG doesn't  
necessarily
have to use the same approach.
>
>
> My understanding is that we came away with a general agreement to add
> mask support to operations that can trap and to memory operations,   
> That
> would mean adding masks to floating-point arithmetic and memory  
> operations.
> As I recall, Chris experssed some interest in create separate  
> integer and fp
> arithmetic instructions anyway, so it doesn't seem to be a lot of  
> additional
> work to add masks to the fp side since instcombine, et. al. will  
> need to know
> about entirely new operations anyway.
I think we all recognize the need, and in the absence of better
alternatives are willing to accept the mask operand approach. It would
have a significant impact on everyone, even those that don't use masks.
I don't want to stand in the way of progress, but this alternative
approach seems promising enough to be worth consideration.
>
>
> We concluded that operation results would be undefined for vector  
> elements
> corresponding to a zero mask bit.
>
> We also talked about adding a vector select, which is crucial for  
> any code
> that uses masks.
Right. This applymask idea doesn't conflict with these.

Dan

Devang Patel

2008-Aug-12 21:00 UTC

head link

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

Dan,

On Aug 2, 2008, at 2:47 PM, Dan Gohman wrote:
> Masks would not themselves be maskable. That is, the result of an
> applymask could not be used (even indirectly) as the second operand
> of another applymask.
What is the motivation behind this restriction ?
-
Devang

Daniel Berlin

2008-Aug-12 21:51 UTC

head link

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

On Tue, Aug 12, 2008 at 5:00 PM, Devang Patel <dpatel at apple.com>
wrote:> Dan,
>
> On Aug 2, 2008, at 2:47 PM, Dan Gohman wrote:
>
>> Masks would not themselves be maskable. That is, the result of an
>> applymask could not be used (even indirectly) as the second operand
>> of another applymask.
>
> What is the motivation behind this restriction ?
It also seems hard to enforce and completely arbitrary :)

Not to mention value numbering could easily be taught to combine masks
when value numbering to simply dependent applymasks, etc.

Dan Gohman

2008-Aug-12 21:58 UTC

head link

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

On Tue, 2008-08-12 at 14:00 -0700, Devang Patel wrote:> Dan,
> 
> On Aug 2, 2008, at 2:47 PM, Dan Gohman wrote:
> 
> > Masks would not themselves be maskable. That is, the result of an
> > applymask could not be used (even indirectly) as the second operand
> > of another applymask.
> 
> What is the motivation behind this restriction ?
This is basically preventing the applymask instruction itself from being
masked, which doesn't really make
sense -- "conditionally use this mask using this other mask".

Mask values are just vectors of i1, so the way to compute the
intersection of two mask values is to use an 'and' instruction.

Dan

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Aug 2008 - [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

Reasonably Related Threads