thr3ads.net - llvm dev - [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Dan Gohman

2008-Aug-04 22:56 UTC

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

On Aug 4, 2008, at 2:02 PM, David Greene wrote:
> On Saturday 02 August 2008 16:47, Dan Gohman wrote:
>
>> * Vector Gather/Scatter
>> This would complicate analyses that look at load and store addresses,
>> but if we really want to do gather/scatter without messes, this  
>> might be
>> an acceptable tradeoff.
>
> By "complicate" do you mean "need to look at multiple
addresses from a
> single instruction?"  Or is there more than that?  I'm trying to  
> understand
> all the implications.
I mean just that -- we have a fair amount of code built around looking
at the addresses of load and store nodes that in some cases would need
to be restructured if it would cope with multiple addresses at a time.
>
>
>> While adding a mask operand to every instruction that needs it would
>> serve the intended purpose, it would also enlarge and complicate IR,
>> even in code that doesn't need masks. It's a common use-case to
>> have a
>> single mask used by many adjacent instructions, so this would also be
>> highly redundant.
>
> But explicit is better than implicit in my experience.  It's also  
> the LLVM
> philosophy to be as explicit as possible.
>
>> An alternative that exploits this common use-case is to add a new
>> applymask instruction:
>>
>>  %w = applymask <2 x f32> %v, <2 x i1> %m
>>
>> The semantics would be to copy %v into %w, and implicitly apply  
>> mask %m
>> to all users (recursively) of %w, unless overridden by another
>> applymask. For example:
>>
>>  %p = applymask <2 x f32*> %q, <2 x i1> %m
>>  %x = load <2 x f32*>* %p                   ; implicitly masked
by %m
>>  %y = add <2 x f32> %x, %w                  ; implicitly masked
by %m
>>  %z = mul <2 x f32> %y, %y                  ; implicitly masked
by %m
>
> Yuck.  I don't like this at all.  It makes reading the IR harder  
> because now
> you need to worry about context.
I don't disagree with these. I think it's a trade-off, with LLVM
design philosophy and IR cleanliness arguments on both sides.

The applymask approach leverages use-def information rather than
what can be thought of as duplicating a subset of it, making the IR
less cluttered. And, it makes it trivially straightforward to write
passes that work correctly on both masked and unmasked code.
>  Not all dependencies are readily expressed
> in the instructions.  How would one express TableGen patterns for such
> things?
The syntax above is an idea for LLVM IR. SelectionDAG doesn't  
necessarily
have to use the same approach.
>
>
> My understanding is that we came away with a general agreement to add
> mask support to operations that can trap and to memory operations,   
> That
> would mean adding masks to floating-point arithmetic and memory  
> operations.
> As I recall, Chris experssed some interest in create separate  
> integer and fp
> arithmetic instructions anyway, so it doesn't seem to be a lot of  
> additional
> work to add masks to the fp side since instcombine, et. al. will  
> need to know
> about entirely new operations anyway.
I think we all recognize the need, and in the absence of better
alternatives are willing to accept the mask operand approach. It would
have a significant impact on everyone, even those that don't use masks.
I don't want to stand in the way of progress, but this alternative
approach seems promising enough to be worth consideration.
>
>
> We concluded that operation results would be undefined for vector  
> elements
> corresponding to a zero mask bit.
>
> We also talked about adding a vector select, which is crucial for  
> any code
> that uses masks.
Right. This applymask idea doesn't conflict with these.

Dan

David Greene

2008-Aug-05 15:32 UTC

head link

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

On Monday 04 August 2008 17:56, Dan Gohman wrote:
> > By "complicate" do you mean "need to look at multiple
addresses from a
> > single instruction?"  Or is there more than that?  I'm trying
to
> > understand
> > all the implications.
>
> I mean just that -- we have a fair amount of code built around looking
> at the addresses of load and store nodes that in some cases would need
> to be restructured if it would cope with multiple addresses at a time.
Ok.  I should think that this would be feasible to do.  In the worst case
it's
an N^2 loop looking at all pairs.  And N is usually going to be small.
> >>  %p = applymask <2 x f32*> %q, <2 x i1> %m
> >>  %x = load <2 x f32*>* %p                   ; implicitly
masked by %m
> >>  %y = add <2 x f32> %x, %w                  ; implicitly
masked by %m
> >>  %z = mul <2 x f32> %y, %y                  ; implicitly
masked by %m
> >
> > Yuck.  I don't like this at all.  It makes reading the IR harder
> > because now
> > you need to worry about context.
>
> I don't disagree with these. I think it's a trade-off, with LLVM
> design philosophy and IR cleanliness arguments on both sides.
>
> The applymask approach leverages use-def information rather than
> what can be thought of as duplicating a subset of it, making the IR
I don't understand what you mean by "duplicating" here.  You need
some
kind of use-def information for the masks themselves because at some
point they need to be register-allocated.
> less cluttered. And, it makes it trivially straightforward to write
> passes that work correctly on both masked and unmasked code.
I had a thought on this, actually.  Let's say the mask is the very last 
operand on masked instructions.  Most passes don't care about the mask
at all.  They can just ignore it.  Since they don't look at the extra
operand
right now, there shouldn't be many changes necessary (some asserts
may need to be fixed, etc.).

Think about instcombine.  It's matching patterns.  If the matcher
doesn't
look at masks, that may be ok most of the time (mod corner cases which
I fully appreciate can be a real pain to track down).  If we want fancy 
instcombine tricks that understand masks, we can add those later.
> >  Not all dependencies are readily expressed
> > in the instructions.  How would one express TableGen patterns for such
> > things?
>
> The syntax above is an idea for LLVM IR. SelectionDAG doesn't
> necessarily
> have to use the same approach.
What do you mean by "ideal for LLVM IR?"  This looks very much _not_
ideal to
me from a debugging standpoint.  It's difficult to understand.  It took me 
reading through the proposal a few times to grok what you are talking about.
> I think we all recognize the need, and in the absence of better
> alternatives are willing to accept the mask operand approach. It would
> have a significant impact on everyone, even those that don't use masks.
How do you define "significant impact?"  Compile time?  Development
effort?
Transition pain?  All of the above?  More?

For architectures that don't use masks, either the mask gets set to all
1's or
we have non-masked versions of operators.  I honestly don't know which is
the desireable route to take.  My guess is that the optimizers will have to
understand whether or not the target architecture supports masks and not
generate them (e.g. no if-conversion) if the target doesn't support them.

I wonder if there is some way to un-if-convert to eliminate masks if 
necessary.  I'm thinking about code portability and JIT issues when
readfing in LLVM IR that was produced at some earlier time.  Perhaps
this isn't an issue we need to worry about right now.
> I don't want to stand in the way of progress, but this alternative
> approach seems promising enough to be worth consideration.
Alternatives are always welcome and worth considering.  I'm looking at the
kind of things the LLVM community is going to want to support and I'm
pretty sure masks are going to be a very big part of architectures in the
future.  We're done with clock speed improvements, so we need to rely on
architecture more.  Vectorization is a well-known technique to improve
single thread performance and masks are critical to producing efficient vector 
code.

If y'all agree with this premise, it seems to me that we want to support
such architectures in as straightforward a way as possible so as to minimize
future pain when we're all writing complex and beautiful vector hacks.  :)

What can we learn from the IA64 and ARM backends?  How do they handle
their masks (scalar predication)?  Is all the if-conversion done in 
target-specific passes?
> > We concluded that operation results would be undefined for vector
> > elements
> > corresponding to a zero mask bit.
> >
> > We also talked about adding a vector select, which is crucial for
> > any code
> > that uses masks.
>
> Right. This applymask idea doesn't conflict with these.
Yep.  I just wanted to be thorough.

                                                  -Dave

Dan Gohman

2008-Aug-05 17:39 UTC

head link

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

On Tue, August 5, 2008 8:32 am, David Greene wrote:> On Monday 04 August 2008 17:56, Dan Gohman wrote:
>> The applymask approach leverages use-def information rather than
>> what can be thought of as duplicating a subset of it, making the IR
>
> I don't understand what you mean by "duplicating" here.
If you look just at the case where every instruction in a
given use-def sub-dag uses the same mask, adding that mask as
an operand to all of them is largely just duplicating the
information about them all being connected. This is the common
case that applymask is aimed at.

In the case where multiple masks are used, applymask can still
cope, but the neat thing is that in this case it serves to
mark the dataflow edges where masks change.
>  You need some
> kind of use-def information for the masks themselves because at some
> point they need to be register-allocated.
What I'm talking about here is just in LLVM IR. I agree that we want
mask registers as operands during register allocation, and probably
also instruction selection.
>
>> less cluttered. And, it makes it trivially straightforward to write
>> passes that work correctly on both masked and unmasked code.
>
> I had a thought on this, actually.  Let's say the mask is the very last
> operand on masked instructions.  Most passes don't care about the mask
> at all.  They can just ignore it.  Since they don't look at the extra
> operand
> right now, there shouldn't be many changes necessary (some asserts
> may need to be fixed, etc.).
>
> Think about instcombine.  It's matching patterns.  If the matcher
doesn't
> look at masks, that may be ok most of the time (mod corner cases which
> I fully appreciate can be a real pain to track down).  If we want fancy
> instcombine tricks that understand masks, we can add those later.
If masks are operands, instcombine will need to check if all the
relevent masks match before many of the transformations it does,
and it'll need to take care to put the mask operand in the
instructions it creates.

With applymask, I believe instcombine wouldn't require any
modifications, except things like "case ApplyMaskInst: break;" in
a few places. Applymask makes masks in the IR so easy to reason
about, most passes won't need to do any special reasoning.
>
>> >  Not all dependencies are readily expressed
>> > in the instructions.  How would one express TableGen patterns for
such
>> > things?
>>
>> The syntax above is an idea for LLVM IR. SelectionDAG doesn't
>> necessarily
>> have to use the same approach.
>
> What do you mean by "ideal for LLVM IR?"  This looks very much
_not_ ideal
> to
> me from a debugging standpoint.  It's difficult to understand.  It took
me
> reading through the proposal a few times to grok what you are talking
> about.
I said "idea", not "ideal" :-). But I just meant that LLVM
IR
and SelectionDAG don't have to do the same thing.
>
>> I think we all recognize the need, and in the absence of better
>> alternatives are willing to accept the mask operand approach. It would
>> have a significant impact on everyone, even those that don't use
masks.
>
> How do you define "significant impact?"  Compile time? 
Development
> effort?
> Transition pain?  All of the above?  More?
With mask operands, many passes will need to explicitly check for
masks even if they don't care and just want to be conservatively
correct.

With applymask, passes will often be able to operate on masked IR
just as aggressively as non-masked IR.
>> I don't want to stand in the way of progress, but this alternative
>> approach seems promising enough to be worth consideration.
>
> Alternatives are always welcome and worth considering.  I'm looking at
the
> kind of things the LLVM community is going to want to support and I'm
> pretty sure masks are going to be a very big part of architectures in the
> future.  We're done with clock speed improvements, so we need to rely
on
> architecture more.  Vectorization is a well-known technique to improve
> single thread performance and masks are critical to producing efficient
> vector
> code.
>
> If y'all agree with this premise, it seems to me that we want to
support
> such architectures in as straightforward a way as possible so as to
> minimize
> future pain when we're all writing complex and beautiful vector hacks. 
:)
I think we basically agree here :-). For me, that applymask
simplifies the reasoning that optimizers must do for masked
instructions is a large part of what motivates it for
consideration.
> What can we learn from the IA64 and ARM backends?  How do they handle
> their masks (scalar predication)?  Is all the if-conversion done in
> target-specific passes?
It's in lib/CodeGen/IfConversion.cpp, but it wouldn't be usable for
vectors. If-conversion for vectors must be done as part of the
vectorization (whether that's the user/front-end or the optimizer).

Dan

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Aug 2008 - [LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

Reasonably Related Threads