thr3ads.net - llvm dev - [LLVMdev] RFC: AVX Pattern Specification [LONG] [May 2009]

If this information is useful, please help other people find it:
Share via:

David Greene

2009-May-01 21:47 UTC

[LLVMdev] RFC: AVX Pattern Specification [LONG]

On Friday 01 May 2009 13:46, Chris Lattner wrote:
> Right, a lot of these problems can be solved by some nice refactoring
> stuff.  I'm also hoping that some of the complexity in defining
> shuffle matching code can be helped by making the definition of the
> shuffle patterns more declarative within the td file.  It would be
> really nice to say that "this shuffle does a  "1,0,3,2 shuffle
and has
> cost 42" and have tblgen generate all the required matching code.
That would be nice.  Any ideas how this would work?

> I agree, I think it is unfortunate that AVX decided to do this at an
> architectural level :).
I'm not sure what else would be done.
> > So what I've done is a little experiment to see if I can unify all
> > SSE and AVX
> > SIMD instructions under one framework.  I'll leave MMX and 3dNow
> > alone since
> > they're oddballs and hardly anyone uses them.
>
> Ok.  I agree that the similarity being factored here is across SSE1/2/
> AVX.
And SSE3/4.

> > I have a Perl script that auto-generates the necessary mutliclass
> > combinations
> > as well as the needed base classes depending on what's in the top-
> > level .td
> > file.  For now, I've named that top-level file X86InstrSIMD.td.
> >
> > The Perl script would only be need to run as X86InstrSIMD.td
> > changes.  Thus
> > its use would be similar to how we use autoconf today.  We only run
> > autoconf /
> > automake when we update the .ac files, not as part of the build
> > process.
>
> While I agree that we want to refactor this, I really don't think that
> we should autogenerate .td files from perl.  This has a number of
> significant logistical problems.  What is it that perl gives you that
> we can't enhance tblgen to do directly?
Well, mainly it's because we don't have whatever tblgen enhancements we
need.
I'll have to think on this some and see if I can come up with some tblgen
features that could help.

I was writing a lot of these base classes by hand at first, but there are a 
lot of them (they tend to be very small) and writing them is very mechanical.  
So we probably can enhance tblgen somehow.  I'm just not sure what that
looks
like right now.
> > Initially, X86InstrSIMD.td would define only AVX instructions so it
> > would not
> > impact existing SSE clients.  My intent is that X86InstrSIMD.td
> > essentially
> > become the canonical description of all SSE and AVX instructions and
> > X86InstrSSE.td would go away completely.
>
> Instead of slowly building it up and then cutting over, I'd prefer to
> incrementally move patterns into it, removing them from the other .td
> files at the same time.  This should be a nice clean and continuous
> refactoring that makes the code monotonically better (smaller).
Ok, that sounds like a pretty good idea.
> > The cons:
> > * Transition from X86InstrSSE.td
> > * A more complex class hierarchy
>
> I'm not worried about these.
Ok.
> > * A class-generating tool / indirection
>
> I really don't like this :).  If there is something higher level that
> you need, I think it would be very interesting to carefully consider
> what the root problem is and whether there is a good solution that we
> can directly implement in tblgen.  It is pretty clear that we can
> *improve* the current situation with no tblgen enhancements, but I
> agree that AVX is a nice forcing function that will greatly benefit
> from a *much improved* target description.
Your point is well taken.  Let me think on this a bit.

                             -Dave

David Greene

2009-May-01 22:01 UTC

head link

[LLVMdev] RFC: AVX Pattern Specification [LONG]

On Friday 01 May 2009 16:47, David Greene wrote:
> > Ok.  I agree that the similarity being factored here is across SSE1/2/
> > AVX.
>
> And SSE3/4.
What I really meant was SSE3/AVX, SSSE3/AVX and SSE4/AVX.  SSS?E3 and SSE4 
don't have too much in common with each other.

                              -Dave

Chris Lattner

2009-May-01 22:50 UTC

head link

[LLVMdev] RFC: AVX Pattern Specification [LONG]

On May 1, 2009, at 2:47 PM, David Greene wrote:> On Friday 01 May 2009 13:46, Chris Lattner wrote:
>> Right, a lot of these problems can be solved by some nice refactoring
>> stuff.  I'm also hoping that some of the complexity in defining
>> shuffle matching code can be helped by making the definition of the
>> shuffle patterns more declarative within the td file.  It would be
>> really nice to say that "this shuffle does a  "1,0,3,2
shuffle and
>> has
>> cost 42" and have tblgen generate all the required matching code.
>
> That would be nice.  Any ideas how this would work?
Nate is currently working on refactoring a bunch of shuffle related  
logic, which includes changing the X86 backend to canonicalize  
shuffles more like the ppc/altivec backend does.  Once that is done, I  
think it would make sense for tblgen to generate some C++ code that  
looks like this:

// MatchVectorShuffle - Matches a shuffle node against the available  
instructions,
// returning the lowest cost one as well as the actual cost of it.
unsigned MatchVectorShuffle(VectorShuffleSDNode *N) {
   unsigned LowestCost = ~0;

   if (N can be matched by movddup) {
     unsigned movddupcost = ...  // can be either constant, or  
callback into subtarget info
     if (LowestCost > movddupcost)
       LowestCost = movddupcost;
       operands = [whatever]
       opcode = X86::MOVDDUP;
     }
   }

   if (N can be matched by movhlps) {
     unsigned movhlpscost = ...
     if (LowestCost > movhlpscost)
       LowestCost = movhlpscost;
       operands = [whatever]
       opcode = X86::MOVHLPS;
     }
   }
   ...
}

The advantage of doing this is that it moves the current heuristics  
for match ordering (which is a poor way to model costs) into a  
declarative form in the .td file.  This is particularly important  
because different chips have different costs!

This generated function could then be called by the actual isel pass  
itself as well as from DAGCombine.  We'd like dagcombine to be able to  
merge two shuffles into one, but it should only do this when the cost  
of the resultant shuffle is less than the two original ones (a simple  
greedy algorithm).

This is vague and hand wavy, but hopefully the idea comes across.  We  
have this in the .td files right now:

;; we already have this
def MOVDDUPrr  : S3DI<0x12, MRMSrcReg, (outs VR128:$dst), (ins  
VR128:$src),
                       "movddup\t{$src, $dst|$dst, $src}",
                       [(set VR128:$dst,(v2f64 (movddup VR128:$src,  
(undef))))]>;

def movddup : PatFrag<(ops node:$lhs, node:$rhs),
                       (vector_shuffle node:$lhs, node:$rhs), [{
   return X86::isMOVDDUPMask(cast<ShuffleVectorSDNode>(N));
}]>;

The goal is to replace the pattern fragment and the C++ code for  
X86::isMOVDDUPMask with something like:

def movddup : PatFrag<(ops node:$lhs, node:$rhs),
                       (vector_shuffle node:$lhs, node:$rhs,
                                       0, 1, 0, 1, Cost<42>)

Alternatively, the cost could be put on the instructions etc, whatever  
makes the most sense.  incidentally, I'm not sure why movddup is  
currently defined to take a LHS/RHS: the RHS should always be undef so  
it should be coded into the movddup def.

Another possible syntax would be to add a special kind of shuffle node  
to give more natural and clean syntax.  This is probably the better  
solution:

def movddup : Shuffle4<VR128, undef, 0, 1, 0, 1>, Cost<42>;
>> While I agree that we want to refactor this, I really don't think  
>> that
>> we should autogenerate .td files from perl.  This has a number of
>> significant logistical problems.  What is it that perl gives you that
>> we can't enhance tblgen to do directly?
>
> Well, mainly it's because we don't have whatever tblgen
enhancements
> we need.
> I'll have to think on this some and see if I can come up with some  
> tblgen
> features that could help.
>
> I was writing a lot of these base classes by hand at first, but  
> there are a
> lot of them (they tend to be very small) and writing them is very  
> mechanical.
> So we probably can enhance tblgen somehow.  I'm just not sure what  
> that looks
> like right now.
Ok.
> Your point is well taken.  Let me think on this a bit.
Thanks Dave! I really appreciate you working in this area,

-Chris

Evan Cheng

2009-May-05 06:02 UTC

head link

[LLVMdev] RFC: AVX Pattern Specification [LONG]

On May 1, 2009, at 3:50 PM, Chris Lattner wrote:
>
> The goal is to replace the pattern fragment and the C++ code for
> X86::isMOVDDUPMask with something like:
>
> def movddup : PatFrag<(ops node:$lhs, node:$rhs),
>                       (vector_shuffle node:$lhs, node:$rhs,
>                                       0, 1, 0, 1, Cost<42>)
>
> Alternatively, the cost could be put on the instructions etc, whatever
> makes the most sense.  incidentally, I'm not sure why movddup is
> currently defined to take a LHS/RHS: the RHS should always be undef so
> it should be coded into the movddup def.
>
> Another possible syntax would be to add a special kind of shuffle node
> to give more natural and clean syntax.  This is probably the better
> solution:
>
> def movddup : Shuffle4<VR128, undef, 0, 1, 0, 1>, Cost<42>;
What does "cost" mean here? Currently isel cost means complexity of  
the matched pattern. It's hard to compute this by hand so the current  
hack is  to allow manual cost adjustments.

I think it makes sense for isel to use HW cost (instruction latency,  
code size) as a late tie breaker. In that case, shouldn't cost be part  
of instruction itinerary?

Evan

David Greene

2009-May-07 16:38 UTC

head link

[LLVMdev] RFC: AVX Pattern Specification [LONG]

On Friday 01 May 2009 16:47, David Greene wrote:
> > While I agree that we want to refactor this, I really don't think
that
> > we should autogenerate .td files from perl.  This has a number of
> > significant logistical problems.  What is it that perl gives you that
> > we can't enhance tblgen to do directly?
>
> Well, mainly it's because we don't have whatever tblgen
enhancements we
> need. I'll have to think on this some and see if I can come up with
some
> tblgen features that could help.
>
> I was writing a lot of these base classes by hand at first, but there are a
> lot of them (they tend to be very small) and writing them is very
> mechanical. So we probably can enhance tblgen somehow.  I'm just not
sure
> what that looks like right now.
So I've been thinking about this some more and the major obtacle here is
that
the Perl generator has a lot of X86-specific knowledge coded into it.

For example, it knows:

* "SS" instructions need to use ths XS encoding class, but only for
SSE1
  and AVX
* "SD" instructions need to use the XD encoding class, but only for
SSE2
  and AVX
* A vector instruction never uses XS or XD encoding
* A scalar instruction never uses OpSize or TB
* AVX uses rrm and mrr encoding, SSE uses rm and mr
* rrm expands to rrr and rrm encoding, rm expands to rr and rm
* mr only expands to mr, mrr only expands to mrr
* and on and on...

I'm not sure how to conveniently encapsulate all of that detailed knowledge
in a set of TableGen classes and/or feature extensions.  Basically, TableGen
would need to look at

defm ADD : 
sse1_sse2_avx_avx3_binary_scalar_xs_xd_vector_tb_ostb_node_intrinsic_rm_rrm<
  0x58, "add", fadd, "add", 1>;
and understand all of the SS/PS/SD/PD/VEX/etc. combinations that implies.  Or 
at least have support to conveniently express inheritance from multiclasses
that provide the valid SS/PS, etc. combinations and (this is critical) a 
convenient way to transform class/multiclass arguments to specialize them for 
the various instruction sets/encodings.  If we only want to specify patterns 
and arguments once, we need a way to specify transformations on them and pass 
the results down to base (multi)classes.  

That's what the intermediate classes generated by the Perl script do.  They
serve two functions:

* Provide valid format combinations (e.g. SS/PS/SD/PD, SS/SD, PS/PD, etc.)
* Provide argument transformation to specialize for various formats and 
  encoding

I've got a few ideas rolling in my head but I need to do some more thinking.

Oh, and I guess I'll go ahead and add XOP support too:

http://forums.amd.com/devblog/blogpost.cfm?threadid=112934&catid=208

                                -Dave

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - May 2009 - [LLVMdev] RFC: AVX Pattern Specification [LONG]

[LLVMdev] RFC: AVX Pattern Specification [LONG]

[LLVMdev] RFC: AVX Pattern Specification [LONG]

[LLVMdev] RFC: AVX Pattern Specification [LONG]

[LLVMdev] RFC: AVX Pattern Specification [LONG]

[LLVMdev] RFC: AVX Pattern Specification [LONG]

Possibly Parallel Threads