thr3ads.net - llvm dev - [LLVMdev] AVX Shuffles & PatLeaf Help Needed [Dec 2009]

If this information is useful, please help other people find it:
Share via:

David Greene

2009-Dec-17 23:10 UTC

[LLVMdev] AVX Shuffles & PatLeaf Help Needed

I'm working on debugging AVX shuffles and I ran into an interesting
problem.

The current isSHUFPMask predicate in X86ISelLowering needs to be
generalized to operate on 128-bit or 256-bit masks.  There are
probably lots of other things to change too (LowerVECTOR_SHUFFLE_4wide,
etc.) but I'll worry about that later.

The generalized rule is:

1. For the low 64 bits of the result vector, the source can be from
   the low 128 bits of vector 1.

2. For the next 64 bits, the source can be from the low 128 bits of
   vector 2.

3. For the 3rd 64 bits, the source is the high 128 bits of vector 1.

4. For the high 64 bits, the source is the high 128 bits of vector 2.

For 128 bit vectors, steps 3 and 4 are ignored since there are no high
128 bits.

Determining the answer boils down to knowing how big a vector element
is.  Then we can map operand values to ranges within 64-bit and 128-bit
chunks and determine the proper index ranges to look for.  For example,
for 64-bit elements, result element zero must come from index 0 or 1.
For 32-bit elements, result element zero must come from index 0-3.

In isSHUFPMask all we have is the SDNode of the shufflevector index vector.
Unfortunately, this tells us nothing about the type of the result vector.
If we have two operands, we obviously have a v2i/f64 vector.  For eight 
operands, we have a v8i/f32 vector.  But for four operands we could have
a v4f/i32 or a v4f/i64.  So we can't know the vector element size and thus
we can't map shufflevector indices valid ranges for output vector elements.

If I change isSHUFPMask to take an extra argument which is the result
vector type (or element type, or something similar), how would I express
that extra argument in the .td file?  Right now we do:

SHUFP_shuffle_mask:$src3

to add the predicate to check the 3rd (mask) operand for conformance to
something SHUFPS/D can handle.

Is there some way currently to add another "argument" to the PatLeaf 
invocation?

                           -Dave

Nate Begeman

2009-Dec-17 23:16 UTC

head link

[LLVMdev] AVX Shuffles & PatLeaf Help Needed

On Dec 17, 2009, at 3:10 PM, David Greene wrote:
> I'm working on debugging AVX shuffles and I ran into an interesting
> problem.
> 
> The current isSHUFPMask predicate in X86ISelLowering needs to be
> generalized to operate on 128-bit or 256-bit masks.  There are
> probably lots of other things to change too (LowerVECTOR_SHUFFLE_4wide,
> etc.) but I'll worry about that later.
> 
> The generalized rule is:
> 
> 1. For the low 64 bits of the result vector, the source can be from
>   the low 128 bits of vector 1.
> 
> 2. For the next 64 bits, the source can be from the low 128 bits of
>   vector 2.
> 
> 3. For the 3rd 64 bits, the source is the high 128 bits of vector 1.
> 
> 4. For the high 64 bits, the source is the high 128 bits of vector 2.
> 
> For 128 bit vectors, steps 3 and 4 are ignored since there are no high
> 128 bits.
> 
> Determining the answer boils down to knowing how big a vector element
> is.  Then we can map operand values to ranges within 64-bit and 128-bit
> chunks and determine the proper index ranges to look for.  For example,
> for 64-bit elements, result element zero must come from index 0 or 1.
> For 32-bit elements, result element zero must come from index 0-3.
David, this is probably the wrong approach, based on the accreted awfulness of
the X86 shuffle lowering code, which Eli and I have hacked on to improve
somewhat.  The correct approach is probably a rewrite based around what AltiVec
does: Canonicalize to byte ops, and write all the patterns once rather than
having to look for 6 different variants of the same pattern.

Nate

David Greene

2009-Dec-17 23:30 UTC

head link

[LLVMdev] AVX Shuffles & PatLeaf Help Needed

On Thursday 17 December 2009 17:16, Nate Begeman wrote:
> David, this is probably the wrong approach, based on the accreted awfulness
> of the X86 shuffle lowering code, 
Ha!  I have no issue believing this statement.  :)
> The correct approach is probably a rewrite based around what
> AltiVec does: Canonicalize to byte ops, and write all the patterns once
> rather than having to look for 6 different variants of the same pattern.
Can you expand on this with an example?  There seems to be an awful lot of
shuffle patterns and predicates in PPCInstrAltivec.td.  What do you mean by, 
"Canonicalize to byte ops?"  Can you walk me through how that works
with
Altivec?

Since I'm rewriting all of the SSE patterns to clean them up and incorporate
AVX functionality anyway, a complete rewrite of shuffles is not additional 
work.  :)

Thanks.

                            -Dave

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Dec 2009 - [LLVMdev] AVX Shuffles & PatLeaf Help Needed

[LLVMdev] AVX Shuffles & PatLeaf Help Needed

[LLVMdev] AVX Shuffles & PatLeaf Help Needed

[LLVMdev] AVX Shuffles & PatLeaf Help Needed

Possibly Parallel Threads