I'm working on debugging AVX shuffles and I ran into an interesting problem. The current isSHUFPMask predicate in X86ISelLowering needs to be generalized to operate on 128-bit or 256-bit masks. There are probably lots of other things to change too (LowerVECTOR_SHUFFLE_4wide, etc.) but I'll worry about that later. The generalized rule is: 1. For the low 64 bits of the result vector, the source can be from the low 128 bits of vector 1. 2. For the next 64 bits, the source can be from the low 128 bits of vector 2. 3. For the 3rd 64 bits, the source is the high 128 bits of vector 1. 4. For the high 64 bits, the source is the high 128 bits of vector 2. For 128 bit vectors, steps 3 and 4 are ignored since there are no high 128 bits. Determining the answer boils down to knowing how big a vector element is. Then we can map operand values to ranges within 64-bit and 128-bit chunks and determine the proper index ranges to look for. For example, for 64-bit elements, result element zero must come from index 0 or 1. For 32-bit elements, result element zero must come from index 0-3. In isSHUFPMask all we have is the SDNode of the shufflevector index vector. Unfortunately, this tells us nothing about the type of the result vector. If we have two operands, we obviously have a v2i/f64 vector. For eight operands, we have a v8i/f32 vector. But for four operands we could have a v4f/i32 or a v4f/i64. So we can't know the vector element size and thus we can't map shufflevector indices valid ranges for output vector elements. If I change isSHUFPMask to take an extra argument which is the result vector type (or element type, or something similar), how would I express that extra argument in the .td file? Right now we do: SHUFP_shuffle_mask:$src3 to add the predicate to check the 3rd (mask) operand for conformance to something SHUFPS/D can handle. Is there some way currently to add another "argument" to the PatLeaf invocation? -Dave
On Dec 17, 2009, at 3:10 PM, David Greene wrote:> I'm working on debugging AVX shuffles and I ran into an interesting > problem. > > The current isSHUFPMask predicate in X86ISelLowering needs to be > generalized to operate on 128-bit or 256-bit masks. There are > probably lots of other things to change too (LowerVECTOR_SHUFFLE_4wide, > etc.) but I'll worry about that later. > > The generalized rule is: > > 1. For the low 64 bits of the result vector, the source can be from > the low 128 bits of vector 1. > > 2. For the next 64 bits, the source can be from the low 128 bits of > vector 2. > > 3. For the 3rd 64 bits, the source is the high 128 bits of vector 1. > > 4. For the high 64 bits, the source is the high 128 bits of vector 2. > > For 128 bit vectors, steps 3 and 4 are ignored since there are no high > 128 bits. > > Determining the answer boils down to knowing how big a vector element > is. Then we can map operand values to ranges within 64-bit and 128-bit > chunks and determine the proper index ranges to look for. For example, > for 64-bit elements, result element zero must come from index 0 or 1. > For 32-bit elements, result element zero must come from index 0-3.David, this is probably the wrong approach, based on the accreted awfulness of the X86 shuffle lowering code, which Eli and I have hacked on to improve somewhat. The correct approach is probably a rewrite based around what AltiVec does: Canonicalize to byte ops, and write all the patterns once rather than having to look for 6 different variants of the same pattern. Nate
On Thursday 17 December 2009 17:16, Nate Begeman wrote:> David, this is probably the wrong approach, based on the accreted awfulness > of the X86 shuffle lowering code,Ha! I have no issue believing this statement. :)> The correct approach is probably a rewrite based around what > AltiVec does: Canonicalize to byte ops, and write all the patterns once > rather than having to look for 6 different variants of the same pattern.Can you expand on this with an example? There seems to be an awful lot of shuffle patterns and predicates in PPCInstrAltivec.td. What do you mean by, "Canonicalize to byte ops?" Can you walk me through how that works with Altivec? Since I'm rewriting all of the SSE patterns to clean them up and incorporate AVX functionality anyway, a complete rewrite of shuffles is not additional work. :) Thanks. -Dave