On Thursday 17 December 2009 18:04, Anton Korobeynikov wrote:> Hello, David > > > Can you expand on this with an example? There seems to be an awful lot > > of shuffle patterns and predicates in PPCInstrAltivec.td. What do you > > mean by, "Canonicalize to byte ops?" Can you walk me through how that > > works with Altivec? > > The basic idea is quite simple - lower everything to vNi8 and write > all the patterns using only these types.Yeah, I figured that out after thinking a bit more. However, I think in this case we only want to lower to vNi32 since there are no immediate-mask shuffles in X86 that operate on smaller element types. Doing it at the byte level would just be more confusing, I think. PSHUFB is really a completely different instruction than PSHUFD, for example. -Dave
On Dec 17, 2009, at 4:12 PM, David Greene wrote:> On Thursday 17 December 2009 18:04, Anton Korobeynikov wrote: >> Hello, David >> >>> Can you expand on this with an example? There seems to be an awful lot >>> of shuffle patterns and predicates in PPCInstrAltivec.td. What do you >>> mean by, "Canonicalize to byte ops?" Can you walk me through how that >>> works with Altivec? >> >> The basic idea is quite simple - lower everything to vNi8 and write >> all the patterns using only these types. > > Yeah, I figured that out after thinking a bit more. However, I think in this > case we only want to lower to vNi32 since there are no immediate-mask shuffles > in X86 that operate on smaller element types. Doing it at the byte level > would just be more confusing, I think. > > PSHUFB is really a completely different instruction than PSHUFD, for example.Aside from consuming one of its inputs, which is a regalloc problem, it isn't really different. It's just a one-input immediate shuffle, where the immediate is not encoded in the instruction. From the perspective of the shuffle instruction, all the x86 shuffles are just various byte shuffles. Writing them all in one canonical form would substantially simplify the code, especially given layering AVX on top of the existing barely-understandable code would probably result in something almost unmaintainable. Nate
On Thursday 17 December 2009 23:48, Nate Begeman wrote:> > PSHUFB is really a completely different instruction than PSHUFD, for > > example. > > Aside from consuming one of its inputs, which is a regalloc problem, it > isn't really different. It's just a one-input immediate shuffle, where the > immediate is not encoded in the instruction. From the perspective of the > shuffle instruction, all the x86 shuffles are just various byte shuffles. > Writing them all in one canonical form would substantially simplify the > code, especially given layering AVX on top of the existing > barely-understandable code would probably result in something almost > unmaintainable.But I don't think it can share a pattern with the immmediate-mask shuffles because the operands are of different class (register vs. immediate). I guess for PSHUFB the instruction works for all masks so we don't even need a predicate to check for validity. I'll think some more about this, but casting a vector to bytes is really going to make debugging harder. It's easier to think about 4 or 8 index values than 16 or 32. -Dave