Cameron McInally via llvm-dev
2020-Nov-19 18:11 UTC
[llvm-dev] Complex proposal v3 + roundtable agenda
On Wed, Nov 18, 2020 at 4:47 PM Krzysztof Parzyszek via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Complex type would pose another issue for vectorization: in general it's better to have a vector of the real parts and a vector of the imaginary parts for the purpose of arithmetic, than having vectors of complex elements (i.e. real and imaginary parts interleaved).Is that universally true? I think it depends on the target. Let's take Florian's FCMLA example. The inputs and output are interleaved. And if you need just the reals/imags from an interleaved vector for something else, LD2/ST2 should be pretty fast on recent chips. On the other hand, if we had a non-interleaved complex representation and wanted to use FCMLA, we'd need some number of zips and unzips to interleave and deinterleave between the load and store. Those probably aren't cheap in aggregate. I haven't studied this across all targets, but my intuition says we should leave the representation decision up to the targets. Maybe we should have a larger discussion about it. Although, it's worth noting that predication would likely be *much* easier with a non-interleaved representation. I think. Again, I haven't thought this completely through, but it's probably worth talking about.
David Greene via llvm-dev
2020-Dec-01 19:54 UTC
[llvm-dev] Complex proposal v3 + roundtable agenda
Cameron McInally via llvm-dev <llvm-dev at lists.llvm.org> writes:> Although, it's worth noting that predication would likely be *much* > easier with a non-interleaved representation. I think. Again, I > haven't thought this completely through, but it's probably worth > talking about.This is one place where a complex type seems particularly beneficial. Personally, I think of "vector of complex" as a vector of individual complex value, not a vector of interleaved real and imaginary elements. A predicated vector of complex should look like a predicated vector of double from a masking standpoint. With an interleaved viewpoint, you'd basically double-up all of the mask bits. e.g.: <4 x complex_double> a <4 x i1> amask = {1, 0, 1, 1} vs. <8 x double> a <8 x i1> amask = {1, 1, 0, 0, 1, 1, 1, 1} Of course lowering may require transforming the first mask into the second depending on what the hardware has available. This is mostly an issue in mixed-data loops where you end up having to either track extra masks or spend time converting between masks. I am not sure what this looks like with intrinsics. Do the intrinsics accept the first kind of mask or the second? -David