David Greene via llvm-dev
2020-Dec-01 19:54 UTC
[llvm-dev] Complex proposal v3 + roundtable agenda
Cameron McInally via llvm-dev <llvm-dev at lists.llvm.org> writes:> Although, it's worth noting that predication would likely be *much* > easier with a non-interleaved representation. I think. Again, I > haven't thought this completely through, but it's probably worth > talking about.This is one place where a complex type seems particularly beneficial. Personally, I think of "vector of complex" as a vector of individual complex value, not a vector of interleaved real and imaginary elements. A predicated vector of complex should look like a predicated vector of double from a masking standpoint. With an interleaved viewpoint, you'd basically double-up all of the mask bits. e.g.: <4 x complex_double> a <4 x i1> amask = {1, 0, 1, 1} vs. <8 x double> a <8 x i1> amask = {1, 1, 0, 0, 1, 1, 1, 1} Of course lowering may require transforming the first mask into the second depending on what the hardware has available. This is mostly an issue in mixed-data loops where you end up having to either track extra masks or spend time converting between masks. I am not sure what this looks like with intrinsics. Do the intrinsics accept the first kind of mask or the second? -David
Steve (Numerics) Canon via llvm-dev
2020-Dec-04 19:23 UTC
[llvm-dev] Complex proposal v3 + roundtable agenda
(Late to the party) I think there’s a lot of good questions in this thread, but I also want Florian to get started landing some patches and then everyone can iterate on that. Slight pushback on this one question:> On Nov 19, 2020, at 18:11, Cameron McInally via llvm-dev <llvm-dev at lists.llvm.org> wrote:>> On Wed, Nov 18, 2020 at 4:47 PM Krzysztof Parzyszek via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> Complex type would pose another issue for vectorization: in general it's better to have a vector of the real parts and a vector of the imaginary parts for the purpose of arithmetic, than having vectors of complex elements (i.e. real and imaginary parts interleaved). > > Is that universally true? I think it depends on the target. Let's take > Florian's FCMLA example. The inputs and output are interleaved. And if > you need just the reals/imags from an interleaved vector for something > else, LD2/ST2 should be pretty fast on recent chips.FCMLA makes it so _if your data is already interleaved and you can’t change it_, you can still operate efficiently in that format. But if you have control, and you’re doing anything beyond basic arithmetic, it’s still advantageous to use a planar [SoA] layout instead of interleaved [AoS]. For a simple example, if you want to vectorize a complex exponential function, you’ll want to compute cos and sin of the imaginary parts, and exp of the real parts—FCMLA doesn’t help here. So Florian’s proposal:> In the short-term I think to get things rolling it would be good to focus on the layout as defined by frontends (e.g. Clang/C++).Sounds great to me. – Steve