thr3ads.net - llvm dev - [llvm-dev] Complex proposal v3 + roundtable agenda [Dec 2020]

If this information is useful, please help other people find it:
Share via:

Cameron McInally via llvm-dev

2020-Nov-19 18:11 UTC

[llvm-dev] Complex proposal v3 + roundtable agenda

On Wed, Nov 18, 2020 at 4:47 PM Krzysztof Parzyszek via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Complex type would pose another issue for vectorization: in general
it's better to have a vector of the real parts and a vector of the imaginary
parts for the purpose of arithmetic, than having vectors of complex elements
(i.e. real and imaginary parts interleaved).
Is that universally true? I think it depends on the target. Let's take
Florian's FCMLA example. The inputs and output are interleaved. And if
you need just the reals/imags from an interleaved vector for something
else, LD2/ST2 should be pretty fast on recent chips.

On the other hand, if we had a non-interleaved complex representation
and wanted to use FCMLA, we'd need some number of zips and unzips to
interleave and deinterleave between the load and store. Those probably
aren't cheap in aggregate.

I haven't studied this across all targets, but my intuition says we
should leave the representation decision up to the targets. Maybe we
should have a larger discussion about it.

Although, it's worth noting that predication would likely be *much*
easier with a non-interleaved representation. I think. Again, I
haven't thought this completely through, but it's probably worth
talking about.

David Greene via llvm-dev

2020-Dec-01 19:54 UTC

head link

[llvm-dev] Complex proposal v3 + roundtable agenda

Cameron McInally via llvm-dev <llvm-dev at lists.llvm.org> writes:
> Although, it's worth noting that predication would likely be *much*
> easier with a non-interleaved representation. I think. Again, I
> haven't thought this completely through, but it's probably worth
> talking about.
This is one place where a complex type seems particularly beneficial.
Personally, I think of "vector of complex" as a vector of individual
complex value, not a vector of interleaved real and imaginary elements.
A predicated vector of complex should look like a predicated vector of
double from a masking standpoint.  With an interleaved viewpoint, you'd
basically double-up all of the mask bits.

e.g.:

<4 x complex_double> a
<4 x i1> amask = {1, 0, 1, 1}

vs.

<8 x double> a
<8 x i1> amask = {1, 1, 0, 0, 1, 1, 1, 1}

Of course lowering may require transforming the first mask into the
second depending on what the hardware has available.  This is mostly an
issue in mixed-data loops where you end up having to either track extra
masks or spend time converting between masks.

I am not sure what this looks like with intrinsics.  Do the intrinsics
accept the first kind of mask or the second?

              -David

llvm dev - Dec 2020 - Complex proposal v3 + roundtable agenda

[llvm-dev] Complex proposal v3 + roundtable agenda

[llvm-dev] Complex proposal v3 + roundtable agenda