thr3ads.net - llvm dev - [llvm-dev] Extending SLP Vectorizer to deal with aggregates? [Oct 2015]

If this information is useful, please help other people find it:
Share via:

Robison, Arch via llvm-dev

2015-Oct-14 15:40 UTC

[llvm-dev] Extending SLP Vectorizer to deal with aggregates?

I'm looking for a sanity check on extending SLP Vectorizer to deal with
aggregates.

I'd like to vectorize Julia tuple operations.  The Julia compiler lowers
tuples to LLVM arrays, not LLVM vectors.  I've tried making Julia lower
tuples to LLVM vectors, but that hurt performance when SLP Vectorizer was not
applicable, because of extraction/insertion overhead.  I.e., the Julia lowering
is too early to make the right choice on vector vs. array representation of
tuples.  So instead I'd like to make SLP Vectorizer vectorize idioms
involving aggregates.  A sample of the idiom is at
https://gist.github.com/ArchRobison/e762cd55b8cfc0e019a3 .

1. Identify stores of arrays.  E.g. "store [4 x float]".
2. Walk chains backwards from the array stores, similar to the way SLPVectorizer
already walks chains.
3. If vectorization is applicable, replace array construction/load/store with
vector construction/load/store.  Vector load/stores will be unaligned.

Does this sound like a reasonable approach?  If it sounds too Julia-specific, I
could do it as a custom Julia pass.

- Arch D. Robison
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151014/89f6dcd5/attachment.html>

Tom Stellard via llvm-dev

2015-Oct-14 16:16 UTC

head link

[llvm-dev] Extending SLP Vectorizer to deal with aggregates?

On Wed, Oct 14, 2015 at 03:40:49PM +0000, Robison, Arch via llvm-dev
wrote:> I'm looking for a sanity check on extending SLP Vectorizer to deal with
aggregates.
> 
> I'd like to vectorize Julia tuple operations.  The Julia compiler
lowers tuples to LLVM arrays, not LLVM vectors.  I've tried making Julia
lower tuples to LLVM vectors, but that hurt performance when SLP Vectorizer was
not applicable, because of extraction/insertion overhead.  I.e., the Julia
lowering is too early to make the right choice on vector vs. array
representation of tuples.  So instead I'd like to make SLP Vectorizer
vectorize idioms involving aggregates.  A sample of the idiom is at
https://gist.github.com/ArchRobison/e762cd55b8cfc0e019a3 .
> 
> 1. Identify stores of arrays.  E.g. "store [4 x float]".
> 2. Walk chains backwards from the array stores, similar to the way
SLPVectorizer already walks chains.
> 3. If vectorization is applicable, replace array construction/load/store
with vector construction/load/store.  Vector load/stores will be unaligned.
> 
> Does this sound like a reasonable approach?  If it sounds too
Julia-specific, I could do it as a custom Julia pass.
> 
Hi Arch,

This kind of vectorization would be useful for the AMDGPU backend
especially for allocas.

We already have a custom pass in the backend called AMDGPUPromoteAllocas
which replaces loads and stores to alloca pointers with vector
operations in some very simple cases.  It would be great to have something more
generic that handles more cases.

-Tom> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Renato Golin via llvm-dev

2015-Oct-14 16:30 UTC

head link

[llvm-dev] Extending SLP Vectorizer to deal with aggregates?

On 14 October 2015 at 16:40, Robison, Arch via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> 1. Identify stores of arrays.  E.g. “store [4 x float]”.
>
> 2. Walk chains backwards from the array stores, similar to the way
> SLPVectorizer already walks chains.
>
> 3. If vectorization is applicable, replace array construction/load/store
> with vector construction/load/store.  Vector load/stores will be unaligned.
I like this idea, too. But unaligned access may need to be constrained
for the targets that support it. Not a big deal.

Another potential problem is the cost model. Right now, it's tuned to
make scalar vs vector loads and shuffles be "sensible", but array
patterns are slightly different. With luck, you can add gep costs to
balance individual allocas.

That IR you sent could be reduced to a few vector loads, one mul-add,
one vector store. :)

IIRC, AVX has the indirect shuffle, so this could even reduce the
number of loads.

Seems like a good candidate for vectorizing!

cheers,
-renato

Robison, Arch via llvm-dev

2015-Oct-14 20:12 UTC

head link

[llvm-dev] Extending SLP Vectorizer to deal with aggregates?

Will the cost model used by SLPVectorizer shoot down unaligned loads/stores on
targets that don't support them (or support them slowly)?

I started working out details to my design, and for a first round I think
it's safest to address only cases where the leaf of the chain is an array
load that can be rewritten as an unaligned vector load.  I.e., similar to the
CanReuseExtract idea in SLPVectorizer, but further constrained to be resulting
from an array load.

- Arch

-----Original Message-----
From: Renato Golin [mailto:renato.golin at linaro.org] 
Sent: Wednesday, October 14, 2015 11:31 AM
To: Robison, Arch
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Extending SLP Vectorizer to deal with aggregates?

On 14 October 2015 at 16:40, Robison, Arch via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> 1. Identify stores of arrays.  E.g. “store [4 x float]”.
>
> 2. Walk chains backwards from the array stores, similar to the way 
> SLPVectorizer already walks chains.
>
> 3. If vectorization is applicable, replace array 
> construction/load/store with vector construction/load/store.  Vector
load/stores will be unaligned.
I like this idea, too. But unaligned access may need to be constrained for the
targets that support it. Not a big deal.

Another potential problem is the cost model. Right now, it's tuned to make
scalar vs vector loads and shuffles be "sensible", but array patterns
are slightly different. With luck, you can add gep costs to balance individual
allocas.

That IR you sent could be reduced to a few vector loads, one mul-add, one vector
store. :)

IIRC, AVX has the indirect shuffle, so this could even reduce the number of
loads.

Seems like a good candidate for vectorizing!

cheers,
-renato

Arnold Schwaighofer via llvm-dev

2015-Oct-14 20:44 UTC

head link

[llvm-dev] Extending SLP Vectorizer to deal with aggregates?

I don’t see anything that is special to Julia in your example. I think this can
be part of the SLPVectorizer.

It will need a pattern to start its tree building when it sees a store of array
of insert values:

partial_value = insert_value(v1 ...)
full_value = insert_value(v2, partial_value)
store [2 x double] full_value, dest

Start a tree at (v1,v2).

A peephole to merge 

partial_value = insert_value(extract_element(v, 0), undef)
full_value = insert_value(extract_element(v, 1), partial_value)
store [2 x double] full_value, dest

into

store <2 x double> v, dest

And adjust the cost modeling in the SLPVectorizer to account for this peephole.

> On Oct 14, 2015, at 8:40 AM, Robison, Arch via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> I’m looking for a sanity check on extending SLP Vectorizer to deal with
aggregates.
>  
> I’d like to vectorize Julia tuple operations.  The Julia compiler lowers
tuples to LLVM arrays, not LLVM vectors.  I’ve tried making Julia lower tuples
to LLVM vectors, but that hurt performance when SLP Vectorizer was not
applicable, because of extraction/insertion overhead.  I.e., the Julia lowering
is too early to make the right choice on vector vs. array representation of
tuples.  So instead I’d like to make SLP Vectorizer vectorize idioms involving
aggregates.  A sample of the idiom is at
https://gist.github.com/ArchRobison/e762cd55b8cfc0e019a3 .
>  
> 1. Identify stores of arrays.  E.g. “store [4 x float]”. 
> 2. Walk chains backwards from the array stores, similar to the way
SLPVectorizer already walks chains.
> 3. If vectorization is applicable, replace array construction/load/store
with vector construction/load/store.  Vector load/stores will be unaligned.
>  
> Does this sound like a reasonable approach?  If it sounds too
Julia-specific, I could do it as a custom Julia pass.
>  
> - Arch D. Robison
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - Oct 2015 - Extending SLP Vectorizer to deal with aggregates?

[llvm-dev] Extending SLP Vectorizer to deal with aggregates?

[llvm-dev] Extending SLP Vectorizer to deal with aggregates?

[llvm-dev] Extending SLP Vectorizer to deal with aggregates?

[llvm-dev] Extending SLP Vectorizer to deal with aggregates?

[llvm-dev] Extending SLP Vectorizer to deal with aggregates?