Robison, Arch via llvm-dev
2015-Oct-14 15:40 UTC
[llvm-dev] Extending SLP Vectorizer to deal with aggregates?
I'm looking for a sanity check on extending SLP Vectorizer to deal with aggregates. I'd like to vectorize Julia tuple operations. The Julia compiler lowers tuples to LLVM arrays, not LLVM vectors. I've tried making Julia lower tuples to LLVM vectors, but that hurt performance when SLP Vectorizer was not applicable, because of extraction/insertion overhead. I.e., the Julia lowering is too early to make the right choice on vector vs. array representation of tuples. So instead I'd like to make SLP Vectorizer vectorize idioms involving aggregates. A sample of the idiom is at https://gist.github.com/ArchRobison/e762cd55b8cfc0e019a3 . 1. Identify stores of arrays. E.g. "store [4 x float]". 2. Walk chains backwards from the array stores, similar to the way SLPVectorizer already walks chains. 3. If vectorization is applicable, replace array construction/load/store with vector construction/load/store. Vector load/stores will be unaligned. Does this sound like a reasonable approach? If it sounds too Julia-specific, I could do it as a custom Julia pass. - Arch D. Robison -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151014/89f6dcd5/attachment.html>
Tom Stellard via llvm-dev
2015-Oct-14 16:16 UTC
[llvm-dev] Extending SLP Vectorizer to deal with aggregates?
On Wed, Oct 14, 2015 at 03:40:49PM +0000, Robison, Arch via llvm-dev wrote:> I'm looking for a sanity check on extending SLP Vectorizer to deal with aggregates. > > I'd like to vectorize Julia tuple operations. The Julia compiler lowers tuples to LLVM arrays, not LLVM vectors. I've tried making Julia lower tuples to LLVM vectors, but that hurt performance when SLP Vectorizer was not applicable, because of extraction/insertion overhead. I.e., the Julia lowering is too early to make the right choice on vector vs. array representation of tuples. So instead I'd like to make SLP Vectorizer vectorize idioms involving aggregates. A sample of the idiom is at https://gist.github.com/ArchRobison/e762cd55b8cfc0e019a3 . > > 1. Identify stores of arrays. E.g. "store [4 x float]". > 2. Walk chains backwards from the array stores, similar to the way SLPVectorizer already walks chains. > 3. If vectorization is applicable, replace array construction/load/store with vector construction/load/store. Vector load/stores will be unaligned. > > Does this sound like a reasonable approach? If it sounds too Julia-specific, I could do it as a custom Julia pass. >Hi Arch, This kind of vectorization would be useful for the AMDGPU backend especially for allocas. We already have a custom pass in the backend called AMDGPUPromoteAllocas which replaces loads and stores to alloca pointers with vector operations in some very simple cases. It would be great to have something more generic that handles more cases. -Tom> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Renato Golin via llvm-dev
2015-Oct-14 16:30 UTC
[llvm-dev] Extending SLP Vectorizer to deal with aggregates?
On 14 October 2015 at 16:40, Robison, Arch via llvm-dev <llvm-dev at lists.llvm.org> wrote:> 1. Identify stores of arrays. E.g. “store [4 x float]”. > > 2. Walk chains backwards from the array stores, similar to the way > SLPVectorizer already walks chains. > > 3. If vectorization is applicable, replace array construction/load/store > with vector construction/load/store. Vector load/stores will be unaligned.I like this idea, too. But unaligned access may need to be constrained for the targets that support it. Not a big deal. Another potential problem is the cost model. Right now, it's tuned to make scalar vs vector loads and shuffles be "sensible", but array patterns are slightly different. With luck, you can add gep costs to balance individual allocas. That IR you sent could be reduced to a few vector loads, one mul-add, one vector store. :) IIRC, AVX has the indirect shuffle, so this could even reduce the number of loads. Seems like a good candidate for vectorizing! cheers, -renato
Robison, Arch via llvm-dev
2015-Oct-14 20:12 UTC
[llvm-dev] Extending SLP Vectorizer to deal with aggregates?
Will the cost model used by SLPVectorizer shoot down unaligned loads/stores on targets that don't support them (or support them slowly)? I started working out details to my design, and for a first round I think it's safest to address only cases where the leaf of the chain is an array load that can be rewritten as an unaligned vector load. I.e., similar to the CanReuseExtract idea in SLPVectorizer, but further constrained to be resulting from an array load. - Arch -----Original Message----- From: Renato Golin [mailto:renato.golin at linaro.org] Sent: Wednesday, October 14, 2015 11:31 AM To: Robison, Arch Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] Extending SLP Vectorizer to deal with aggregates? On 14 October 2015 at 16:40, Robison, Arch via llvm-dev <llvm-dev at lists.llvm.org> wrote:> 1. Identify stores of arrays. E.g. “store [4 x float]”. > > 2. Walk chains backwards from the array stores, similar to the way > SLPVectorizer already walks chains. > > 3. If vectorization is applicable, replace array > construction/load/store with vector construction/load/store. Vector load/stores will be unaligned.I like this idea, too. But unaligned access may need to be constrained for the targets that support it. Not a big deal. Another potential problem is the cost model. Right now, it's tuned to make scalar vs vector loads and shuffles be "sensible", but array patterns are slightly different. With luck, you can add gep costs to balance individual allocas. That IR you sent could be reduced to a few vector loads, one mul-add, one vector store. :) IIRC, AVX has the indirect shuffle, so this could even reduce the number of loads. Seems like a good candidate for vectorizing! cheers, -renato
Arnold Schwaighofer via llvm-dev
2015-Oct-14 20:44 UTC
[llvm-dev] Extending SLP Vectorizer to deal with aggregates?
I don’t see anything that is special to Julia in your example. I think this can be part of the SLPVectorizer. It will need a pattern to start its tree building when it sees a store of array of insert values: partial_value = insert_value(v1 ...) full_value = insert_value(v2, partial_value) store [2 x double] full_value, dest Start a tree at (v1,v2). A peephole to merge partial_value = insert_value(extract_element(v, 0), undef) full_value = insert_value(extract_element(v, 1), partial_value) store [2 x double] full_value, dest into store <2 x double> v, dest And adjust the cost modeling in the SLPVectorizer to account for this peephole.> On Oct 14, 2015, at 8:40 AM, Robison, Arch via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I’m looking for a sanity check on extending SLP Vectorizer to deal with aggregates. > > I’d like to vectorize Julia tuple operations. The Julia compiler lowers tuples to LLVM arrays, not LLVM vectors. I’ve tried making Julia lower tuples to LLVM vectors, but that hurt performance when SLP Vectorizer was not applicable, because of extraction/insertion overhead. I.e., the Julia lowering is too early to make the right choice on vector vs. array representation of tuples. So instead I’d like to make SLP Vectorizer vectorize idioms involving aggregates. A sample of the idiom is at https://gist.github.com/ArchRobison/e762cd55b8cfc0e019a3 . > > 1. Identify stores of arrays. E.g. “store [4 x float]”. > 2. Walk chains backwards from the array stores, similar to the way SLPVectorizer already walks chains. > 3. If vectorization is applicable, replace array construction/load/store with vector construction/load/store. Vector load/stores will be unaligned. > > Does this sound like a reasonable approach? If it sounds too Julia-specific, I could do it as a custom Julia pass. > > - Arch D. Robison > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Apparently Analagous Threads
- [LLVMdev] Extend SLPVectorizer to struct operations that are isomorphic to vector operations?
- [LLVMdev] Improving SLPVectorizer for Julia
- [LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
- [LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
- [LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.