thr3ads.net - llvm dev - [llvm-dev] PSLP: Padded SLP Automatic Vectorization [Oct 2020]

If this information is useful, please help other people find it:
Share via:

Matt P. Dziubinski via llvm-dev

2020-Oct-02 14:33 UTC

[llvm-dev] PSLP: Padded SLP Automatic Vectorization

On 9/29/2020 14:37, David Chisnall via llvm-dev wrote:> On 28/09/2020 15:45, Matt P. Dziubinski via llvm-dev wrote:
>> Hey, I noticed this talk from the EuroLLVM 2015 
>> (https://llvm.org/devmtg/2015-04/slides/pslp_slides_EUROLLVM2015.pdf) 
>> on the PSLP vectorization algorithm (CGO 2015 paper: 
>> http://vporpo.me/papers/pslp_cgo2015.pdf).
>>
>> Is anyone working on implementing it?
>>
>> If so, are there Phab reviews I can subscribe to?
> 
> The CGO paper was based on a very old LLVM and the last I heard, moving 
> the transform to a newer LLVM and rerunning the benchmarks made the 
> speedups go away.  It's not clear what the cause of this was and the 
> team responsible didn't have the time to do any root cause analysis.
Thank you for the reply; interesting!

Incidentally, would you happen to know whether VW-SLP fares any better?

I'm referring to "VW-SLP: Auto-Vectorization with Adaptive Vector
Width"
from PACT 2018 (http://vporpo.me/papers/vwslp_pact2018.pdf; also 
presented as "Extending the SLP vectorizer to support variable vector 
widths" at the 2018 LLVM Developers’ Meeting, 
http://llvm.org/devmtg/2018-10/).

I'm wondering whether it subsumes PSLP or whether there are areas where 
PSLP still works (or worked) better, given the (brief) comparison in the 
paper (from which it seems like it addresses the problem possibly in a 
more general manner):

"The widely used bottom-up SLP algorithm has been improved in several 
ways. Porpodas et al. [32] propose  PSLP, a technique that pads the 
scalar code with redundant instructions, to convert non-isomorphic 
instruction sequences into isomorphic ones, thus extending the 
applicability of SLP. Just like VW-SLP-S, PSLP can vectorize code when 
some of the lanes differ, but it is most effective when the 
non-isomorphic parts are only a short section of the instruction chain. 
VW-SLP, on the other hand, works even if the chain never converges."

Best,
Matt

Vasileios Porpodas via llvm-dev

2020-Oct-02 20:57 UTC

head link

[llvm-dev] PSLP: Padded SLP Automatic Vectorization

Hi,

Some of the PSLP functionality was added to the SLP vectorizer with the
support for vectorizing alternate sequences several years ago. So some
of the most common patterns, like bundles of alternating add-subs, are
already being taken care of by the current SLP pass. However, it won't
do the more sophisticated padding across graphs that PSLP does.

There is indeed some overlap between VW-SLP and PSLP since they are both
trying to form isomorphic graphs out of non-isomorphic code. But each
one works better in different cases, so they complement each other.

For example, let's say you are in the middle of vectorizing 4-wide and
you encounter opcodes like this: [A, A, B, B].
Then you could either vectorize this with:
1. Padding the lanes and emitting shuffles (PSLP-style), or
2. Falling back to 2-way vectorization like in VW-SLP.
Determining which one is better depends on the rest of the code.
If the rest of the code is 4-wide isomorphic, then the PSLP option is
better, if it is 2-wide isomorphic then the VW-SLP option is better.

As it is mentioned in the paper, the current SLP pass can change the
vector-width but it cannot do so in all cases. The main focus of the
VW-SLP paper is to highlight some of these cases and show a design for
the SLP pass where width switching is done in a more complete way.

Performance fluctuates a lot as the compiler changes, so it is hard to
tell what to expect from applying either of these techniques to the
current snapshot.

Regards,
Vasileios

"Matt P. Dziubinski" <matdzb at gmail.com> writes:
> On 9/29/2020 14:37, David Chisnall via llvm-dev wrote:
>> On 28/09/2020 15:45, Matt P. Dziubinski via llvm-dev wrote:
>>> Hey, I noticed this talk from the EuroLLVM 2015 
>>>
(https://llvm.org/devmtg/2015-04/slides/pslp_slides_EUROLLVM2015.pdf)
>>> on the PSLP vectorization algorithm (CGO 2015 paper: 
>>> http://vporpo.me/papers/pslp_cgo2015.pdf).
>>>
>>> Is anyone working on implementing it?
>>>
>>> If so, are there Phab reviews I can subscribe to?
>> 
>> The CGO paper was based on a very old LLVM and the last I heard, moving
>> the transform to a newer LLVM and rerunning the benchmarks made the 
>> speedups go away.  It's not clear what the cause of this was and
the
>> team responsible didn't have the time to do any root cause
analysis.
>
> Thank you for the reply; interesting!
>
> Incidentally, would you happen to know whether VW-SLP fares any better?
>
> I'm referring to "VW-SLP: Auto-Vectorization with Adaptive Vector
Width"
> from PACT 2018 (http://vporpo.me/papers/vwslp_pact2018.pdf; also 
> presented as "Extending the SLP vectorizer to support variable vector 
> widths" at the 2018 LLVM Developers’ Meeting, 
> http://llvm.org/devmtg/2018-10/).
>
> I'm wondering whether it subsumes PSLP or whether there are areas where
> PSLP still works (or worked) better, given the (brief) comparison in the 
> paper (from which it seems like it addresses the problem possibly in a 
> more general manner):
>
> "The widely used bottom-up SLP algorithm has been improved in several 
> ways. Porpodas et al. [32] propose  PSLP, a technique that pads the 
> scalar code with redundant instructions, to convert non-isomorphic 
> instruction sequences into isomorphic ones, thus extending the 
> applicability of SLP. Just like VW-SLP-S, PSLP can vectorize code when 
> some of the lanes differ, but it is most effective when the 
> non-isomorphic parts are only a short section of the instruction chain. 
> VW-SLP, on the other hand, works even if the chain never converges."
>
> Best,
> Matt

Matt P. Dziubinski via llvm-dev

2020-Oct-06 11:49 UTC

head link

[llvm-dev] PSLP: Padded SLP Automatic Vectorization

Hi,

Thank you for your reply, this is very informative.
Good to know about the VW-SLP & PSLP trade-offs, I appreciate your 
explanation!

Best,
Matt

On 10/2/2020 22:57, Vasileios Porpodas wrote:> Hi,
> 
> Some of the PSLP functionality was added to the SLP vectorizer with the
> support for vectorizing alternate sequences several years ago. So some
> of the most common patterns, like bundles of alternating add-subs, are
> already being taken care of by the current SLP pass. However, it won't
> do the more sophisticated padding across graphs that PSLP does.
> 
> There is indeed some overlap between VW-SLP and PSLP since they are both
> trying to form isomorphic graphs out of non-isomorphic code. But each
> one works better in different cases, so they complement each other.
> 
> For example, let's say you are in the middle of vectorizing 4-wide and
> you encounter opcodes like this: [A, A, B, B].
> Then you could either vectorize this with:
> 1. Padding the lanes and emitting shuffles (PSLP-style), or
> 2. Falling back to 2-way vectorization like in VW-SLP.
> Determining which one is better depends on the rest of the code.
> If the rest of the code is 4-wide isomorphic, then the PSLP option is
> better, if it is 2-wide isomorphic then the VW-SLP option is better.
> 
> As it is mentioned in the paper, the current SLP pass can change the
> vector-width but it cannot do so in all cases. The main focus of the
> VW-SLP paper is to highlight some of these cases and show a design for
> the SLP pass where width switching is done in a more complete way.
> 
> Performance fluctuates a lot as the compiler changes, so it is hard to
> tell what to expect from applying either of these techniques to the
> current snapshot.
> 
> Regards,
> Vasileios
> 
> "Matt P. Dziubinski" <matdzb at gmail.com> writes:
> 
>> On 9/29/2020 14:37, David Chisnall via llvm-dev wrote:
>>> On 28/09/2020 15:45, Matt P. Dziubinski via llvm-dev wrote:
>>>> Hey, I noticed this talk from the EuroLLVM 2015
>>>>
(https://llvm.org/devmtg/2015-04/slides/pslp_slides_EUROLLVM2015.pdf)
>>>> on the PSLP vectorization algorithm (CGO 2015 paper:
>>>> http://vporpo.me/papers/pslp_cgo2015.pdf).
>>>>
>>>> Is anyone working on implementing it?
>>>>
>>>> If so, are there Phab reviews I can subscribe to?
>>>
>>> The CGO paper was based on a very old LLVM and the last I heard,
moving
>>> the transform to a newer LLVM and rerunning the benchmarks made the
>>> speedups go away.  It's not clear what the cause of this was
and the
>>> team responsible didn't have the time to do any root cause
analysis.
>>
>> Thank you for the reply; interesting!
>>
>> Incidentally, would you happen to know whether VW-SLP fares any better?
>>
>> I'm referring to "VW-SLP: Auto-Vectorization with Adaptive
Vector Width"
>> from PACT 2018 (http://vporpo.me/papers/vwslp_pact2018.pdf; also
>> presented as "Extending the SLP vectorizer to support variable
vector
>> widths" at the 2018 LLVM Developers’ Meeting,
>> http://llvm.org/devmtg/2018-10/).
>>
>> I'm wondering whether it subsumes PSLP or whether there are areas
where
>> PSLP still works (or worked) better, given the (brief) comparison in
the
>> paper (from which it seems like it addresses the problem possibly in a
>> more general manner):
>>
>> "The widely used bottom-up SLP algorithm has been improved in
several
>> ways. Porpodas et al. [32] propose  PSLP, a technique that pads the
>> scalar code with redundant instructions, to convert non-isomorphic
>> instruction sequences into isomorphic ones, thus extending the
>> applicability of SLP. Just like VW-SLP-S, PSLP can vectorize code when
>> some of the lanes differ, but it is most effective when the
>> non-isomorphic parts are only a short section of the instruction chain.
>> VW-SLP, on the other hand, works even if the chain never
converges."
>>
>> Best,
>> Matt

llvm dev - Oct 2020 - PSLP: Padded SLP Automatic Vectorization

[llvm-dev] PSLP: Padded SLP Automatic Vectorization

[llvm-dev] PSLP: Padded SLP Automatic Vectorization

[llvm-dev] PSLP: Padded SLP Automatic Vectorization