Matt P. Dziubinski via llvm-dev
2020-Oct-02 14:33 UTC
[llvm-dev] PSLP: Padded SLP Automatic Vectorization
On 9/29/2020 14:37, David Chisnall via llvm-dev wrote:> On 28/09/2020 15:45, Matt P. Dziubinski via llvm-dev wrote: >> Hey, I noticed this talk from the EuroLLVM 2015 >> (https://llvm.org/devmtg/2015-04/slides/pslp_slides_EUROLLVM2015.pdf) >> on the PSLP vectorization algorithm (CGO 2015 paper: >> http://vporpo.me/papers/pslp_cgo2015.pdf). >> >> Is anyone working on implementing it? >> >> If so, are there Phab reviews I can subscribe to? > > The CGO paper was based on a very old LLVM and the last I heard, moving > the transform to a newer LLVM and rerunning the benchmarks made the > speedups go away. It's not clear what the cause of this was and the > team responsible didn't have the time to do any root cause analysis.Thank you for the reply; interesting! Incidentally, would you happen to know whether VW-SLP fares any better? I'm referring to "VW-SLP: Auto-Vectorization with Adaptive Vector Width" from PACT 2018 (http://vporpo.me/papers/vwslp_pact2018.pdf; also presented as "Extending the SLP vectorizer to support variable vector widths" at the 2018 LLVM Developers’ Meeting, http://llvm.org/devmtg/2018-10/). I'm wondering whether it subsumes PSLP or whether there are areas where PSLP still works (or worked) better, given the (brief) comparison in the paper (from which it seems like it addresses the problem possibly in a more general manner): "The widely used bottom-up SLP algorithm has been improved in several ways. Porpodas et al. [32] propose PSLP, a technique that pads the scalar code with redundant instructions, to convert non-isomorphic instruction sequences into isomorphic ones, thus extending the applicability of SLP. Just like VW-SLP-S, PSLP can vectorize code when some of the lanes differ, but it is most effective when the non-isomorphic parts are only a short section of the instruction chain. VW-SLP, on the other hand, works even if the chain never converges." Best, Matt
Vasileios Porpodas via llvm-dev
2020-Oct-02 20:57 UTC
[llvm-dev] PSLP: Padded SLP Automatic Vectorization
Hi, Some of the PSLP functionality was added to the SLP vectorizer with the support for vectorizing alternate sequences several years ago. So some of the most common patterns, like bundles of alternating add-subs, are already being taken care of by the current SLP pass. However, it won't do the more sophisticated padding across graphs that PSLP does. There is indeed some overlap between VW-SLP and PSLP since they are both trying to form isomorphic graphs out of non-isomorphic code. But each one works better in different cases, so they complement each other. For example, let's say you are in the middle of vectorizing 4-wide and you encounter opcodes like this: [A, A, B, B]. Then you could either vectorize this with: 1. Padding the lanes and emitting shuffles (PSLP-style), or 2. Falling back to 2-way vectorization like in VW-SLP. Determining which one is better depends on the rest of the code. If the rest of the code is 4-wide isomorphic, then the PSLP option is better, if it is 2-wide isomorphic then the VW-SLP option is better. As it is mentioned in the paper, the current SLP pass can change the vector-width but it cannot do so in all cases. The main focus of the VW-SLP paper is to highlight some of these cases and show a design for the SLP pass where width switching is done in a more complete way. Performance fluctuates a lot as the compiler changes, so it is hard to tell what to expect from applying either of these techniques to the current snapshot. Regards, Vasileios "Matt P. Dziubinski" <matdzb at gmail.com> writes:> On 9/29/2020 14:37, David Chisnall via llvm-dev wrote: >> On 28/09/2020 15:45, Matt P. Dziubinski via llvm-dev wrote: >>> Hey, I noticed this talk from the EuroLLVM 2015 >>> (https://llvm.org/devmtg/2015-04/slides/pslp_slides_EUROLLVM2015.pdf) >>> on the PSLP vectorization algorithm (CGO 2015 paper: >>> http://vporpo.me/papers/pslp_cgo2015.pdf). >>> >>> Is anyone working on implementing it? >>> >>> If so, are there Phab reviews I can subscribe to? >> >> The CGO paper was based on a very old LLVM and the last I heard, moving >> the transform to a newer LLVM and rerunning the benchmarks made the >> speedups go away. It's not clear what the cause of this was and the >> team responsible didn't have the time to do any root cause analysis. > > Thank you for the reply; interesting! > > Incidentally, would you happen to know whether VW-SLP fares any better? > > I'm referring to "VW-SLP: Auto-Vectorization with Adaptive Vector Width" > from PACT 2018 (http://vporpo.me/papers/vwslp_pact2018.pdf; also > presented as "Extending the SLP vectorizer to support variable vector > widths" at the 2018 LLVM Developers’ Meeting, > http://llvm.org/devmtg/2018-10/). > > I'm wondering whether it subsumes PSLP or whether there are areas where > PSLP still works (or worked) better, given the (brief) comparison in the > paper (from which it seems like it addresses the problem possibly in a > more general manner): > > "The widely used bottom-up SLP algorithm has been improved in several > ways. Porpodas et al. [32] propose PSLP, a technique that pads the > scalar code with redundant instructions, to convert non-isomorphic > instruction sequences into isomorphic ones, thus extending the > applicability of SLP. Just like VW-SLP-S, PSLP can vectorize code when > some of the lanes differ, but it is most effective when the > non-isomorphic parts are only a short section of the instruction chain. > VW-SLP, on the other hand, works even if the chain never converges." > > Best, > Matt
Matt P. Dziubinski via llvm-dev
2020-Oct-06 11:49 UTC
[llvm-dev] PSLP: Padded SLP Automatic Vectorization
Hi, Thank you for your reply, this is very informative. Good to know about the VW-SLP & PSLP trade-offs, I appreciate your explanation! Best, Matt On 10/2/2020 22:57, Vasileios Porpodas wrote:> Hi, > > Some of the PSLP functionality was added to the SLP vectorizer with the > support for vectorizing alternate sequences several years ago. So some > of the most common patterns, like bundles of alternating add-subs, are > already being taken care of by the current SLP pass. However, it won't > do the more sophisticated padding across graphs that PSLP does. > > There is indeed some overlap between VW-SLP and PSLP since they are both > trying to form isomorphic graphs out of non-isomorphic code. But each > one works better in different cases, so they complement each other. > > For example, let's say you are in the middle of vectorizing 4-wide and > you encounter opcodes like this: [A, A, B, B]. > Then you could either vectorize this with: > 1. Padding the lanes and emitting shuffles (PSLP-style), or > 2. Falling back to 2-way vectorization like in VW-SLP. > Determining which one is better depends on the rest of the code. > If the rest of the code is 4-wide isomorphic, then the PSLP option is > better, if it is 2-wide isomorphic then the VW-SLP option is better. > > As it is mentioned in the paper, the current SLP pass can change the > vector-width but it cannot do so in all cases. The main focus of the > VW-SLP paper is to highlight some of these cases and show a design for > the SLP pass where width switching is done in a more complete way. > > Performance fluctuates a lot as the compiler changes, so it is hard to > tell what to expect from applying either of these techniques to the > current snapshot. > > Regards, > Vasileios > > "Matt P. Dziubinski" <matdzb at gmail.com> writes: > >> On 9/29/2020 14:37, David Chisnall via llvm-dev wrote: >>> On 28/09/2020 15:45, Matt P. Dziubinski via llvm-dev wrote: >>>> Hey, I noticed this talk from the EuroLLVM 2015 >>>> (https://llvm.org/devmtg/2015-04/slides/pslp_slides_EUROLLVM2015.pdf) >>>> on the PSLP vectorization algorithm (CGO 2015 paper: >>>> http://vporpo.me/papers/pslp_cgo2015.pdf). >>>> >>>> Is anyone working on implementing it? >>>> >>>> If so, are there Phab reviews I can subscribe to? >>> >>> The CGO paper was based on a very old LLVM and the last I heard, moving >>> the transform to a newer LLVM and rerunning the benchmarks made the >>> speedups go away. It's not clear what the cause of this was and the >>> team responsible didn't have the time to do any root cause analysis. >> >> Thank you for the reply; interesting! >> >> Incidentally, would you happen to know whether VW-SLP fares any better? >> >> I'm referring to "VW-SLP: Auto-Vectorization with Adaptive Vector Width" >> from PACT 2018 (http://vporpo.me/papers/vwslp_pact2018.pdf; also >> presented as "Extending the SLP vectorizer to support variable vector >> widths" at the 2018 LLVM Developers’ Meeting, >> http://llvm.org/devmtg/2018-10/). >> >> I'm wondering whether it subsumes PSLP or whether there are areas where >> PSLP still works (or worked) better, given the (brief) comparison in the >> paper (from which it seems like it addresses the problem possibly in a >> more general manner): >> >> "The widely used bottom-up SLP algorithm has been improved in several >> ways. Porpodas et al. [32] propose PSLP, a technique that pads the >> scalar code with redundant instructions, to convert non-isomorphic >> instruction sequences into isomorphic ones, thus extending the >> applicability of SLP. Just like VW-SLP-S, PSLP can vectorize code when >> some of the lanes differ, but it is most effective when the >> non-isomorphic parts are only a short section of the instruction chain. >> VW-SLP, on the other hand, works even if the chain never converges." >> >> Best, >> Matt