thr3ads.net - llvm dev - [llvm-dev] Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors) [Nov 2020]

If this information is useful, please help other people find it:
Share via:

Vineet Kumar via llvm-dev

2020-Nov-02 15:52 UTC

[llvm-dev] Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

Hi all,

At the Barcelona Supercomputing Center, we have been working on an 
end-to-end vectorizer using scalable vectors for RISC-V Vector extension 
in context of the EPI Project 
<https://www.european-processor-initiative.eu/accelerator/>. We earlier 
shared a demo of our prototype implementation  
(https://repo.hca.bsc.es/epic/z/9eYRIF, see below) with the folks 
involved with LLVM SVE/SVE2 development. Since there was an interest in 
looking at the source code during the discussions in the subsequent LLVM 
SVE/SVE2 sync-up meetings, we are also publishing a public copy of our 
repository.

It is available at https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi and 
will sync with our ongoing development on a weekly basis. Note that this 
is very much a work in progress and the code in this repository is only 
for reference purpose. Please see the README 
<https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/-/blob/EPI/README.md> 
file in the repo for details on our approach, design decisions, and 
limitations.

We welcome any questions and feedback.


Thanks and Regards,
Vineet Kumar -vineet.kumar at bsc.es
Barcelona Supercomputing Center - Centro Nacional de Supercomputación


On 2020-07-29 3:10 a.m., Vineet Kumar wrote:> Hi all,
>
> Following up on the discussion in the last meeting about auto-
> vectorization for RISC-V Vector extension (scalable vectors) at the
> Barcelona Supercomputing Center, here are some additional details.
>
> We have a working prototype for end-to-end compilation targeting the
> RISC-V Vector extension. The auto-vectorizer supports two strategies to
> generate LLVM IR using scalable vectors:
>
> 1) Generate a vector loop using VF (vscale x k) = whole vector register
> width, followed by a scalar tail loop.
>
> 2) Generate only a vector loop with active vector length controlled by
> the RISC-V `vsetvli` instruction and using Vector Predicated intrinsics
> (https://reviews.llvm.org/D57504). (Of course, intrinsics come with
> their own limitations but we feel it serves as a good proof of concept
> for our use case.) We also extend the VPlan to generate VPInstructions
> that are expanded using predicated intrinsics.
>
> We also considered a third hybrid approach of having a vector loop with
> VF = whole register width, followed by a vector tail loop using
> predicated intrinsics. For now though, based on project requirements,
> we favoured the second approach.
>
> We have also taken care to not break any fixed-vector implementation.
> All the scalable vector IR gen is guarded by conditions set by TTI.
>
> For shuffles, the most used case is broadcast which is supported by the
> current semantics of `shufflevector` instruction. For other cases like
> reverse, concat, etc., we have defined our own intrinsics.
>
> Current limitaitons:
> The cost model for scalable vectors doesn't do much other than always
> decideing to vectorize with VF based on TargetWidestType/SmallestType.
> We also do not support interleaving yet.
>
> Demo:
> The current implementation is very much in alpha and eventually, once
> it's more polished and thoroughly verified, we will put out patches on
> Phabricator. Till then, we have set up a Compiler Explorer server
> against our development branch to showcase the generated code.
>
> You can see and experiment with the generated LLVM IR and VPlan for a
> set of examples, with predicated vector loop (`-mprefer-predicate-over-
> epilog`) athttps://repo.hca.bsc.es/epic/z/JB4ZoJ
> and with a scalar epilog (`-mno-prefer-predicate-over-epilog`) at
> https://repo.hca.bsc.es/epic/z/0WoDGt.
> Note that you can remove the `-emit-llvm` option to see the generated
> RISC-V assembly.
>
> We welcome any questions and feedback.
>
> Thanks and Regards,
> Vineet Kumar -vineet.kumar at bsc.es
> Barcelona Supercomputing Center - Centro Nacional de Supercomputación
>
>

http://bsc.es/disclaimer
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201102/e2b218a0/attachment-0001.html>

Renato Golin via llvm-dev

2020-Nov-02 16:43 UTC

head link

[llvm-dev] Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

Hi Vineet,

Thanks for sharing! I haven't looked at the code yet, just read the README
file you have and it has already answered a lot of questions that I
initially had. Some general comments...

I'm very happy to see that Simon's predication changes were useful to
your
work. It's a nice validation of their work and hopefully will help SVE, too.

Your main approach to strip-mine + fuse tail loop is what I was going to
propose for now. It matches well with the bite-sized approach VPlan has and
could build on existing vector formats. For example, you always try to
strip-mine (for scalable and non-scalable) and then only for scalable, you
try to fuse the scalar loops, which would improve the solution and give
RVV/SVVE an edge over the other extensions on the same hardware.

There were also in the past proposals to vectorise the tail loop,
which could be a similar step. For example, in case the main vector body is
8-way or 16-way, the tail loop would be 7-way or 15-way, which is horribly
inefficient. The idea was to further vectorise the 7-way as 4+2+1 ways,
same for 15. If those loops are then unrolled, you end up with a nice
decaling down pattern. On scalable vectors, this becomes a noop.

There is a separate thread for vectorisation cost model [1] which talks
about some of the challenges there, I think we need to include scalable
vectors in consideration when thinking about it.

The NEON vs RISCV register shadowing is interesting. It is true we mostly
ignored 64-bit vectors in the vectoriser, but LLVM can still generate them
with the (SLP) region vectoriser. IIRC, support for that kind of aliasing
is not trivial (and why GCC's description of NEON registers sucked for so
long), but the motivation of register pressure inside hot loops is indeed
important. I'm adding Arai Masaki in CC as this is something he was working
on.

Otherwise, I think working with the current folks on VPlan and scalable
extensions will be a good way to upstreaming all the ideas you guys had in
your work.

Thanks!
--renato

[1] http://lists.llvm.org/pipermail/llvm-dev/2020-October/146236.html



On Mon, 2 Nov 2020 at 15:52, Vineet Kumar via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>
> At the Barcelona Supercomputing Center, we have been working on an
> end-to-end vectorizer using scalable vectors for RISC-V Vector extension in
> context of the EPI Project
> <https://www.european-processor-initiative.eu/accelerator/>. We
earlier
> shared a demo of our prototype implementation  (
> https://repo.hca.bsc.es/epic/z/9eYRIF, see below) with the folks involved
> with LLVM SVE/SVE2 development. Since there was an interest in looking at
> the source code during the discussions in the subsequent LLVM SVE/SVE2
> sync-up meetings, we are also publishing a public copy of our repository.
>
> It is available at https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi and
> will sync with our ongoing development on a weekly basis. Note that this is
> very much a work in progress and the code in this repository is only for
> reference purpose. Please see the README
>
<https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/-/blob/EPI/README.md>
> file in the repo for details on our approach, design decisions, and
> limitations.
>
> We welcome any questions and feedback.
>
> Thanks and Regards,
> Vineet Kumar - vineet.kumar at bsc.es
> Barcelona Supercomputing Center - Centro Nacional de Supercomputación
>
>
>
> On 2020-07-29 3:10 a.m., Vineet Kumar wrote:
>
> Hi all,
>
> Following up on the discussion in the last meeting about auto-
> vectorization for RISC-V Vector extension (scalable vectors) at the
> Barcelona Supercomputing Center, here are some additional details.
>
> We have a working prototype for end-to-end compilation targeting the
> RISC-V Vector extension. The auto-vectorizer supports two strategies to
> generate LLVM IR using scalable vectors:
>
> 1) Generate a vector loop using VF (vscale x k) = whole vector register
> width, followed by a scalar tail loop.
>
> 2) Generate only a vector loop with active vector length controlled by
> the RISC-V `vsetvli` instruction and using Vector Predicated intrinsics
> (https://reviews.llvm.org/D57504). (Of course, intrinsics come with
> their own limitations but we feel it serves as a good proof of concept
> for our use case.) We also extend the VPlan to generate VPInstructions
> that are expanded using predicated intrinsics.
>
> We also considered a third hybrid approach of having a vector loop with
> VF = whole register width, followed by a vector tail loop using
> predicated intrinsics. For now though, based on project requirements,
> we favoured the second approach.
>
> We have also taken care to not break any fixed-vector implementation.
> All the scalable vector IR gen is guarded by conditions set by TTI.
>
> For shuffles, the most used case is broadcast which is supported by the
> current semantics of `shufflevector` instruction. For other cases like
> reverse, concat, etc., we have defined our own intrinsics.
>
> Current limitaitons:
> The cost model for scalable vectors doesn't do much other than always
> decideing to vectorize with VF based on TargetWidestType/SmallestType.
> We also do not support interleaving yet.
>
> Demo:
> The current implementation is very much in alpha and eventually, once
> it's more polished and thoroughly verified, we will put out patches on
> Phabricator. Till then, we have set up a Compiler Explorer server
> against our development branch to showcase the generated code.
>
> You can see and experiment with the generated LLVM IR and VPlan for a
> set of examples, with predicated vector loop (`-mprefer-predicate-over-
> epilog`) at https://repo.hca.bsc.es/epic/z/JB4ZoJ
> and with a scalar epilog (`-mno-prefer-predicate-over-epilog`) at
https://repo.hca.bsc.es/epic/z/0WoDGt.
> Note that you can remove the `-emit-llvm` option to see the generated
> RISC-V assembly.
>
> We welcome any questions and feedback.
>
> Thanks and Regards,
> Vineet Kumar - vineet.kumar at bsc.es
> Barcelona Supercomputing Center - Centro Nacional de Supercomputación
>
>
>
>
>
> WARNING / LEGAL TEXT: This message is intended only for the use of the
> individual or entity to which it is addressed and may contain information
> which is privileged, confidential, proprietary, or exempt from disclosure
> under applicable law. If you are not the intended recipient or the person
> responsible for delivering the message to the intended recipient, you are
> strictly prohibited from disclosing, distributing, copying, or in any way
> using this message. If you have received this communication in error,
> please notify the sender and destroy and delete any copies you may have
> received.
>
> http://www.bsc.es/disclaimer
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201102/c6e4d2e7/attachment.html>

Vineet Kumar via llvm-dev

2020-Nov-05 01:36 UTC

head link

[llvm-dev] Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

Hi Renato,

Thanks a lot for your comments!

(more inline.)


Thanks and Regards,

Vineet


On 2020-11-02 5:43 p.m., Renato Golin wrote:> Hi Vineet,
>
> Thanks for sharing! I haven't looked at the code yet, just read the 
> README file you have and it has already answered a lot of questions 
> that I initially had. Some general comments...
>
> I'm very happy to see that Simon's predication changes were useful
to
> your work. It's a nice validation of their work and hopefully will 
> help SVE, too.Simon's vector predication ideas fit really nicely with our approach to 
predicated vectorization, specially the support for EVL parameter. We 
look forward to more discussions around it.>
> Your main approach to strip-mine + fuse tail loop is what I was going 
> to propose for now. It matches well with the bite-sized approach VPlan 
> has and could build on existing vector formats. For example, you 
> always try to strip-mine (for scalable and non-scalable) and then only 
> for scalable, you try to fuse the scalar loops, which would improve 
> the solution and give RVV/SVVE an edge over the other extensions on 
> the same hardware.While our implemented approach with tail folding and predication is 
guided by the research interests of the EPI project, I agree that for a 
more general implementation your proposed approach for now makes more 
sense before moving on to better predication support and exploring other 
approaches.>
> There were also in the past proposals to vectorise the tail loop, 
> which could be a similar step. For example, in case the main vector 
> body is 8-way or 16-way, the tail loop would be 7-way or 15-way, which 
> is horribly inefficient. The idea was to further vectorise the 7-way 
> as 4+2+1 ways, same for 15. If those loops are then unrolled, you end 
> up with a nice decaling down pattern. On scalable vectors, this 
> becomes a noop.
>
> There is a separate thread for vectorisation cost model [1] which 
> talks about some of the challenges there, I think we need to include 
> scalable vectors in consideration when thinking about it.Agreed. It would be very useful to think about a scalable vectors aware 
cost-model right from the beginning now that there is effort already 
underway to integrate it into VPlan. There was also a discussion around 
it in the latest SVE/SVE2 sync-up meeting and I think almost everyone 
was in agreement.>
> The NEON vs RISCV register shadowing is interesting. It is true we 
> mostly ignored 64-bit vectors in the vectoriser, but LLVM can still 
> generate them with the (SLP) region vectoriser. IIRC, support for that 
> kind of aliasing is not trivial (and why GCC's description of NEON 
> registers sucked for so long), but the motivation of register pressure 
> inside hot loops is indeed important. I'm adding Arai Masaki in CC as 
> this is something he was working on.
Thanks for adding Arai! I will be happy to pick their brain on the the 
topic.

One specific place where we have to deal with it is when computing a 
feasible max VF. I am currently experimenting with an approach to have 
user specify (via a command line flag) a vector register width 
multiplier - a factor by which the operating vector register width would 
be the multiple of the minimum vector register width and then based on 
that, estimate the highest VF that won't spill registers (relies on TTI 
for information about the number of registers in relation to register 
width). This is definitely not a generic solution and probably not 
elegant either but personally it serves as a starting point to think 
about the broader issue.
>
> Otherwise, I think working with the current folks on VPlan and 
> scalable extensions will be a good way to upstreaming all the ideas 
> you guys had in your work.
That's the plan!>
> Thanks!
> --renato
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/2020-October/146236.html
>
>
>
> On Mon, 2 Nov 2020 at 15:52, Vineet Kumar via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     Hi all,
>
>     At the Barcelona Supercomputing Center, we have been working on an
>     end-to-end vectorizer using scalable vectors for RISC-V Vector
>     extension in context of the EPI Project
>     <https://www.european-processor-initiative.eu/accelerator/>. We
>     earlier shared a demo of our prototype implementation 
>     (https://repo.hca.bsc.es/epic/z/9eYRIF, see below) with the folks
>     involved with LLVM SVE/SVE2 development. Since there was an
>     interest in looking at the source code during the discussions in
>     the subsequent LLVM SVE/SVE2 sync-up meetings, we are also
>     publishing a public copy of our repository.
>
>     It is available at https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi
>     and will sync with our ongoing development on a weekly basis. Note
>     that this is very much a work in progress and the code in this
>     repository is only for reference purpose. Please see the README
>    
<https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/-/blob/EPI/README.md>
>     file in the repo for details on our approach, design decisions,
>     and limitations.
>
>     We welcome any questions and feedback.
>
>
>     Thanks and Regards,
>     Vineet Kumar -vineet.kumar at bsc.es  <mailto:vineet.kumar at
bsc.es>
>     Barcelona Supercomputing Center - Centro Nacional de Supercomputación
>
>
>     On 2020-07-29 3:10 a.m., Vineet Kumar wrote:
>>     Hi all,
>>
>>     Following up on the discussion in the last meeting about auto-
>>     vectorization for RISC-V Vector extension (scalable vectors) at the
>>     Barcelona Supercomputing Center, here are some additional details.
>>
>>     We have a working prototype for end-to-end compilation targeting
the
>>     RISC-V Vector extension. The auto-vectorizer supports two
strategies to
>>     generate LLVM IR using scalable vectors:
>>
>>     1) Generate a vector loop using VF (vscale x k) = whole vector
register
>>     width, followed by a scalar tail loop.
>>
>>     2) Generate only a vector loop with active vector length controlled
by
>>     the RISC-V `vsetvli` instruction and using Vector Predicated
intrinsics
>>     (https://reviews.llvm.org/D57504). (Of course, intrinsics come with
>>     their own limitations but we feel it serves as a good proof of
concept
>>     for our use case.) We also extend the VPlan to generate
VPInstructions
>>     that are expanded using predicated intrinsics.
>>
>>     We also considered a third hybrid approach of having a vector loop
with
>>     VF = whole register width, followed by a vector tail loop using
>>     predicated intrinsics. For now though, based on project
requirements,
>>     we favoured the second approach.
>>
>>     We have also taken care to not break any fixed-vector
implementation.
>>     All the scalable vector IR gen is guarded by conditions set by TTI.
>>
>>     For shuffles, the most used case is broadcast which is supported by
the
>>     current semantics of `shufflevector` instruction. For other cases
like
>>     reverse, concat, etc., we have defined our own intrinsics.
>>
>>     Current limitaitons:
>>     The cost model for scalable vectors doesn't do much other than
always
>>     decideing to vectorize with VF based on
TargetWidestType/SmallestType.
>>     We also do not support interleaving yet.
>>
>>     Demo:
>>     The current implementation is very much in alpha and eventually,
once
>>     it's more polished and thoroughly verified, we will put out
patches on
>>     Phabricator. Till then, we have set up a Compiler Explorer server
>>     against our development branch to showcase the generated code.
>>
>>     You can see and experiment with the generated LLVM IR and VPlan for
a
>>     set of examples, with predicated vector loop
(`-mprefer-predicate-over-
>>     epilog`) athttps://repo.hca.bsc.es/epic/z/JB4ZoJ
>>     and with a scalar epilog (`-mno-prefer-predicate-over-epilog`) at
>>     https://repo.hca.bsc.es/epic/z/0WoDGt.
>>     Note that you can remove the `-emit-llvm` option to see the
generated
>>     RISC-V assembly.
>>
>>     We welcome any questions and feedback.
>>
>>     Thanks and Regards,
>>     Vineet Kumar -vineet.kumar at bsc.es  <mailto:vineet.kumar at
bsc.es>
>>     Barcelona Supercomputing Center - Centro Nacional de
Supercomputación
>>
>>
>
>
>     WARNING / LEGAL TEXT: This message is intended only for the use of
>     the individual or entity to which it is addressed and may contain
>     information which is privileged, confidential, proprietary, or
>     exempt from disclosure under applicable law. If you are not the
>     intended recipient or the person responsible for delivering the
>     message to the intended recipient, you are strictly prohibited
>     from disclosing, distributing, copying, or in any way using this
>     message. If you have received this communication in error, please
>     notify the sender and destroy and delete any copies you may have
>     received.
>
>     http://www.bsc.es/disclaimer
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

http://bsc.es/disclaimer
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201105/20e88415/attachment-0001.html>

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Nov 2020 - Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

[llvm-dev] Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

[llvm-dev] Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

[llvm-dev] Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

Seemingly Similar Threads