thr3ads.net - llvm dev - [llvm-dev] [RFC] Supporting ARM's SVE in LLVM [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Renato Golin via llvm-dev

2016-Nov-27 16:25 UTC

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

On 27 November 2016 at 16:10, Amara Emerson <amara.emerson at gmail.com>
wrote:> No. Let's make one thing clear now: we don't expect the VL to be
> changed on the fly, once the process is started it's fixed. Otherwise
> things like stack frames with SVE objects will be invalid.
This then forbids different lengths on shared objects, which in turn
forces all objects in the same OS to have the same length. Still a
kernel option (like VMA or page tables sizes), but not on a
per-process thing.

Like ARM's ABI, it only makes sense to go that far on public interfaces.

Given how distros are conservative on their deployments, can force
"normal" people using SVE (ie. not super-computers) to either accept
the lowest denominator or to optimise local code better.

Or distros will base that on the hardware default value (reported via
kernel interface) and the deployment will be per-process... or there's
will be multilib.

In any case, SVE is pretty cool, but deployment is likely to be *very*
complicated. Nothing we haven't seen with AArch64, or worse, on ARM,
so...

> 1. The vectorizer will have to deal with loop carried dependences as
> normal, if it doesn't have a guarantee about the VL then it has to
> either avoid vectorizing some loops, or it can cap the effective
> vectorization factor by restricting the loop predicate to a safe
> value.
This should be easily done via the predicate register.

> 2. Yes the cost model is more complicated, but it's not necessarily
> the case that we assume the smallest VL. We can cross that bridge when
> we get to it though.
Ok.

cheers,
--renato

Amara Emerson via llvm-dev

2016-Nov-27 16:44 UTC

head link

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

Libraries do not require the vector length to be encoded inside them,
so there is no restriction here. You don't pick a VL for them, the
point of vector length agnosticism is that the code generated runs
across ALL vector lengths. However, once your process starts then the
VL can be assumed to be constant, whatever it is. You could still
theoretically have two different processes using the same shared
libraries running with different VLs.

On 27 November 2016 at 16:25, Renato Golin <renato.golin at linaro.org>
wrote:> On 27 November 2016 at 16:10, Amara Emerson <amara.emerson at
gmail.com> wrote:
>> No. Let's make one thing clear now: we don't expect the VL to
be
>> changed on the fly, once the process is started it's fixed.
Otherwise
>> things like stack frames with SVE objects will be invalid.
>
> This then forbids different lengths on shared objects, which in turn
> forces all objects in the same OS to have the same length. Still a
> kernel option (like VMA or page tables sizes), but not on a
> per-process thing.
>
> Like ARM's ABI, it only makes sense to go that far on public
interfaces.
>
> Given how distros are conservative on their deployments, can force
> "normal" people using SVE (ie. not super-computers) to either
accept
> the lowest denominator or to optimise local code better.
>
> Or distros will base that on the hardware default value (reported via
> kernel interface) and the deployment will be per-process... or there's
> will be multilib.
>
> In any case, SVE is pretty cool, but deployment is likely to be *very*
> complicated. Nothing we haven't seen with AArch64, or worse, on ARM,
> so...
>
>
>> 1. The vectorizer will have to deal with loop carried dependences as
>> normal, if it doesn't have a guarantee about the VL then it has to
>> either avoid vectorizing some loops, or it can cap the effective
>> vectorization factor by restricting the loop predicate to a safe
>> value.
>
> This should be easily done via the predicate register.
>
>
>> 2. Yes the cost model is more complicated, but it's not necessarily
>> the case that we assume the smallest VL. We can cross that bridge when
>> we get to it though.
>
> Ok.
>
> cheers,
> --renato

Renato Golin via llvm-dev

2016-Nov-27 16:52 UTC

head link

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

On 27 November 2016 at 16:44, Amara Emerson <amara.emerson at gmail.com>
wrote:> Libraries do not require the vector length to be encoded inside them,
> so there is no restriction here. You don't pick a VL for them, the
> point of vector length agnosticism is that the code generated runs
> across ALL vector lengths. However, once your process starts then the
> VL can be assumed to be constant, whatever it is. You could still
> theoretically have two different processes using the same shared
> libraries running with different VLs.
I see. That would only work well if the arguments are passed directly
via SVE vectors. Encoding that in a soft-float variant would be
madness.

But also, it would require one to pass the predicate, as it's possible
that the original vectors are restricted. This also fits nicely with
the data/predicate pairs that are always required for SVE vectors.

A simple way would be to mandate z0/p0, z1/p1 to be paired for PCS
purposes. Interesting...

cheers,
--renato

llvm dev - Nov 2016 - [RFC] Supporting ARM's SVE in LLVM

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM