thr3ads.net - llvm dev - [llvm-dev] [RFC] Supporting ARM's SVE in LLVM [Nov 2016]

If this information is useful, please help other people find it:
Share via:

C Bergström via llvm-dev

2016-Nov-27 15:58 UTC

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

On Sun, Nov 27, 2016 at 11:40 PM, Renato Golin <renato.golin at
linaro.org> wrote:> On 27 November 2016 at 15:35, C Bergström <cbergstrom at
pathscale.com> wrote:
>> While the VL can vary.. in practice wouldn't the cost of
vectorization
>> and width be tied more to the hardware implementation than anything
>> else? The cost of vectorizing thread 1 vs 2 isn't likely to change?
>> (Am I drunk and mistaken?)
>
> Mistaken. :)
>
> The scale of the vector can change between two processes on the same
> machine and it's up to the kernel (I guess) to make sure they're
> correct.
>
> In theory, it could even change in the same process, for instance, as
> a result of PGO or if some loops have less loop-carried dependencies
> than others.
>
> The three important premises are:
>
> 1. The vectorizer still has the duty to restrict the vector length to
> whatever makes it cope with the loop dependencies. SVE *has* to be
> able to cope with that by restricting the number of lanes "per
> access".
>
> 2. The cost analysis will have to assume the smallest possible vector
> size and "hope" that anything larger will only mean profit. This
seems
> straight-forward enough.
>
> 3. Hardware flags and target features must be able to override the
> minimum size, maximum size, etc. and it's up to the users to make sure
> that's meaningful in their hardware.
I'll bite my tongue on negative comments, but it seems that for
anything other than trivial loops this is going to put the burden
entirely on the user. Are you telling me the *kernel* is really going
to be able to make these decisions on the fly, correctly?

Won't this block loop transformations?

Amara Emerson via llvm-dev

2016-Nov-27 16:10 UTC

head link

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

No. Let's make one thing clear now: we don't expect the VL to be
changed on the fly, once the process is started it's fixed. Otherwise
things like stack frames with SVE objects will be invalid.

Responding to Renato's mail earlier:
1. The vectorizer will have to deal with loop carried dependences as
normal, if it doesn't have a guarantee about the VL then it has to
either avoid vectorizing some loops, or it can cap the effective
vectorization factor by restricting the loop predicate to a safe
value.
2. Yes the cost model is more complicated, but it's not necessarily
the case that we assume the smallest VL. We can cross that bridge when
we get to it though.

Amara



On 27 November 2016 at 15:58, C Bergström via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> On Sun, Nov 27, 2016 at 11:40 PM, Renato Golin <renato.golin at
linaro.org> wrote:
>> On 27 November 2016 at 15:35, C Bergström <cbergstrom at
pathscale.com> wrote:
>>> While the VL can vary.. in practice wouldn't the cost of
vectorization
>>> and width be tied more to the hardware implementation than anything
>>> else? The cost of vectorizing thread 1 vs 2 isn't likely to
change?
>>> (Am I drunk and mistaken?)
>>
>> Mistaken. :)
>>
>> The scale of the vector can change between two processes on the same
>> machine and it's up to the kernel (I guess) to make sure
they're
>> correct.
>>
>> In theory, it could even change in the same process, for instance, as
>> a result of PGO or if some loops have less loop-carried dependencies
>> than others.
>>
>> The three important premises are:
>>
>> 1. The vectorizer still has the duty to restrict the vector length to
>> whatever makes it cope with the loop dependencies. SVE *has* to be
>> able to cope with that by restricting the number of lanes "per
>> access".
>>
>> 2. The cost analysis will have to assume the smallest possible vector
>> size and "hope" that anything larger will only mean profit.
This seems
>> straight-forward enough.
>>
>> 3. Hardware flags and target features must be able to override the
>> minimum size, maximum size, etc. and it's up to the users to make
sure
>> that's meaningful in their hardware.
>
> I'll bite my tongue on negative comments, but it seems that for
> anything other than trivial loops this is going to put the burden
> entirely on the user. Are you telling me the *kernel* is really going
> to be able to make these decisions on the fly, correctly?
>
> Won't this block loop transformations?
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Renato Golin via llvm-dev

2016-Nov-27 16:25 UTC

head link

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

On 27 November 2016 at 16:10, Amara Emerson <amara.emerson at gmail.com>
wrote:> No. Let's make one thing clear now: we don't expect the VL to be
> changed on the fly, once the process is started it's fixed. Otherwise
> things like stack frames with SVE objects will be invalid.
This then forbids different lengths on shared objects, which in turn
forces all objects in the same OS to have the same length. Still a
kernel option (like VMA or page tables sizes), but not on a
per-process thing.

Like ARM's ABI, it only makes sense to go that far on public interfaces.

Given how distros are conservative on their deployments, can force
"normal" people using SVE (ie. not super-computers) to either accept
the lowest denominator or to optimise local code better.

Or distros will base that on the hardware default value (reported via
kernel interface) and the deployment will be per-process... or there's
will be multilib.

In any case, SVE is pretty cool, but deployment is likely to be *very*
complicated. Nothing we haven't seen with AArch64, or worse, on ARM,
so...

> 1. The vectorizer will have to deal with loop carried dependences as
> normal, if it doesn't have a guarantee about the VL then it has to
> either avoid vectorizing some loops, or it can cap the effective
> vectorization factor by restricting the loop predicate to a safe
> value.
This should be easily done via the predicate register.

> 2. Yes the cost model is more complicated, but it's not necessarily
> the case that we assume the smallest VL. We can cross that bridge when
> we get to it though.
Ok.

cheers,
--renato

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Nov 2016 - [RFC] Supporting ARM's SVE in LLVM

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

Apparently Analagous Threads