thr3ads.net - llvm dev - [llvm-dev] [RFC] Supporting ARM's SVE in LLVM [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Graham Hunter via llvm-dev

2016-Nov-22 14:49 UTC

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

Hi Renato,

Sorry for the delay in responding. We've been busy rethinking some of our
changes after the feedback we've received thus far (particularly from the
devmeeting). The incremental patches will use our revised design(which should be
less invasive), and I'll be updating our document to match.

On 16/11/2016, 12:46, "Renato Golin" <renato.golin at
linaro.org> wrote:
>  This email is long and hard to read. I'm not surprised no one replied
>  yet. I think your PDF attached is a good start away from the
>  complexity, but we're not going to get far if we try to do things in
>  one step.
>  Based on your repository, the number of changes is so great, and the
>  changes so invasive, that we really should look back at what we need
>  to do, one step at a time, and only perform the refactoring changes
>  that are needed for each step.
We don't intend to do this all in one go; we fully expect that we'll
need to refactor a few times based on community feedback as we incrementally add
support for scalable vectors.
>  > * This is a warts-and-all release of our development tree, with
plenty of TODOs and unfinished experiments
>  > * We haven't posted our clang changes yet
>    
>  I don't mind FIXMEs or TODOs, but I did see a lot of spurious name
>  changes, enum value moves (breaking old binaries) and a lot of new
>  high-level passes (LoopVectorisationAnalysis) which will need a long
>  review on their own before we even start thinking about SVE.
>
>  I recommend you guys separate the refactoring from the implementation
>  and try to upstream the initial and uncontroversial refactorings (name
>  changes, etc), as well as move out the current functionality into new
>  passes, so then you can extend for SVE as a refactoring, not
>  move-and-extend in the same pass.
So our highest priority is getting basic support for SVE into the codebase
(types, codegen, assembler/disassembler, simple vectorization); after that is
in, we'll be happy to discuss our other changes like separating out loop
vectorization legality, controlling loops via predication, or adding search loop
vectorization.
    >  We want to minimise the number of changes, so that we can revert
>  breakages more easily, and have a steady progress, rather than a
>  break-the-world situation.
Same for us. The individual patches will be relatively small, this repo was just
for context if needed when discussing the smaller patches.
    >  Finally, *every* test change needs to be scrutinised and guaranteed to
>  make sense. We really dislike spurious test changes, unless we can
>  prove that the test was unstable to being with, in which case we
>  change it to a better test.
Yep, makes sense.

Thanks,

-Graham

Graham Hunter via llvm-dev

2016-Nov-24 15:39 UTC

head link

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

Hi,

Paul Walker has now uploaded the first set of IR support patches to phabricator,
which use our revised design. We managed to remove the need for new instructions
for basic scalable vectorization in favor of adding two new constant classes;
here's a subset of the revised documentation describing just those
constants:

## *vscale*

### Syntax:
> `vscale`
### Overview:

This complex constant represents the runtime value of `n` for any scalable type
`<n x m x ty>`. This is primarily used to increment induction variables
and
generate offsets.

### Interface:

```cpp
  Constant *VScaleValue::get(Type *Ty);
```

### Example:

The following shows how an induction variable would be incremented for a
scalable vector of type `<n x 4 x i32>`.

```llvm
  %index.next = add nuw nsw i64 %index, mul (i64 vscale, i64 4)
```

## *stepvector*

### Syntax:
> `stepvector`
### Overview:

This complex constant represents the runtime value of a vector of increasing
integers in the arithmetic series:
> `<0, 1, 2, ... num_elements-1>`
This is the basis for a scalable form of vector constants. Adding a splat
changes the effective starting point, and multiplying changes the step. The
main uses for this are:

* Predicate creation using vector compares for fully predicated loops (see also:
  [*propff*](#propff), [*test*](#test)).
* Creating offset vectors for gather/scatter via `getelementptr`.
* Creating masks for `shufflevector`.

For the following loop, a `stepvector` constant would be added to a splat of the
loop induction variable to create the data vector to store:

```cpp
  unsigned a[LIMIT];

  for (unsigned i = 0; i < LIMIT; i++) {
    a[i] = i;
  }
```

### Interface:

```cpp
  Constant *StepVectorValue::get(Type *Ty);
```

### Example:

The following shows the construction of a scalable vector of the form
<start, start-2, start-4, ...>:

```llvm
  %elt = insertelement <n x 4 x i32> undef, i32 %start, i32 0
  %widestart = shufflevector <n x 4 x i32> %elt, <n x 4 x i32>
undef, <n x 4 x i32> zeroinitializer
  %step = insertelement <n x 4 x i32> undef, i32 -2, i32 0
  %widestep = shufflevector <n x 4 x i32> %step, <n x 4 x i32>
undef, <n x 4 x i32> zeroinitializer
  %stridevec = mul <n x 4 x i32> stepvector, %widestep
  %finalvec = add <n x 4 x i32> %widestart, %stridevec
```




Current patch set:
https://reviews.llvm.org/D27101
https://reviews.llvm.org/D27102
https://reviews.llvm.org/D27103
https://reviews.llvm.org/D27105

-Graham



On 22/11/2016, 14:49, "Graham Hunter via llvm-dev" <llvm-dev at
lists.llvm.org> wrote:

    Hi Renato,
    
    Sorry for the delay in responding. We've been busy rethinking some of
our changes after the feedback we've received thus far (particularly from
the devmeeting). The incremental patches will use our revised design(which
should be less invasive), and I'll be updating our document to match.
    
    On 16/11/2016, 12:46, "Renato Golin" <renato.golin at
linaro.org> wrote:
    
    >  This email is long and hard to read. I'm not surprised no one
replied
    >  yet. I think your PDF attached is a good start away from the
    >  complexity, but we're not going to get far if we try to do things
in
    >  one step.
    
    >  Based on your repository, the number of changes is so great, and the
    >  changes so invasive, that we really should look back at what we need
    >  to do, one step at a time, and only perform the refactoring changes
    >  that are needed for each step.
    
    We don't intend to do this all in one go; we fully expect that we'll
need to refactor a few times based on community feedback as we incrementally add
support for scalable vectors.
    
    >  > * This is a warts-and-all release of our development tree, with
plenty of TODOs and unfinished experiments
    >  > * We haven't posted our clang changes yet
    >    
    >  I don't mind FIXMEs or TODOs, but I did see a lot of spurious name
    >  changes, enum value moves (breaking old binaries) and a lot of new
    >  high-level passes (LoopVectorisationAnalysis) which will need a long
    >  review on their own before we even start thinking about SVE.
    >
    >  I recommend you guys separate the refactoring from the implementation
    >  and try to upstream the initial and uncontroversial refactorings (name
    >  changes, etc), as well as move out the current functionality into new
    >  passes, so then you can extend for SVE as a refactoring, not
    >  move-and-extend in the same pass.
    
    So our highest priority is getting basic support for SVE into the codebase
(types, codegen, assembler/disassembler, simple vectorization); after that is
in, we'll be happy to discuss our other changes like separating out loop
vectorization legality, controlling loops via predication, or adding search loop
vectorization.
        
    >  We want to minimise the number of changes, so that we can revert
    >  breakages more easily, and have a steady progress, rather than a
    >  break-the-world situation.
    
    Same for us. The individual patches will be relatively small, this repo was
just for context if needed when discussing the smaller patches.
        
    >  Finally, *every* test change needs to be scrutinised and guaranteed to
    >  make sense. We really dislike spurious test changes, unless we can
    >  prove that the test was unstable to being with, in which case we
    >  change it to a better test.
    
    Yep, makes sense.
    
    Thanks,
    
    -Graham
        
    
    
    _______________________________________________
    LLVM Developers mailing list
    llvm-dev at lists.llvm.org
    http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

James Molloy via llvm-dev

2016-Nov-24 20:49 UTC

head link

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

Hi Graham,

One high level comment without reading the patchset too much - it seems
'vscale' in particular could be just as easy to implement as an
intrinsic,
which would be a less invasive patch.

Is there a reason you didn't go down the intrinsic route?

James
On Thu, 24 Nov 2016 at 15:39, Graham Hunter via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
> Paul Walker has now uploaded the first set of IR support patches to
> phabricator, which use our revised design. We managed to remove the need
> for new instructions for basic scalable vectorization in favor of adding
> two new constant classes; here's a subset of the revised documentation
> describing just those constants:
>
> ## *vscale*
>
> ### Syntax:
>
> > `vscale`
>
> ### Overview:
>
> This complex constant represents the runtime value of `n` for any scalable
> type
> `<n x m x ty>`. This is primarily used to increment induction
variables and
> generate offsets.
>
> ### Interface:
>
> ```cpp
>   Constant *VScaleValue::get(Type *Ty);
> ```
>
> ### Example:
>
> The following shows how an induction variable would be incremented for a
> scalable vector of type `<n x 4 x i32>`.
>
> ```llvm
>   %index.next = add nuw nsw i64 %index, mul (i64 vscale, i64 4)
> ```
>
> ## *stepvector*
>
> ### Syntax:
>
> > `stepvector`
>
> ### Overview:
>
> This complex constant represents the runtime value of a vector of
> increasing
> integers in the arithmetic series:
>
> > `<0, 1, 2, ... num_elements-1>`
>
> This is the basis for a scalable form of vector constants. Adding a splat
> changes the effective starting point, and multiplying changes the step. The
> main uses for this are:
>
> * Predicate creation using vector compares for fully predicated loops (see
> also:
>   [*propff*](#propff), [*test*](#test)).
> * Creating offset vectors for gather/scatter via `getelementptr`.
> * Creating masks for `shufflevector`.
>
> For the following loop, a `stepvector` constant would be added to a splat
> of the
> loop induction variable to create the data vector to store:
>
> ```cpp
>   unsigned a[LIMIT];
>
>   for (unsigned i = 0; i < LIMIT; i++) {
>     a[i] = i;
>   }
> ```
>
> ### Interface:
>
> ```cpp
>   Constant *StepVectorValue::get(Type *Ty);
> ```
>
> ### Example:
>
> The following shows the construction of a scalable vector of the form
> <start, start-2, start-4, ...>:
>
> ```llvm
>   %elt = insertelement <n x 4 x i32> undef, i32 %start, i32 0
>   %widestart = shufflevector <n x 4 x i32> %elt, <n x 4 x i32>
undef, <n x
> 4 x i32> zeroinitializer
>   %step = insertelement <n x 4 x i32> undef, i32 -2, i32 0
>   %widestep = shufflevector <n x 4 x i32> %step, <n x 4 x i32>
undef, <n x
> 4 x i32> zeroinitializer
>   %stridevec = mul <n x 4 x i32> stepvector, %widestep
>   %finalvec = add <n x 4 x i32> %widestart, %stridevec
> ```
>
>
>
>
> Current patch set:
> https://reviews.llvm.org/D27101
> https://reviews.llvm.org/D27102
> https://reviews.llvm.org/D27103
> https://reviews.llvm.org/D27105
>
> -Graham
>
>
>
> On 22/11/2016, 14:49, "Graham Hunter via llvm-dev" <
> llvm-dev at lists.llvm.org> wrote:
>
>     Hi Renato,
>
>     Sorry for the delay in responding. We've been busy rethinking some
of
> our changes after the feedback we've received thus far (particularly
from
> the devmeeting). The incremental patches will use our revised design(which
> should be less invasive), and I'll be updating our document to match.
>
>     On 16/11/2016, 12:46, "Renato Golin" <renato.golin at
linaro.org> wrote:
>
>     >  This email is long and hard to read. I'm not surprised no one
> replied
>     >  yet. I think your PDF attached is a good start away from the
>     >  complexity, but we're not going to get far if we try to do
things in
>     >  one step.
>
>     >  Based on your repository, the number of changes is so great, and
the
>     >  changes so invasive, that we really should look back at what we
need
>     >  to do, one step at a time, and only perform the refactoring
changes
>     >  that are needed for each step.
>
>     We don't intend to do this all in one go; we fully expect that
we'll
> need to refactor a few times based on community feedback as we
> incrementally add support for scalable vectors.
>
>     >  > * This is a warts-and-all release of our development tree,
with
> plenty of TODOs and unfinished experiments
>     >  > * We haven't posted our clang changes yet
>     >
>     >  I don't mind FIXMEs or TODOs, but I did see a lot of spurious
name
>     >  changes, enum value moves (breaking old binaries) and a lot of
new
>     >  high-level passes (LoopVectorisationAnalysis) which will need a
long
>     >  review on their own before we even start thinking about SVE.
>     >
>     >  I recommend you guys separate the refactoring from the
> implementation
>     >  and try to upstream the initial and uncontroversial refactorings
> (name
>     >  changes, etc), as well as move out the current functionality into
> new
>     >  passes, so then you can extend for SVE as a refactoring, not
>     >  move-and-extend in the same pass.
>
>     So our highest priority is getting basic support for SVE into the
> codebase (types, codegen, assembler/disassembler, simple vectorization);
> after that is in, we'll be happy to discuss our other changes like
> separating out loop vectorization legality, controlling loops via
> predication, or adding search loop vectorization.
>
>     >  We want to minimise the number of changes, so that we can revert
>     >  breakages more easily, and have a steady progress, rather than a
>     >  break-the-world situation.
>
>     Same for us. The individual patches will be relatively small, this
> repo was just for context if needed when discussing the smaller patches.
>
>     >  Finally, *every* test change needs to be scrutinised and
guaranteed
> to
>     >  make sense. We really dislike spurious test changes, unless we
can
>     >  prove that the test was unstable to being with, in which case we
>     >  change it to a better test.
>
>     Yep, makes sense.
>
>     Thanks,
>
>     -Graham
>
>
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161124/d5b78471/attachment.html>

Renato Golin via llvm-dev

2016-Nov-25 13:39 UTC

head link

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

Hi Graham,

I'll look into the patches next, but first some questions after
reading the available white papers on the net.

On 24 November 2016 at 15:39, Graham Hunter <Graham.Hunter at arm.com>
wrote:> This complex constant represents the runtime value of `n` for any scalable
type
> `<n x m x ty>`. This is primarily used to increment induction
variables and
> generate offsets.
What do you mean by "complex constant"? Surely not Complex, but this
is not really a constant either.
>From what I read around (and this is why releasing the spec isimportant, because I'm basing my reviews on guess work), is that the
length of a vector is not constant, even on the same machine.

In theory, according to a post in the ARM forums (which now I forget),
the kernel could choose the vector length per process, meaning this is
not known even at link time.

But that's ok, because the SVE instructions completely (I'm guessing,
again) bypass the need for that "constant" to be constant at all, ie,
the use of `incw/incp`. Since you can fail half-way through, the width
that you need to increment to the induction variable is not even known
at run time! Meaning, that's not a constant at all!

Example: a[i] = b[ c[i] ];
  ld1w  z0.s, p0/z, [ c, i, lsl 2 ]
  ld1w  z1.s, p0/z, [ b, z0.s, stxw 2 ]

Now, z0.s load may have failed with seg fault somewhere, and it's up
to the FFR to tell brka/brkb how to deal with this.

Each iteration will have:
  * The same vector length *per process* for accessing c[]
  * A potentially *different* vector length, *per iteration*, for accessing b[]

So, while <n x m x i32> could be constant on some vectors, even at
compile time (if we have a flag that forces certain length), it could
be unknown *per iteration* at run time.

> ```llvm
>   %index.next = add nuw nsw i64 %index, mul (i64 vscale, i64 4)
> ```
Right, this would be translated to:
  incw   x2

Now, the question is, why do we need "mul (i64 vscale, i64 4)" in the
IR?

There is no semantic analysis you can do on a value that can change on
every iteration of the loop. You can't elide, hoist, combine or const
fold.

If I got it right (from random documents on the web), `incX` relates
to a number of "increment induction" functionality. `incw` is probably
"increment W", ie. 32-bits, while `incp` is "increment
predicate", ie.
whatever the size of the predicate you use:

Examples:
  incw  x2          # increments x2 to 4*(FFR successful lanes)
  incp  x2, p0.b  # increments x2 to 1*(FFR successful lanes)

So, this IR semantics is valid for the second case, but irrelevant for
the second. Also, I'm worried that we'll end up ignoring the
multiplier altogether, if we change the vector types (from byte to
word, for example), or make the process of doing so more complex.

> The following shows the construction of a scalable vector of the form
> <start, start-2, start-4, ...>:
>
> ```llvm
>   %elt = insertelement <n x 4 x i32> undef, i32 %start, i32 0
>   %widestart = shufflevector <n x 4 x i32> %elt, <n x 4 x i32>
undef, <n x 4 x i32> zeroinitializer
>   %step = insertelement <n x 4 x i32> undef, i32 -2, i32 0
>   %widestep = shufflevector <n x 4 x i32> %step, <n x 4 x i32>
undef, <n x 4 x i32> zeroinitializer
>   %stridevec = mul <n x 4 x i32> stepvector, %widestep
>   %finalvec = add <n x 4 x i32> %widestart, %stridevec
> ```
This is really fragile and confusing, and I agree with James, an
intrinsic here would be *much* better.

Something like

%const_vec = <n x 4 x i32> @llvm.sve.constant_vector(i32 %start, i32
%step)

cheers,
--renato

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Nov 2016 - [RFC] Supporting ARM's SVE in LLVM

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

Possibly Parallel Threads