thr3ads.net - llvm dev - [llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR [May 2020]

If this information is useful, please help other people find it:
Share via:

Chris Tetreault via llvm-dev

2020-Mar-09 19:05 UTC

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

Hi,
                I am helping with the effort to implement scalable vectors in
the codebase in order to add support for generating SVE code in the Arm backend.
I would like to propose a refactor of the Type class hierarchy in order to
eliminate issues related to the misuse of SequentialType::getNumElements(). I
would like to introduce a new class FixedVectorType that inherits from
SequentialType and VectorType. VectorType would no longer inherit from
SequentialType, instead directly inheriting from Type. After this change, it
will be statically impossible to accidentally call
SequentialType::getNumElements() via a VectorType pointer.
Background:
                Recently, scalable vectors have been introduced into the
codebase. Previously, vectors have been written <n x ty> in IR, where n is
a fixed number of elements known at compile time, and ty is some type. Scalable
vectors are written <vscale x n x ty> where vscale is a runtime constant
value. A new function has been added to VectorType (defined in
llvm/IR/DerivedTypes.h), getElementCount(), that returns an ElementCount, which
is defined as such in llvm/Support/TypeSize.h:
                class ElementCount {
public:
  unsigned Min;
  bool Scalable;
  ...
}
                Min is the minimum number of elements in the vector (the
"n" in <vscale x n x ty>), and Scalable is true if the vector is
scalable (true for <vscale x n x ty>, false for <n x ty>) The idea
is that if a vector is not scalable, then Min is exactly equal to the number of
vector elements, but if the vector is scalable, then the number of vector
elements is equal to some runtime-constant multiple of Min. The key takeaway
here is that scalable vectors and fixed length vectors need to be treated
differently by the compiler. For a fixed length vector, it is valid to iterate
over the vector elements, but this is impossible for a scalable vector.
Discussion:
The trouble is that all instances of VectorType have getNumElements() inherited
from SequentialType. Prior to the introduction of scalable vectors, this
function was guaranteed to return the number of elements in a vector or array.
Today, there is a comment that documents the fact that this returns only the
minimum number of elements for scalable vectors, however there exists a ton of
code in the codebase that is now misusing getNumElements(). Some examples:
                Auto *V = VectorType::get(Ty,
SomeOtherVec->getNumElements());
                This code was previously perfectly fine but is incorrect for
scalable vectors. When scalable vectors were introduced VectorType::get() was
refactored to take a bool to tell if the vector is scalable. This bool has a
default value of false. In this example, get() is returning a non-scalable
vector even if SomeOtherVec was scalable. This will manifest later in some
unrelated code as a type mismatch between a scalable and fixed length vector.
                for (unsigned I = 0; I < SomeVec->getNumElements(); ++I) {
... }
                Previously, since there was no notion of scalable vectors, this
was perfectly reasonable code. However, for scalable vectors, this is always a
bug.
                With vigilance in code review, and good test coverage we will
eventually find and squash most of these bugs. Unfortunately, code review is
hard, and test coverage isn't perfect. Bugs will continue to slip through as
long as it's easier to do the wrong thing.
                One other factor to consider, is that there is a great deal of
code which deals exclusively with fixed length vectors. Any backend for which
there are no scalable vectors should not need to care about their existence.
Even in Arm, if Neon code is being generated, then the vectors will never be
scalable. In this code, the current status quo is perfectly fine, and any code
related to checking if the vector is scalable is just noise.
Proposal:
                In order to support users who only need fixed width vectors, and
to ensure that nobody can accidentally call getNumElements() on a scalable
vector, I am proposing the introduction of a new FixedVectorType which inherits
from both VectorType and SequentialType. In turn, VectorType will no longer
inherit from SequentialType. An example of what this will look like, with some
misc. functions omitted for clarity:
class VectorType : public Type {
public:
  static VectorType *get(Type *ElementType, ElementCount EC);

  Type *getElementType() const;
  ElementCount getElementCount() const;
  bool isScalable() const;
};

class FixedVectorType : public VectorType, public SequentialType {
public:
  static FixedVectorType *get(Type *ElementType, unsigned NumElts);
};

class SequentialType : public CompositeType {
public:
  uint64_t getNumElements() const { return NumElements; }
};
                In this proposed architecture, VectorType does not have a
getNumElements() function because it does not inherit from SequentialType. In
generic code, users will call VectorType::get() to obtain a new instance of
VectorType just as they always have. VectorType implements the safe subset of
functionality of fixed and scalable vectors that is suitable for use anywhere.
If the user passes false to the scalable parameter of get(), they will get an
instance of FixedVectorType back. Users can then inspect its type and cast it to
FixedVectorType using the usual mechanisms. In code that deals exclusively in
fixed length vectors, the user can call FixedVectorType::get() to directly get
an instance of FixedVectorType, and their code can remain largely unchanged from
how it was prior to the introduction of scalable vectors. At this time, there
exists no use case that is only valid for scalable vectors, so no
ScalableVectorType is being added.
                With this change, in generic code it is now impossible to
accidentally call getNumElements() on a scalable vector. If a user tries to pass
a scalable vector to a function that expects a fixed length vector, they will
encounter a compilation failure at the site of the bug, rather than a runtime
error in some unrelated code. If a user attempts to cast a scalable vector to
FixedVectorType, the cast will fail at the call site. This will make it easier
to track down all the places that are currently incorrect, and will prevent
future developers from introducing bugs by misusing getNumElements().
Outstanding design choice:
                One issue with this architecture as proposed is the fact that
SequentialType (by way of CompositeType) inherits from Type. This introduces a
diamond inheritance in FixedVectorType. Unfortunately, llvm::cast uses a c-style
cast internally, so we cannot use virtual inheritance to resolve this issue.
Thus, we have a few options:

  1.  Break CompositeType's inheritance on Type and introduce functions to
convert from a Type to a CompositeType and vice versa. The conversion from
CompositeType is always safe because all instances of CompositeType (StructType,
ArrayType, and FixedVectorType) are instances of Type. A CompositeType can be
cast to the most derived class, then back to Type. The other way is not always
safe, so a function will need to be added to check if a Type is an instance of
CompositeType. This change is not that big, and I have a prototype
implementation up at https://reviews.llvm.org/D75486 ([SVE] Make CompositeType
not inherit from Type)
     *   Pros: this approach would result in minimal changes to the codebase. If
the llvm casts can be made to work for the conversion functions, then it would
touch very few files.
     *   Cons: There are those who think that CompositeType adds little value
and should be removed. Now would be an ideal time to do this. Additionally, the
conversion functions would be more complicated if we left CompositeType in.
  2.  Remove CompositeType and break SequentialType's inheritance of Type.
Add functions to convert a SequentialType to and from Type. The conversion
functions would work the same as those in option 1 above. Currently, there
exists only one class that derives directly from CompositeType: StructType. The
functionality of CompositeType can be directly moved into StructType, and APIs
that use CompositeType can directly use Type and cast appropriately. We feel
that this would be a fairly simple change, and we have a prototype
implementation up at https://reviews.llvm.org/D75660 (Remove CompositeType
class)
     *   Pros: Removing CompositeType would simplify the type hierarchy. Leaving
SequentialType in would simplify some code and be more typesafe than having a
getSequentialNumElements on Type.
     *   Cons: The value of SequentialType has also been called into question.
If we wanted to remove it, now would be a good time. Conversion functions add
complexity to the design. Introduces additional casting from Type.
  3.  Remove CompositeType and SequentialType. Roll the functions directly into
the most derived classes. A helper function can be added to Type to handle
choosing from FixedVectorType and ArrayType and calling getNumElements():
static unsigned getSequentialNumElements() {
  assert(isSequentialType()); // This already exists and does the
                              // right thing
  if (auto *AT = dyn_cast<ArrayType>(this))
    return AT->getNumElements();
  return cast<FixedVectorType>(this)->getNumElements();
}
A prototype implementation of this strategy can be found at
https://reviews.llvm.org/D75661 (Remove SequentialType from the type heirarchy.)

     *   Pros: By removing the multiple inheritance completely, we greatly
simplify the design and eliminate the need for any conversion functions. The
value of CompositeType and SequentialType has been called into question, and
removing them now might be of benefit to the codebase
     *   Cons: getSequentialNumElements() has similar issues to those that we
are trying to solve in the first place and potentially subverts the whole
design. Omitting getSequentialNumElements() would add lots of code duplication.
Introduces additional casting from Type.
I believe that all three of these options are reasonable. My personal preference
is currently option 2. I think that option 3's getSequentialNumElements()
subverts the design because every Type has getSequentialNumElements(), it is
tempting to just call it. However, the cast will fail at the call site in debug,
and in release it will return a garbage value rather than a value that works
most of the time. For option 1, the existence of CompositeType complicates the
conversion logic for little benefit.
Conclusion:
                Thank you for your time in reviewing this RFC. Your feedback on
my work is greatly appreciated.

Thank you,
                Christopher Tetreault

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200309/6a045a63/attachment.html>

Nicolai Hähnle via llvm-dev

2020-Mar-09 21:09 UTC

head link

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

Hi Chris,

Guarding against future bugs through type-safety is a welcome
initiative, thank you!

On Mon, Mar 9, 2020 at 8:05 PM Chris Tetreault via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Proposal:
>
>                 In order to support users who only need fixed width
vectors, and to ensure that nobody can accidentally call getNumElements() on a
scalable vector, I am proposing the introduction of a new FixedVectorType which
inherits from both VectorType and SequentialType. In turn, VectorType will no
longer inherit from SequentialType. An example of what this will look like, with
some misc. functions omitted for clarity:
>
> class VectorType : public Type {
> public:
>   static VectorType *get(Type *ElementType, ElementCount EC);
>   Type *getElementType() const;
>   ElementCount getElementCount() const;
>   bool isScalable() const;
> };
>
> class FixedVectorType : public VectorType, public SequentialType {
> public:
>   static FixedVectorType *get(Type *ElementType, unsigned NumElts);
> };
>
> class SequentialType : public CompositeType {
> public:
>   uint64_t getNumElements() const { return NumElements; }
> };
[snip]> Outstanding design choice:
>
>                 One issue with this architecture as proposed is the fact
that SequentialType (by way of CompositeType) inherits from Type. This
introduces a diamond inheritance in FixedVectorType. Unfortunately, llvm::cast
uses a c-style cast internally, so we cannot use virtual inheritance to resolve
this issue. Thus, we have a few options:
[snip]> 2. Remove CompositeType and break SequentialType’s inheritance of Type. Add
functions to convert a SequentialType to and from Type. The conversion functions
would work the same as those in option 1 above. Currently, there exists only one
class that derives directly from CompositeType: StructType. The functionality of
CompositeType can be directly moved into StructType, and APIs that use
CompositeType can directly use Type and cast appropriately. We feel that this
would be a fairly simple change, and we have a prototype implementation up at
https://reviews.llvm.org/D75660 (Remove CompositeType class)
>
> Pros: Removing CompositeType would simplify the type hierarchy. Leaving
SequentialType in would simplify some code and be more typesafe than having a
getSequentialNumElements on Type.
> Cons: The value of SequentialType has also been called into question. If we
wanted to remove it, now would be a good time. Conversion functions add
complexity to the design. Introduces additional casting from Type.
>
> 3. Remove CompositeType and SequentialType. Roll the functions directly
into the most derived classes. A helper function can be added to Type to handle
choosing from FixedVectorType and ArrayType and calling getNumElements():
>
> static unsigned getSequentialNumElements() {
>   assert(isSequentialType()); // This already exists and does the
>                               // right thing
>   if (auto *AT = dyn_cast<ArrayType>(this))
>     return AT->getNumElements();
>   return cast<FixedVectorType>(this)->getNumElements();
> }
>
> A prototype implementation of this strategy can be found at
https://reviews.llvm.org/D75661 (Remove SequentialType from the type heirarchy.)
>
> Pros: By removing the multiple inheritance completely, we greatly simplify
the design and eliminate the need for any conversion functions. The value of
CompositeType and SequentialType has been called into question, and removing
them now might be of benefit to the codebase
> Cons: getSequentialNumElements() has similar issues to those that we are
trying to solve in the first place and potentially subverts the whole design.
Omitting getSequentialNumElements() would add lots of code duplication.
Introduces additional casting from Type.
Are the issues of getSequentialNumElements() really bigger than those
of cast<SequentialType>(foo)->getNumElements()?

FWIW, the removal of CompositeType is small enough that I'm between
option 2 and 3, personally.

Cheers,
Nicolai

-- 
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.

Sander De Smalen via llvm-dev

2020-Mar-11 21:43 UTC

head link

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

Hi Chris,

Thanks for writing this up! I strongly support the proposal to add a
FixedVectorType class to distinguish that type from the more generic (possibly
scalable) VectorType. By having the code-base operate on
'FixedVectorType' rather than 'VectorType', we can gradually
work to upgrade the code-base to support scalable vectors. This avoids bugs and
it seems right conceptually.

On these three options, my first thought was "can we start by breaking
FixedVectorType away from SequentialType" (adding a separate
'getNumElements()' method to FixedVectorType), until I figured this
wouldn't be that much different from D75661. SequentialType will at that
point be a pointless layer on top of ArrayType, so they could be squashed. It
would however leave ArrayType and StructType as two independent types under
CompositeType (is this an option 4?)

I'd be in favour of removing SequentialType and CompositeType altogether.
They seem little used in practice and the places where they are used seem like
they can be relatively easily updated to distinguish ArrayType, StructType and
FixedVectorType separately.

Even when it requires some code duplication, I prefer being more explicit in
distinguishing these types (option 3), over adding conversion functions between
Type and Composite/SequentialType (options 1 and 2), especially when the
conversion may not always be safe. I'm concerned that requiring conversion
functions makes the code less readable, like this example from D65486:

  if (auto *STy = dyn_cast_or_null<llvm::SequentialType>(
                          llvm::CompositeType::get(OrigTy, false)))

Here the use of CompositeType::get(Type*) in the context of
dyn_cast_or_null<llvm::SequentialType> seems a bit obscure to me.

If we are to choose for option 3, I'd suggest removing the interface to
'Type::getSequentialNumElements()' entirely, and replacing it by
explicit `Type::getFixedVectorNumElements()` and `Type::getArrayNumElements()`,
thus removing any methods that mimic the old design.

Thanks,

Sander
> On 9 Mar 2020, at 19:05, Chris Tetreault via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
> 
>                 I am helping with the effort to implement scalable vectors
in the codebase in order to add support for generating SVE code in the Arm
backend. I would like to propose a refactor of the Type class hierarchy in order
to eliminate issues related to the misuse of SequentialType::getNumElements(). I
would like to introduce a new class FixedVectorType that inherits from
SequentialType and VectorType. VectorType would no longer inherit from
SequentialType, instead directly inheriting from Type. After this change, it
will be statically impossible to accidentally call
SequentialType::getNumElements() via a VectorType pointer.
> 
> Background:
> 
>                 Recently, scalable vectors have been introduced into the
codebase. Previously, vectors have been written <n x ty> in IR, where n is
a fixed number of elements known at compile time, and ty is some type. Scalable
vectors are written <vscale x n x ty> where vscale is a runtime constant
value. A new function has been added to VectorType (defined in
llvm/IR/DerivedTypes.h), getElementCount(), that returns an ElementCount, which
is defined as such in llvm/Support/TypeSize.h:
> 
>                 class ElementCount {
> public:
>   unsigned Min;
>   bool Scalable;
>   …
> }
> 
>                 Min is the minimum number of elements in the vector (the
“n” in <vscale x n x ty>), and Scalable is true if the vector is scalable
(true for <vscale x n x ty>, false for <n x ty>) The idea is that if
a vector is not scalable, then Min is exactly equal to the number of vector
elements, but if the vector is scalable, then the number of vector elements is
equal to some runtime-constant multiple of Min. The key takeaway here is that
scalable vectors and fixed length vectors need to be treated differently by the
compiler. For a fixed length vector, it is valid to iterate over the vector
elements, but this is impossible for a scalable vector.
> 
> Discussion:
> 
> The trouble is that all instances of VectorType have getNumElements()
inherited from SequentialType. Prior to the introduction of scalable vectors,
this function was guaranteed to return the number of elements in a vector or
array. Today, there is a comment that documents the fact that this returns only
the minimum number of elements for scalable vectors, however there exists a ton
of code in the codebase that is now misusing getNumElements(). Some examples:
> 
>                 Auto *V = VectorType::get(Ty,
SomeOtherVec->getNumElements());
> 
>                 This code was previously perfectly fine but is incorrect
for scalable vectors. When scalable vectors were introduced VectorType::get()
was refactored to take a bool to tell if the vector is scalable. This bool has a
default value of false. In this example, get() is returning a non-scalable
vector even if SomeOtherVec was scalable. This will manifest later in some
unrelated code as a type mismatch between a scalable and fixed length vector.
> 
>                 for (unsigned I = 0; I < SomeVec->getNumElements();
++I) { … }
> 
>                 Previously, since there was no notion of scalable vectors,
this was perfectly reasonable code. However, for scalable vectors, this is
always a bug.
> 
>                 With vigilance in code review, and good test coverage we
will eventually find and squash most of these bugs. Unfortunately, code review
is hard, and test coverage isn’t perfect. Bugs will continue to slip through as
long as it’s easier to do the wrong thing.
> 
>                 One other factor to consider, is that there is a great deal
of code which deals exclusively with fixed length vectors. Any backend for which
there are no scalable vectors should not need to care about their existence.
Even in Arm, if Neon code is being generated, then the vectors will never be
scalable. In this code, the current status quo is perfectly fine, and any code
related to checking if the vector is scalable is just noise.
> 
> Proposal:
> 
>                 In order to support users who only need fixed width
vectors, and to ensure that nobody can accidentally call getNumElements() on a
scalable vector, I am proposing the introduction of a new FixedVectorType which
inherits from both VectorType and SequentialType. In turn, VectorType will no
longer inherit fromSequentialType. An example of what this will look like, with
some misc. functions omitted for clarity:
> 
> class VectorType : public Type {
> public:
>   static VectorType *get(Type *ElementType, ElementCount EC);
>  
>   Type *getElementType() const;
>   ElementCount getElementCount() const;
>   bool isScalable() const;
> };
>  
> class FixedVectorType : public VectorType, public SequentialType {
> public:
>   static FixedVectorType *get(Type *ElementType, unsigned NumElts);
> };
>  
> class SequentialType : public CompositeType {
> public:
>   uint64_t getNumElements() const { return NumElements; }
> };
> 
>                 In this proposed architecture, VectorType does not have a
getNumElements() function because it does not inherit from SequentialType. In
generic code, users will call VectorType::get() to obtain a new instance of
VectorType just as they always have. VectorType implements the safe subset of
functionality of fixed and scalable vectors that is suitable for use anywhere.
If the user passes false to the scalable parameter of get(), they will get an
instance ofFixedVectorType back. Users can then inspect its type and cast it to
FixedVectorType using the usual mechanisms. In code that deals exclusively in
fixed length vectors, the user can call FixedVectorType::get() to directly get
an instance of FixedVectorType, and their code can remain largely unchanged from
how it was prior to the introduction of scalable vectors. At this time, there
exists no use case that is only valid for scalable vectors, so no
ScalableVectorType is being added.
> 
>                 With this change, in generic code it is now impossible to
accidentally call getNumElements() on a scalable vector. If a user tries to pass
a scalable vector to a function that expects a fixed length vector, they will
encounter a compilation failure at the site of the bug, rather than a runtime
error in some unrelated code. If a user attempts to cast a scalable vector to
FixedVectorType, the cast will fail at the call site. This will make it easier
to track down all the places that are currently incorrect, and will prevent
future developers from introducing bugs by misusing getNumElements().
> 
> Outstanding design choice:
> 
>                 One issue with this architecture as proposed is the fact
that SequentialType (by way of CompositeType) inherits from Type. This
introduces a diamond inheritance in FixedVectorType. Unfortunately, llvm::cast
uses a c-style cast internally, so we cannot use virtual inheritance to resolve
this issue. Thus, we have a few options:
> 
> 	• Break CompositeType’s inheritance on Type and introduce functions to
convert from a Type to a CompositeType and vice versa. The conversion from
CompositeType is always safe because all instances of CompositeType (StructType,
ArrayType, and FixedVectorType) are instances of Type. A CompositeType can be
cast to the most derived class, then back to Type. The other way is not always
safe, so a function will need to be added to check if aType is an instance of
CompositeType. This change is not that big, and I have a prototype
implementation up at https://reviews.llvm.org/D75486 ([SVE] Make CompositeType
not inherit from Type)
> 		• Pros: this approach would result in minimal changes to the codebase. If
the llvm casts can be made to work for the conversion functions, then it would
touch very few files.
> 		• Cons: There are those who think that CompositeType adds little value
and should be removed. Now would be an ideal time to do this. Additionally, the
conversion functions would be more complicated if we left CompositeType in.
> 	• Remove CompositeType and break SequentialType’s inheritance of Type. Add
functions to convert a SequentialType to and from Type. The conversion functions
would work the same as those in option 1 above. Currently, there exists only one
class that derives directly from CompositeType: StructType. The functionality of
CompositeType can be directly moved into StructType, and APIs that use
CompositeType can directly use Type and cast appropriately. We feel that this
would be a fairly simple change, and we have a prototype implementation up at
https://reviews.llvm.org/D75660 (Remove CompositeType class)
> 		• Pros: Removing CompositeType would simplify the type hierarchy. Leaving
SequentialType in would simplify some code and be more typesafe than having a
getSequentialNumElements on Type.
> 		• Cons: The value of SequentialType has also been called into question.
If we wanted to remove it, now would be a good time. Conversion functions add
complexity to the design. Introduces additional casting from Type.
> 	• Remove CompositeType and SequentialType. Roll the functions directly
into the most derived classes. A helper function can be added to Type to handle
choosing from FixedVectorType and ArrayType and calling getNumElements():
> static unsigned getSequentialNumElements() {
>   assert(isSequentialType()); // This already exists and does the
>                               // right thing
>   if (auto *AT = dyn_cast<ArrayType>(this))
>     return AT->getNumElements();
>   return cast<FixedVectorType>(this)->getNumElements();
> }
> 
> A prototype implementation of this strategy can be found at
https://reviews.llvm.org/D75661 (Remove SequentialType from the type heirarchy.)
> 
> 		• Pros: By removing the multiple inheritance completely, we greatly
simplify the design and eliminate the need for any conversion functions. The
value ofCompositeType and SequentialType has been called into question, and
removing them now might be of benefit to the codebase
> 		• Cons: getSequentialNumElements() has similar issues to those that we
are trying to solve in the first place and potentially subverts the whole
design. Omitting getSequentialNumElements() would add lots of code duplication.
Introduces additional casting from Type.
> I believe that all three of these options are reasonable. My personal
preference is currently option 2. I think that option 3’s
getSequentialNumElements()subverts the design because every Type has
getSequentialNumElements(), it is tempting to just call it. However, the cast
will fail at the call site in debug, and in release it will return a garbage
value rather than a value that works most of the time. For option 1, the
existence of CompositeType complicates the conversion logic for little benefit.
> 
> Conclusion:
> 
>                 Thank you for your time in reviewing this RFC. Your
feedback on my work is greatly appreciated.
> 
>  
> 
> Thank you,
> 
>                 Christopher Tetreault
> 
>  
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

David Greene via llvm-dev

2020-Mar-12 16:13 UTC

head link

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

Sander De Smalen via llvm-dev <llvm-dev at lists.llvm.org> writes:
> Even when it requires some code duplication, I prefer being more
> explicit in distinguishing these types (option 3), over adding
> conversion functions between Type and Composite/SequentialType
> (options 1 and 2), especially when the conversion may not always be
> safe. I'm concerned that requiring conversion functions makes the code
> less readable, like this example from D65486:
>
>   if (auto *STy = dyn_cast_or_null<llvm::SequentialType>(
>                           llvm::CompositeType::get(OrigTy, false)))
>
> Here the use of CompositeType::get(Type*) in the context of
> dyn_cast_or_null<llvm::SequentialType> seems a bit obscure to me.
>
> If we are to choose for option 3, I'd suggest removing the interface
> to 'Type::getSequentialNumElements()' entirely, and replacing it by
> explicit `Type::getFixedVectorNumElements()` and
> `Type::getArrayNumElements()`, thus removing any methods that mimic
> the old design.
+1.

                      -David

Chris Tetreault via llvm-dev

2020-Mar-12 16:50 UTC

head link

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

Sander,

   Thank you for your reply, allow me to address some of your points:


* Regarding the conversion functions

   We discussed it internally and our conclusion was that my
CompositeType::get() and CompositeType::is() might be unpalatable to the
community. We think it might be possible to specialize the casting templates
such that cast(), dyn_cast(), and isa() work. Code like

  if (auto *STy = dyn_cast_or_null<llvm::SequentialType>(
                          llvm::CompositeType::get(OrigTy, false)))

... represents a more egregious case of this. But if I can get cast working,
this will become

if (auto *Sty = dyn_cast<llvm::SequentialType>(OrigTy))

... which is much nicer. If I we can make this work, then the conversions will
be just as safe as they ever were. Unfortunately, accomplishing this requires
writing some pretty painful template code, and it's not really documented.
The cast documentation calls out clang::Decl and clang::DeclContext as an
example to emulate, but provides no further guidance. I suppose this might be a
good exercise. Alternatively, I could add a SequentialType::get() and
SequentialType::is(), and bypass the dyn_cast_or_null() call in your example.
It's just a bunch of boilerplate I didn't want to do for potentially
throw-away prototype code.


* Regarding just breaking FixedVectorType away from SequentialType, but leaving
ArrayType a subclass

    I think this is not a good option. We will still have to rewrite all code
that is generic over FixedVectorType and ArrayType, so we gain nothing, and the
amount of work is likely the same, in addition to the drawbacks that you
mentioned.


* Regarding using option 3 without getSequentialNumElements()

   This will result in a bunch of code that looks like this:

if (auto *ArrTy = dyn_cast<ArrayType>(Ty))
   doSomething(ArrTy->getNumElements(), Foo);
else
   doSomething(cast<FixedVectorType>(Ty)->getNumElements(), Foo);

   I count 8 places in https://reviews.llvm.org/D75661 where we call
getSequentialNumElements(). 8 isn't _that many_ places, but it's enough
to be annoying. In the resulting branches, it would be doing literally the same
thing; it just screams code duplication. I think I may have been a bit
melodramatic about claiming it "subverts the design." Realistically,
the implementation of getSequentialNumElements() never tries to cast to
VectorType, only ArrayType and FixedVectorType, so it will assert or return a
garbage value at runtime. It also only calls getNumElements(), so it won't
work on a scalable vector. I suppose the implementation of cast uses a c-style
cast, which will eventually resort to a reinterpret_cast, so it may happen that
the data layout of a FixedVectorType and VectorType are such that the
VectorType's ElementCount::Min and FixedVectorType::NumElements are at the
same offset. I don't think we should defensively handle this situation;
either we accept that UB exists, or we reject the idea of
getSequentialNumElements(). However, I assume enough people develop with asserts
enabled where this won't be an issue.

   My personal preference is that we keep getSequentialNumElements() if we
choose to go with option 3.

Thanks,
   Christopher Tetreault

-----Original Message-----
From: Sander De Smalen <Sander.DeSmalen at arm.com>
Sent: Wednesday, March 11, 2020 2:44 PM
To: Chris Tetreault <ctetreau at quicinc.com>
Cc: llvm-dev at lists.llvm.org
Subject: [EXT] Re: [llvm-dev] [RFC] Refactor class hierarchy of VectorType in
the IR

Hi Chris,

Thanks for writing this up! I strongly support the proposal to add a
FixedVectorType class to distinguish that type from the more generic (possibly
scalable) VectorType. By having the code-base operate on
'FixedVectorType' rather than 'VectorType', we can gradually
work to upgrade the code-base to support scalable vectors. This avoids bugs and
it seems right conceptually.

On these three options, my first thought was "can we start by breaking
FixedVectorType away from SequentialType" (adding a separate
'getNumElements()' method to FixedVectorType), until I figured this
wouldn't be that much different from D75661. SequentialType will at that
point be a pointless layer on top of ArrayType, so they could be squashed. It
would however leave ArrayType and StructType as two independent types under
CompositeType (is this an option 4?)

I'd be in favour of removing SequentialType and CompositeType altogether.
They seem little used in practice and the places where they are used seem like
they can be relatively easily updated to distinguish ArrayType, StructType and
FixedVectorType separately.

Even when it requires some code duplication, I prefer being more explicit in
distinguishing these types (option 3), over adding conversion functions between
Type and Composite/SequentialType (options 1 and 2), especially when the
conversion may not always be safe. I'm concerned that requiring conversion
functions makes the code less readable, like this example from D65486:

  if (auto *STy = dyn_cast_or_null<llvm::SequentialType>(
                          llvm::CompositeType::get(OrigTy, false)))

Here the use of CompositeType::get(Type*) in the context of
dyn_cast_or_null<llvm::SequentialType> seems a bit obscure to me.

If we are to choose for option 3, I'd suggest removing the interface to
'Type::getSequentialNumElements()' entirely, and replacing it by
explicit `Type::getFixedVectorNumElements()` and `Type::getArrayNumElements()`,
thus removing any methods that mimic the old design.

Thanks,

Sander
> On 9 Mar 2020, at 19:05, Chris Tetreault via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>
> Hi,
>
>                 I am helping with the effort to implement scalable vectors
in the codebase in order to add support for generating SVE code in the Arm
backend. I would like to propose a refactor of the Type class hierarchy in order
to eliminate issues related to the misuse of SequentialType::getNumElements(). I
would like to introduce a new class FixedVectorType that inherits from
SequentialType and VectorType. VectorType would no longer inherit from
SequentialType, instead directly inheriting from Type. After this change, it
will be statically impossible to accidentally call
SequentialType::getNumElements() via a VectorType pointer.
>
> Background:
>
>                 Recently, scalable vectors have been introduced into the
codebase. Previously, vectors have been written <n x ty> in IR, where n is
a fixed number of elements known at compile time, and ty is some type. Scalable
vectors are written <vscale x n x ty> where vscale is a runtime constant
value. A new function has been added to VectorType (defined in
llvm/IR/DerivedTypes.h), getElementCount(), that returns an ElementCount, which
is defined as such in llvm/Support/TypeSize.h:
>
>                 class ElementCount {
> public:
>   unsigned Min;
>   bool Scalable;
>   …
> }
>
>                 Min is the minimum number of elements in the vector (the
“n” in <vscale x n x ty>), and Scalable is true if the vector is scalable
(true for <vscale x n x ty>, false for <n x ty>) The idea is that if
a vector is not scalable, then Min is exactly equal to the number of vector
elements, but if the vector is scalable, then the number of vector elements is
equal to some runtime-constant multiple of Min. The key takeaway here is that
scalable vectors and fixed length vectors need to be treated differently by the
compiler. For a fixed length vector, it is valid to iterate over the vector
elements, but this is impossible for a scalable vector.
>
> Discussion:
>
> The trouble is that all instances of VectorType have getNumElements()
inherited from SequentialType. Prior to the introduction of scalable vectors,
this function was guaranteed to return the number of elements in a vector or
array. Today, there is a comment that documents the fact that this returns only
the minimum number of elements for scalable vectors, however there exists a ton
of code in the codebase that is now misusing getNumElements(). Some examples:
>
>                 Auto *V = VectorType::get(Ty,
> SomeOtherVec->getNumElements());
>
>                 This code was previously perfectly fine but is incorrect
for scalable vectors. When scalable vectors were introduced VectorType::get()
was refactored to take a bool to tell if the vector is scalable. This bool has a
default value of false. In this example, get() is returning a non-scalable
vector even if SomeOtherVec was scalable. This will manifest later in some
unrelated code as a type mismatch between a scalable and fixed length vector.
>
>                 for (unsigned I = 0; I < SomeVec->getNumElements();
> ++I) { … }
>
>                 Previously, since there was no notion of scalable vectors,
this was perfectly reasonable code. However, for scalable vectors, this is
always a bug.
>
>                 With vigilance in code review, and good test coverage we
will eventually find and squash most of these bugs. Unfortunately, code review
is hard, and test coverage isn’t perfect. Bugs will continue to slip through as
long as it’s easier to do the wrong thing.
>
>                 One other factor to consider, is that there is a great deal
of code which deals exclusively with fixed length vectors. Any backend for which
there are no scalable vectors should not need to care about their existence.
Even in Arm, if Neon code is being generated, then the vectors will never be
scalable. In this code, the current status quo is perfectly fine, and any code
related to checking if the vector is scalable is just noise.
>
> Proposal:
>
>                 In order to support users who only need fixed width
vectors, and to ensure that nobody can accidentally call getNumElements() on a
scalable vector, I am proposing the introduction of a new FixedVectorType which
inherits from both VectorType and SequentialType. In turn, VectorType will no
longer inherit fromSequentialType. An example of what this will look like, with
some misc. functions omitted for clarity:
>
> class VectorType : public Type {
> public:
>   static VectorType *get(Type *ElementType, ElementCount EC);
>
>   Type *getElementType() const;
>   ElementCount getElementCount() const;
>   bool isScalable() const;
> };
>
> class FixedVectorType : public VectorType, public SequentialType {
> public:
>   static FixedVectorType *get(Type *ElementType, unsigned NumElts); };
>
> class SequentialType : public CompositeType {
> public:
>   uint64_t getNumElements() const { return NumElements; } };
>
>                 In this proposed architecture, VectorType does not have a
getNumElements() function because it does not inherit from SequentialType. In
generic code, users will call VectorType::get() to obtain a new instance of
VectorType just as they always have. VectorType implements the safe subset of
functionality of fixed and scalable vectors that is suitable for use anywhere.
If the user passes false to the scalable parameter of get(), they will get an
instance ofFixedVectorType back. Users can then inspect its type and cast it to
FixedVectorType using the usual mechanisms. In code that deals exclusively in
fixed length vectors, the user can call FixedVectorType::get() to directly get
an instance of FixedVectorType, and their code can remain largely unchanged from
how it was prior to the introduction of scalable vectors. At this time, there
exists no use case that is only valid for scalable vectors, so no
ScalableVectorType is being added.
>
>                 With this change, in generic code it is now impossible to
accidentally call getNumElements() on a scalable vector. If a user tries to pass
a scalable vector to a function that expects a fixed length vector, they will
encounter a compilation failure at the site of the bug, rather than a runtime
error in some unrelated code. If a user attempts to cast a scalable vector to
FixedVectorType, the cast will fail at the call site. This will make it easier
to track down all the places that are currently incorrect, and will prevent
future developers from introducing bugs by misusing getNumElements().
>
> Outstanding design choice:
>
>                 One issue with this architecture as proposed is the fact
that SequentialType (by way of CompositeType) inherits from Type. This
introduces a diamond inheritance in FixedVectorType. Unfortunately, llvm::cast
uses a c-style cast internally, so we cannot use virtual inheritance to resolve
this issue. Thus, we have a few options:
>
> • Break CompositeType’s inheritance on Type and introduce functions to
convert from a Type to a CompositeType and vice versa. The conversion from
CompositeType is always safe because all instances of CompositeType (StructType,
ArrayType, and FixedVectorType) are instances of Type. A CompositeType can be
cast to the most derived class, then back to Type. The other way is not always
safe, so a function will need to be added to check if aType is an instance of
CompositeType. This change is not that big, and I have a prototype
implementation up at https://reviews.llvm.org/D75486 ([SVE] Make CompositeType
not inherit from Type)
> • Pros: this approach would result in minimal changes to the codebase. If
the llvm casts can be made to work for the conversion functions, then it would
touch very few files.
> • Cons: There are those who think that CompositeType adds little value and
should be removed. Now would be an ideal time to do this. Additionally, the
conversion functions would be more complicated if we left CompositeType in.
> • Remove CompositeType and break SequentialType’s inheritance of Type. Add
functions to convert a SequentialType to and from Type. The conversion functions
would work the same as those in option 1 above. Currently, there exists only one
class that derives directly from CompositeType: StructType. The functionality of
CompositeType can be directly moved into StructType, and APIs that use
CompositeType can directly use Type and cast appropriately. We feel that this
would be a fairly simple change, and we have a prototype implementation up at
https://reviews.llvm.org/D75660 (Remove CompositeType class)
> • Pros: Removing CompositeType would simplify the type hierarchy. Leaving
SequentialType in would simplify some code and be more typesafe than having a
getSequentialNumElements on Type.
> • Cons: The value of SequentialType has also been called into question. If
we wanted to remove it, now would be a good time. Conversion functions add
complexity to the design. Introduces additional casting from Type.
> • Remove CompositeType and SequentialType. Roll the functions directly into
the most derived classes. A helper function can be added to Type to handle
choosing from FixedVectorType and ArrayType and calling getNumElements():
> static unsigned getSequentialNumElements() {
>   assert(isSequentialType()); // This already exists and does the
>                               // right thing
>   if (auto *AT = dyn_cast<ArrayType>(this))
>     return AT->getNumElements();
>   return cast<FixedVectorType>(this)->getNumElements();
> }
>
> A prototype implementation of this strategy can be found at
> https://reviews.llvm.org/D75661 (Remove SequentialType from the type
> heirarchy.)
>
> • Pros: By removing the multiple inheritance completely, we greatly
simplify the design and eliminate the need for any conversion functions. The
value ofCompositeType and SequentialType has been called into question, and
removing them now might be of benefit to the codebase
> • Cons: getSequentialNumElements() has similar issues to those that we are
trying to solve in the first place and potentially subverts the whole design.
Omitting getSequentialNumElements() would add lots of code duplication.
Introduces additional casting from Type.
> I believe that all three of these options are reasonable. My personal
preference is currently option 2. I think that option 3’s
getSequentialNumElements()subverts the design because every Type has
getSequentialNumElements(), it is tempting to just call it. However, the cast
will fail at the call site in debug, and in release it will return a garbage
value rather than a value that works most of the time. For option 1, the
existence of CompositeType complicates the conversion logic for little benefit.
>
> Conclusion:
>
>                 Thank you for your time in reviewing this RFC. Your
feedback on my work is greatly appreciated.
>
>
>
> Thank you,
>
>                 Christopher Tetreault
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Chris Tetreault via llvm-dev

2020-Mar-17 17:24 UTC

head link

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

Hi,

   I just wanted to ping the mailing list on this RFC. So far, feedback has been
pretty limited. Regarding the open question of how to handle the SequentialType
subclass, I have received one vote of removing it entirely, and not having a
getSequentialNumElements() function.

   Are there any other thoughts on this topic?

Thank you,
   Christopher Tetreault

From: Chris Tetreault
Sent: Monday, March 9, 2020 12:06 PM
To: llvm-dev at lists.llvm.org
Subject: [RFC] Refactor class hierarchy of VectorType in the IR

Hi,
                I am helping with the effort to implement scalable vectors in
the codebase in order to add support for generating SVE code in the Arm backend.
I would like to propose a refactor of the Type class hierarchy in order to
eliminate issues related to the misuse of SequentialType::getNumElements(). I
would like to introduce a new class FixedVectorType that inherits from
SequentialType and VectorType. VectorType would no longer inherit from
SequentialType, instead directly inheriting from Type. After this change, it
will be statically impossible to accidentally call
SequentialType::getNumElements() via a VectorType pointer.
Background:
                Recently, scalable vectors have been introduced into the
codebase. Previously, vectors have been written <n x ty> in IR, where n is
a fixed number of elements known at compile time, and ty is some type. Scalable
vectors are written <vscale x n x ty> where vscale is a runtime constant
value. A new function has been added to VectorType (defined in
llvm/IR/DerivedTypes.h), getElementCount(), that returns an ElementCount, which
is defined as such in llvm/Support/TypeSize.h:
                class ElementCount {
public:
  unsigned Min;
  bool Scalable;
  ...
}
                Min is the minimum number of elements in the vector (the
"n" in <vscale x n x ty>), and Scalable is true if the vector is
scalable (true for <vscale x n x ty>, false for <n x ty>) The idea
is that if a vector is not scalable, then Min is exactly equal to the number of
vector elements, but if the vector is scalable, then the number of vector
elements is equal to some runtime-constant multiple of Min. The key takeaway
here is that scalable vectors and fixed length vectors need to be treated
differently by the compiler. For a fixed length vector, it is valid to iterate
over the vector elements, but this is impossible for a scalable vector.
Discussion:
The trouble is that all instances of VectorType have getNumElements() inherited
from SequentialType. Prior to the introduction of scalable vectors, this
function was guaranteed to return the number of elements in a vector or array.
Today, there is a comment that documents the fact that this returns only the
minimum number of elements for scalable vectors, however there exists a ton of
code in the codebase that is now misusing getNumElements(). Some examples:
                Auto *V = VectorType::get(Ty,
SomeOtherVec->getNumElements());
                This code was previously perfectly fine but is incorrect for
scalable vectors. When scalable vectors were introduced VectorType::get() was
refactored to take a bool to tell if the vector is scalable. This bool has a
default value of false. In this example, get() is returning a non-scalable
vector even if SomeOtherVec was scalable. This will manifest later in some
unrelated code as a type mismatch between a scalable and fixed length vector.
                for (unsigned I = 0; I < SomeVec->getNumElements(); ++I) {
... }
                Previously, since there was no notion of scalable vectors, this
was perfectly reasonable code. However, for scalable vectors, this is always a
bug.
                With vigilance in code review, and good test coverage we will
eventually find and squash most of these bugs. Unfortunately, code review is
hard, and test coverage isn't perfect. Bugs will continue to slip through as
long as it's easier to do the wrong thing.
                One other factor to consider, is that there is a great deal of
code which deals exclusively with fixed length vectors. Any backend for which
there are no scalable vectors should not need to care about their existence.
Even in Arm, if Neon code is being generated, then the vectors will never be
scalable. In this code, the current status quo is perfectly fine, and any code
related to checking if the vector is scalable is just noise.
Proposal:
                In order to support users who only need fixed width vectors, and
to ensure that nobody can accidentally call getNumElements() on a scalable
vector, I am proposing the introduction of a new FixedVectorType which inherits
from both VectorType and SequentialType. In turn, VectorType will no longer
inherit from SequentialType. An example of what this will look like, with some
misc. functions omitted for clarity:
class VectorType : public Type {
public:
  static VectorType *get(Type *ElementType, ElementCount EC);

  Type *getElementType() const;
  ElementCount getElementCount() const;
  bool isScalable() const;
};

class FixedVectorType : public VectorType, public SequentialType {
public:
  static FixedVectorType *get(Type *ElementType, unsigned NumElts);
};

class SequentialType : public CompositeType {
public:
  uint64_t getNumElements() const { return NumElements; }
};
                In this proposed architecture, VectorType does not have a
getNumElements() function because it does not inherit from SequentialType. In
generic code, users will call VectorType::get() to obtain a new instance of
VectorType just as they always have. VectorType implements the safe subset of
functionality of fixed and scalable vectors that is suitable for use anywhere.
If the user passes false to the scalable parameter of get(), they will get an
instance of FixedVectorType back. Users can then inspect its type and cast it to
FixedVectorType using the usual mechanisms. In code that deals exclusively in
fixed length vectors, the user can call FixedVectorType::get() to directly get
an instance of FixedVectorType, and their code can remain largely unchanged from
how it was prior to the introduction of scalable vectors. At this time, there
exists no use case that is only valid for scalable vectors, so no
ScalableVectorType is being added.
                With this change, in generic code it is now impossible to
accidentally call getNumElements() on a scalable vector. If a user tries to pass
a scalable vector to a function that expects a fixed length vector, they will
encounter a compilation failure at the site of the bug, rather than a runtime
error in some unrelated code. If a user attempts to cast a scalable vector to
FixedVectorType, the cast will fail at the call site. This will make it easier
to track down all the places that are currently incorrect, and will prevent
future developers from introducing bugs by misusing getNumElements().
Outstanding design choice:
                One issue with this architecture as proposed is the fact that
SequentialType (by way of CompositeType) inherits from Type. This introduces a
diamond inheritance in FixedVectorType. Unfortunately, llvm::cast uses a c-style
cast internally, so we cannot use virtual inheritance to resolve this issue.
Thus, we have a few options:

  1.  Break CompositeType's inheritance on Type and introduce functions to
convert from a Type to a CompositeType and vice versa. The conversion from
CompositeType is always safe because all instances of CompositeType (StructType,
ArrayType, and FixedVectorType) are instances of Type. A CompositeType can be
cast to the most derived class, then back to Type. The other way is not always
safe, so a function will need to be added to check if a Type is an instance of
CompositeType. This change is not that big, and I have a prototype
implementation up at https://reviews.llvm.org/D75486 ([SVE] Make CompositeType
not inherit from Type)
     *   Pros: this approach would result in minimal changes to the codebase. If
the llvm casts can be made to work for the conversion functions, then it would
touch very few files.
     *   Cons: There are those who think that CompositeType adds little value
and should be removed. Now would be an ideal time to do this. Additionally, the
conversion functions would be more complicated if we left CompositeType in.
  2.  Remove CompositeType and break SequentialType's inheritance of Type.
Add functions to convert a SequentialType to and from Type. The conversion
functions would work the same as those in option 1 above. Currently, there
exists only one class that derives directly from CompositeType: StructType. The
functionality of CompositeType can be directly moved into StructType, and APIs
that use CompositeType can directly use Type and cast appropriately. We feel
that this would be a fairly simple change, and we have a prototype
implementation up at https://reviews.llvm.org/D75660 (Remove CompositeType
class)
     *   Pros: Removing CompositeType would simplify the type hierarchy. Leaving
SequentialType in would simplify some code and be more typesafe than having a
getSequentialNumElements on Type.
     *   Cons: The value of SequentialType has also been called into question.
If we wanted to remove it, now would be a good time. Conversion functions add
complexity to the design. Introduces additional casting from Type.
  3.  Remove CompositeType and SequentialType. Roll the functions directly into
the most derived classes. A helper function can be added to Type to handle
choosing from FixedVectorType and ArrayType and calling getNumElements():
static unsigned getSequentialNumElements() {
  assert(isSequentialType()); // This already exists and does the
                              // right thing
  if (auto *AT = dyn_cast<ArrayType>(this))
    return AT->getNumElements();
  return cast<FixedVectorType>(this)->getNumElements();
}
A prototype implementation of this strategy can be found at
https://reviews.llvm.org/D75661 (Remove SequentialType from the type heirarchy.)

     *   Pros: By removing the multiple inheritance completely, we greatly
simplify the design and eliminate the need for any conversion functions. The
value of CompositeType and SequentialType has been called into question, and
removing them now might be of benefit to the codebase
     *   Cons: getSequentialNumElements() has similar issues to those that we
are trying to solve in the first place and potentially subverts the whole
design. Omitting getSequentialNumElements() would add lots of code duplication.
Introduces additional casting from Type.
I believe that all three of these options are reasonable. My personal preference
is currently option 2. I think that option 3's getSequentialNumElements()
subverts the design because every Type has getSequentialNumElements(), it is
tempting to just call it. However, the cast will fail at the call site in debug,
and in release it will return a garbage value rather than a value that works
most of the time. For option 1, the existence of CompositeType complicates the
conversion logic for little benefit.
Conclusion:
                Thank you for your time in reviewing this RFC. Your feedback on
my work is greatly appreciated.

Thank you,
                Christopher Tetreault

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200317/d7959c40/attachment-0001.html>

John McCall via llvm-dev

2020-May-21 18:14 UTC

head link

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

On 9 Mar 2020, at 15:05, Chris Tetreault via llvm-dev
wrote:> Hi,
>                 I am helping with the effort to implement scalable 
> vectors in the codebase in order to add support for generating SVE 
> code in the Arm backend. I would like to propose a refactor of the 
> Type class hierarchy in order to eliminate issues related to the 
> misuse of SequentialType::getNumElements(). I would like to introduce 
> a new class FixedVectorType that inherits from SequentialType and 
> VectorType. VectorType would no longer inherit from SequentialType, 
> instead directly inheriting from Type. After this change, it will be 
> statically impossible to accidentally call 
> SequentialType::getNumElements() via a VectorType pointer.
I’m sorry that I missed this thread when you posted it.  I’m very 
much in favor of changing the type hierarchy to statically distinguish 
fixed from scalable vector types, but I think that making VectorType the 
common base type is unnecessarily disruptive.  Practically speaking, 
this is going to break every out-of-tree frontend, backend, or 
optimization pass that supports SIMD types.  Relatively little LLVM code 
will just naturally support scalable vector types without any 
adjustment.  Following the principle of iterative development, as well 
as just good conservative coding practice, it’s much better for code 
that does support both to explicitly opt in by checking for and handling 
the more general type, rather than being implicitly “volunteered” to 
support both by having the `VectorType` type semantically repurposed out 
from under them.

I understand the argument that `VectorType` is a better name for the 
abstract base type, but in this case I don’t think that consideration 
justifies the disruption for the vast majority of LLVM developers.  
There are plenty of names you could give the abstract base type that 
adequately express that’s a more general type, and the historical 
baggage of `VectorType` being slightly misleadingly named if you’re 
aware of a particular largely-vendor-specific extension does not seem 
overbearing.

John.
> Background:
>                 Recently, scalable vectors have been introduced into 
> the codebase. Previously, vectors have been written <n x ty> in IR, 
> where n is a fixed number of elements known at compile time, and ty is 
> some type. Scalable vectors are written <vscale x n x ty> where
vscale
> is a runtime constant value. A new function has been added to 
> VectorType (defined in llvm/IR/DerivedTypes.h), getElementCount(), 
> that returns an ElementCount, which is defined as such in 
> llvm/Support/TypeSize.h:
>                 class ElementCount {
> public:
>   unsigned Min;
>   bool Scalable;
>   ...
> }
>                 Min is the minimum number of elements in the vector 
> (the "n" in <vscale x n x ty>), and Scalable is true if the
vector is
> scalable (true for <vscale x n x ty>, false for <n x ty>) The
idea is
> that if a vector is not scalable, then Min is exactly equal to the 
> number of vector elements, but if the vector is scalable, then the 
> number of vector elements is equal to some runtime-constant multiple 
> of Min. The key takeaway here is that scalable vectors and fixed 
> length vectors need to be treated differently by the compiler. For a 
> fixed length vector, it is valid to iterate over the vector elements, 
> but this is impossible for a scalable vector.
> Discussion:
> The trouble is that all instances of VectorType have getNumElements() 
> inherited from SequentialType. Prior to the introduction of scalable 
> vectors, this function was guaranteed to return the number of elements 
> in a vector or array. Today, there is a comment that documents the 
> fact that this returns only the minimum number of elements for 
> scalable vectors, however there exists a ton of code in the codebase 
> that is now misusing getNumElements(). Some examples:
>                 Auto *V = VectorType::get(Ty, 
> SomeOtherVec->getNumElements());
>                 This code was previously perfectly fine but is 
> incorrect for scalable vectors. When scalable vectors were introduced 
> VectorType::get() was refactored to take a bool to tell if the vector 
> is scalable. This bool has a default value of false. In this example, 
> get() is returning a non-scalable vector even if SomeOtherVec was 
> scalable. This will manifest later in some unrelated code as a type 
> mismatch between a scalable and fixed length vector.
>                 for (unsigned I = 0; I < SomeVec->getNumElements(); 
> ++I) { ... }
>                 Previously, since there was no notion of scalable 
> vectors, this was perfectly reasonable code. However, for scalable 
> vectors, this is always a bug.
>                 With vigilance in code review, and good test coverage 
> we will eventually find and squash most of these bugs. Unfortunately, 
> code review is hard, and test coverage isn't perfect. Bugs will 
> continue to slip through as long as it's easier to do the wrong thing.
>                 One other factor to consider, is that there is a great 
> deal of code which deals exclusively with fixed length vectors. Any 
> backend for which there are no scalable vectors should not need to 
> care about their existence. Even in Arm, if Neon code is being 
> generated, then the vectors will never be scalable. In this code, the 
> current status quo is perfectly fine, and any code related to checking 
> if the vector is scalable is just noise.
> Proposal:
>                 In order to support users who only need fixed width 
> vectors, and to ensure that nobody can accidentally call 
> getNumElements() on a scalable vector, I am proposing the introduction 
> of a new FixedVectorType which inherits from both VectorType and 
> SequentialType. In turn, VectorType will no longer inherit from 
> SequentialType. An example of what this will look like, with some 
> misc. functions omitted for clarity:
> class VectorType : public Type {
> public:
>   static VectorType *get(Type *ElementType, ElementCount EC);
>
>   Type *getElementType() const;
>   ElementCount getElementCount() const;
>   bool isScalable() const;
> };
>
> class FixedVectorType : public VectorType, public SequentialType {
> public:
>   static FixedVectorType *get(Type *ElementType, unsigned NumElts);
> };
>
> class SequentialType : public CompositeType {
> public:
>   uint64_t getNumElements() const { return NumElements; }
> };
>                 In this proposed architecture, VectorType does not 
> have a getNumElements() function because it does not inherit from 
> SequentialType. In generic code, users will call VectorType::get() to 
> obtain a new instance of VectorType just as they always have. 
> VectorType implements the safe subset of functionality of fixed and 
> scalable vectors that is suitable for use anywhere. If the user passes 
> false to the scalable parameter of get(), they will get an instance of 
> FixedVectorType back. Users can then inspect its type and cast it to 
> FixedVectorType using the usual mechanisms. In code that deals 
> exclusively in fixed length vectors, the user can call 
> FixedVectorType::get() to directly get an instance of FixedVectorType, 
> and their code can remain largely unchanged from how it was prior to 
> the introduction of scalable vectors. At this time, there exists no 
> use case that is only valid for scalable vectors, so no 
> ScalableVectorType is being added.
>                 With this change, in generic code it is now impossible 
> to accidentally call getNumElements() on a scalable vector. If a user 
> tries to pass a scalable vector to a function that expects a fixed 
> length vector, they will encounter a compilation failure at the site 
> of the bug, rather than a runtime error in some unrelated code. If a 
> user attempts to cast a scalable vector to FixedVectorType, the cast 
> will fail at the call site. This will make it easier to track down all 
> the places that are currently incorrect, and will prevent future 
> developers from introducing bugs by misusing getNumElements().
> Outstanding design choice:
>                 One issue with this architecture as proposed is the 
> fact that SequentialType (by way of CompositeType) inherits from Type. 
> This introduces a diamond inheritance in FixedVectorType. 
> Unfortunately, llvm::cast uses a c-style cast internally, so we cannot 
> use virtual inheritance to resolve this issue. Thus, we have a few 
> options:
>
>   1.  Break CompositeType's inheritance on Type and introduce 
> functions to convert from a Type to a CompositeType and vice versa. 
> The conversion from CompositeType is always safe because all instances 
> of CompositeType (StructType, ArrayType, and FixedVectorType) are 
> instances of Type. A CompositeType can be cast to the most derived 
> class, then back to Type. The other way is not always safe, so a 
> function will need to be added to check if a Type is an instance of 
> CompositeType. This change is not that big, and I have a prototype 
> implementation up at https://reviews.llvm.org/D75486 ([SVE] Make 
> CompositeType not inherit from Type)
>      *   Pros: this approach would result in minimal changes to the 
> codebase. If the llvm casts can be made to work for the conversion 
> functions, then it would touch very few files.
>      *   Cons: There are those who think that CompositeType adds 
> little value and should be removed. Now would be an ideal time to do 
> this. Additionally, the conversion functions would be more complicated 
> if we left CompositeType in.
>   2.  Remove CompositeType and break SequentialType's inheritance of 
> Type. Add functions to convert a SequentialType to and from Type. The 
> conversion functions would work the same as those in option 1 above. 
> Currently, there exists only one class that derives directly from 
> CompositeType: StructType. The functionality of CompositeType can be 
> directly moved into StructType, and APIs that use CompositeType can 
> directly use Type and cast appropriately. We feel that this would be a 
> fairly simple change, and we have a prototype implementation up at 
> https://reviews.llvm.org/D75660 (Remove CompositeType class)
>      *   Pros: Removing CompositeType would simplify the type 
> hierarchy. Leaving SequentialType in would simplify some code and be 
> more typesafe than having a getSequentialNumElements on Type.
>      *   Cons: The value of SequentialType has also been called into 
> question. If we wanted to remove it, now would be a good time. 
> Conversion functions add complexity to the design. Introduces 
> additional casting from Type.
>   3.  Remove CompositeType and SequentialType. Roll the functions 
> directly into the most derived classes. A helper function can be added 
> to Type to handle choosing from FixedVectorType and ArrayType and 
> calling getNumElements():
> static unsigned getSequentialNumElements() {
>   assert(isSequentialType()); // This already exists and does the
>                               // right thing
>   if (auto *AT = dyn_cast<ArrayType>(this))
>     return AT->getNumElements();
>   return cast<FixedVectorType>(this)->getNumElements();
> }
> A prototype implementation of this strategy can be found at 
> https://reviews.llvm.org/D75661 (Remove SequentialType from the type 
> heirarchy.)
>
>      *   Pros: By removing the multiple inheritance completely, we 
> greatly simplify the design and eliminate the need for any conversion 
> functions. The value of CompositeType and SequentialType has been 
> called into question, and removing them now might be of benefit to the 
> codebase
>      *   Cons: getSequentialNumElements() has similar issues to those 
> that we are trying to solve in the first place and potentially 
> subverts the whole design. Omitting getSequentialNumElements() would 
> add lots of code duplication. Introduces additional casting from Type.
> I believe that all three of these options are reasonable. My personal 
> preference is currently option 2. I think that option 3's 
> getSequentialNumElements() subverts the design because every Type has 
> getSequentialNumElements(), it is tempting to just call it. However, 
> the cast will fail at the call site in debug, and in release it will 
> return a garbage value rather than a value that works most of the 
> time. For option 1, the existence of CompositeType complicates the 
> conversion logic for little benefit.
> Conclusion:
>                 Thank you for your time in reviewing this RFC. Your 
> feedback on my work is greatly appreciated.
>
> Thank you,
>                 Christopher Tetreault
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200521/a54e7ed1/attachment.html>

Chris Tetreault via llvm-dev

2020-May-21 20:01 UTC

head link

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

Hi John,

   I’d like to address some points in your message.
> Practically speaking, this is going to break every out-of-tree frontend,
backend, or optimization pass that supports SIMD types.
My understanding is that the policy in LLVM development is that we do not let
considerations for downstream and out-of-tree codebases affect the pace of
development. The C++ API is explicitly unstable. I maintain a downstream fork of
LLVM myself, so I know the pain that this is causing because I get to fix all
the issues in my internal codebase. However, the fix for downstream codebases is
very simple: Just find all the places where it says VectorType, and change it to
say FixedVectorType.
> … by having the VectorType type semantically repurposed out from under
them.
The documented semantics of VectorType prior to my RFC were that it is a
generalization of all vector types. The VectorType contains an ElementCount,
which is a pair of (bool, unsigned). If the bool is true, then the return value
of getNumElements() is the minimum number of vector elements. If the bool is
false, then it is the actual number of elements. My RFC has not changed these
semantics. It will eventually delete a function that has been pervasively
misused throughout the codebase, but the semantics remain the same. You are
proposing a semantic change to VectorType to have it specifically be a fixed
width vector.
> … a particular largely-vendor-specific extension …
All SIMD vectors are vendor specific extensions. Just because most of the most
popular architectures have them does not make this not true. AArch64 and RISC-V
have scalable vectors, so it is not just one architecture. It is the
responsibility of all developers to ensure that they use the types correctly. It
would be nice if the obvious thing to do is the correct thing to do.
> … it’s much better for code that does support both to explicitly opt in by
checking for and handling the more general type …
This is how it will work. I am in the process of fixing up call sites that make
fixed width assumptions so that they use FixedVectorType.

   I think that it is important to ensure that things have clear sensible names,
and to clean up historical baggage when the opportunity presents. The advantage
of an unstable API is that you are not prevented from changing this sort of
thing by external entities. Now is the time to fix the names of the vector
types; a new type of vector exists, and we have the choice to change the names
to reflect the new reality or to accumulate some technical debt. If we wait to
address this issue later, there will just be more code that needs to be
addressed. The refactor is fairly easy right now because pretty much everything
is making fixed width assumptions.  The changes are all fairly mechanical. If we
wait until scalable vectors are well supported throughout the codebase, the
change will just be that much harder.

Thanks,
   Christopher Tetreault

From: John McCall <rjmccall at apple.com>
Sent: Thursday, May 21, 2020 11:15 AM
To: Chris Tetreault <ctetreau at quicinc.com>
Cc: llvm-dev at lists.llvm.org
Subject: [EXT] Re: [llvm-dev] [RFC] Refactor class hierarchy of VectorType in
the IR

On 9 Mar 2020, at 15:05, Chris Tetreault via llvm-dev wrote:

Hi,
I am helping with the effort to implement scalable vectors in the codebase in
order to add support for generating SVE code in the Arm backend. I would like to
propose a refactor of the Type class hierarchy in order to eliminate issues
related to the misuse of SequentialType::getNumElements(). I would like to
introduce a new class FixedVectorType that inherits from SequentialType and
VectorType. VectorType would no longer inherit from SequentialType, instead
directly inheriting from Type. After this change, it will be statically
impossible to accidentally call SequentialType::getNumElements() via a
VectorType pointer.

I’m sorry that I missed this thread when you posted it. I’m very much in favor
of changing the type hierarchy to statically distinguish fixed from scalable
vector types, but I think that making VectorType the common base type is
unnecessarily disruptive. Practically speaking, this is going to break every
out-of-tree frontend, backend, or optimization pass that supports SIMD types.
Relatively little LLVM code will just naturally support scalable vector types
without any adjustment. Following the principle of iterative development, as
well as just good conservative coding practice, it’s much better for code that
does support both to explicitly opt in by checking for and handling the more
general type, rather than being implicitly “volunteered” to support both by
having the VectorType type semantically repurposed out from under them.

I understand the argument that VectorType is a better name for the abstract base
type, but in this case I don’t think that consideration justifies the disruption
for the vast majority of LLVM developers. There are plenty of names you could
give the abstract base type that adequately express that’s a more general type,
and the historical baggage of VectorType being slightly misleadingly named if
you’re aware of a particular largely-vendor-specific extension does not seem
overbearing.

John.

Background:
Recently, scalable vectors have been introduced into the codebase. Previously,
vectors have been written <n x ty> in IR, where n is a fixed number of
elements known at compile time, and ty is some type. Scalable vectors are
written <vscale x n x ty> where vscale is a runtime constant value. A new
function has been added to VectorType (defined in llvm/IR/DerivedTypes.h),
getElementCount(), that returns an ElementCount, which is defined as such in
llvm/Support/TypeSize.h:
class ElementCount {
public:
unsigned Min;
bool Scalable;
...
}
Min is the minimum number of elements in the vector (the "n" in
<vscale x n x ty>), and Scalable is true if the vector is scalable (true
for <vscale x n x ty>, false for <n x ty>) The idea is that if a
vector is not scalable, then Min is exactly equal to the number of vector
elements, but if the vector is scalable, then the number of vector elements is
equal to some runtime-constant multiple of Min. The key takeaway here is that
scalable vectors and fixed length vectors need to be treated differently by the
compiler. For a fixed length vector, it is valid to iterate over the vector
elements, but this is impossible for a scalable vector.
Discussion:
The trouble is that all instances of VectorType have getNumElements() inherited
from SequentialType. Prior to the introduction of scalable vectors, this
function was guaranteed to return the number of elements in a vector or array.
Today, there is a comment that documents the fact that this returns only the
minimum number of elements for scalable vectors, however there exists a ton of
code in the codebase that is now misusing getNumElements(). Some examples:
Auto *V = VectorType::get(Ty, SomeOtherVec->getNumElements());
This code was previously perfectly fine but is incorrect for scalable vectors.
When scalable vectors were introduced VectorType::get() was refactored to take a
bool to tell if the vector is scalable. This bool has a default value of false.
In this example, get() is returning a non-scalable vector even if SomeOtherVec
was scalable. This will manifest later in some unrelated code as a type mismatch
between a scalable and fixed length vector.
for (unsigned I = 0; I < SomeVec->getNumElements(); ++I) { ... }
Previously, since there was no notion of scalable vectors, this was perfectly
reasonable code. However, for scalable vectors, this is always a bug.
With vigilance in code review, and good test coverage we will eventually find
and squash most of these bugs. Unfortunately, code review is hard, and test
coverage isn't perfect. Bugs will continue to slip through as long as
it's easier to do the wrong thing.
One other factor to consider, is that there is a great deal of code which deals
exclusively with fixed length vectors. Any backend for which there are no
scalable vectors should not need to care about their existence. Even in Arm, if
Neon code is being generated, then the vectors will never be scalable. In this
code, the current status quo is perfectly fine, and any code related to checking
if the vector is scalable is just noise.
Proposal:
In order to support users who only need fixed width vectors, and to ensure that
nobody can accidentally call getNumElements() on a scalable vector, I am
proposing the introduction of a new FixedVectorType which inherits from both
VectorType and SequentialType. In turn, VectorType will no longer inherit from
SequentialType. An example of what this will look like, with some misc.
functions omitted for clarity:
class VectorType : public Type {
public:
static VectorType *get(Type *ElementType, ElementCount EC);

Type *getElementType() const;
ElementCount getElementCount() const;
bool isScalable() const;
};

class FixedVectorType : public VectorType, public SequentialType {
public:
static FixedVectorType *get(Type *ElementType, unsigned NumElts);
};

class SequentialType : public CompositeType {
public:
uint64_t getNumElements() const { return NumElements; }
};
In this proposed architecture, VectorType does not have a getNumElements()
function because it does not inherit from SequentialType. In generic code, users
will call VectorType::get() to obtain a new instance of VectorType just as they
always have. VectorType implements the safe subset of functionality of fixed and
scalable vectors that is suitable for use anywhere. If the user passes false to
the scalable parameter of get(), they will get an instance of FixedVectorType
back. Users can then inspect its type and cast it to FixedVectorType using the
usual mechanisms. In code that deals exclusively in fixed length vectors, the
user can call FixedVectorType::get() to directly get an instance of
FixedVectorType, and their code can remain largely unchanged from how it was
prior to the introduction of scalable vectors. At this time, there exists no use
case that is only valid for scalable vectors, so no ScalableVectorType is being
added.
With this change, in generic code it is now impossible to accidentally call
getNumElements() on a scalable vector. If a user tries to pass a scalable vector
to a function that expects a fixed length vector, they will encounter a
compilation failure at the site of the bug, rather than a runtime error in some
unrelated code. If a user attempts to cast a scalable vector to FixedVectorType,
the cast will fail at the call site. This will make it easier to track down all
the places that are currently incorrect, and will prevent future developers from
introducing bugs by misusing getNumElements().
Outstanding design choice:
One issue with this architecture as proposed is the fact that SequentialType (by
way of CompositeType) inherits from Type. This introduces a diamond inheritance
in FixedVectorType. Unfortunately, llvm::cast uses a c-style cast internally, so
we cannot use virtual inheritance to resolve this issue. Thus, we have a few
options:

1. Break CompositeType's inheritance on Type and introduce functions to
convert from a Type to a CompositeType and vice versa. The conversion from
CompositeType is always safe because all instances of CompositeType (StructType,
ArrayType, and FixedVectorType) are instances of Type. A CompositeType can be
cast to the most derived class, then back to Type. The other way is not always
safe, so a function will need to be added to check if a Type is an instance of
CompositeType. This change is not that big, and I have a prototype
implementation up at https://reviews.llvm.org/D75486 ([SVE] Make CompositeType
not inherit from Type)
* Pros: this approach would result in minimal changes to the codebase. If the
llvm casts can be made to work for the conversion functions, then it would touch
very few files.
* Cons: There are those who think that CompositeType adds little value and
should be removed. Now would be an ideal time to do this. Additionally, the
conversion functions would be more complicated if we left CompositeType in.
2. Remove CompositeType and break SequentialType's inheritance of Type. Add
functions to convert a SequentialType to and from Type. The conversion functions
would work the same as those in option 1 above. Currently, there exists only one
class that derives directly from CompositeType: StructType. The functionality of
CompositeType can be directly moved into StructType, and APIs that use
CompositeType can directly use Type and cast appropriately. We feel that this
would be a fairly simple change, and we have a prototype implementation up at
https://reviews.llvm.org/D75660 (Remove CompositeType class)
* Pros: Removing CompositeType would simplify the type hierarchy. Leaving
SequentialType in would simplify some code and be more typesafe than having a
getSequentialNumElements on Type.
* Cons: The value of SequentialType has also been called into question. If we
wanted to remove it, now would be a good time. Conversion functions add
complexity to the design. Introduces additional casting from Type.
3. Remove CompositeType and SequentialType. Roll the functions directly into the
most derived classes. A helper function can be added to Type to handle choosing
from FixedVectorType and ArrayType and calling getNumElements():
static unsigned getSequentialNumElements() {
assert(isSequentialType()); // This already exists and does the
// right thing
if (auto *AT = dyn_cast<ArrayType>(this))
return AT->getNumElements();
return cast<FixedVectorType>(this)->getNumElements();
}
A prototype implementation of this strategy can be found at
https://reviews.llvm.org/D75661 (Remove SequentialType from the type heirarchy.)

* Pros: By removing the multiple inheritance completely, we greatly simplify the
design and eliminate the need for any conversion functions. The value of
CompositeType and SequentialType has been called into question, and removing
them now might be of benefit to the codebase
* Cons: getSequentialNumElements() has similar issues to those that we are
trying to solve in the first place and potentially subverts the whole design.
Omitting getSequentialNumElements() would add lots of code duplication.
Introduces additional casting from Type.
I believe that all three of these options are reasonable. My personal preference
is currently option 2. I think that option 3's getSequentialNumElements()
subverts the design because every Type has getSequentialNumElements(), it is
tempting to just call it. However, the cast will fail at the call site in debug,
and in release it will return a garbage value rather than a value that works
most of the time. For option 1, the existence of CompositeType complicates the
conversion logic for little benefit.
Conclusion:
Thank you for your time in reviewing this RFC. Your feedback on my work is
greatly appreciated.

Thank you,
Christopher Tetreault

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200521/d184dd97/attachment.html>

Chris Tetreault via llvm-dev

2020-May-28 18:18 UTC

head link

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

All,

   I’d like to summarize the outcome of our discussion in the SVE call regarding
this. We discussed some pros and cons of my approach:

Pros:
- Having VectorType be the base, and FixedVectorType be the fixed vector type
presents a nicer API
- It would be less work for those performing the refactor and working on this
feature in upstream to continue on the current course
- A significant portion of code that currently operates on VectorType already
just works with scalable vector types

Cons:
- This refactor causes an API breakage which creates work for downstream
codebases that need to adapt to the new API and resolve merge conflicts

   Since there was not much support put forth for John’s plan to have VectorType
be the fixed vector type, we have decided that we will move forward with my plan
to have VectorType be the base. John has requested that we have a proper
deprecation period, so before removing any more functions, I will be marking
them as deprecated. They will stay deprecated until final binaries for LLVM 11
are shipped. This will allow downstream and out of tree codebases time to
migrate to the new APIs.

   While I feel very strongly that the plan I have put forth in my RFC is the
right way to go, it is possible that significant opposition to this plan appears
when LLVM 11 ships and client codebases that don’t track master see this change.
I believe that we should stay the course, but if it becomes necessary to
reevaluate this RFC, I believe that it will be easier to reverse course when
this work is completed rather than mid-stream.

Thank you,
   Christopher Tetreault

From: John McCall <rjmccall at apple.com>
Sent: Thursday, May 21, 2020 11:15 AM
To: Chris Tetreault <ctetreau at quicinc.com>
Cc: llvm-dev at lists.llvm.org
Subject: [EXT] Re: [llvm-dev] [RFC] Refactor class hierarchy of VectorType in
the IR


On 9 Mar 2020, at 15:05, Chris Tetreault via llvm-dev wrote:

Hi,
I am helping with the effort to implement scalable vectors in the codebase in
order to add support for generating SVE code in the Arm backend. I would like to
propose a refactor of the Type class hierarchy in order to eliminate issues
related to the misuse of SequentialType::getNumElements(). I would like to
introduce a new class FixedVectorType that inherits from SequentialType and
VectorType. VectorType would no longer inherit from SequentialType, instead
directly inheriting from Type. After this change, it will be statically
impossible to accidentally call SequentialType::getNumElements() via a
VectorType pointer.

I’m sorry that I missed this thread when you posted it. I’m very much in favor
of changing the type hierarchy to statically distinguish fixed from scalable
vector types, but I think that making VectorType the common base type is
unnecessarily disruptive. Practically speaking, this is going to break every
out-of-tree frontend, backend, or optimization pass that supports SIMD types.
Relatively little LLVM code will just naturally support scalable vector types
without any adjustment. Following the principle of iterative development, as
well as just good conservative coding practice, it’s much better for code that
does support both to explicitly opt in by checking for and handling the more
general type, rather than being implicitly “volunteered” to support both by
having the VectorType type semantically repurposed out from under them.

I understand the argument that VectorType is a better name for the abstract base
type, but in this case I don’t think that consideration justifies the disruption
for the vast majority of LLVM developers. There are plenty of names you could
give the abstract base type that adequately express that’s a more general type,
and the historical baggage of VectorType being slightly misleadingly named if
you’re aware of a particular largely-vendor-specific extension does not seem
overbearing.

John.

Background:
Recently, scalable vectors have been introduced into the codebase. Previously,
vectors have been written <n x ty> in IR, where n is a fixed number of
elements known at compile time, and ty is some type. Scalable vectors are
written <vscale x n x ty> where vscale is a runtime constant value. A new
function has been added to VectorType (defined in llvm/IR/DerivedTypes.h),
getElementCount(), that returns an ElementCount, which is defined as such in
llvm/Support/TypeSize.h:
class ElementCount {
public:
unsigned Min;
bool Scalable;
...
}
Min is the minimum number of elements in the vector (the "n" in
<vscale x n x ty>), and Scalable is true if the vector is scalable (true
for <vscale x n x ty>, false for <n x ty>) The idea is that if a
vector is not scalable, then Min is exactly equal to the number of vector
elements, but if the vector is scalable, then the number of vector elements is
equal to some runtime-constant multiple of Min. The key takeaway here is that
scalable vectors and fixed length vectors need to be treated differently by the
compiler. For a fixed length vector, it is valid to iterate over the vector
elements, but this is impossible for a scalable vector.
Discussion:
The trouble is that all instances of VectorType have getNumElements() inherited
from SequentialType. Prior to the introduction of scalable vectors, this
function was guaranteed to return the number of elements in a vector or array.
Today, there is a comment that documents the fact that this returns only the
minimum number of elements for scalable vectors, however there exists a ton of
code in the codebase that is now misusing getNumElements(). Some examples:
Auto *V = VectorType::get(Ty, SomeOtherVec->getNumElements());
This code was previously perfectly fine but is incorrect for scalable vectors.
When scalable vectors were introduced VectorType::get() was refactored to take a
bool to tell if the vector is scalable. This bool has a default value of false.
In this example, get() is returning a non-scalable vector even if SomeOtherVec
was scalable. This will manifest later in some unrelated code as a type mismatch
between a scalable and fixed length vector.
for (unsigned I = 0; I < SomeVec->getNumElements(); ++I) { ... }
Previously, since there was no notion of scalable vectors, this was perfectly
reasonable code. However, for scalable vectors, this is always a bug.
With vigilance in code review, and good test coverage we will eventually find
and squash most of these bugs. Unfortunately, code review is hard, and test
coverage isn't perfect. Bugs will continue to slip through as long as
it's easier to do the wrong thing.
One other factor to consider, is that there is a great deal of code which deals
exclusively with fixed length vectors. Any backend for which there are no
scalable vectors should not need to care about their existence. Even in Arm, if
Neon code is being generated, then the vectors will never be scalable. In this
code, the current status quo is perfectly fine, and any code related to checking
if the vector is scalable is just noise.
Proposal:
In order to support users who only need fixed width vectors, and to ensure that
nobody can accidentally call getNumElements() on a scalable vector, I am
proposing the introduction of a new FixedVectorType which inherits from both
VectorType and SequentialType. In turn, VectorType will no longer inherit from
SequentialType. An example of what this will look like, with some misc.
functions omitted for clarity:
class VectorType : public Type {
public:
static VectorType *get(Type *ElementType, ElementCount EC);

Type *getElementType() const;
ElementCount getElementCount() const;
bool isScalable() const;
};

class FixedVectorType : public VectorType, public SequentialType {
public:
static FixedVectorType *get(Type *ElementType, unsigned NumElts);
};

class SequentialType : public CompositeType {
public:
uint64_t getNumElements() const { return NumElements; }
};
In this proposed architecture, VectorType does not have a getNumElements()
function because it does not inherit from SequentialType. In generic code, users
will call VectorType::get() to obtain a new instance of VectorType just as they
always have. VectorType implements the safe subset of functionality of fixed and
scalable vectors that is suitable for use anywhere. If the user passes false to
the scalable parameter of get(), they will get an instance of FixedVectorType
back. Users can then inspect its type and cast it to FixedVectorType using the
usual mechanisms. In code that deals exclusively in fixed length vectors, the
user can call FixedVectorType::get() to directly get an instance of
FixedVectorType, and their code can remain largely unchanged from how it was
prior to the introduction of scalable vectors. At this time, there exists no use
case that is only valid for scalable vectors, so no ScalableVectorType is being
added.
With this change, in generic code it is now impossible to accidentally call
getNumElements() on a scalable vector. If a user tries to pass a scalable vector
to a function that expects a fixed length vector, they will encounter a
compilation failure at the site of the bug, rather than a runtime error in some
unrelated code. If a user attempts to cast a scalable vector to FixedVectorType,
the cast will fail at the call site. This will make it easier to track down all
the places that are currently incorrect, and will prevent future developers from
introducing bugs by misusing getNumElements().
Outstanding design choice:
One issue with this architecture as proposed is the fact that SequentialType (by
way of CompositeType) inherits from Type. This introduces a diamond inheritance
in FixedVectorType. Unfortunately, llvm::cast uses a c-style cast internally, so
we cannot use virtual inheritance to resolve this issue. Thus, we have a few
options:

1. Break CompositeType's inheritance on Type and introduce functions to
convert from a Type to a CompositeType and vice versa. The conversion from
CompositeType is always safe because all instances of CompositeType (StructType,
ArrayType, and FixedVectorType) are instances of Type. A CompositeType can be
cast to the most derived class, then back to Type. The other way is not always
safe, so a function will need to be added to check if a Type is an instance of
CompositeType. This change is not that big, and I have a prototype
implementation up at https://reviews.llvm.org/D75486 ([SVE] Make CompositeType
not inherit from Type)
* Pros: this approach would result in minimal changes to the codebase. If the
llvm casts can be made to work for the conversion functions, then it would touch
very few files.
* Cons: There are those who think that CompositeType adds little value and
should be removed. Now would be an ideal time to do this. Additionally, the
conversion functions would be more complicated if we left CompositeType in.
2. Remove CompositeType and break SequentialType's inheritance of Type. Add
functions to convert a SequentialType to and from Type. The conversion functions
would work the same as those in option 1 above. Currently, there exists only one
class that derives directly from CompositeType: StructType. The functionality of
CompositeType can be directly moved into StructType, and APIs that use
CompositeType can directly use Type and cast appropriately. We feel that this
would be a fairly simple change, and we have a prototype implementation up at
https://reviews.llvm.org/D75660 (Remove CompositeType class)
* Pros: Removing CompositeType would simplify the type hierarchy. Leaving
SequentialType in would simplify some code and be more typesafe than having a
getSequentialNumElements on Type.
* Cons: The value of SequentialType has also been called into question. If we
wanted to remove it, now would be a good time. Conversion functions add
complexity to the design. Introduces additional casting from Type.
3. Remove CompositeType and SequentialType. Roll the functions directly into the
most derived classes. A helper function can be added to Type to handle choosing
from FixedVectorType and ArrayType and calling getNumElements():
static unsigned getSequentialNumElements() {
assert(isSequentialType()); // This already exists and does the
// right thing
if (auto *AT = dyn_cast<ArrayType>(this))
return AT->getNumElements();
return cast<FixedVectorType>(this)->getNumElements();
}
A prototype implementation of this strategy can be found at
https://reviews.llvm.org/D75661 (Remove SequentialType from the type heirarchy.)

* Pros: By removing the multiple inheritance completely, we greatly simplify the
design and eliminate the need for any conversion functions. The value of
CompositeType and SequentialType has been called into question, and removing
them now might be of benefit to the codebase
* Cons: getSequentialNumElements() has similar issues to those that we are
trying to solve in the first place and potentially subverts the whole design.
Omitting getSequentialNumElements() would add lots of code duplication.
Introduces additional casting from Type.
I believe that all three of these options are reasonable. My personal preference
is currently option 2. I think that option 3's getSequentialNumElements()
subverts the design because every Type has getSequentialNumElements(), it is
tempting to just call it. However, the cast will fail at the call site in debug,
and in release it will return a garbage value rather than a value that works
most of the time. For option 1, the existence of CompositeType complicates the
conversion logic for little benefit.
Conclusion:
Thank you for your time in reviewing this RFC. Your feedback on my work is
greatly appreciated.

Thank you,
Christopher Tetreault


_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200528/f68fc05b/attachment.html>

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - May 2020 - [RFC] Refactor class hierarchy of VectorType in the IR

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

Seemingly Similar Threads