Kothari, Akash via llvm-dev
2021-Nov-15 18:18 UTC
[llvm-dev] [RFC] Proposal for TLX: Tensor LLVM eXtensions
For those who may have been having trouble viewing the RFC in plain text format,
we have our proposal in a Google doc:
https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit?usp=sharing.
It would be great if y’all could comment in the google doc or respond via email.
Thanks,
Akash Kothari
On Nov 12, 2021, at 1:28 PM, Kothari, Akash <akashk4 at
illinois.edu<mailto:akashk4 at illinois.edu>> wrote:
**** Proposal for TLX: Tensor LLVM eXtensions
==================================================================================
Authors: Akash Kothari (UIUC), Abdul Rafae Noor (UIUC), Dounia Khaldi (Intel),
Vikram Adve (UIUC), Yuanke Luo(Intel), Sudipta Sengupta (Amazon AWS),
Milind Girkar (Intel), Charith Mendis (UIUC)
------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211115/c7c21d89/attachment.html>
Florian Hahn via llvm-dev
2021-Nov-23 17:32 UTC
[llvm-dev] [RFC] Proposal for TLX: Tensor LLVM eXtensions
Hi, Thanks for sharing the proposal! I think the matrix extension has shown that it is feasible to use a ‘flat vector’ encoding to support more complex operations. Decoupling the shape information from the ‘operational’ intrinsics seems very neat! Below some additional initial questions. * The proposal itself is very big, both in terms of text as well as in the code that will be required to implement it. Have you thought about how to bring up support in a way that allows using a (smaller) subset of intrinsics end-to-end? * What will the hardware specific lowering look like? I think you mentioned you are planning to support a set of different hardware architectures. Will this require a separate lowering pass for each of those? * What’s the motivation for some intrinsics returning a vector and others returning a token type? Could all intrinsics return vector? This would be more consistent and the type info is associated to the value itself in any case. * Will variable shapes/sizes be supported? IIRC you mentioned that the type intrinsic can take arbitrary values as arguments. But some intrinsics return vectors, so they would need a fixed size? * You mentioned Julia and Halide as potential adopters. Do you know if there’s concrete interest to switch to using the new intrinsics by the maintainers ? What would the anticipated timeframe be? I think this could be a crucial argument for having this in LLVM directly, if we have people who are going to use this ASAP. * What will Clang support for arbitrary tensors look like? If Clang won’t support arbitrary tensors, why not? * AFAICT this should completely subsume the matrix extension and if we decide to add the more general extension the matrix extension should be removed. How will the transition from the current matrix intrinsics to the new tensor intrinsics work? Can existing IR be auto-upgraded? Cheers, Florian
Chris Lattner via llvm-dev
2021-Nov-28 01:57 UTC
[llvm-dev] [RFC] Proposal for TLX: Tensor LLVM eXtensions
Thank you for the interesting proposal Akash (et al). I have a few other questions: Florian pointed out that this is a very large proposal which is introducing a bunch of new concepts which makes it difficult to review. My major concern with it is that it is proposing a single tensor model for LLVM, something that is inappropriate for a wide variety of frameworks, and doesn’t appear to be very general. For example, it isn’t clear how to model the strided tensor model of pytorch, doesn’t appear to support dynamic shapes, sparse tensors, and it isn’t clear (in a quick reading) what the op-extensibiliy story is. Further, there are a bunch of design decisions inherent to this approach (e.g. putting the layout information on the ops, instead of in the types) that make certain optimizations (e.g. layout transformations) more complicated. This isn’t to say that this is the _wrong_ design, merely that it is only one of many plausible and important designs. Standardizing "one thing" in LLVM can have a chilling effect on innovation (particularly for such a rapidly evolving field) which is one of the reasons that MLIR favors an “open extensibility” approach. In terms of detailed design, it isn’t clear to me that representing heap allocated things like this as a token type will work out well. There have been a variety of proposals over the years (incl adding F90 style arrays as a first class entity) that haven’t worked well because of a wide variety of design assumptions in LLVM). The token type <https://llvm.org/docs/LangRef.html#token-type> in particular is not composable with control flow, functional calls and other things, and ML models frequently have loops and other controls flow in them - how do you plan to represent that? In your convolution operation, it doesn’t look like you’re handling the various edge conditions (replicating, mirroring, etc) common in ML frameworks. How do you plan to handle that? Similarly, how do you handle quantization? As per the motivation section, you point out "Crucially, however, MLIR does not have a low-level code generation framework that is retargetable to diverse hardware: it relies on LLVM for this purpose.” I happen to agree with you, but the lack of this in MLIR isn’t evidence that LLVM IR is the natural place to put matrix lowering support. Why do you think LLVM IR is a better place to put this than a high level IR? Whether it is MLIR, XLA, or something else, it seems that there is a very clear separation of concerns here, and (as you point out) LLVM is being successfully used as the backend for a wide variety of tensor compilers already. Finally, I’m also a bit concerned because the IR extensions are not really the meat of this proposal - this is effectively proposing something akin to the entire LLVM “CodeGen” framework but for tensors. The IR abstractions and framework need to be co-designed together, and it isn’t clear how general or powerful the framework will turn out to be. We’ve seen a *LOT* of ML compiler frameworks (incl notably Glow, XLA, TVM, etc) that are successful handling important subsets of the ML inference space, but very few have scaled up to solving the full generality of the problem. -Chris> On Nov 15, 2021, at 10:18 AM, Kothari, Akash via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > For those who may have been having trouble viewing the RFC in plain text format, we have our proposal in a Google doc: https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit?usp=sharing <https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit?usp=sharing>. It would be great if y’all could comment in the google doc or respond via email. > > Thanks, > Akash Kothari > >> On Nov 12, 2021, at 1:28 PM, Kothari, Akash <akashk4 at illinois.edu <mailto:akashk4 at illinois.edu>> wrote: >> >> **** Proposal for TLX: Tensor LLVM eXtensions >> ==================================================================================>> >> Authors: Akash Kothari (UIUC), Abdul Rafae Noor (UIUC), Dounia Khaldi (Intel), >> Vikram Adve (UIUC), Yuanke Luo(Intel), Sudipta Sengupta (Amazon AWS), >> Milind Girkar (Intel), Charith Mendis (UIUC) >> >> ------------------------------------------------------------------------------------ >> >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211127/e51a11b6/attachment.html>
Kothari, Akash via llvm-dev
2021-Dec-03 23:59 UTC
[llvm-dev] [RFC] Proposal for TLX: Tensor LLVM eXtensions
Hi Chris,
Thank you for your questions and comments. I have reordered your
questions/comments to respond to them in a logical progression. In particular,
you have some questions about the technical design of TLX (extensibility, memory
allocation, etc.) and some about *whether a tensor / matrix code-gen framework
belongs in LLVM in the first place*. I thought I should address the latter
first.
-Akash
On Nov 27, 2021, at 7:57 PM, Chris Lattner <clattner at
nondot.org<mailto:clattner at nondot.org>> wrote:
Thank you for the interesting proposal Akash (et al). I have a few other
questions:
As per the motivation section, you point out "Crucially, however, MLIR does
not have a low-level code generation framework that is retargetable to diverse
hardware: it relies on LLVM for this purpose.” I happen to agree with you, but
the lack of this in MLIR isn’t evidence that LLVM IR is the natural place to put
matrix lowering support. Why do you think LLVM IR is a better place to put this
than a high level IR? Whether it is MLIR, XLA, or something else, it seems that
there is a very clear separation of concerns here, and (as you point out) LLVM
is being successfully used as the backend for a wide variety of tensor compilers
already.
I think LLVM is the natural place for put support for tensor lowering for the
following reasons:
* Compilers such as TVM, Halide, XLA,Glow, etc. use LLVM for backend code
generation for different hardware architectures, so it makes sense to add tensor
lowering support in an abstraction layer shared across multiple compilers.
Today, compilers such as TVM and Halide, for instance, have separate backends to
generate target-specific intrinsics for different targets, which is a serious
weakness. These compilers can target a common set of target-agnostic intrinsics
instead to target multiple tensor architectures and benefit from community-wide
shared improvements and efforts.
* Languages such as C/C++, Rust, DPC++, Julia, etc. do not have frontends
for compilers like MLIR, XLA, TVM, etc.yet. Developing frontends for these
languages for production use requires non-trivial engineering effort. Extending
LLVM with our extensions and getting the existing languages frontends to target
our extensions would require relatively less engineering effort and time.
* TLX could be added to the LLVM dialect in MLIR. Also, lessons learned from
the experience of supporting retargetable code generation in LLVM for modern
tensor architectures could be very valuable and could help inspire ideas for new
dialects and abstractions to make MLIR retargetable, too.
Finally, I’m also a bit concerned because the IR extensions are not really the
meat of this proposal - this is effectively proposing something akin to the
entire LLVM “CodeGen” framework but for tensors. The IR abstractions and
framework need to be co-designed together, and it isn’t clear how general or
powerful the framework will turn out to be. We’ve seen a *LOT* of ML compiler
frameworks (incl notably Glow, XLA, TVM, etc) that are successful handling
important subsets of the ML inference space, but very few have scaled up to
solving the full generality of the problem.
I agree with you that IR extensions and the code generation framework must be
co-designed together. We have done exactly just that on our end in collaboration
with folks from Intel, Qualcomm, IBM and AWS. There are three main parts to this
end-to-end tensor support in LLVM: (1) tensor IR (TLX), (2) code generation that
includes lowering from N-d to 2-d and the legalization support for
target-agnostic to target-specific intrinsics, (3) extensions to Target
Transform Info (TTI) for efficient matrix (2d) code generation In the future, we
will post two more RFCs about TTI enhancements and lowering strategies. Our core
proposal is here:
https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit?usp=sharing<https://urldefense.com/v3/__https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit?usp=sharing__;!!DZ3fjg!uQXYK87Zi2_ZCnuI1ZiCnrs6lz_zYFoJIU1kidxX5f1LWPx8q6p9Qh2JtDemE1QZv338OwHchCwHIF5V4pQ$>.
People who may be interested in learning more details about the IR extensions
and the other parts of this support can take a look at the specification
document here:
https://docs.google.com/document/d/1A3xbrtouckRsPz94v2XttjoaTSqQlz1pSzVe80-Jmro/edit?usp=sharing<https://urldefense.com/v3/__https://docs.google.com/document/d/1A3xbrtouckRsPz94v2XttjoaTSqQlz1pSzVe80-Jmro/edit?usp=sharing__;!!DZ3fjg!uQXYK87Zi2_ZCnuI1ZiCnrs6lz_zYFoJIU1kidxX5f1LWPx8q6p9Qh2JtDemE1QZv338OwHchCwHkZyqnb0$>.
Florian pointed out that this is a very large proposal which is introducing a
bunch of new concepts which makes it difficult to review. My major concern with
it is that it is proposing a single tensor model for LLVM, something that is
inappropriate for a wide variety of frameworks, and doesn’t appear to be very
general. For example, it isn’t clear how to model the strided tensor model of
pytorch, doesn’t appear to support dynamic shapes, sparse tensors, and it isn’t
clear (in a quick reading) what the op-extensibiliy story is. Further, there
are a bunch of design decisions inherent to this approach (e.g. putting the
layout information on the ops, instead of in the types) that make certain
optimizations (e.g. layout transformations) more complicated.
I think there is some misunderstanding about some of the extensions we are
proposing. We do not propose that the tensor operations have layout information
on them; in fact, we propose an intrinsic called llvm.tensor.typeinfo should
have the layout information (and other tensor type information) and help
decouple tensor type information from the tensor operations. The tensor load
intrinsic also has layout information embedded. Current matrix extensions have
type information for matrices embedded in intrinsics for matrix operations such
as transpose, matrix multiply, etc. We also support strided tensor loads and
stores — these are akin to strided tensors in Pytorch.
Decoupling tensor type information and intrinsics for tensor operations, allows
us to extend the tensor type information, if needed, without having to make any
changes to other intrinsics for tensor operations. We do not propose extensions
for sparse tensors, but one can support sparse tensors by introducing a new
variant of typeinfo intrinsic to describe sparse tensors and continue using the
intrinsics for tensor operations we propose. Supporting sparse tensors would
also require adding new intrinsics for operations such as coalescing, for
example. Same applies for ragged tensors.
By representing shape information in llvm.tensor.typeinfo as a vector of
dimension sizes, dynamic shapes can be represented in this intrinsic using a
vector of SSA values of dimension sizes (and not just constant values).
This isn’t to say that this is the _wrong_ design, merely that it is only one of
many plausible and important designs. Standardizing "one thing" in
LLVM can have a chilling effect on innovation (particularly for such a rapidly
evolving field) which is one of the reasons that MLIR favors an “open
extensibility” approach.
While it’s true that this is only one of many plausible designs, that will be
true of every LLVM extension that is ever proposed. We do not think that adding
TLX is going to stifle innovation. It is absolutely true that LLVM itself has
only limited support for extensibility, whereas MLIR is inherently more
extensible, but that is not a reason not to add new functionality in LLVM within
the limitations of what is possible. Our tensor extensions we propose are
extensible for the aforementioned reasons. We expect these extensions to be
experimented with and refined further by the community. We are open to any ideas
that you and other folks in the LLVM community may have to make these extensions
more general and more extensible. Note that we have added a section on the
methodology to extend TLX in our proposal document:
https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit#heading=h.ltfq7r4wczwl<https://urldefense.com/v3/__https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit*heading=h.ltfq7r4wczwl__;Iw!!DZ3fjg!uQXYK87Zi2_ZCnuI1ZiCnrs6lz_zYFoJIU1kidxX5f1LWPx8q6p9Qh2JtDemE1QZv338OwHchCwH7gS2pPk$>.
In terms of detailed design, it isn’t clear to me that representing heap
allocated things like this as a token type will work out well. There have been
a variety of proposals over the years (incl adding F90 style arrays as a first
class entity) that haven’t worked well because of a wide variety of design
assumptions in LLVM). The token
type<https://urldefense.com/v3/__https://llvm.org/docs/LangRef.html*token-type__;Iw!!DZ3fjg!tuN5EzICOHuuxFRM6_umsefbjEjqODtXvR1oCW2PszH1lChn4IWVQpt76RTMS0R4A4o$>
in particular is not composable with control flow, functional calls and other
things, and ML models frequently have loops and other controls flow in them -
how do you plan to represent that?
We are *not* proposing to represent heap-allocated objects using token type.
Values of token type merely represent SSA tensor values. Tensor loads and stores
contain the information about the tensors they read from or write to memory. We
have implemented these intrinsics in LLVM already and have not encountered any
problems so far.
In order to handle cases where tensor information from block1 and block2 has to
be used in block3:
block1:
….
%tensor1_info = call token @llvm.tensor.typeinfo(<256 x i8> %tensor1,
<3 x i32> %shape1, <3 x i32> %layout1, <3 x i32> %padding1)
br %block3
block2:
.....
%tensor2_info = call token @llvm.tensor.typeinfo(<256 x i8> %tensor2,
<3 x i32> %shape2, <3 x i32> %layout2, <3 x i32> %padding2)
br %block3
block3:
….
%tensor3 = phi < 256 x i8> [%tensor1, %block1], [%tensor2, %block2]
%shape3 = phi <3 x i32> [%shape1, %block1], [%shape2, %block2]
%layout3 = phi <3 x i32> [%layout1, %block1], [%layout2, %block2]
%padding3 = phi <3 x i32> [%padding1, %block1], [%padding2, %block2]
%tensor3_info = call token @llvm.tensor.typeinfo(<256 x i8> %tensor3,
<3 x i32> %shape3, <3 x i32> %layout3, <3 x i32> %padding3)
…..
We do not discuss how tensor information could be passed across function call
boundaries, but we could use 3 new parameter attributes: tensorshape,
tensorlayout, tensorpad. These attributes indicate what property of a tensor
parameter they represent. We could also introduce an attribute named tensorargid
to give each set of parameters representing a tensor and its shape, layout and
padding a unique ID.
define void @callee (<256 x i8> tensorargid 0 %tensor1, <2 x i32>
tensorshape tensorargid 0 %shape1, <2 x i32> tensorlayout tensorargid 0
%layout1, <2 x i32> tensorpad tensorargid 0 %pad1, <256 x i8>
tensorargid 1 %tensor2, <2 x i32> tensorshape tensorargid 1 %shape2, <2
x i32> tensorlayout tensorargid 1 %layout2, <2 x i32> tensorpad
tensorargid 1 %pad2) {
; Define typed input tensors
%typed_tensor1 = call <256 x i8> llvm.tensor.typeinfo(<256 x i8>
%tensor1, <2 x i32> %shape1, <2 x i32> %layout1, <2 x i32>
%pad1)
%typed_tensor2 = call <256 x i8> llvm.tensor.typeinfo(<256 x i8>
%tensor2, <2 x i32> %shape2, <2 x i32> %layout2, <2 x i32>
%pad2)
….
}
In your convolution operation, it doesn’t look like you’re handling the various
edge conditions (replicating, mirroring, etc) common in ML frameworks. How do
you plan to handle that? Similarly, how do you handle quantization?
We could extend our convolution intrinsic to include one more operand with
number of feature groups along every outer dimension to support depthwise
convolutions. I am not sure what you mean by other edge conditions such as
mirroring, replicating, etc.
We do not propose any intrinsics for quantization in this proposal, but
quantization could be added as a set of additional intrinsics that use the same
tensor typeinfo intrinsic. We are looking for feedback/ideas from the LLVM
community on which specific intrinsics should be absolutely added along with
this RFC. As Aditya Atluri pointed out in the comment section of the
specification document, we may have to think more about how to allow vendors to
support custom types and how to allow vendors to specify the legal conversions
between other custom and existing LLVM types. So this is an open question that
merits more discussion. However, this concern is relatively minor and should not
impact most of the rest of the design details in the RFC.
-Chris
On Nov 15, 2021, at 10:18 AM, Kothari, Akash via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
For those who may have been having trouble viewing the RFC in plain text format,
we have our proposal in a Google doc:
https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit?usp=sharing<https://urldefense.com/v3/__https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit?usp=sharing__;!!DZ3fjg!tuN5EzICOHuuxFRM6_umsefbjEjqODtXvR1oCW2PszH1lChn4IWVQpt76RTMHUZES-M$>.
It would be great if y’all could comment in the google doc or respond via email.
Thanks,
Akash Kothari
On Nov 12, 2021, at 1:28 PM, Kothari, Akash <akashk4 at
illinois.edu<mailto:akashk4 at illinois.edu>> wrote:
**** Proposal for TLX: Tensor LLVM eXtensions
==================================================================================
Authors: Akash Kothari (UIUC), Abdul Rafae Noor (UIUC), Dounia Khaldi (Intel),
Vikram Adve (UIUC), Yuanke Luo(Intel), Sudipta Sengupta (Amazon AWS),
Milind Girkar (Intel), Charith Mendis (UIUC)
------------------------------------------------------------------------------------
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://urldefense.com/v3/__https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev__;!!DZ3fjg!tuN5EzICOHuuxFRM6_umsefbjEjqODtXvR1oCW2PszH1lChn4IWVQpt76RTMjMO964A$>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211203/a4781fd3/attachment-0001.html>