Kothari, Akash via llvm-dev
2021-Nov-15 18:18 UTC
[llvm-dev] [RFC] Proposal for TLX: Tensor LLVM eXtensions
For those who may have been having trouble viewing the RFC in plain text format, we have our proposal in a Google doc: https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit?usp=sharing. It would be great if y’all could comment in the google doc or respond via email. Thanks, Akash Kothari On Nov 12, 2021, at 1:28 PM, Kothari, Akash <akashk4 at illinois.edu<mailto:akashk4 at illinois.edu>> wrote: **** Proposal for TLX: Tensor LLVM eXtensions ================================================================================== Authors: Akash Kothari (UIUC), Abdul Rafae Noor (UIUC), Dounia Khaldi (Intel), Vikram Adve (UIUC), Yuanke Luo(Intel), Sudipta Sengupta (Amazon AWS), Milind Girkar (Intel), Charith Mendis (UIUC) ------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211115/c7c21d89/attachment.html>
Florian Hahn via llvm-dev
2021-Nov-23 17:32 UTC
[llvm-dev] [RFC] Proposal for TLX: Tensor LLVM eXtensions
Hi, Thanks for sharing the proposal! I think the matrix extension has shown that it is feasible to use a ‘flat vector’ encoding to support more complex operations. Decoupling the shape information from the ‘operational’ intrinsics seems very neat! Below some additional initial questions. * The proposal itself is very big, both in terms of text as well as in the code that will be required to implement it. Have you thought about how to bring up support in a way that allows using a (smaller) subset of intrinsics end-to-end? * What will the hardware specific lowering look like? I think you mentioned you are planning to support a set of different hardware architectures. Will this require a separate lowering pass for each of those? * What’s the motivation for some intrinsics returning a vector and others returning a token type? Could all intrinsics return vector? This would be more consistent and the type info is associated to the value itself in any case. * Will variable shapes/sizes be supported? IIRC you mentioned that the type intrinsic can take arbitrary values as arguments. But some intrinsics return vectors, so they would need a fixed size? * You mentioned Julia and Halide as potential adopters. Do you know if there’s concrete interest to switch to using the new intrinsics by the maintainers ? What would the anticipated timeframe be? I think this could be a crucial argument for having this in LLVM directly, if we have people who are going to use this ASAP. * What will Clang support for arbitrary tensors look like? If Clang won’t support arbitrary tensors, why not? * AFAICT this should completely subsume the matrix extension and if we decide to add the more general extension the matrix extension should be removed. How will the transition from the current matrix intrinsics to the new tensor intrinsics work? Can existing IR be auto-upgraded? Cheers, Florian
Chris Lattner via llvm-dev
2021-Nov-28 01:57 UTC
[llvm-dev] [RFC] Proposal for TLX: Tensor LLVM eXtensions
Thank you for the interesting proposal Akash (et al). I have a few other questions: Florian pointed out that this is a very large proposal which is introducing a bunch of new concepts which makes it difficult to review. My major concern with it is that it is proposing a single tensor model for LLVM, something that is inappropriate for a wide variety of frameworks, and doesn’t appear to be very general. For example, it isn’t clear how to model the strided tensor model of pytorch, doesn’t appear to support dynamic shapes, sparse tensors, and it isn’t clear (in a quick reading) what the op-extensibiliy story is. Further, there are a bunch of design decisions inherent to this approach (e.g. putting the layout information on the ops, instead of in the types) that make certain optimizations (e.g. layout transformations) more complicated. This isn’t to say that this is the _wrong_ design, merely that it is only one of many plausible and important designs. Standardizing "one thing" in LLVM can have a chilling effect on innovation (particularly for such a rapidly evolving field) which is one of the reasons that MLIR favors an “open extensibility” approach. In terms of detailed design, it isn’t clear to me that representing heap allocated things like this as a token type will work out well. There have been a variety of proposals over the years (incl adding F90 style arrays as a first class entity) that haven’t worked well because of a wide variety of design assumptions in LLVM). The token type <https://llvm.org/docs/LangRef.html#token-type> in particular is not composable with control flow, functional calls and other things, and ML models frequently have loops and other controls flow in them - how do you plan to represent that? In your convolution operation, it doesn’t look like you’re handling the various edge conditions (replicating, mirroring, etc) common in ML frameworks. How do you plan to handle that? Similarly, how do you handle quantization? As per the motivation section, you point out "Crucially, however, MLIR does not have a low-level code generation framework that is retargetable to diverse hardware: it relies on LLVM for this purpose.” I happen to agree with you, but the lack of this in MLIR isn’t evidence that LLVM IR is the natural place to put matrix lowering support. Why do you think LLVM IR is a better place to put this than a high level IR? Whether it is MLIR, XLA, or something else, it seems that there is a very clear separation of concerns here, and (as you point out) LLVM is being successfully used as the backend for a wide variety of tensor compilers already. Finally, I’m also a bit concerned because the IR extensions are not really the meat of this proposal - this is effectively proposing something akin to the entire LLVM “CodeGen” framework but for tensors. The IR abstractions and framework need to be co-designed together, and it isn’t clear how general or powerful the framework will turn out to be. We’ve seen a *LOT* of ML compiler frameworks (incl notably Glow, XLA, TVM, etc) that are successful handling important subsets of the ML inference space, but very few have scaled up to solving the full generality of the problem. -Chris> On Nov 15, 2021, at 10:18 AM, Kothari, Akash via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > For those who may have been having trouble viewing the RFC in plain text format, we have our proposal in a Google doc: https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit?usp=sharing <https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit?usp=sharing>. It would be great if y’all could comment in the google doc or respond via email. > > Thanks, > Akash Kothari > >> On Nov 12, 2021, at 1:28 PM, Kothari, Akash <akashk4 at illinois.edu <mailto:akashk4 at illinois.edu>> wrote: >> >> **** Proposal for TLX: Tensor LLVM eXtensions >> ==================================================================================>> >> Authors: Akash Kothari (UIUC), Abdul Rafae Noor (UIUC), Dounia Khaldi (Intel), >> Vikram Adve (UIUC), Yuanke Luo(Intel), Sudipta Sengupta (Amazon AWS), >> Milind Girkar (Intel), Charith Mendis (UIUC) >> >> ------------------------------------------------------------------------------------ >> >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211127/e51a11b6/attachment.html>
Kothari, Akash via llvm-dev
2021-Dec-03 23:59 UTC
[llvm-dev] [RFC] Proposal for TLX: Tensor LLVM eXtensions
Hi Chris, Thank you for your questions and comments. I have reordered your questions/comments to respond to them in a logical progression. In particular, you have some questions about the technical design of TLX (extensibility, memory allocation, etc.) and some about *whether a tensor / matrix code-gen framework belongs in LLVM in the first place*. I thought I should address the latter first. -Akash On Nov 27, 2021, at 7:57 PM, Chris Lattner <clattner at nondot.org<mailto:clattner at nondot.org>> wrote: Thank you for the interesting proposal Akash (et al). I have a few other questions: As per the motivation section, you point out "Crucially, however, MLIR does not have a low-level code generation framework that is retargetable to diverse hardware: it relies on LLVM for this purpose.” I happen to agree with you, but the lack of this in MLIR isn’t evidence that LLVM IR is the natural place to put matrix lowering support. Why do you think LLVM IR is a better place to put this than a high level IR? Whether it is MLIR, XLA, or something else, it seems that there is a very clear separation of concerns here, and (as you point out) LLVM is being successfully used as the backend for a wide variety of tensor compilers already. I think LLVM is the natural place for put support for tensor lowering for the following reasons: * Compilers such as TVM, Halide, XLA,Glow, etc. use LLVM for backend code generation for different hardware architectures, so it makes sense to add tensor lowering support in an abstraction layer shared across multiple compilers. Today, compilers such as TVM and Halide, for instance, have separate backends to generate target-specific intrinsics for different targets, which is a serious weakness. These compilers can target a common set of target-agnostic intrinsics instead to target multiple tensor architectures and benefit from community-wide shared improvements and efforts. * Languages such as C/C++, Rust, DPC++, Julia, etc. do not have frontends for compilers like MLIR, XLA, TVM, etc.yet. Developing frontends for these languages for production use requires non-trivial engineering effort. Extending LLVM with our extensions and getting the existing languages frontends to target our extensions would require relatively less engineering effort and time. * TLX could be added to the LLVM dialect in MLIR. Also, lessons learned from the experience of supporting retargetable code generation in LLVM for modern tensor architectures could be very valuable and could help inspire ideas for new dialects and abstractions to make MLIR retargetable, too. Finally, I’m also a bit concerned because the IR extensions are not really the meat of this proposal - this is effectively proposing something akin to the entire LLVM “CodeGen” framework but for tensors. The IR abstractions and framework need to be co-designed together, and it isn’t clear how general or powerful the framework will turn out to be. We’ve seen a *LOT* of ML compiler frameworks (incl notably Glow, XLA, TVM, etc) that are successful handling important subsets of the ML inference space, but very few have scaled up to solving the full generality of the problem. I agree with you that IR extensions and the code generation framework must be co-designed together. We have done exactly just that on our end in collaboration with folks from Intel, Qualcomm, IBM and AWS. There are three main parts to this end-to-end tensor support in LLVM: (1) tensor IR (TLX), (2) code generation that includes lowering from N-d to 2-d and the legalization support for target-agnostic to target-specific intrinsics, (3) extensions to Target Transform Info (TTI) for efficient matrix (2d) code generation In the future, we will post two more RFCs about TTI enhancements and lowering strategies. Our core proposal is here: https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit?usp=sharing<https://urldefense.com/v3/__https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit?usp=sharing__;!!DZ3fjg!uQXYK87Zi2_ZCnuI1ZiCnrs6lz_zYFoJIU1kidxX5f1LWPx8q6p9Qh2JtDemE1QZv338OwHchCwHIF5V4pQ$>. People who may be interested in learning more details about the IR extensions and the other parts of this support can take a look at the specification document here: https://docs.google.com/document/d/1A3xbrtouckRsPz94v2XttjoaTSqQlz1pSzVe80-Jmro/edit?usp=sharing<https://urldefense.com/v3/__https://docs.google.com/document/d/1A3xbrtouckRsPz94v2XttjoaTSqQlz1pSzVe80-Jmro/edit?usp=sharing__;!!DZ3fjg!uQXYK87Zi2_ZCnuI1ZiCnrs6lz_zYFoJIU1kidxX5f1LWPx8q6p9Qh2JtDemE1QZv338OwHchCwHkZyqnb0$>. Florian pointed out that this is a very large proposal which is introducing a bunch of new concepts which makes it difficult to review. My major concern with it is that it is proposing a single tensor model for LLVM, something that is inappropriate for a wide variety of frameworks, and doesn’t appear to be very general. For example, it isn’t clear how to model the strided tensor model of pytorch, doesn’t appear to support dynamic shapes, sparse tensors, and it isn’t clear (in a quick reading) what the op-extensibiliy story is. Further, there are a bunch of design decisions inherent to this approach (e.g. putting the layout information on the ops, instead of in the types) that make certain optimizations (e.g. layout transformations) more complicated. I think there is some misunderstanding about some of the extensions we are proposing. We do not propose that the tensor operations have layout information on them; in fact, we propose an intrinsic called llvm.tensor.typeinfo should have the layout information (and other tensor type information) and help decouple tensor type information from the tensor operations. The tensor load intrinsic also has layout information embedded. Current matrix extensions have type information for matrices embedded in intrinsics for matrix operations such as transpose, matrix multiply, etc. We also support strided tensor loads and stores — these are akin to strided tensors in Pytorch. Decoupling tensor type information and intrinsics for tensor operations, allows us to extend the tensor type information, if needed, without having to make any changes to other intrinsics for tensor operations. We do not propose extensions for sparse tensors, but one can support sparse tensors by introducing a new variant of typeinfo intrinsic to describe sparse tensors and continue using the intrinsics for tensor operations we propose. Supporting sparse tensors would also require adding new intrinsics for operations such as coalescing, for example. Same applies for ragged tensors. By representing shape information in llvm.tensor.typeinfo as a vector of dimension sizes, dynamic shapes can be represented in this intrinsic using a vector of SSA values of dimension sizes (and not just constant values). This isn’t to say that this is the _wrong_ design, merely that it is only one of many plausible and important designs. Standardizing "one thing" in LLVM can have a chilling effect on innovation (particularly for such a rapidly evolving field) which is one of the reasons that MLIR favors an “open extensibility” approach. While it’s true that this is only one of many plausible designs, that will be true of every LLVM extension that is ever proposed. We do not think that adding TLX is going to stifle innovation. It is absolutely true that LLVM itself has only limited support for extensibility, whereas MLIR is inherently more extensible, but that is not a reason not to add new functionality in LLVM within the limitations of what is possible. Our tensor extensions we propose are extensible for the aforementioned reasons. We expect these extensions to be experimented with and refined further by the community. We are open to any ideas that you and other folks in the LLVM community may have to make these extensions more general and more extensible. Note that we have added a section on the methodology to extend TLX in our proposal document: https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit#heading=h.ltfq7r4wczwl<https://urldefense.com/v3/__https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit*heading=h.ltfq7r4wczwl__;Iw!!DZ3fjg!uQXYK87Zi2_ZCnuI1ZiCnrs6lz_zYFoJIU1kidxX5f1LWPx8q6p9Qh2JtDemE1QZv338OwHchCwH7gS2pPk$>. In terms of detailed design, it isn’t clear to me that representing heap allocated things like this as a token type will work out well. There have been a variety of proposals over the years (incl adding F90 style arrays as a first class entity) that haven’t worked well because of a wide variety of design assumptions in LLVM). The token type<https://urldefense.com/v3/__https://llvm.org/docs/LangRef.html*token-type__;Iw!!DZ3fjg!tuN5EzICOHuuxFRM6_umsefbjEjqODtXvR1oCW2PszH1lChn4IWVQpt76RTMS0R4A4o$> in particular is not composable with control flow, functional calls and other things, and ML models frequently have loops and other controls flow in them - how do you plan to represent that? We are *not* proposing to represent heap-allocated objects using token type. Values of token type merely represent SSA tensor values. Tensor loads and stores contain the information about the tensors they read from or write to memory. We have implemented these intrinsics in LLVM already and have not encountered any problems so far. In order to handle cases where tensor information from block1 and block2 has to be used in block3: block1: …. %tensor1_info = call token @llvm.tensor.typeinfo(<256 x i8> %tensor1, <3 x i32> %shape1, <3 x i32> %layout1, <3 x i32> %padding1) br %block3 block2: ..... %tensor2_info = call token @llvm.tensor.typeinfo(<256 x i8> %tensor2, <3 x i32> %shape2, <3 x i32> %layout2, <3 x i32> %padding2) br %block3 block3: …. %tensor3 = phi < 256 x i8> [%tensor1, %block1], [%tensor2, %block2] %shape3 = phi <3 x i32> [%shape1, %block1], [%shape2, %block2] %layout3 = phi <3 x i32> [%layout1, %block1], [%layout2, %block2] %padding3 = phi <3 x i32> [%padding1, %block1], [%padding2, %block2] %tensor3_info = call token @llvm.tensor.typeinfo(<256 x i8> %tensor3, <3 x i32> %shape3, <3 x i32> %layout3, <3 x i32> %padding3) ….. We do not discuss how tensor information could be passed across function call boundaries, but we could use 3 new parameter attributes: tensorshape, tensorlayout, tensorpad. These attributes indicate what property of a tensor parameter they represent. We could also introduce an attribute named tensorargid to give each set of parameters representing a tensor and its shape, layout and padding a unique ID. define void @callee (<256 x i8> tensorargid 0 %tensor1, <2 x i32> tensorshape tensorargid 0 %shape1, <2 x i32> tensorlayout tensorargid 0 %layout1, <2 x i32> tensorpad tensorargid 0 %pad1, <256 x i8> tensorargid 1 %tensor2, <2 x i32> tensorshape tensorargid 1 %shape2, <2 x i32> tensorlayout tensorargid 1 %layout2, <2 x i32> tensorpad tensorargid 1 %pad2) { ; Define typed input tensors %typed_tensor1 = call <256 x i8> llvm.tensor.typeinfo(<256 x i8> %tensor1, <2 x i32> %shape1, <2 x i32> %layout1, <2 x i32> %pad1) %typed_tensor2 = call <256 x i8> llvm.tensor.typeinfo(<256 x i8> %tensor2, <2 x i32> %shape2, <2 x i32> %layout2, <2 x i32> %pad2) …. } In your convolution operation, it doesn’t look like you’re handling the various edge conditions (replicating, mirroring, etc) common in ML frameworks. How do you plan to handle that? Similarly, how do you handle quantization? We could extend our convolution intrinsic to include one more operand with number of feature groups along every outer dimension to support depthwise convolutions. I am not sure what you mean by other edge conditions such as mirroring, replicating, etc. We do not propose any intrinsics for quantization in this proposal, but quantization could be added as a set of additional intrinsics that use the same tensor typeinfo intrinsic. We are looking for feedback/ideas from the LLVM community on which specific intrinsics should be absolutely added along with this RFC. As Aditya Atluri pointed out in the comment section of the specification document, we may have to think more about how to allow vendors to support custom types and how to allow vendors to specify the legal conversions between other custom and existing LLVM types. So this is an open question that merits more discussion. However, this concern is relatively minor and should not impact most of the rest of the design details in the RFC. -Chris On Nov 15, 2021, at 10:18 AM, Kothari, Akash via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: For those who may have been having trouble viewing the RFC in plain text format, we have our proposal in a Google doc: https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit?usp=sharing<https://urldefense.com/v3/__https://docs.google.com/document/d/1IW6VIJ4lMYbGRTOle7S5QXP7Sb5UlucZ3gf-L-4Ccfs/edit?usp=sharing__;!!DZ3fjg!tuN5EzICOHuuxFRM6_umsefbjEjqODtXvR1oCW2PszH1lChn4IWVQpt76RTMHUZES-M$>. It would be great if y’all could comment in the google doc or respond via email. Thanks, Akash Kothari On Nov 12, 2021, at 1:28 PM, Kothari, Akash <akashk4 at illinois.edu<mailto:akashk4 at illinois.edu>> wrote: **** Proposal for TLX: Tensor LLVM eXtensions ================================================================================== Authors: Akash Kothari (UIUC), Abdul Rafae Noor (UIUC), Dounia Khaldi (Intel), Vikram Adve (UIUC), Yuanke Luo(Intel), Sudipta Sengupta (Amazon AWS), Milind Girkar (Intel), Charith Mendis (UIUC) ------------------------------------------------------------------------------------ _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://urldefense.com/v3/__https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev__;!!DZ3fjg!tuN5EzICOHuuxFRM6_umsefbjEjqODtXvR1oCW2PszH1lChn4IWVQpt76RTMjMO964A$> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211203/a4781fd3/attachment-0001.html>