David A. Greene
2010-Aug-23 15:52 UTC
[LLVMdev] Upstream PTX backend that uses target independent code generator if possible
Che-Liang Chiou <clchiou at gmail.com> writes:> Hi there, > > Thank Nick for kindly reviewing the patch. Here is the link to the > source code of the PTX backend; it would help Nick review the patch. > http://lime.csie.ntu.edu.tw/~clchiou/llvm-ptx-backend.tar.gzGreat!> I decided to take the code generator approach (referred to as codegen > approach) rather than C backend appraoch (referred to as cbe approach) > for the following reasons (in fact, I had my first prototype in cbe > approach, but later I abandoned it and rewrote in codegen approach). > This would partly answer previous questions about comparison between > two approaches.I think the codegen approad is the right on long-term but I don't necessarily agree with all of your reasons. :)> * LLVM should not rely on nVidia's design of its CUDA toolchain. To > my knowledge, nVidia does not make any commitment on how much > optimization would be implemented in its graphics driver compiler. A > backend with few optimization supports would screw up if nVidia > decides move most of optimizer to its CUDA compiler from its graphics > driver compiler.This is true.> * nVidia's CUDA compiler has a non-trivial optimizer; this should > suggest that late optimization alone is not sufficient. If LLVM's PTX > backend is trying to provide a comparable alternative to nVidia's CUDA > compiler, the backend should have a good code optimizer. In my > experiment, the prototype PTX backend generates better optimized code > than nVidia's CUDA compiler in some cases.LLVM will never completely replace the cuda compiler because PTX is not the final ISA. We'll always need some piece of the cuda compiler to translate to the metal ISA.> * PTX is a virtual instruction set that is not designed for an > optimizer; for one, it is even not in SSA form. So graphics driver > compiler's optimizer might not do its job very well, and I would > suggest we should not rely on its optimization.Not being in SSA form is no problem. Converting to SSA is a well-known transformation. LLVM IR doesn't start out in SSA either.> * The codegen approach is actually simpler than the cbe approach. PTX > is mostly RISC-based; that said, the codegen approach leverages from > most of *.td and from implementations of existing matured RISC > backends such as ARM, PowerPC, and Sparc. Besides, I guess most > developers would be more familiar with *.td than C backend. In fact, > it only took me two weeks to write a working prototype from scratch -- > and I had had no any prior experience on LLVM's codegen.I believe that. PTX is a really simple instruction set and quite orthogonal.> * So far my backend is less complete than other backends based on cbe > approach, but considering the simplicity of codegen approach, a > backend based on codegen approach should catch up with them in short > time.The one thing we'll have to add is mask support.> * Masked operation, as well as branch folding and alike, is much > easier to implement in codegen approach. I am not sure how much > performance improvement could be achieved from these optimizations, > but it is worth trying.I'm not sure why these would be easier with one model over another. It's a lot of hand-lowering and manual optimization either way. Can you explain?> All in all, I would propose a PTX backend in codegen approach after I > have implemented both.The fact that PTX is a moving target seals the deal for me. It's really easy to generate variants of PTX using TableGen's predicate approach. -Dave
Che-Liang Chiou
2010-Aug-26 12:37 UTC
[LLVMdev] Upstream PTX backend that uses target independent code generator if possible
Thanks David for the comments. Sorry for the late reply. On Mon, Aug 23, 2010 at 11:52 PM, David A. Greene <greened at obbligato.org> wrote:> Che-Liang Chiou <clchiou at gmail.com> writes: > >> Hi there, >> >> Thank Nick for kindly reviewing the patch. Here is the link to the >> source code of the PTX backend; it would help Nick review the patch. >> http://lime.csie.ntu.edu.tw/~clchiou/llvm-ptx-backend.tar.gz > > Great! > >> I decided to take the code generator approach (referred to as codegen >> approach) rather than C backend appraoch (referred to as cbe approach) >> for the following reasons (in fact, I had my first prototype in cbe >> approach, but later I abandoned it and rewrote in codegen approach). >> This would partly answer previous questions about comparison between >> two approaches. > > I think the codegen approad is the right on long-term but I don't > necessarily agree with all of your reasons. :) > >> * LLVM should not rely on nVidia's design of its CUDA toolchain. To >> my knowledge, nVidia does not make any commitment on how much >> optimization would be implemented in its graphics driver compiler. A >> backend with few optimization supports would screw up if nVidia >> decides move most of optimizer to its CUDA compiler from its graphics >> driver compiler. > > This is true. > >> * nVidia's CUDA compiler has a non-trivial optimizer; this should >> suggest that late optimization alone is not sufficient. If LLVM's PTX >> backend is trying to provide a comparable alternative to nVidia's CUDA >> compiler, the backend should have a good code optimizer. In my >> experiment, the prototype PTX backend generates better optimized code >> than nVidia's CUDA compiler in some cases. > > LLVM will never completely replace the cuda compiler because PTX is not > the final ISA. We'll always need some piece of the cuda compiler to > translate to the metal ISA. > >> * PTX is a virtual instruction set that is not designed for an >> optimizer; for one, it is even not in SSA form. So graphics driver >> compiler's optimizer might not do its job very well, and I would >> suggest we should not rely on its optimization. > > Not being in SSA form is no problem. Converting to SSA is a well-known > transformation. LLVM IR doesn't start out in SSA either. > >> * The codegen approach is actually simpler than the cbe approach. PTX >> is mostly RISC-based; that said, the codegen approach leverages from >> most of *.td and from implementations of existing matured RISC >> backends such as ARM, PowerPC, and Sparc. Besides, I guess most >> developers would be more familiar with *.td than C backend. In fact, >> it only took me two weeks to write a working prototype from scratch -- >> and I had had no any prior experience on LLVM's codegen. > > I believe that. PTX is a really simple instruction set and quite > orthogonal. > >> * So far my backend is less complete than other backends based on cbe >> approach, but considering the simplicity of codegen approach, a >> backend based on codegen approach should catch up with them in short >> time. > > The one thing we'll have to add is mask support. >I'm a little bit confused here. Does "masked operation" equal to "predicated operation"?>> * Masked operation, as well as branch folding and alike, is much >> easier to implement in codegen approach. I am not sure how much >> performance improvement could be achieved from these optimizations, >> but it is worth trying. > > I'm not sure why these would be easier with one model over another. > It's a lot of hand-lowering and manual optimization either way. Can you > explain? >The codegen is smart enough to translate a simple if-else block like if (pred) return A; else return B; into one instruction selp A, B, pred Also codegen has branch-folding support so it would be easier (this is my guess, I've not yet started). I didn't try many examples, but I was convinced that it should be easier.>> All in all, I would propose a PTX backend in codegen approach after I >> have implemented both. > > The fact that PTX is a moving target seals the deal for me. It's really > easy to generate variants of PTX using TableGen's predicate approach. > > -Dave >By the way, what should I do to upstream this backend? I submitted a small patch to llvm-commits mailing list. In average how long I have to wait for code review? Thanks. Regards, Che-Liang
David A. Greene
2010-Aug-26 22:01 UTC
[LLVMdev] Upstream PTX backend that uses target independent code generator if possible
Che-Liang Chiou <clchiou at gmail.com> writes:>> The one thing we'll have to add is mask support. >> > I'm a little bit confused here. Does "masked operation" equal to > "predicated operation"?Yep. "Mask" is the traditional vector term since the predicates for each element can be different.>>> * Masked operation, as well as branch folding and alike, is much >>> easier to implement in codegen approach. I am not sure how much >>> performance improvement could be achieved from these optimizations, >>> but it is worth trying. >> >> I'm not sure why these would be easier with one model over another. >> It's a lot of hand-lowering and manual optimization either way. Can you >> explain? >> > The codegen is smart enough to translate a simple if-else block like > if (pred) return A; else return B; > into one instruction > selp A, B, predThat's a really simple case. The general case is much more involved. It has to handle multiple levels of branching. It can be done pretty mechanically, but it all has to be hand-coded in LLVM. There are really two things going on here. The first is the ability to take branchy code and if-convert it. That's usually pretty straightforward for the common cases. The second is handling code that's already in predicated form. Since right now there's no publicly available vectorizer with mask support for LLVM, we aren't going to see that for a while. This kind of thing really calls for an extension to the LLVM IR to handle general predication. The ARM target codegen may or may not be a good place to look. I don't know how involved its predication is.> Also codegen has branch-folding support so it would be easier (this is > my guess, I've not yet started).Ok, that's probably true.> By the way, what should I do to upstream this backend? I submitted a > small patch to llvm-commits mailing list. In average how long I have > to wait for code review? Thanks.I must have missed it. Can you send it again? -Dave
Reasonably Related Threads
- [LLVMdev] Upstream PTX backend that uses target independent code generator if possible
- [LLVMdev] Upstream PTX backend that uses target independent code generator if possible
- [LLVMdev] Upstream PTX backend that uses target independent code generator if possible
- [LLVMdev] Upstream PTX backend that uses target independent code generator if possible
- [LLVMdev] Upstream PTX backend that uses target independent code generator if possible