Hi All, I am going to submit a GSoC proposal for LLVM this year, and I would like to first post it here to get constructive feedback before I submit it before the April 8 deadline. This is the first time I have submitted a GSoC proposal, so please be brutal with the feedback. :) Additionally, Che-Liang Chiou (the code owner of the PTX back-end) has agreed to be my mentor if this is accepted. What does he need to do to become an official mentor? =======Overview ======= The NVidia Parallel Thread eXecution (PTX) language is an assembly-like language that is used as an intermediate format for all GPU programs that execute on NVidia hardware. It is similar to many other three-address assembly formats, and hence is a great target for the LLVM code generation framework. Having a supported PTX code generator back-end in LLVM would allow users of LLVM to generate GPU code directly from LLVM IR, with appropriate use of PTX-specific intrinsics to support features such as thread/block id queries, texture sampling, and prefetching. =====Status ===== For the last month, I have been working with Che-Liang Chiou (the code owner of the PTX back-end) to implement basic support for PTX code generation within the LLVM source tree. Currently, the back-end is capable of handling a small sub-set of LLVM IR, including integer and floating-point arithmetic, loads/stores, and basic branching. While this is enough to support basic computational kernels, there is still much to be done to support arbitrary LLVM IR. =============Qualifications ============= As I have already contributed significant portions of code to the current PTX back-end, the learning curve for this project would be minimal. I am already comfortable working with the core LLVM libraries, as well as the LLVM code generation and selection DAG libraries. I have also been working with C/C++ for over 15 years. I am currently a PhD student at the Ohio State University, pursuing a degree in Computer Science and Engineering. My research focus is high-performance code generation for multi-core and many-core architectures, specifically current GPU architectures. I am primarily interested in the compiler technology to drive this. My interest in the PTX back-end started with a research interest for generating high-performance GPU code for stencil computations. While the PTX back-end is not my research focus, it is an important part of the infrastructure needed for a planned research compiler. I also have a personal interest in GPU code generation for graphics applications. =======Proposal ======= For the 2011 Google Summer of Code program, I propose to implement the pieces of the PTX back-end that are currently missing or error-prone. This includes, but is not limited to, * Implementing efficient instruction selection for floating-point IR instructions - e.g., Selecting the most efficient instructions for different hardware * Implementing the full range of integer and floating-point comparison instructions * Implementing function calls * Implementing jump tables * Implementing the full range of LLVM intrinsics needed for "special" PTX instructions - e.g. texture mapping, prefetching * Implementing support for v4f32 and similar vector types In addition to these basic milestones, the driving goal would be to allow the PTX back-end to generate correct and efficient code for LLVM IR versions of the samples contained in the NVidia GPU Computing SDK. In other words, I want to be able to take the CUDA code from the SDK samples, generate LLVM IR with Clang (with appropriate source-level syntactic modifications), and be able to generate efficient PTX code that is close in performance to that generated by the NVidia nvcc compiler. My limited testing so far has shown that code generated from the PTX back-end in its current form is able to come within 10% of the performance of identical code compiled with nvcc, and in some cases even marginally beats nvcc. To accomplish this goal, I propose a two-phase implementation. In the first phase, I will implement as much of the PTX ISA as is representable in LLVM IR, and produce LLVM IR intrinsics for the rest. The goal of the first phase will be to generate correct PTX code for arbitrary LLVM IR input. However, some exceptions will be necessary; for example, it is currently not feasible to implement exception handling within PTX. After the code generator is able to generate correct code for a large set of complex LLVM IR input (including real-world computational kernels originally written in CUDA), I will begin phase two. In phase two, I would like to begin optimizing the PTX back-end to generate efficient code. This will involve work on the instruction scheduler to take advantage of the instruction pipeline on the GPU hardware, as well as potentially involving the register allocator. =================Advantage for LLVM ================= The advantage of this project for the LLVM community would be the creation and maintenance of a functionally-complete code generator for NVidia GPU hardware that can be eventually tied to the OpenCL and CUDA front-ends for Clang. It would be the first LLVM code generator for GPU architectures that would be a part of upstream LLVM. This would expand the range of influence of LLVM to include GPU architectures, out-of-the-box. Additionally, the work in this proposal should be complete within the LLVM 3.0 timeline. ==========Future Work ========== In the future, the PTX back-end can be tied to the up-and-coming CUDA and OpenCL front-ends within Clang. This would provide a completely open-source implementation of both OpenCL and CUDA for NVidia hardware, with the only dependency being the NVidia CUDA SDK. While this integration work is outside of the scope of this proposal, it is a good future use-case for the PTX back-end. However, I do not know the timelines regarding the implementation of these two front-ends, so I am unable to make any guarantees regarding this GSoC proposal. =====Mentor ===== The code owner of the PTX back-end, Che-Liang Chiou, has agreed to mentor me for this project if it is accepted this year. However, I would love feedback from others working on the back-end code generators within LLVM. -- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110328/3e543ab6/attachment.html>
On Mar 28, 2011, at 6:12 AM, Justin Holewinski wrote:> Hi All, > > I am going to submit a GSoC proposal for LLVM this year, and I would like to first post it here to get constructive feedback before I submit it before the April 8 deadline. This is the first time I have submitted a GSoC proposal, so please be brutal with the feedback. :) > > Additionally, Che-Liang Chiou (the code owner of the PTX back-end) has agreed to be my mentor if this is accepted. What does he need to do to become an official mentor? > >He needs to sign up to be a mentor: http://www.google-melange.com/gsoc/org/google/gsoc2011/llvm Scroll down to "Or register as a mentor" -Tanya> > > =======> Overview > =======> > The NVidia Parallel Thread eXecution (PTX) language is an assembly-like language that is used as an intermediate format for all GPU programs that execute on NVidia hardware. It is similar to many other three-address assembly formats, and hence is a great target for the LLVM code generation framework. Having a supported PTX code generator back-end in LLVM would allow users of LLVM to generate GPU code directly from LLVM IR, with appropriate use of PTX-specific intrinsics to support features such as thread/block id queries, texture sampling, and prefetching. > > > =====> Status > =====> > For the last month, I have been working with Che-Liang Chiou (the code owner of the PTX back-end) to implement basic support for PTX code generation within the LLVM source tree. Currently, the back-end is capable of handling a small sub-set of LLVM IR, including integer and floating-point arithmetic, loads/stores, and basic branching. While this is enough to support basic computational kernels, there is still much to be done to support arbitrary LLVM IR. > > > =============> Qualifications > =============> > As I have already contributed significant portions of code to the current PTX back-end, the learning curve for this project would be minimal. I am already comfortable working with the core LLVM libraries, as well as the LLVM code generation and selection DAG libraries. I have also been working with C/C++ for over 15 years. > > I am currently a PhD student at the Ohio State University, pursuing a degree in Computer Science and Engineering. My research focus is high-performance code generation for multi-core and many-core architectures, specifically current GPU architectures. I am primarily interested in the compiler technology to drive this. My interest in the PTX back-end started with a research interest for generating high-performance GPU code for stencil computations. While the PTX back-end is not my research focus, it is an important part of the infrastructure needed for a planned research compiler. I also have a personal interest in GPU code generation for graphics applications. > > > =======> Proposal > =======> > For the 2011 Google Summer of Code program, I propose to implement the pieces of the PTX back-end that are currently missing or error-prone. This includes, but is not limited to, > > * Implementing efficient instruction selection for floating-point IR instructions > - e.g., Selecting the most efficient instructions for different hardware > * Implementing the full range of integer and floating-point comparison instructions > * Implementing function calls > * Implementing jump tables > * Implementing the full range of LLVM intrinsics needed for "special" PTX instructions > - e.g. texture mapping, prefetching > * Implementing support for v4f32 and similar vector types > > In addition to these basic milestones, the driving goal would be to allow the PTX back-end to generate correct and efficient code for LLVM IR versions of the samples contained in the NVidia GPU Computing SDK. In other words, I want to be able to take the CUDA code from the SDK samples, generate LLVM IR with Clang (with appropriate source-level syntactic modifications), and be able to generate efficient PTX code that is close in performance to that generated by the NVidia nvcc compiler. My limited testing so far has shown that code generated from the PTX back-end in its current form is able to come within 10% of the performance of identical code compiled with nvcc, and in some cases even marginally beats nvcc. > > To accomplish this goal, I propose a two-phase implementation. In the first phase, I will implement as much of the PTX ISA as is representable in LLVM IR, and produce LLVM IR intrinsics for the rest. The goal of the first phase will be to generate correct PTX code for arbitrary LLVM IR input. However, some exceptions will be necessary; for example, it is currently not feasible to implement exception handling within PTX. After the code generator is able to generate correct code for a large set of complex LLVM IR input (including real-world computational kernels originally written in CUDA), I will begin phase two. In phase two, I would like to begin optimizing the PTX back-end to generate efficient code. This will involve work on the instruction scheduler to take advantage of the instruction pipeline on the GPU hardware, as well as potentially involving the register allocator. > > > =================> Advantage for LLVM > =================> > The advantage of this project for the LLVM community would be the creation and maintenance of a functionally-complete code generator for NVidia GPU hardware that can be eventually tied to the OpenCL and CUDA front-ends for Clang. It would be the first LLVM code generator for GPU architectures that would be a part of upstream LLVM. This would expand the range of influence of LLVM to include GPU architectures, out-of-the-box. Additionally, the work in this proposal should be complete within the LLVM 3.0 timeline. > > > ==========> Future Work > ==========> > In the future, the PTX back-end can be tied to the up-and-coming CUDA and OpenCL front-ends within Clang. This would provide a completely open-source implementation of both OpenCL and CUDA for NVidia hardware, with the only dependency being the NVidia CUDA SDK. While this integration work is outside of the scope of this proposal, it is a good future use-case for the PTX back-end. However, I do not know the timelines regarding the implementation of these two front-ends, so I am unable to make any guarantees regarding this GSoC proposal. > > > =====> Mentor > =====> > The code owner of the PTX back-end, Che-Liang Chiou, has agreed to mentor me for this project if it is accepted this year. However, I would love feedback from others working on the back-end code generators within LLVM. > > > -- > > Thanks, > > Justin Holewinski > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110328/2bf7c7db/attachment.html>
Hi, Justin> I am going to submit a GSoC proposal for LLVM this year, and I would like to > first post it here to get constructive feedback before I submit it before > the April 8 deadline. This is the first time I have submitted a GSoC > proposal, so please be brutal with the feedback. :)Can I join this project, if possible? I am also interested in PTX backend. Regards, chenwj -- Wei-Ren Chen (陳韋任) Computer System Lab, Institute of Information Science, Academia Sinica, Taiwan (R.O.C.) Tel:886-2-2788-3799 #1667
On 03/28/2011 09:00 PM, 陳韋任 wrote:> Hi, Justin > >> I am going to submit a GSoC proposal for LLVM this year, and I would like to >> first post it here to get constructive feedback before I submit it before >> the April 8 deadline. This is the first time I have submitted a GSoC >> proposal, so please be brutal with the feedback. :) > > Can I join this project, if possible? I am also interested in PTX > backend.Hi chenwj, you can obviously contribute to the LLVM PTX backend and I am sure Justin will help to review your patches. However, if you are asking in respect of the Google Summer of Code, then it is not possible for two students to apply together for one project. You can apply for the same project as Justin and LLVM is free to take both of you. However, most probably the better application will win. Another option is that you discuss with Justin and Che-Liang Chiou to see if it is possible for both of you to focus on different parts of the PTX code generation, such that both of you work in the same area, but on different sub-projects. Another project closely related PTX may e.g. include work on the clang CUDA or OpenCL frontend which Justin pointed out as still being unfinished. See in his "Future Work" section. Cheers Tobi
On 03/28/2011 09:12 AM, Justin Holewinski wrote:> Hi All, > > I am going to submit a GSoC proposal for LLVM this year, and I would > like to first post it here to get constructive feedback before I submit > it before the April 8 deadline. This is the first time I have submitted > a GSoC proposal, so please be brutal with the feedback. :)Hi Justin, I think this is a great idea. I am highly interested in PTX code generation. [...Proposal...] The proposal is nice and shows that you already have a good idea of your project. Here some ideas how you can further improve it: 1. Milestones / Time line You already have a two-phase development plan. I believe it would be nice, if you can further split it into a set of smaller milestones. Each could include a short description of what you plan to deliver, how long its implementation will take and when you plan to implement it during the summer of code. Those milestones could be sorted into the time frame you have for the GSoC. In addition, you could define "Success Criteria" for the midterm/final evaluation. This will make it easy to see during GSoC, if you are on track with your project and will allow you and your mentor to readjust your milestones if necessary. When developing mile stones and success criteria, better be conservative and only add items you are confident you can implement during GSoC. You can add additionally a set of "if time permits" milestones, where you put the stuff that is not 100% needed, but that would be good to have. 2. It would be nice to include a description of the examples you have already tested 3. Define the exceptions It would be good to know what parts you definitely do not plan to implement and best why not (Postponed, impossible, not relevant, ...). Like this people can understand to what extend your backend will be usable after the GSoC. 4. Phase two is currently a little short What kind of optimizations do you plan? Have you already an idea or will you investigate this when you get to this point? How much time do you plan to spend on Part II? If it is more than two weeks, it would be good to elaborate a little on what you plan to do there exactly. So that's all for the moment. As the application was already nice, I just did some conceptually nitpicking. ;-) Cheers Tobi
On Mon, Mar 28, 2011 at 10:19 PM, Tobias Grosser <grosser at fim.uni-passau.de>wrote:> On 03/28/2011 09:12 AM, Justin Holewinski wrote: > >> Hi All, >> >> I am going to submit a GSoC proposal for LLVM this year, and I would >> like to first post it here to get constructive feedback before I submit >> it before the April 8 deadline. This is the first time I have submitted >> a GSoC proposal, so please be brutal with the feedback. :) >> > > Hi Justin, > > I think this is a great idea. I am highly interested in PTX code > generation. > > [...Proposal...] > > The proposal is nice and shows that you already have a good idea of your > project. > > Here some ideas how you can further improve it: > > 1. Milestones / Time line > > You already have a two-phase development plan. I believe it would be nice, > if you can further split it into a set of smaller milestones. Each could > include a short description of what you plan to deliver, how long its > implementation will take and when you plan to implement it during the summer > of code. Those milestones could be sorted into the time frame you have for > the GSoC. In addition, you could define "Success Criteria" for the > midterm/final evaluation. > > This will make it easy to see during GSoC, if you are on track with your > project and will allow you and your mentor to readjust your milestones if > necessary. > > When developing mile stones and success criteria, better be conservative > and only add items you are confident you can implement during GSoC. You can > add additionally a set of "if time permits" milestones, where you put the > stuff that is not 100% needed, but that would be good to have. > > 2. It would be nice to include a description of the examples you have > already tested > > 3. Define the exceptions > > It would be good to know what parts you definitely do not plan to implement > and best why not (Postponed, impossible, not relevant, ...). > Like this people can understand to what extend your backend will be usable > after the GSoC. > > 4. Phase two is currently a little short > > What kind of optimizations do you plan? Have you already an idea or will > you investigate this when you get to this point? How much time do you plan > to spend on Part II? If it is more than two weeks, it would be good to > elaborate a little on what you plan to do there exactly. > > > So that's all for the moment. As the application was already nice, I just > did some conceptually nitpicking. ;-) >Thanks for the comments! I have updated the proposal; it can be found at: https://sites.google.com/site/justinholewinski/projects/gsoc/llvm-ptx-back-end-2011 Please let me know if you have any comments before I submit it to Melange in the next few days!> > Cheers > Tobi > > > > > > > > >-- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110331/c03ba10a/attachment.html>