thr3ads.net - llvm dev - [LLVMdev] Upstream PTX backend that uses target independent code generator if possible [Aug 2010]

If this information is useful, please help other people find it:
Share via:

David A. Greene

2010-Aug-11 21:53 UTC

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

Che-Liang Chiou <clchiou at gmail.com> writes:
> My implementation of predicated instructions is similar to ARM
> backend. I traced ARM and PowerPC backend for reference.
Cool.
> If, David, you were saying a implementation of predication in LLVM IR,
> I didn't do that.  It was partly because I was not (and is still not)
> very familiar with LLVM's design; so I didn't know how to do that.
No, I wouldn't have expected you to do that, but I think long-term we
will want to consider it.

                               -Dave

Che-Liang Chiou

2010-Aug-19 11:54 UTC

head link

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

Hi there,

Thank Nick for kindly reviewing the patch.  Here is the link to the
source code of the PTX backend; it would help Nick review the patch.
http://lime.csie.ntu.edu.tw/~clchiou/llvm-ptx-backend.tar.gz

The source code from above link is a working prototype.  So it will
not be upstreamed as is; I will refactor and add unimplemented
features while upstreaming it.  That said, the source code from above
link
* is not guarantee to be compilable on other machines,
* is not stable or bug-free, and
* should not be considered as the final version for upstream.

I decided to take the code generator approach (referred to as codegen
approach) rather than C backend appraoch (referred to as cbe approach)
for the following reasons (in fact, I had my first prototype in cbe
approach, but later I abandoned it and rewrote in codegen approach).
This would partly answer previous questions about comparison between
two approaches.

* LLVM should not rely on nVidia's design of its CUDA toolchain.  To
my knowledge, nVidia does not make any commitment on how much
optimization would be implemented in its graphics driver compiler.  A
backend with few optimization supports would screw up if nVidia
decides move most of optimizer to its CUDA compiler from its graphics
driver compiler.

* nVidia's CUDA compiler has a non-trivial optimizer; this should
suggest that late optimization alone is not sufficient.  If LLVM's PTX
backend is trying to provide a comparable alternative to nVidia's CUDA
compiler, the backend should have a good code optimizer.  In my
experiment, the prototype PTX backend generates better optimized code
than nVidia's CUDA compiler in some cases.

* PTX is a virtual instruction set that is not designed for an
optimizer; for one, it is even not in SSA form.  So graphics driver
compiler's optimizer might not do its job very well, and I would
suggest we should not rely on its optimization.

* The codegen approach is actually simpler than the cbe approach.  PTX
is mostly RISC-based; that said, the codegen approach leverages from
most of *.td and from implementations of existing matured RISC
backends such as ARM, PowerPC, and Sparc.  Besides, I guess most
developers would be more familiar with *.td than C backend.  In fact,
it only took me two weeks to write a working prototype from scratch --
and I had had no any prior experience on LLVM's codegen.

* So far my backend is less complete than other backends based on cbe
approach, but considering the simplicity of codegen approach, a
backend based on codegen approach should catch up with them in short
time.

* Masked operation, as well as branch folding and alike, is much
easier to implement in codegen approach.  I am not sure how much
performance improvement could be achieved from these optimizations,
but it is worth trying.

All in all, I would propose a PTX backend in codegen approach after I
have implemented both.

Regards,
Che-Liang

David A. Greene

2010-Aug-23 15:52 UTC

head link

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

Che-Liang Chiou <clchiou at gmail.com> writes:
> Hi there,
>
> Thank Nick for kindly reviewing the patch.  Here is the link to the
> source code of the PTX backend; it would help Nick review the patch.
> http://lime.csie.ntu.edu.tw/~clchiou/llvm-ptx-backend.tar.gz
Great!
> I decided to take the code generator approach (referred to as codegen
> approach) rather than C backend appraoch (referred to as cbe approach)
> for the following reasons (in fact, I had my first prototype in cbe
> approach, but later I abandoned it and rewrote in codegen approach).
> This would partly answer previous questions about comparison between
> two approaches.
I think the codegen approad is the right on long-term but I don't
necessarily agree with all of your reasons.  :)
> * LLVM should not rely on nVidia's design of its CUDA toolchain.  To
> my knowledge, nVidia does not make any commitment on how much
> optimization would be implemented in its graphics driver compiler.  A
> backend with few optimization supports would screw up if nVidia
> decides move most of optimizer to its CUDA compiler from its graphics
> driver compiler.
This is true.
> * nVidia's CUDA compiler has a non-trivial optimizer; this should
> suggest that late optimization alone is not sufficient.  If LLVM's PTX
> backend is trying to provide a comparable alternative to nVidia's CUDA
> compiler, the backend should have a good code optimizer.  In my
> experiment, the prototype PTX backend generates better optimized code
> than nVidia's CUDA compiler in some cases.
LLVM will never completely replace the cuda compiler because PTX is not
the final ISA.  We'll always need some piece of the cuda compiler to
translate to the metal ISA.
> * PTX is a virtual instruction set that is not designed for an
> optimizer; for one, it is even not in SSA form.  So graphics driver
> compiler's optimizer might not do its job very well, and I would
> suggest we should not rely on its optimization.
Not being in SSA form is no problem.  Converting to SSA is a well-known
transformation.  LLVM IR doesn't start out in SSA either.
> * The codegen approach is actually simpler than the cbe approach.  PTX
> is mostly RISC-based; that said, the codegen approach leverages from
> most of *.td and from implementations of existing matured RISC
> backends such as ARM, PowerPC, and Sparc.  Besides, I guess most
> developers would be more familiar with *.td than C backend.  In fact,
> it only took me two weeks to write a working prototype from scratch --
> and I had had no any prior experience on LLVM's codegen.
I believe that.  PTX is a really simple instruction set and quite
orthogonal.
> * So far my backend is less complete than other backends based on cbe
> approach, but considering the simplicity of codegen approach, a
> backend based on codegen approach should catch up with them in short
> time.
The one thing we'll have to add is mask support.
> * Masked operation, as well as branch folding and alike, is much
> easier to implement in codegen approach.  I am not sure how much
> performance improvement could be achieved from these optimizations,
> but it is worth trying.
I'm not sure why these would be easier with one model over another.
It's a lot of hand-lowering and manual optimization either way.  Can you
explain?
> All in all, I would propose a PTX backend in codegen approach after I
> have implemented both.
The fact that PTX is a moving target seals the deal for me.  It's really
easy to generate variants of PTX using TableGen's predicate approach.

                            -Dave

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Aug 2010 - [LLVMdev] Upstream PTX backend that uses target independent code generator if possible

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

Seemingly Similar Threads