thr3ads.net - llvm dev - [LLVMdev] Upstream PTX backend that uses target independent code generator if possible [Aug 2010]

If this information is useful, please help other people find it:
Share via:

Che-Liang Chiou

2010-Aug-10 00:01 UTC

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

Hi David,

Thanks for asking.

On Mon, Aug 9, 2010 at 3:25 PM, David A. Greene <greened at obbligato.org>
wrote:> Che-Liang Chiou <clchiou at gmail.com> writes:
>
>> Hi there,
>>
>> I have a working prototype of PTX backend, and I would like to
>> upstream it if possible.  This backend is implemented by LLVM's
target
>> independent code generator framework; I think this will make it easier
>> to maintain.
>
> How does this relate, at all, to the backend here:
>
> http://sourceforge.net/projects/llvmptxbackend/
>
> If they are unrelated, can you do a comparison of the two?  Perhaps
> there are holes in each that can be filled by the other.  It would be
> a shame to have two completely different PTX backends.
>I surfed their code, and it seems that they didn't use code generator.
That means there design should be similar to CBackend or CPPBackend.
So I guess it can't generate some machine instructions like MAD,
and there are some PTX instruction set features that are hard to exploit
if not using code generator.

But I didn't study their code thoroughly, so I might be wrong about this.
>> I have tested this backend to translate a work-efficient parallel scan
>> kernel ( http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html
>> ) into PTX code.  The generated PTX code was then executed on real
>> hardware, and the result is correct.
>
> How much of the LLVM IR does this support?  What's missing?Have to add some intrinsics, calling conventions, and address spaces.
I would say these are relatively small changes.
>> So far I have to hack clang to generate bitcode for this backend, but
>> I will try to patch clang to parse CUDA (or OpenCL) while I am
>> upstreaming this backend.
>
> I think it's a lot of work to do CUDA support for not much benefit.
> The OpenMP committee is working on accelerator directives and that's
> the better long-term approach, IMHO.  Clang/LLVM would be a great
> vehicle to generate/test ideas for such directives.Thanks for suggestion.
> http://openmp.org/wp/
> http://www.pgroup.com/lit/articles/insider/v2n2a5.htm
>
>                         -Dave
Regards,
Che-Liang

Helge Rhodin

2010-Aug-10 11:15 UTC

head link

[LLVMdev] PTX backend, BSD license

Hi!>>> Hi there,
>>>
>>> I have a working prototype of PTX backend, and I would like to
>>> upstream it if possible.  This backend is implemented by LLVM's
target
>>> independent code generator framework; I think this will make it
easier
>>> to maintain.
>>>       
>> How does this relate, at all, to the backend here:
>>
>> http://sourceforge.net/projects/llvmptxbackend/
>>
>> If they are unrelated, can you do a comparison of the two?  Perhaps
>> there are holes in each that can be filled by the other.  It would be
>> a shame to have two completely different PTX backends.
>>
>>     
> I surfed their code, and it seems that they didn't use code generator.
> That means there design should be similar to CBackend or CPPBackend.
> So I guess it can't generate some machine instructions like MAD,
> and there are some PTX instruction set features that are hard to exploit
> if not using code generator.
>
> But I didn't study their code thoroughly, so I might be wrong about
this.
>   Yes, we don't use the target-independent code generator and the backend 
is based on the CBackend.
We decided to not use the code generator because PTX code is also an 
intermediate language. The
graphics driver contains a compiler which compiles PTX code to machine 
code targeting a particular
GPU architecture. It performs register allocation, instruction 
scheduling, dead-code elimination, and
other late optimizations. Thus we don't need most of the 
target-independent code generator
features in the PTXBackend.

We already support most of the PTX instruction set. Texture lookup, 
structs&arrays, function calls, vector types,
different address spaces and many intrinsics. Not all intrinsics are 
implemented yet because they are not required
by our application, but it is easy to add them. Only the fused 
operations(e.g. MAD) are not supported and it will
probably be not as easy as in the target independent code generator. But 
it might be that they are also inserted by
the graphics driver compiler. I'm not sure about that, but I remember 
seeing it once(Indeed it would make sense to
do that during instruction selection). Too bad that NVIDIA does not 
release any detailed information on their
compiler.> How does this relate, at all, to the backend here:
>
> http://sourceforge.net/projects/llvmptxbackend/
>
>
> If they are unrelated, can you do a comparison of the two?  Perhaps
> there are holes in each that can be filled by the other.  It would be
> a shame to have two completely different PTX backends.
>   I don't know much about the target-independent code generator but I 
think we use distinct approaches which cannot
be merged in a reasonable way. Probably both approaches have their own 
pros and cons.> Is there work to upstream this?  I've got a relatively unused NVIDIA
> card at home.   :) 
>
>                                  -Dave
>   The PTXBackend probably needs more test cases. I'm currently covering a 
lot of  LLVM and PTX features but the test suite is still not exhaustive.
I took the coding standards into account and the license is now 
compatible to LLVM. I don't know what else needs to be done?

Helge

David A. Greene

2010-Aug-10 19:02 UTC

head link

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

Che-Liang Chiou <clchiou at gmail.com> writes:
> I surfed their code, and it seems that they didn't use code generator.
> That means there design should be similar to CBackend or CPPBackend.
> So I guess it can't generate some machine instructions like MAD,
> and there are some PTX instruction set features that are hard to exploit
> if not using code generator.
>
> But I didn't study their code thoroughly, so I might be wrong about
this.
I haven't had a chance to look at it yet either.
>>> I have tested this backend to translate a work-efficient parallel
scan
>>> kernel (
http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html
>>> ) into PTX code.  The generated PTX code was then executed on real
>>> hardware, and the result is correct.
>>
>> How much of the LLVM IR does this support?  What's missing?
> Have to add some intrinsics, calling conventions, and address spaces.
> I would say these are relatively small changes.
Are you generating masks at all?  If so, how are you doing that?
Similarly to how the ARM backend does predicates (handling all the
representation, etc. in the target-specific codegen)?

I've have been wanting to see predicates (vector and scalar) in the
LLVM IR for a long time.  Perhaps the PTX backend is an opportunity
to explore that.

                           -Dave

David A. Greene

2010-Aug-10 19:05 UTC

head link

[LLVMdev] PTX backend, BSD license

Helge Rhodin <helge.rhodin at alice-dsl.net> writes:
>> But I didn't study their code thoroughly, so I might be wrong about
this.
>>   
> Yes, we don't use the target-independent code generator and the
> backend is based on the CBackend.  We decided to not use the code
> generator because PTX code is also an intermediate language. The
> graphics driver contains a compiler which compiles PTX code to machine
> code targeting a particular GPU architecture. It performs register
> allocation, instruction scheduling, dead-code elimination, and other
> late optimizations. Thus we don't need most of the target-independent
> code generator features in the PTXBackend.
Some of these could still be useful to aid the NVIDIA compiler.  But I
don't have any hard data to support that assertion.  :)
> We already support most of the PTX instruction set. Texture lookup,
> structs&arrays, function calls, vector types, different address spaces
> and many intrinsics.
Do you generate masked operations?  If so, are you managing
masks/predicates with your own target-specific representation _a_la_ the
current ARM backend?
>> If they are unrelated, can you do a comparison of the two?  Perhaps
>> there are holes in each that can be filled by the other.  It would be
>> a shame to have two completely different PTX backends.
>>   
> I don't know much about the target-independent code generator but I 
> think we use distinct approaches which cannot
> be merged in a reasonable way. Probably both approaches have their own 
> pros and cons.
Certainly.
>> Is there work to upstream this?  I've got a relatively unused
NVIDIA
>> card at home.   :) 
>>
>>                                  -Dave
>>   
> The PTXBackend probably needs more test cases. I'm currently covering a
> lot of  LLVM and PTX features but the test suite is still not exhaustive.
> I took the coding standards into account and the license is now 
> compatible to LLVM. I don't know what else needs to be done?
Checking it in.  :) Really, we probably should do some sort of code
review, but Chris would have to indicate what he wants.

                            -Dave

Villmow, Micah

2010-Aug-10 19:25 UTC

head link

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> On Behalf Of David A. Greene
> Sent: Tuesday, August 10, 2010 12:02 PM
> To: Che-Liang Chiou
> Cc: llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Upstream PTX backend that uses target
> independent code generator if possible
> 
> Che-Liang Chiou <clchiou at gmail.com> writes:
> 
> > I surfed their code, and it seems that they didn't use code
> generator.
> > That means there design should be similar to CBackend or CPPBackend.
> > So I guess it can't generate some machine instructions like MAD,
> > and there are some PTX instruction set features that are hard to
> exploit
> > if not using code generator.
> >
> > But I didn't study their code thoroughly, so I might be wrong
about
> this.
> 
> I haven't had a chance to look at it yet either.
> 
> >>> I have tested this backend to translate a work-efficient
parallel
> scan
> >>> kernel (
> http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html
> >>> ) into PTX code.  The generated PTX code was then executed on
real
> >>> hardware, and the result is correct.
> >>
> >> How much of the LLVM IR does this support?  What's missing?
> > Have to add some intrinsics, calling conventions, and address spaces.
> > I would say these are relatively small changes.
> 
> Are you generating masks at all?  If so, how are you doing that?
> Similarly to how the ARM backend does predicates (handling all the
> representation, etc. in the target-specific codegen)?
> 
> I've have been wanting to see predicates (vector and scalar) in the
> LLVM IR for a long time.  Perhaps the PTX backend is an opportunity
> to explore that.[Villmow, Micah] From looking at the llvmptxbackend, it does not fully support
vector types.
This in my perspective is one of the greatest benefits of the backend
code-generator, automatic support
for vector types in LLVM-IR that are not natively supported by the target
machine via vector splitting.> 
>                            -Dave
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Aug 2010 - [LLVMdev] Upstream PTX backend that uses target independent code generator if possible

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

[LLVMdev] PTX backend, BSD license

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

[LLVMdev] PTX backend, BSD license

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

Possibly Parallel Threads