thr3ads.net - llvm dev - [LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Yabin Hu

2012-Apr-02 14:16 UTC

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Hi all,

I am a phd student from Huazhong University of Sci&Tech, China. The
following is my GSoC 2012 proposal.
Comments are welcome!

*Title: Automatic GPGPU Code Generation for LLVM*

*Abstract*
Very often, manually developing an GPGPU application is a time-consuming,
complex, error-prone and iterative process. In this project, I propose to
build an automatic GPGPU code generation framework for LLVM, based on two
successful LLVM (sub-)projects - Polly and PTX backend. This can be very
useful to ease the burden of the long learning curve of various GPU
programming model.

*Motivation*
With the broad proliferation of GPU computing, it is very important to
provide an easy and automatic tool to develop or port the applications to
GPU for normal developers, especially for those domain experts who want to
harness the huge computing power of GPU. Polly has implemented many
transformations, such as tiling, auto-vectorization and openmp code
generation. With the help of LLVM's PTX backend, I plan to extend Polly
with the feature of GPGPU code generation.


*Project Detail*
In this project, we target various parallel loops which can be described by
Polly's polyhedral model. We first translated the selected SCoPs (Static
Control Parts) into 4-depth loops with Polly's schedule optimization. Then
we extract the loop body (or inner non-parallel loops) into a LLVM
sub-function, tagged with PTX_Kernel or PTX_Device call convention. After
that, we use PTX backend to translate the subfunctions into a string of the
corresponding PTX codes. Finally, we provide an runtime library to generate
the executable program.

There are three key challenges in this project here.
1. How to get the optimal execution configure of GPU codes.
The execution configure is essential to the performance of the GPU codes.
It is limited by many factors, including hardware, source codes, register
usage, local store (device) usage, original memory access patterns and so
on. We must take all the staff into consideration.

2. How to automatically insert the synchronization codes.
This is very important to preserve the original semantics. We must detect
where we need insert them correctly.

3. How to automatically generate the memory copy operation between host and
device.
We must transport the input data to GPU and copy the
results back. Fortunately, Polly has implemented a very expressive way to
describe memory access.
*
*
*Timeline*
May 21 ~ June 3 preliminary code generation for 1-d and 2d parallel loops.
June 4 ~ June 11 code generation for parallel loops with non-parallel inner
loops.
June 11 ~ June 24 automatic memory copy insertions.
June 25 ~ July 8 auto-tuning for GPU execution configuration.
July 9 ~ July 15 Midterm evaluation and writing documents.
July 16 ~ July 22 automatic synchronization insertion.
July 23 ~ August 3 test on polybench benchmarks.
August 4 ~ August 12 summarize and complete the final documents.

*
*
*Project experience*
I participated in several projects related to binary translation
(optimization) and run-time system. And I implemented a frontend for
numerical computing languages like octave/matlab, following the style of
clang. Recently, I work very close with Polly team to contribute some
patches and investigate lots of details about polyhedral transformation.
*
*
*
*
*References*
1. Tobias Grosser, Ragesh A. *Polly - First Successful Optimizations - How
to proceed?* LLVM Developer Meeting 2011.
2. Muthu Manikandan Baskaran, J. Ramanujam and P. Sadayappan.* **Automatic
C-to-CUDA Code Generation for Affine Programs*. CC 2010.
3. Soufiane Baghdadi, Armin Größlinger, and Albert Cohen. *Putting
Automatic Polyhedral Compilation for GPGPU to Work*. In Proc. of Compilers
for Parallel Computers (CPC), 2010.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120402/f8655bec/attachment.html>

Tobias Grosser

2012-Apr-03 11:49 UTC

head link

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

On 04/02/2012 04:16 PM, Yabin Hu wrote:> Hi all,
>
> I am a phd student from Huazhong University of Sci&Tech, China. The
> following is my GSoC 2012 proposal.
Hi Yabin,
> Comments are welcome!
>
> *Title: Automatic GPGPU Code Generation for LLVM*
>
> *Abstract*
> Very often, manually developing an GPGPU application is a                        developing a GPGPU
> time-consuming, complex, error-prone and iterativeprocess. In this                                            iterative process.
> project, I propose to build an automatic GPGPU code generation framework
> for LLVM, based on two successful LLVM (sub-)projects - Polly and PTX
> backend. This can be very useful to ease the burden of the long learning
> curve of various GPU programming model.                                    models.
I like the idea ;-)

Please submit a first version of this proposal to the Google SoC web 
application. You can refine it later, but it is important that it is 
officially registered. Like this you are on the save side, in case 
something unexpected happens the last days.
> *Motivation*
> With the broad proliferation of GPU computing, it is very important to
> provide an easy and automatic tool to develop or port the applications
> to GPU for normal developers, especially for those domain experts who
> want to harness the huge computing power of GPU.
> Polly has implemented
> many transformations, such as tiling, auto-vectorization and openmp code
> generation. With the help of LLVM's PTX backend, I plan to extend Polly
> with the feature of GPGPU code generation.
> *Project Detail*
> In this project, we target various parallel loops which can be described
> by Polly's polyhedral model. We first translated the selected SCoPs
> (Static Control Parts) into 4-depth loops with Polly's schedule
> optimization.
> Then we extract the loop body (or inner non-parallel
> loops) into a LLVM sub-function, tagged with PTX_Kernel or PTX_Device
> call convention. After that, we use PTX backend to translate the
> subfunctions into a string of the corresponding PTX codes. Finally, we
> provide an runtime library to generate the executable program.
I would distinguish here between the infrastructure features that you 
add to Polly and the actual code generation/scheduling strategy you will 
follow. It should become clear that the infrastructure changes are 
independent of the actual code generation strategy you use.
This is especially important as automatic GPGPU code generation is a 
complex problem. I doubt it will be possible to implement a perfect 
solution within three months. Hence, I would target a (very) simple code
generation strategy that brings all the necessary infrastructure into 
Polly. When the infrastructure is read and proven to work, you can start
to implement (and evaluate) more complex code generation strategies.
> There are three key challenges in this project here.
> 1. How to get the optimal execution configure of GPU codes.
> The execution configure is essential to the performance of the GPU
> codes. It is limited by many factors, including hardware, source codes,
> register usage, local store (device) usage, original memory access
> patterns and so on. We must take all the staff into consideration.
Yes and no. Don't try to solve everything withing 3 months. Rather try 
to limit yourself to some very simple but certainly achievable goals.
I would probably go either with a very simple
> 2. How to automatically insert the synchronization codes.
> This is very important to preserve the original semantics. We must
> detect where we need insert them correctly.
Again, distinguish here between the infrastructure of adding 
synchronizations and the algorithm to derive optimal synchronizations.
> 3. How to automatically generate the memory copy operation between host
> and device.
> We must transport the input data to GPU and copy the
> results back. Fortunately, Polly has implemented a very expressive way
> to describe memory access.
In general, I think in general it may be helpful to have some examples 
that where you show what you want to do.
> *Timeline*
> May 21 ~ June 3 preliminary code generation for 1-d and 2d parallel loops.
> June 4 ~ June 11 code generation for parallel loops with non-parallel
> inner loops.
> June 11 ~ June 24 automatic memory copy insertions.
> June 25 ~ July 8 auto-tuning for GPU execution configuration.What do you mean by auto-tuning? What do you want to tune?

For me it does not seem to be essential.

Due to the short time of a GSoC I would suggest to just require the user 
to define such values and give a little bit more time to the other
features. You can put it into a nice to have list, where you put ideas 
that can be implemented after having fulfilled the success criteria.
> July 9 ~ July 15 Midterm evaluation and writing documents.
> July 16 ~ July 22 automatic synchronization insertion.
> July 23 ~ August 3 test on polybench benchmarks.
> August 4 ~ August 12 summarize and complete the final documents.
An additional list with details for the individual steps would be good.

When are you planning to add what infrastructure. You may also add 
example codes.
> *Project experience*
> I participated in several projects related to binary translation
> (optimization) and run-time system. And I implemented a frontend for
> numerical computing languages like octave/matlab, following the style of
> clang. Recently, I work very close with Polly team to contribute some
> patches and investigate lots of details about polyhedral transformation.
You may add links to the corresponding commit messages.
> *References*
> 1. Tobias Grosser, Ragesh A. /Polly - First Successful Optimizations -
> How to proceed?/ LLVM Developer Meeting 2011.
> 2. Muthu Manikandan Baskaran, J. Ramanujam and P.
> Sadayappan.///Automatic C-to-CUDA Code Generation for Affine Programs/.
> CC 2010.
> 3. Soufiane Baghdadi, Armin Größlinger, and Albert Cohen. /Putting
> Automatic Polyhedral Compilation for GPGPU to Work/. In Proc. of
> Compilers for Parallel Computers (CPC), 2010.
You are adding references, but don't reference them in your text. Is 
this intentional?

Overall, this looks interesting. Looking forward to your final submission.

Tobi

P.S. Feel free to post again to get further comments.

Hongbin Zheng

2012-Apr-03 13:13 UTC

head link

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Hi Yabin,

Instead of compile the LLVM IR to PTX asm string in a ScopPass, you
can also the improve llc/lli or create new tools to support the code
generation for Heterogeneous platforms[1], i.e. generate code for more
than one target architecture at the same time. Something like this is
not very complicated and had been implemented[2,3] by some people, but
not available in LLVM mainstream. Implement this could make your GPU
project more complete.

best regards
ether


[1]http://en.wikipedia.org/wiki/Heterogeneous_computing
[2]http://llvm.org/devmtg/2010-11/Villmow-OpenCL.pdf
[3]http://llvm.org/devmtg/2008-08/Sander_HW-SW-CoDesignflowWithLLVM.pdf

On Mon, Apr 2, 2012 at 10:16 PM, Yabin Hu <yabin.hwu at gmail.com>
wrote:> Hi all,
>
> I am a phd student from Huazhong University of Sci&Tech, China. The
> following is my GSoC 2012 proposal.
> Comments are welcome!
>
> Title: Automatic GPGPU Code Generation for LLVM
>
> Abstract
> Very often, manually developing an GPGPU application is a time-consuming,
> complex, error-prone and iterative process. In this project, I propose to
> build an automatic GPGPU code generation framework for LLVM, based on two
> successful LLVM (sub-)projects - Polly and PTX backend. This can be very
> useful to ease the burden of the long learning curve of various GPU
> programming model.
>
> Motivation
> With the broad proliferation of GPU computing, it is very important to
> provide an easy and automatic tool to develop or port the applications to
> GPU for normal developers, especially for those domain experts who want to
> harness the huge computing power of GPU. Polly has implemented many
> transformations, such as tiling, auto-vectorization and openmp code
> generation. With the help of LLVM's PTX backend, I plan to extend Polly
with
> the feature of GPGPU code generation.
>
>
> Project Detail
> In this project, we target various parallel loops which can be described by
> Polly's polyhedral model. We first translated the selected SCoPs
(Static
> Control Parts) into 4-depth loops with Polly's schedule optimization.
Then
> we extract the loop body (or inner non-parallel loops) into a LLVM
> sub-function, tagged with PTX_Kernel or PTX_Device call convention. After
> that, we use PTX backend to translate the subfunctions into a string of the
> corresponding PTX codes. Finally, we provide an runtime library to generate
> the executable program.
>
> There are three key challenges in this project here.
> 1. How to get the optimal execution configure of GPU codes.
> The execution configure is essential to the performance of the GPU codes.
It
> is limited by many factors, including hardware, source codes, register
> usage, local store (device) usage, original memory access patterns and so
> on. We must take all the staff into consideration.
>
> 2. How to automatically insert the synchronization codes.
> This is very important to preserve the original semantics. We must detect
> where we need insert them correctly.
>
> 3. How to automatically generate the memory copy operation between host and
> device.
> We must transport the input data to GPU and copy the
> results back. Fortunately, Polly has implemented a very expressive way to
> describe memory access.
>
> Timeline
> May 21 ~ June 3 preliminary code generation for 1-d and 2d parallel loops.
> June 4 ~ June 11 code generation for parallel loops with non-parallel inner
> loops.
> June 11 ~ June 24 automatic memory copy insertions.
> June 25 ~ July 8 auto-tuning for GPU execution configuration.
> July 9 ~ July 15 Midterm evaluation and writing documents.
> July 16 ~ July 22 automatic synchronization insertion.
> July 23 ~ August 3 test on polybench benchmarks.
> August 4 ~ August 12 summarize and complete the final documents.
>
>
> Project experience
> I participated in several projects related to binary translation
> (optimization) and run-time system. And I implemented a frontend for
> numerical computing languages like octave/matlab, following the style of
> clang. Recently, I work very close with Polly team to contribute some
> patches and investigate lots of details about polyhedral transformation.
>
>
> References
> 1. Tobias Grosser, Ragesh A. Polly - First Successful Optimizations - How
to
> proceed? LLVM Developer Meeting 2011.
> 2. Muthu Manikandan Baskaran, J. Ramanujam and P. Sadayappan. Automatic
> C-to-CUDA Code Generation for Affine Programs. CC 2010.
> 3. Soufiane Baghdadi, Armin Größlinger, and Albert Cohen. Putting Automatic
> Polyhedral Compilation for GPGPU to Work. In Proc. of Compilers for
Parallel
> Computers (CPC), 2010.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Justin Holewinski

2012-Apr-03 14:30 UTC

head link

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

On Mon, Apr 2, 2012 at 7:16 AM, Yabin Hu <yabin.hwu at gmail.com> wrote:
> Hi all,
>
> I am a phd student from Huazhong University of Sci&Tech, China. The
> following is my GSoC 2012 proposal.
> Comments are welcome!
>
> *Title: Automatic GPGPU Code Generation for LLVM*
>
> *Abstract*
> Very often, manually developing an GPGPU application is a time-consuming,
> complex, error-prone and iterative process. In this project, I propose to
> build an automatic GPGPU code generation framework for LLVM, based on two
> successful LLVM (sub-)projects - Polly and PTX backend. This can be very
> useful to ease the burden of the long learning curve of various GPU
> programming model.
>
> *Motivation*
> With the broad proliferation of GPU computing, it is very important to
> provide an easy and automatic tool to develop or port the applications to
> GPU for normal developers, especially for those domain experts who want to
> harness the huge computing power of GPU. Polly has implemented many
> transformations, such as tiling, auto-vectorization and openmp code
> generation. With the help of LLVM's PTX backend, I plan to extend Polly
> with the feature of GPGPU code generation.
>
Very interesting!  I'm quite familiar with Muthu's work, and putting
that
into LLVM would be great.  If done right, it could apply to any
heterogeneous systems, including AMD GPUs.

As the maintainer and primary developer on the PTX back-end, please feel
free to contact me with any issues/suggestions you have regarding the PTX
back-end!

>
>
> *Project Detail*
> In this project, we target various parallel loops which can be described
> by Polly's polyhedral model. We first translated the selected SCoPs
(Static
> Control Parts) into 4-depth loops with Polly's schedule optimization.
Then
> we extract the loop body (or inner non-parallel loops) into a LLVM
> sub-function, tagged with PTX_Kernel or PTX_Device call convention. After
> that, we use PTX backend to translate the subfunctions into a string of the
> corresponding PTX codes. Finally, we provide an runtime library to generate
> the executable program.
>
I'm a bit confused by the wording here.  What do you mean by 'LLVM
sub-function?'  I'm assuming you mean extracting the relevant code into
a
separate function, but I would just use the word 'function'.

And what do you mean by a run-time library to generate the executable
program?  Are you proposing to side-step the LLVM code generator LLC?  It
seems like a reasonable approach would be to write an LLVM pass (or set of
passes) that takes as input a single IR file, and produces two: (1) the GPU
kernel/device code, and (2) the non-translatable IR with GPU code replaced
by appropriate CUDA Driver API calls.  Then, both of these can pass through
the opt/llc tools with the appropriate selection for optimization passes
and target back-end.

This way, you could fairly easily create a GPGPU compiler by writing a
simple wrapper around Clang (or better yet, improve Clang to support
multiple targets simultaneously!)

>
> There are three key challenges in this project here.
> 1. How to get the optimal execution configure of GPU codes.
> The execution configure is essential to the performance of the GPU codes.
> It is limited by many factors, including hardware, source codes, register
> usage, local store (device) usage, original memory access patterns and so
> on. We must take all the staff into consideration.
>
> 2. How to automatically insert the synchronization codes.
> This is very important to preserve the original semantics. We must detect
> where we need insert them correctly.
>
> 3. How to automatically generate the memory copy operation between host
> and device.
> We must transport the input data to GPU and copy the
> results back. Fortunately, Polly has implemented a very expressive way to
> describe memory access.
> *
> *
> *Timeline*
> May 21 ~ June 3 preliminary code generation for 1-d and 2d parallel loops.
> June 4 ~ June 11 code generation for parallel loops with non-parallel
> inner loops.
> June 11 ~ June 24 automatic memory copy insertions.
> June 25 ~ July 8 auto-tuning for GPU execution configuration.
> July 9 ~ July 15 Midterm evaluation and writing documents.
> July 16 ~ July 22 automatic synchronization insertion.
> July 23 ~ August 3 test on polybench benchmarks.
> August 4 ~ August 12 summarize and complete the final documents.
>
> *
> *
> *Project experience*
> I participated in several projects related to binary translation
> (optimization) and run-time system. And I implemented a frontend for
> numerical computing languages like octave/matlab, following the style of
> clang. Recently, I work very close with Polly team to contribute some
> patches and investigate lots of details about polyhedral transformation.
> *
> *
> *
> *
> *References*
> 1. Tobias Grosser, Ragesh A. *Polly - First Successful Optimizations -
> How to proceed?* LLVM Developer Meeting 2011.
> 2. Muthu Manikandan Baskaran, J. Ramanujam and P. Sadayappan.* **Automatic
> C-to-CUDA Code Generation for Affine Programs*. CC 2010.
> 3. Soufiane Baghdadi, Armin Größlinger, and Albert Cohen. *Putting
> Automatic Polyhedral Compilation for GPGPU to Work*. In Proc. of
> Compilers for Parallel Computers (CPC), 2010.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120403/d7a84314/attachment.html>

Yabin Hu

2012-Apr-03 22:44 UTC

head link

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Hi Hongbin,

2012/4/3 Hongbin Zheng <etherzhhb at gmail.com>
> Instead of compile the LLVM IR to PTX asm string in a ScopPass, you
> can also the improve llc/lli or create new tools to support the code
> generation for Heterogeneous platforms[1], i.e. generate code for more
> than one target architecture at the same time. Something like this is
> not very complicated and had been implemented[2,3] by some people, but
> not available in LLVM mainstream. Implement this could make your GPU
> project more complete.
>
>
> [1]http://en.wikipedia.org/wiki/Heterogeneous_computing
> [2]http://llvm.org/devmtg/2010-11/Villmow-OpenCL.pdf
> [3]http://llvm.org/devmtg/2008-08/Sander_HW-SW-CoDesignflowWithLLVM.pdf

The original motivation we do this, is to provide a jit compiler for our
language frontend (a subset of matlab/octave). I've extended lli to
implement a jit compiler (named gvm) to use polly dynamically. However,
preliminary results show that the overhead is heavy. I choose to offload
the dynamic optimization from the jitting process.  And also putting the
LLVM to PTX asm string pass into polly can provide a kind of one-touch
experience to users.

Please imagine such a user scenario.  When a user open a matlab source file
or a folder contained source files, we can start to compile the source
statically and use polly and opt to optimize it to get the optimal version
llvm ir. Finally, when the user click run or the enter key, we just need
jit the llvm ir as normal one, minimizing the dynamic overhead.

Thanks for the recommendation of the references

best regards,
Yabin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120404/4b3c5682/attachment.html>

Yabin Hu

2012-Apr-03 23:02 UTC

head link

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Hi Justin,

2012/4/3 Justin Holewinski <justin.holewinski at gmail.com>
> *Motivation*
>> With the broad proliferation of GPU computing, it is very important to
>> provide an easy and automatic tool to develop or port the applications
to
>> GPU for normal developers, especially for those domain experts who want
to
>> harness the huge computing power of GPU. Polly has implemented many
>> transformations, such as tiling, auto-vectorization and openmp code
>> generation. With the help of LLVM's PTX backend, I plan to extend
Polly
>> with the feature of GPGPU code generation.
>>
>
> Very interesting!  I'm quite familiar with Muthu's work, and
putting that
> into LLVM would be great.  If done right, it could apply to any
> heterogeneous systems, including AMD GPUs.
>As the maintainer and primary developer on the PTX back-end, please
feel> free to contact me with any issues/suggestions you have regarding the PTX
> back-end!

Thanks for your interest and help.

I'm a bit confused by the wording here.  What do you mean by
'LLVM> sub-function?'  I'm assuming you mean extracting the relevant code
into a
> separate function, but I would just use the word 'function'.

Yes, it is indeed a function. I use this word by following the methods
naming style of polly's openmp code generation. I will fix this.

And what do you mean by a run-time library to generate the
executable> program?

The runtime library is just a wrapper of cuda driver APIs in my mind. But
we can add our debug info and make the cuda APIs changes apparent to users.


Are you proposing to side-step the LLVM code generator LLC?  It seems
like> a reasonable approach would be to write an LLVM pass (or set of passes)
> that takes as input a single IR file, and produces two: (1) the GPU
> kernel/device code, and (2) the non-translatable IR with GPU code replaced
> by appropriate CUDA Driver API calls.  Then, both of these can pass through
> the opt/llc tools with the appropriate selection for optimization passes
> and target back-end.
>
> This way, you could fairly easily create a GPGPU compiler by writing a
> simple wrapper around Clang (or better yet, improve Clang to support
> multiple targets simultaneously!)
>
Ether give a similar suggestion to this point. Here I copy the reply to him
to explain why I choose to put the transformation pass embedded in my
implementation.

The original motivation we do this, is to provide a jit compiler for our
language frontend (a subset of matlab/octave). I've extended lli to
implement a jit compiler (named gvm) to use polly dynamically. However,
preliminary results show that the overhead is heavy. I choose to offload
the dynamic optimization from the jitting process.  And also putting the
LLVM to PTX asm string pass into polly can provide a kind of one-touch
experience to users. Please imagine such a user scenario.  When a user open
a matlab source file or a folder contained source files, we can start to
compile the source statically and use polly and opt to optimize it to get
the optimal version llvm ir. Finally, when the user click run or the enter
key, we just need jit the llvm ir as normal one, minimizing the dynamic
overhead.


Thanks again!

best regards,
Yabin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120404/b2d88915/attachment.html>

Tobias Grosser

2012-Apr-04 11:49 UTC

head link

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

On 04/03/2012 03:13 PM, Hongbin Zheng wrote:> Hi Yabin,
>
> Instead of compile the LLVM IR to PTX asm string in a ScopPass, you
> can also the improve llc/lli or create new tools to support the code
> generation for Heterogeneous platforms[1], i.e. generate code for more
> than one target architecture at the same time. Something like this is
> not very complicated and had been implemented[2,3] by some people, but
> not available in LLVM mainstream. Implement this could make your GPU
> project more complete.
I agree with ether that we should ensure as much work as possible is 
done within generic, not Polly specific code.

In terms of heterogeneous code generation the approach Yabin proposed 
seems to work, but we should discuss other approaches. For the moment,
I believe his proposal is very similar the model of OpenCL and CUDA. He 
splits the code into host and kernel code. The host code is directly 
compiled to machine code by the existing tools (clang/llc). The kernel 
code is stored as a string and only at execution time it is compiled to 
platform specific code.

Are there any other approaches that could be taken? What specific 
heterogeneous platform support would be needed. At the moment, it seems 
to me we actually do not need too much additional support.

Cheers
Tobi

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Apr 2012 - [LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Reasonably Related Threads