thr3ads.net - llvm dev - [LLVMdev] C++AMP -> OpenCL (NVPTX) prototype [Apr 2013]

If this information is useful, please help other people find it:
Share via:

corngood at gmail.com

2013-Apr-14 02:13 UTC

[LLVMdev] C++AMP -> OpenCL (NVPTX) prototype

After reading about Intel's 'Shevlin Park' project to implement
C++AMP in
llvm/clang, and failing to find any code for it, I decided to try to implement 
something similar.  I did it as an excuse to explore and hack on llvm/clang, 
which I hadn't done before, but it's now at the point where it will run
the
simplest matrix multiplication sample from MSDN, so I thought I might as well 
share it.

The source is in:
https://github.com/corngood/llvm.git
https://github.com/corngood/clang.git
https://github.com/corngood/compiler-rt.git [unchanged]
https://github.com/corngood/amp.git [simple test project]

It's fairly hacky, and very fragile, so don't expect anything that
isn't used
in the sample to work.  I also haven't tested it on large datasets, and
there
are some things that definitely need fixing before I'd expect good
performance
(e.g. workgroup size).  It currently works only on NVIDIA GPUs, and has only 
been tested on my shitty old 9600GT on amd64 linux with the stable binary 
drivers.

The compilation process currently works like this:

.cpp -> [clang++ -fc++-amp] -> .ll
	- compile non-amp code

.cpp -> [clang++ -fc++-amp -famp-is-kernel] -> .amp.ll
	- compile amp kernels only

.amp.ll -> [opt -amp-to-opencl] -> .nvvm.ll
	- create kernel wrapper to deal with buffer/const inputs
	- add nvvm annotations

.nvvm.ll -> [llc -march=nvptx] -> .ptx
	- compile kernels to NVPTX (unchanged)

.ll + .ptx -> [opt -amp-create-stubs .ptx] -> .opt.ll
	- embed ptx as array data
	- create functions to get kernel info, load inputs, etc

.opt.ll -> [llc] -> .o
	- unchanged

The clang steps only differ in codegen, so eventually they should be combined 
into one clang call.  NVPTX is meant to be replaced with SPIR at some point, 
to make it portable, which is why I didn't bother with text kernel
generation.

I won't go into implementation details, but if anyone is interested, or 
working on something similar, feel free to get in touch.

Thanks,
Dave McFarland

Hal Finkel

2013-Apr-14 14:42 UTC

head link

[LLVMdev] C++AMP -> OpenCL (NVPTX) prototype

----- Original Message -----> From: corngood at gmail.com
> To: llvmdev at cs.uiuc.edu
> Sent: Saturday, April 13, 2013 9:13:57 PM
> Subject: [LLVMdev] C++AMP -> OpenCL (NVPTX) prototype
> 
> After reading about Intel's 'Shevlin Park' project to implement
> C++AMP in
> llvm/clang, and failing to find any code for it, I decided to try to
> implement
> something similar.  I did it as an excuse to explore and hack on
> llvm/clang,
> which I hadn't done before, but it's now at the point where it will
> run the
> simplest matrix multiplication sample from MSDN, so I thought I might
> as well
> share it.
> 
> The source is in:
> https://github.com/corngood/llvm.git
> https://github.com/corngood/clang.git
> https://github.com/corngood/compiler-rt.git [unchanged]
> https://github.com/corngood/amp.git [simple test project]
> 
> It's fairly hacky, and very fragile, so don't expect anything that
> isn't used
> in the sample to work.  I also haven't tested it on large datasets,
> and there
> are some things that definitely need fixing before I'd expect good
> performance
> (e.g. workgroup size).  It currently works only on NVIDIA GPUs, and
> has only
> been tested on my shitty old 9600GT on amd64 linux with the stable
> binary
> drivers.
> 
> The compilation process currently works like this:
> 
> .cpp -> [clang++ -fc++-amp] -> .ll
> 	- compile non-amp code
> 
> .cpp -> [clang++ -fc++-amp -famp-is-kernel] -> .amp.ll
> 	- compile amp kernels only
> 
> .amp.ll -> [opt -amp-to-opencl] -> .nvvm.ll
> 	- create kernel wrapper to deal with buffer/const inputs
> 	- add nvvm annotations
> 
> .nvvm.ll -> [llc -march=nvptx] -> .ptx
> 	- compile kernels to NVPTX (unchanged)
> 
> .ll + .ptx -> [opt -amp-create-stubs .ptx] -> .opt.ll
> 	- embed ptx as array data
> 	- create functions to get kernel info, load inputs, etc
> 
> .opt.ll -> [llc] -> .o
> 	- unchanged
> 
> The clang steps only differ in codegen, so eventually they should be
> combined
> into one clang call.  NVPTX is meant to be replaced with SPIR at some
> point,
> to make it portable, which is why I didn't bother with text kernel
> generation.
> 
> I won't go into implementation details, but if anyone is interested,
> or
> working on something similar, feel free to get in touch.
Dave,

[I've copied the cfe-dev list as well.]

Thanks for sharing this! I think this sounds very interesting. I don't know
much about AMP, but I do have users who are also interested in accelerator
targeting, and I'd like you to share your thoughts on:

 1. Does your implementation share common functionality with the 'captured
statement' work that Intel is currently doing (in order to support Cilk,
OpenMP, etc.)? If you're not aware of it, see:
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130408/077615.html
-- This should end up in trunk soon. I ask because if the current captured
statement patches would almost, but not quite, work for you, then it would be
interesting to understand why.

 2. What will be necessary to eliminate the two-clang-invocations problem. If we
ever grow support for embedded accelerator targeting (through AMP, OpenACC,
OpenMP 4+, etc.), it sounds like this will be a common requirement, and if I had
to guess, there is common interest in putting the necessary infrastructure in
place.

 -Hal
> 
> Thanks,
> Dave McFarland
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

corngood at gmail.com

2013-Apr-14 18:18 UTC

head link

[LLVMdev] C++AMP -> OpenCL (NVPTX) prototype

On April 14, 2013 09:42:28 AM Hal Finkel wrote:> Dave,
> 
> [I've copied the cfe-dev list as well.]
> 
> Thanks for sharing this! I think this sounds very interesting. I don't
know
> much about AMP, but I do have users who are also interested in accelerator
> targeting, and I'd like you to share your thoughts on:
> 
>  1. Does your implementation share common functionality with the
'captured
> statement' work that Intel is currently doing (in order to support
Cilk,
> OpenMP, etc.)? If you're not aware of it, see:
> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130408/077615.
> html -- This should end up in trunk soon. I ask because if the current
> captured statement patches would almost, but not quite, work for you, then
> it would be interesting to understand why.
Kernels in AMP are represented by a lambda, so I haven't had to do anything 
special to capture variables.  I do some work in the opt passes to marshal 
certain types (buffer references so far; also textures, etc in the future), so 
maybe there's some overlap there.  

Thanks for the link, I'll have to read more about it.
>  2. What will be necessary to eliminate the two-clang-invocations problem.
> If we ever grow support for embedded accelerator targeting (through AMP,
> OpenACC, OpenMP 4+, etc.), it sounds like this will be a common
> requirement, and if I had to guess, there is common interest in putting the
> necessary infrastructure in place.
The only reason I have two clang invokations right now is because of how I 
dealt with adress-spaces.  In the Shevlin Park presentation, they mentioned 
doing analysis and assigning address-spaces after codegen, but I just assign 
them using __attribute__((addressspace)) for now, and zero them out for CPU 
codegen with a TargetOpt.  It sort of piggybacks on the OpenCL -> 
NVPTX/SPIR/AMD/etc address space abstraction.  The other differences are 
similar to how CodeGenOpts.CUDAIsDevice works.

Unfortunately it won't be sufficient for a full implementation of AMP, which
doesn't specify (to my knowledge) any address-space declaration on pointer 
types, but still allows pointers into buffers in various address-spaces.
> 
>  -Hal
To be honest, I'm not crazy about the AMP specification, I just like the
idea
of compiling a heterogenous module for host/device code, which can be easily 
integrated into existing C++ application.  I'd be happy for it to drop the
MS
specific syntax like properties, use C++ attributes wherever possible instead 
of keywords, and have explicit address spaces like cuda/opencl.

I think the big problem is going to be making it robustly target two very 
different targets in one pass.  Most obviously, supporting different bitness for
host/device.  My testing was all on 64/32 bit, but all other combinations are 
available in practice.

- Dave

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Apr 2013 - [LLVMdev] C++AMP -> OpenCL (NVPTX) prototype

[LLVMdev] C++AMP -> OpenCL (NVPTX) prototype

[LLVMdev] C++AMP -> OpenCL (NVPTX) prototype

[LLVMdev] C++AMP -> OpenCL (NVPTX) prototype

Reasonably Related Threads