thr3ads.net - llvm dev - [llvm-dev] [RFC] Upstreaming PACXX (Programing Accelerators with C++) [Feb 2018]

If this information is useful, please help other people find it:
Share via:

Haidl, Michael via llvm-dev

2018-Feb-05 07:11 UTC

[llvm-dev] [RFC] Upstreaming PACXX (Programing Accelerators with C++)

HI LLVM comunity, 

after 3 years of development, various talks on LLVM-HPC and EuroLLVM and other
scientific conferences I want to present my PhD research topic to the lists.

The main goal for my research was to develop a single-source programming model
equal to CUDA or SYCL for accelerators supported by LLVM (e.g., Nvidia GPUs).
PACXX uses Clang as front-end for code generation and comes with a runtime
library (PACXX-RT) to execute kernels on the available hardware. Currently,
PACXX supports Nvidia GPUs through the NVPTX Target and CUDA, CPUs through MCJIT
(including whole function vectorization thanks to RV [1]) and has an
experimental back-end for AMD GPUs using the AMDGPU Target and ROCm.

The main idea behind PACXX is the use of the LLVM IR as kernel code
representation which is integrated into the executable together with the
PACXX-RT. At runtime of the program the PACXX-RT compiles the IR to the final MC
level and hands it over to the device. Since, PACXX does currently not enforce
any major restrictions on the C++ code we managed to run (almost) arbitrary C++
code on GPUs including range-v3 [2, 3].

A short vector addition example using PACXX: 

using namespace pacxx::v2;
int main(int argc, char *argv[]) {
   // get the default executor 
   auto &exec = Executor::get();
    size_t size = 128;
    std::vector<int> a(size, 1);
    std::vector<int> b(size, 2);
    std::vector<int> c(size, 0);

    // allocate device side memory
    auto &da = exec.allocate<int>(a.size());
    auto &db = exec.allocate<int>(b.size());
    auto &dc = exec.allocate<int>(c.size());
    // copy data to the accelerator
    da.upload(a);
    db.upload(b);
    dc.upload(c);
    // get the raw pointer
    auto pa = da.get();
    auto pb = db.get();
    auto pc = dc.get();

    // define the computation
    auto vadd = [=](auto &config) {
      auto i = config.get_global(0);
      if (i < size)
       pc[i] = pa[i] + pb[i];
    };

    // launch and synchronize 
    std::promise<void> promise;
    auto future = exec.launch(vadd, {{1}, {128}}, promise);
    future.wait();
    // copy back the data
    dc.download(c);
}

Recently, I open sourced PACXX on github [3] under the same license LLVM is
currently using.
Since my PhD is now in its final stage I wanted to ask if there is interest in
having such an SPMD programming model upstreamed.
PACXX is currently on par with release_60 and only requires minor modifications
to Clang, e.g., a command line switch, C++ attributes, some diagnostics and
metadata generation during code gen.
The PACXX-RT can be integrated into the LLVM build system and may remain a
standalone project. (BTW, may I ask to add PACXX to the LLVM projects?).

Looking forward for your feedback. 

Cheers, 
Michael Haidl

[1] https://github.com/cdl-saarland/rv
[2] https://github.com/ericniebler/range-v3
[3] https://dl.acm.org/authorize?N20051
[4] https://github.com/pacxx/pacxx-llvm

Nicholas Wilson via llvm-dev

2018-Feb-05 13:35 UTC

head link

[llvm-dev] [RFC] Upstreaming PACXX (Programing Accelerators with C++)

Interesting. 

I do something similar for D targeting CUDA (via NVPTX) and OpenCL (via my
forward proved fork of Khronos’ SPIRV-LLVM)[1], except all the code generation
is done at compile time. The runtime is aided by compile time reflection so that
calling kernels is done by symbol.

What kind of performance difference do you see running code that was not
developed with GPU in mind (e.g. range-v3) vs code that was?
What restrictions do you apply? I assume virtual functions, recursion. What
else?

How does pacxx's SPMD model differ from what one can do in LLVM at the
moment?

Nic

[1]: http://github.com/libmir/dcompute/
> On 5 Feb 2018, at 7:11 am, Haidl, Michael via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> HI LLVM comunity, 
> 
> after 3 years of development, various talks on LLVM-HPC and EuroLLVM and
other scientific conferences I want to present my PhD research topic to the
lists.
> 
> The main goal for my research was to develop a single-source programming
model equal to CUDA or SYCL for accelerators supported by LLVM (e.g., Nvidia
GPUs). PACXX uses Clang as front-end for code generation and comes with a
runtime library (PACXX-RT) to execute kernels on the available hardware.
Currently, PACXX supports Nvidia GPUs through the NVPTX Target and CUDA, CPUs
through MCJIT (including whole function vectorization thanks to RV [1]) and has
an experimental back-end for AMD GPUs using the AMDGPU Target and ROCm.
> 
> The main idea behind PACXX is the use of the LLVM IR as kernel code
representation which is integrated into the executable together with the
PACXX-RT. At runtime of the program the PACXX-RT compiles the IR to the final MC
level and hands it over to the device. Since, PACXX does currently not enforce
any major restrictions on the C++ code we managed to run (almost) arbitrary C++
code on GPUs including range-v3 [2, 3].
> 
> A short vector addition example using PACXX: 
> 
> ...
> 
> Recently, I open sourced PACXX on github [3] under the same license LLVM is
currently using.
> Since my PhD is now in its final stage I wanted to ask if there is interest
in having such an SPMD programming model upstreamed.
> PACXX is currently on par with release_60 and only requires minor
modifications to Clang, e.g., a command line switch, C++ attributes, some
diagnostics and metadata generation during code gen.
> The PACXX-RT can be integrated into the LLVM build system and may remain a
standalone project. (BTW, may I ask to add PACXX to the LLVM projects?).
> 
> Looking forward for your feedback. 
> 
> Cheers, 
> Michael Haidl
> 
> [1] https://github.com/cdl-saarland/rv
> [2] https://github.com/ericniebler/range-v3
> [3] https://dl.acm.org/authorize?N20051
> [4] https://github.com/pacxx/pacxx-llvm
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Renato Golin via llvm-dev

2018-Feb-05 14:51 UTC

head link

[llvm-dev] [RFC] Upstreaming PACXX (Programing Accelerators with C++)

I was going to say, this reminds me of Kai's presentation at Fosdem
yesterday.

https://fosdem.org/2018/schedule/event/heterogenousd/

It's always good to see the cross-architecture power of LLVM being
used in creative ways! :)

cheers,
--renato

On 5 February 2018 at 13:35, Nicholas Wilson via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Interesting.
>
> I do something similar for D targeting CUDA (via NVPTX) and OpenCL (via my
forward proved fork of Khronos’ SPIRV-LLVM)[1], except all the code generation
is done at compile time. The runtime is aided by compile time reflection so that
calling kernels is done by symbol.
>
> What kind of performance difference do you see running code that was not
developed with GPU in mind (e.g. range-v3) vs code that was?
> What restrictions do you apply? I assume virtual functions, recursion. What
else?
>
> How does pacxx's SPMD model differ from what one can do in LLVM at the
moment?
>
> Nic
>
> [1]: http://github.com/libmir/dcompute/
>
>> On 5 Feb 2018, at 7:11 am, Haidl, Michael via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>
>> HI LLVM comunity,
>>
>> after 3 years of development, various talks on LLVM-HPC and EuroLLVM
and other scientific conferences I want to present my PhD research topic to the
lists.
>>
>> The main goal for my research was to develop a single-source
programming model equal to CUDA or SYCL for accelerators supported by LLVM
(e.g., Nvidia GPUs). PACXX uses Clang as front-end for code generation and comes
with a runtime library (PACXX-RT) to execute kernels on the available hardware.
Currently, PACXX supports Nvidia GPUs through the NVPTX Target and CUDA, CPUs
through MCJIT (including whole function vectorization thanks to RV [1]) and has
an experimental back-end for AMD GPUs using the AMDGPU Target and ROCm.
>>
>> The main idea behind PACXX is the use of the LLVM IR as kernel code
representation which is integrated into the executable together with the
PACXX-RT. At runtime of the program the PACXX-RT compiles the IR to the final MC
level and hands it over to the device. Since, PACXX does currently not enforce
any major restrictions on the C++ code we managed to run (almost) arbitrary C++
code on GPUs including range-v3 [2, 3].
>>
>> A short vector addition example using PACXX:
>>
>> ...
>>
>> Recently, I open sourced PACXX on github [3] under the same license
LLVM is currently using.
>> Since my PhD is now in its final stage I wanted to ask if there is
interest in having such an SPMD programming model upstreamed.
>> PACXX is currently on par with release_60 and only requires minor
modifications to Clang, e.g., a command line switch, C++ attributes, some
diagnostics and metadata generation during code gen.
>> The PACXX-RT can be integrated into the LLVM build system and may
remain a standalone project. (BTW, may I ask to add PACXX to the LLVM
projects?).
>>
>> Looking forward for your feedback.
>>
>> Cheers,
>> Michael Haidl
>>
>> [1] https://github.com/cdl-saarland/rv
>> [2] https://github.com/ericniebler/range-v3
>> [3] https://dl.acm.org/authorize?N20051
>> [4] https://github.com/pacxx/pacxx-llvm
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


-- 
cheers,
--renato

Ronan KERYELL via llvm-dev

2018-Feb-05 20:40 UTC

head link

[llvm-dev] [RFC] Upstreaming PACXX (Programing Accelerators with C++)

>>>>> On Mon, 5 Feb 2018 07:11:29 +0000, "Haidl, Michael via
llvm-dev" <llvm-dev at lists.llvm.org> said:
    Michael> HI LLVM comunity, after 3 years of development, various
    Michael> talks on LLVM-HPC and EuroLLVM and other scientific
    Michael> conferences I want to present my PhD research topic to the
    Michael> lists.

[...]

    Michael> Recently, I open sourced PACXX on github [3] under the same
    Michael> license LLVM is currently using.

Amazing! :-)

    Michael>   Since my PhD is now in its final stage I wanted to ask if
    Michael> there is interest in having such an SPMD programming model
    Michael> upstreamed.

There is probably a lot of things in your code that could be useful for
a lot of other projects related to heterogeneous computing.

It would be nice to have some common support upstreamed for all these
heterogeneous C++ languages (CUDA/OpenMP/OpenACC/OpenCL C++/SYCL/C++AMP/HCC/...)
to ease their implementation or up-streaming.
For now only CUDA & OpenMP are up-streamed I think.
Of course it is not obvious with all these heterogeneous dialects coming
with some subtle syntax, feature and semantics differences...

Are you relying on some of the up-streamed CUDA/OpenMP code for your
implementation?

Thanks for your work.
-- 
  Ronan KERYELL

Haidl, Michael via llvm-dev

2018-Feb-06 06:27 UTC

head link

[llvm-dev] [RFC] Upstreaming PACXX (Programing Accelerators with C++)

> Interesting.
> 
> I do something similar for D targeting CUDA (via NVPTX) and OpenCL (via my
> forward proved fork of Khronos’ SPIRV-LLVM)[1], except all the code
> generation is done at compile time. The runtime is aided by compile time
> reflection so that calling kernels is done by symbol.
> 
> What kind of performance difference do you see running code that was not
> developed with GPU in mind (e.g. range-v3) vs code that was?[Haidl, Michael] 
We extended range-v3 with a few GPU enabled algorithms to exploit especially
views from range-v3 for execution on GPUs. While the kernels are clearly
designed for GPUs mixing it with code like range-v3's views showed no
negative performance impacts. We evaluated against thrust in the linked paper
and were able to get on par with thrust. The views of range-v3 really come with
zero-cost abstractions. > What restrictions do you apply? I assume virtual functions, recursion. What
> else?[Haidl, Michael] 
Virtual functions are still a problem. Recursion works to some point (the stack
frame size on the GPU is the limitation here). Since PACXX builds on CUDA and
HIP we can assume that recursion is possible (with minor intervention of the
developer setting the stack size right).
Exception handling in kernels is currently not possible in PACXX.
> How does pacxx's SPMD model differ from what one can do in LLVM at the
> moment?[Haidl, Michael] 
There is not much difference. I have a little experimental branch that accepts
CUDA as input code and compiles it with PACXX. The only problem are device
specific stuff like nvptx intrinsics generated by clang for CUDA what makes a
portable execution currently impossible.  > Nic
> 
> [1]: http://github.com/libmir/dcompute/
> 
> > On 5 Feb 2018, at 7:11 am, Haidl, Michael via llvm-dev <llvm-
> dev at lists.llvm.org> wrote:
> >
> > HI LLVM comunity,
> >
> > after 3 years of development, various talks on LLVM-HPC and EuroLLVM
and
> other scientific conferences I want to present my PhD research topic to the
> lists.
> >
> > The main goal for my research was to develop a single-source
programming
> model equal to CUDA or SYCL for accelerators supported by LLVM (e.g.,
Nvidia
> GPUs). PACXX uses Clang as front-end for code generation and comes with a
> runtime library (PACXX-RT) to execute kernels on the available hardware.
> Currently, PACXX supports Nvidia GPUs through the NVPTX Target and CUDA,
> CPUs through MCJIT (including whole function vectorization thanks to RV
[1])
> and has an experimental back-end for AMD GPUs using the AMDGPU Target
> and ROCm.
> >
> > The main idea behind PACXX is the use of the LLVM IR as kernel code
> representation which is integrated into the executable together with the
> PACXX-RT. At runtime of the program the PACXX-RT compiles the IR to the
> final MC level and hands it over to the device. Since, PACXX does currently
not
> enforce any major restrictions on the C++ code we managed to run (almost)
> arbitrary C++ code on GPUs including range-v3 [2, 3].
> >
> > A short vector addition example using PACXX:
> >
> > ...
> >
> > Recently, I open sourced PACXX on github [3] under the same license
LLVM
> is currently using.
> > Since my PhD is now in its final stage I wanted to ask if there is
interest in
> having such an SPMD programming model upstreamed.
> > PACXX is currently on par with release_60 and only requires minor
> modifications to Clang, e.g., a command line switch, C++ attributes, some
> diagnostics and metadata generation during code gen.
> > The PACXX-RT can be integrated into the LLVM build system and may
remain
> a standalone project. (BTW, may I ask to add PACXX to the LLVM projects?).
> >
> > Looking forward for your feedback.
> >
> > Cheers,
> > Michael Haidl
> >
> > [1] https://github.com/cdl-saarland/rv
> > [2] https://github.com/ericniebler/range-v3
> > [3] https://dl.acm.org/authorize?N20051
> > [4] https://github.com/pacxx/pacxx-llvm
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Feb 2018 - [RFC] Upstreaming PACXX (Programing Accelerators with C++)

[llvm-dev] [RFC] Upstreaming PACXX (Programing Accelerators with C++)

[llvm-dev] [RFC] Upstreaming PACXX (Programing Accelerators with C++)

[llvm-dev] [RFC] Upstreaming PACXX (Programing Accelerators with C++)

[llvm-dev] [RFC] Upstreaming PACXX (Programing Accelerators with C++)

[llvm-dev] [RFC] Upstreaming PACXX (Programing Accelerators with C++)

Reasonably Related Threads