search for: add_mystery_value

Displaying 4 results from an estimated 4 matches for "add_mystery_value".

2016 Mar 09
2
RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
...the kernel will be written. // // The kernel adds a fixed floating point value to the input and writes the // result to the output location. static constexpr const char *KERNEL_PTX = R"( .version 3.1 .target sm_20 .address_size 64 .visible .entry add_mystery_value( .param .f32 float_literal, .param .u64 result_loc ) { .reg .u64 %rl<2>; .reg .f32 %f<2>; ld.param.f32 %f1, [float_literal]; ld.param.u64 %rl1, [result_loc]; add.f32 %f1, %f1, 123.0; st.f32 [...
2016 Mar 09
2
RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
...// The kernel adds a fixed floating point value to the input and > writes the > // result to the output location. > static constexpr const char *KERNEL_PTX = R"( > .version 3.1 > .target sm_20 > .address_size 64 > .visible .entry add_mystery_value( > .param .f32 float_literal, > .param .u64 result_loc > ) { > .reg .u64 %rl<2>; > .reg .f32 %f<2>; > ld.param.f32 %f1, [float_literal]; > ld.param.u64 %rl1, [result_loc]; > add...
2016 Mar 10
2
RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
...ed floating point value to the input and >> writes the >> // result to the output location. >> static constexpr const char *KERNEL_PTX = R"( >> .version 3.1 >> .target sm_20 >> .address_size 64 >> .visible .entry add_mystery_value( >> .param .f32 float_literal, >> .param .u64 result_loc >> ) { >> .reg .u64 %rl<2>; >> .reg .f32 %f<2>; >> ld.param.f32 %f1, [float_literal]; >> ld.param.u64 %rl1, [res...
2016 Mar 10
2
RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
...cal SPARK C++ API. > > Please correct me if I'm misunderstanding your proposal, but I think > the essence of what you want from the compiler is type safety for > accelerator kernel launches, i.e., you would like the frontend to > parse, check, and codegen for the construct: > add_mystery_value<<<1, 1>>>(kernel_input_argument, *result.ptr()); > > Is that a correct understanding? > Without answering your question, I'll point out that, as I understand it, StreamExecutor completely replaces the CUDA userspace library runtime components and talks directly to...