Displaying 4 results from an estimated 4 matches for "add_mystery_value".
2016 Mar 09
2
RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
...the kernel will be written.
//
// The kernel adds a fixed floating point value to the input and writes
the
// result to the output location.
static constexpr const char *KERNEL_PTX = R"(
.version 3.1
.target sm_20
.address_size 64
.visible .entry add_mystery_value(
.param .f32 float_literal,
.param .u64 result_loc
) {
.reg .u64 %rl<2>;
.reg .f32 %f<2>;
ld.param.f32 %f1, [float_literal];
ld.param.u64 %rl1, [result_loc];
add.f32 %f1, %f1, 123.0;
st.f32 [...
2016 Mar 09
2
RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
...// The kernel adds a fixed floating point value to the input and
> writes the
> // result to the output location.
> static constexpr const char *KERNEL_PTX = R"(
> .version 3.1
> .target sm_20
> .address_size 64
> .visible .entry add_mystery_value(
> .param .f32 float_literal,
> .param .u64 result_loc
> ) {
> .reg .u64 %rl<2>;
> .reg .f32 %f<2>;
> ld.param.f32 %f1, [float_literal];
> ld.param.u64 %rl1, [result_loc];
> add...
2016 Mar 10
2
RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
...ed floating point value to the input and
>> writes the
>> // result to the output location.
>> static constexpr const char *KERNEL_PTX = R"(
>> .version 3.1
>> .target sm_20
>> .address_size 64
>> .visible .entry add_mystery_value(
>> .param .f32 float_literal,
>> .param .u64 result_loc
>> ) {
>> .reg .u64 %rl<2>;
>> .reg .f32 %f<2>;
>> ld.param.f32 %f1, [float_literal];
>> ld.param.u64 %rl1, [res...
2016 Mar 10
2
RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
...cal SPARK C++ API.
>
> Please correct me if I'm misunderstanding your proposal, but I think
> the essence of what you want from the compiler is type safety for
> accelerator kernel launches, i.e., you would like the frontend to
> parse, check, and codegen for the construct:
> add_mystery_value<<<1, 1>>>(kernel_input_argument, *result.ptr());
>
> Is that a correct understanding?
>
Without answering your question, I'll point out that, as I understand it, StreamExecutor completely replaces the CUDA userspace library runtime components and talks directly to...