Displaying 4 results from an estimated 4 matches for "kernel_input_argu".
2016 Mar 09
2
RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
...perations.
se::Stream stream(executor);
// Schedule a kernel launch on the new stream and block until the
kernel
// completes. The kernel call executes asynchronously on the device,
so we
// could do more work on the host before calling BlockHostUntilDone.
const float kernel_input_argument = 42.5f;
stream.Init()
.ThenLaunch(se::ThreadDim(), se::BlockDim(), kernel,
kernel_input_argument, result.ptr())
.BlockHostUntilDone();
// Copy the result of the kernel call from device back to the host.
float host_result = 0.0f;...
2016 Mar 09
2
RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
...(executor);
>
> // Schedule a kernel launch on the new stream and block until the
> kernel
> // completes. The kernel call executes asynchronously on the device,
> so we
> // could do more work on the host before calling BlockHostUntilDone.
> const float kernel_input_argument = 42.5f;
> stream.Init()
> .ThenLaunch(se::ThreadDim(), se::BlockDim(), kernel,
> kernel_input_argument, result.ptr())
> .BlockHostUntilDone();
>
> // Copy the result of the kernel call from device back to the host.
>...
2016 Mar 10
2
RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
...; // Schedule a kernel launch on the new stream and block until the
>> kernel
>> // completes. The kernel call executes asynchronously on the
>> device, so we
>> // could do more work on the host before calling BlockHostUntilDone.
>> const float kernel_input_argument = 42.5f;
>> stream.Init()
>> .ThenLaunch(se::ThreadDim(), se::BlockDim(), kernel,
>> kernel_input_argument, result.ptr())
>> .BlockHostUntilDone();
>>
>> // Copy the result of the kernel call from device b...
2016 Mar 10
2
RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
...e if I'm misunderstanding your proposal, but I think
> the essence of what you want from the compiler is type safety for
> accelerator kernel launches, i.e., you would like the frontend to
> parse, check, and codegen for the construct:
> add_mystery_value<<<1, 1>>>(kernel_input_argument, *result.ptr());
>
> Is that a correct understanding?
>
Without answering your question, I'll point out that, as I understand it, StreamExecutor completely replaces the CUDA userspace library runtime components and talks directly to the drivers. Jason, please correct me if I'...