search for: kernel_input_argu

Displaying 4 results from an estimated 4 matches for "kernel_input_argu".

2016 Mar 09
2
RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
...perations. se::Stream stream(executor); // Schedule a kernel launch on the new stream and block until the kernel // completes. The kernel call executes asynchronously on the device, so we // could do more work on the host before calling BlockHostUntilDone. const float kernel_input_argument = 42.5f; stream.Init() .ThenLaunch(se::ThreadDim(), se::BlockDim(), kernel, kernel_input_argument, result.ptr()) .BlockHostUntilDone(); // Copy the result of the kernel call from device back to the host. float host_result = 0.0f;...
2016 Mar 09
2
RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
...(executor); > > // Schedule a kernel launch on the new stream and block until the > kernel > // completes. The kernel call executes asynchronously on the device, > so we > // could do more work on the host before calling BlockHostUntilDone. > const float kernel_input_argument = 42.5f; > stream.Init() > .ThenLaunch(se::ThreadDim(), se::BlockDim(), kernel, > kernel_input_argument, result.ptr()) > .BlockHostUntilDone(); > > // Copy the result of the kernel call from device back to the host. >...
2016 Mar 10
2
RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
...; // Schedule a kernel launch on the new stream and block until the >> kernel >> // completes. The kernel call executes asynchronously on the >> device, so we >> // could do more work on the host before calling BlockHostUntilDone. >> const float kernel_input_argument = 42.5f; >> stream.Init() >> .ThenLaunch(se::ThreadDim(), se::BlockDim(), kernel, >> kernel_input_argument, result.ptr()) >> .BlockHostUntilDone(); >> >> // Copy the result of the kernel call from device b...
2016 Mar 10
2
RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
...e if I'm misunderstanding your proposal, but I think > the essence of what you want from the compiler is type safety for > accelerator kernel launches, i.e., you would like the frontend to > parse, check, and codegen for the construct: > add_mystery_value<<<1, 1>>>(kernel_input_argument, *result.ptr()); > > Is that a correct understanding? > Without answering your question, I'll point out that, as I understand it, StreamExecutor completely replaces the CUDA userspace library runtime components and talks directly to the drivers. Jason, please correct me if I'...