On Wed, Jul 8, 2015 at 6:58 AM, Ben Skeggs <skeggsb at gmail.com> wrote:> On 8 July 2015 at 09:53, C Bergström <cbergstrom at pathscale.com> wrote: >> regarding >> -------- >> Fixed address allocations weren't going to be part of that, but I see >> that it makes sense for a variety of use cases. One question I have >> here is how this is intended to work where the RM needs to make some >> of these allocations itself (for graphics context mapping, etc), how >> should potential conflicts with user mappings be handled? >> -------- >> As an initial implemetation you can probably assume that the GPU >> offloading is in "exclusive" mode. Basically that the CUDA or OpenACC >> code has full ownership of the card. The Tesla cards don't even have a >> video out on them. To complicate this even more - some offloading code >> has very long running kernels and even worse - may critically depend >> on using the full available GPU ram. (Large matrix sizes and soon big >> Fortran arrays or complex data types) > This doesn't change that, to setup the graphics engine, the driver > needs to map various system-use data structures into the channel's > address space *somewhere* :)I'm not sure I follow exactly what you mean, but I think the answer is - don't setup the graphics engine if you're in "compute" mode. Doing that, iiuc, will at least provide a start to support for compute. Anyone who argues that graphics+compute is critical to have working at the same time is probably a 1%.
On Tue, Jul 7, 2015 at 8:07 PM, C Bergström <cbergstrom at pathscale.com> wrote:> On Wed, Jul 8, 2015 at 6:58 AM, Ben Skeggs <skeggsb at gmail.com> wrote: >> On 8 July 2015 at 09:53, C Bergström <cbergstrom at pathscale.com> wrote: >>> regarding >>> -------- >>> Fixed address allocations weren't going to be part of that, but I see >>> that it makes sense for a variety of use cases. One question I have >>> here is how this is intended to work where the RM needs to make some >>> of these allocations itself (for graphics context mapping, etc), how >>> should potential conflicts with user mappings be handled? >>> -------- >>> As an initial implemetation you can probably assume that the GPU >>> offloading is in "exclusive" mode. Basically that the CUDA or OpenACC >>> code has full ownership of the card. The Tesla cards don't even have a >>> video out on them. To complicate this even more - some offloading code >>> has very long running kernels and even worse - may critically depend >>> on using the full available GPU ram. (Large matrix sizes and soon big >>> Fortran arrays or complex data types) >> This doesn't change that, to setup the graphics engine, the driver >> needs to map various system-use data structures into the channel's >> address space *somewhere* :) > > I'm not sure I follow exactly what you mean, but I think the answer is > - don't setup the graphics engine if you're in "compute" mode. Doing > that, iiuc, will at least provide a start to support for compute. > Anyone who argues that graphics+compute is critical to have working at > the same time is probably a 1%.On NVIDIA GPUs, compute _is_ part of the graphics engine... aka PGRAPH.
On Wed, Jul 8, 2015 at 7:08 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:> On Tue, Jul 7, 2015 at 8:07 PM, C Bergström <cbergstrom at pathscale.com> wrote: >> On Wed, Jul 8, 2015 at 6:58 AM, Ben Skeggs <skeggsb at gmail.com> wrote: >>> On 8 July 2015 at 09:53, C Bergström <cbergstrom at pathscale.com> wrote: >>>> regarding >>>> -------- >>>> Fixed address allocations weren't going to be part of that, but I see >>>> that it makes sense for a variety of use cases. One question I have >>>> here is how this is intended to work where the RM needs to make some >>>> of these allocations itself (for graphics context mapping, etc), how >>>> should potential conflicts with user mappings be handled? >>>> -------- >>>> As an initial implemetation you can probably assume that the GPU >>>> offloading is in "exclusive" mode. Basically that the CUDA or OpenACC >>>> code has full ownership of the card. The Tesla cards don't even have a >>>> video out on them. To complicate this even more - some offloading code >>>> has very long running kernels and even worse - may critically depend >>>> on using the full available GPU ram. (Large matrix sizes and soon big >>>> Fortran arrays or complex data types) >>> This doesn't change that, to setup the graphics engine, the driver >>> needs to map various system-use data structures into the channel's >>> address space *somewhere* :) >> >> I'm not sure I follow exactly what you mean, but I think the answer is >> - don't setup the graphics engine if you're in "compute" mode. Doing >> that, iiuc, will at least provide a start to support for compute. >> Anyone who argues that graphics+compute is critical to have working at >> the same time is probably a 1%. > > On NVIDIA GPUs, compute _is_ part of the graphics engine... aka PGRAPH.You can afaik setup PGRAPH without mapping memory for graphics. You just init the engine and get out of the way.
Reasonably Related Threads
- CUDA fixed VA allocations and sparse mappings
- CUDA fixed VA allocations and sparse mappings
- CUDA fixed VA allocations and sparse mappings
- [Openmp-dev] [cfe-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
- PTX generation from CUDA file for compute capability 1.0 (sm_10)