On Wed, Jul 8, 2015 at 6:58 AM, Ben Skeggs <skeggsb at gmail.com> wrote:> On 8 July 2015 at 09:53, C Bergström <cbergstrom at pathscale.com> wrote: >> regarding >> -------- >> Fixed address allocations weren't going to be part of that, but I see >> that it makes sense for a variety of use cases. One question I have >> here is how this is intended to work where the RM needs to make some >> of these allocations itself (for graphics context mapping, etc), how >> should potential conflicts with user mappings be handled? >> -------- >> As an initial implemetation you can probably assume that the GPU >> offloading is in "exclusive" mode. Basically that the CUDA or OpenACC >> code has full ownership of the card. The Tesla cards don't even have a >> video out on them. To complicate this even more - some offloading code >> has very long running kernels and even worse - may critically depend >> on using the full available GPU ram. (Large matrix sizes and soon big >> Fortran arrays or complex data types) > This doesn't change that, to setup the graphics engine, the driver > needs to map various system-use data structures into the channel's > address space *somewhere* :)I'm not sure I follow exactly what you mean, but I think the answer is - don't setup the graphics engine if you're in "compute" mode. Doing that, iiuc, will at least provide a start to support for compute. Anyone who argues that graphics+compute is critical to have working at the same time is probably a 1%.
On Tue, Jul 7, 2015 at 8:07 PM, C Bergström <cbergstrom at pathscale.com> wrote:> On Wed, Jul 8, 2015 at 6:58 AM, Ben Skeggs <skeggsb at gmail.com> wrote: >> On 8 July 2015 at 09:53, C Bergström <cbergstrom at pathscale.com> wrote: >>> regarding >>> -------- >>> Fixed address allocations weren't going to be part of that, but I see >>> that it makes sense for a variety of use cases. One question I have >>> here is how this is intended to work where the RM needs to make some >>> of these allocations itself (for graphics context mapping, etc), how >>> should potential conflicts with user mappings be handled? >>> -------- >>> As an initial implemetation you can probably assume that the GPU >>> offloading is in "exclusive" mode. Basically that the CUDA or OpenACC >>> code has full ownership of the card. The Tesla cards don't even have a >>> video out on them. To complicate this even more - some offloading code >>> has very long running kernels and even worse - may critically depend >>> on using the full available GPU ram. (Large matrix sizes and soon big >>> Fortran arrays or complex data types) >> This doesn't change that, to setup the graphics engine, the driver >> needs to map various system-use data structures into the channel's >> address space *somewhere* :) > > I'm not sure I follow exactly what you mean, but I think the answer is > - don't setup the graphics engine if you're in "compute" mode. Doing > that, iiuc, will at least provide a start to support for compute. > Anyone who argues that graphics+compute is critical to have working at > the same time is probably a 1%.On NVIDIA GPUs, compute _is_ part of the graphics engine... aka PGRAPH.
On Wed, Jul 8, 2015 at 7:08 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:> On Tue, Jul 7, 2015 at 8:07 PM, C Bergström <cbergstrom at pathscale.com> wrote: >> On Wed, Jul 8, 2015 at 6:58 AM, Ben Skeggs <skeggsb at gmail.com> wrote: >>> On 8 July 2015 at 09:53, C Bergström <cbergstrom at pathscale.com> wrote: >>>> regarding >>>> -------- >>>> Fixed address allocations weren't going to be part of that, but I see >>>> that it makes sense for a variety of use cases. One question I have >>>> here is how this is intended to work where the RM needs to make some >>>> of these allocations itself (for graphics context mapping, etc), how >>>> should potential conflicts with user mappings be handled? >>>> -------- >>>> As an initial implemetation you can probably assume that the GPU >>>> offloading is in "exclusive" mode. Basically that the CUDA or OpenACC >>>> code has full ownership of the card. The Tesla cards don't even have a >>>> video out on them. To complicate this even more - some offloading code >>>> has very long running kernels and even worse - may critically depend >>>> on using the full available GPU ram. (Large matrix sizes and soon big >>>> Fortran arrays or complex data types) >>> This doesn't change that, to setup the graphics engine, the driver >>> needs to map various system-use data structures into the channel's >>> address space *somewhere* :) >> >> I'm not sure I follow exactly what you mean, but I think the answer is >> - don't setup the graphics engine if you're in "compute" mode. Doing >> that, iiuc, will at least provide a start to support for compute. >> Anyone who argues that graphics+compute is critical to have working at >> the same time is probably a 1%. > > On NVIDIA GPUs, compute _is_ part of the graphics engine... aka PGRAPH.You can afaik setup PGRAPH without mapping memory for graphics. You just init the engine and get out of the way.