thr3ads.net - Nouveau - [Nouveau] CUDA fixed VA allocations and sparse mappings [Jul 2015]

If this information is useful, please help other people find it:
Share via:

Andrew Chew

2015-Jul-08 00:15 UTC

[Nouveau] CUDA fixed VA allocations and sparse mappings

On Tue, Jul 07, 2015 at 08:13:28PM -0400, Ilia Mirkin
wrote:> On Tue, Jul 7, 2015 at 8:11 PM, C Bergström <cbergstrom at
pathscale.com> wrote:
> > On Wed, Jul 8, 2015 at 7:08 AM, Ilia Mirkin <imirkin at
alum.mit.edu> wrote:
> >> On Tue, Jul 7, 2015 at 8:07 PM, C Bergström <cbergstrom at
pathscale.com> wrote:
> >>> On Wed, Jul 8, 2015 at 6:58 AM, Ben Skeggs <skeggsb at
gmail.com> wrote:
> >>>> On 8 July 2015 at 09:53, C Bergström <cbergstrom at
pathscale.com> wrote:
> >>>>> regarding
> >>>>> --------
> >>>>> Fixed address allocations weren't going to be part
of that, but I see
> >>>>> that it makes sense for a variety of use cases.  One
question I have
> >>>>> here is how this is intended to work where the RM
needs to make some
> >>>>> of these allocations itself (for graphics context
mapping, etc), how
> >>>>> should potential conflicts with user mappings be
handled?
> >>>>> --------
> >>>>> As an initial implemetation you can probably assume
that the GPU
> >>>>> offloading is in "exclusive" mode. Basically
that the CUDA or OpenACC
> >>>>> code has full ownership of the card. The Tesla cards
don't even have a
> >>>>> video out on them. To complicate this even more - some
offloading code
> >>>>> has very long running kernels and even worse - may
critically depend
> >>>>> on using the full available GPU ram. (Large matrix
sizes and soon big
> >>>>> Fortran arrays or complex data types)
> >>>> This doesn't change that, to setup the graphics
engine, the driver
> >>>> needs to map various system-use data structures into the
channel's
> >>>> address space *somewhere* :)
> >>>
> >>> I'm not sure I follow exactly what you mean, but I think
the answer is
> >>> - don't setup the graphics engine if you're in
"compute" mode. Doing
> >>> that, iiuc, will at least provide a start to support for
compute.
> >>> Anyone who argues that graphics+compute is critical to have
working at
> >>> the same time is probably a 1%.
> >>
> >> On NVIDIA GPUs, compute _is_ part of the graphics engine... aka
PGRAPH.
> >
> > You can afaik setup PGRAPH without mapping memory for graphics. You
> > just init the engine and get out of the way.
> 
> But... you need to map memory to set up the engine. Not a lot, but
> it's gotta go *somewhere*.
There's some minimal state that needs to be mapped into GPU address space.
One thing that comes to mind are pushbuffers, which are needed to submit
stuff to any engine.

Ben Skeggs

2015-Jul-08 00:18 UTC

head link

[Nouveau] CUDA fixed VA allocations and sparse mappings

On 8 July 2015 at 10:15, Andrew Chew <achew at nvidia.com>
wrote:> On Tue, Jul 07, 2015 at 08:13:28PM -0400, Ilia Mirkin wrote:
>> On Tue, Jul 7, 2015 at 8:11 PM, C Bergström <cbergstrom at
pathscale.com> wrote:
>> > On Wed, Jul 8, 2015 at 7:08 AM, Ilia Mirkin <imirkin at
alum.mit.edu> wrote:
>> >> On Tue, Jul 7, 2015 at 8:07 PM, C Bergström <cbergstrom at
pathscale.com> wrote:
>> >>> On Wed, Jul 8, 2015 at 6:58 AM, Ben Skeggs <skeggsb at
gmail.com> wrote:
>> >>>> On 8 July 2015 at 09:53, C Bergström <cbergstrom at
pathscale.com> wrote:
>> >>>>> regarding
>> >>>>> --------
>> >>>>> Fixed address allocations weren't going to be
part of that, but I see
>> >>>>> that it makes sense for a variety of use cases. 
One question I have
>> >>>>> here is how this is intended to work where the RM
needs to make some
>> >>>>> of these allocations itself (for graphics context
mapping, etc), how
>> >>>>> should potential conflicts with user mappings be
handled?
>> >>>>> --------
>> >>>>> As an initial implemetation you can probably
assume that the GPU
>> >>>>> offloading is in "exclusive" mode.
Basically that the CUDA or OpenACC
>> >>>>> code has full ownership of the card. The Tesla
cards don't even have a
>> >>>>> video out on them. To complicate this even more -
some offloading code
>> >>>>> has very long running kernels and even worse - may
critically depend
>> >>>>> on using the full available GPU ram. (Large matrix
sizes and soon big
>> >>>>> Fortran arrays or complex data types)
>> >>>> This doesn't change that, to setup the graphics
engine, the driver
>> >>>> needs to map various system-use data structures into
the channel's
>> >>>> address space *somewhere* :)
>> >>>
>> >>> I'm not sure I follow exactly what you mean, but I
think the answer is
>> >>> - don't setup the graphics engine if you're in
"compute" mode. Doing
>> >>> that, iiuc, will at least provide a start to support for
compute.
>> >>> Anyone who argues that graphics+compute is critical to
have working at
>> >>> the same time is probably a 1%.
>> >>
>> >> On NVIDIA GPUs, compute _is_ part of the graphics engine...
aka PGRAPH.
>> >
>> > You can afaik setup PGRAPH without mapping memory for graphics.
You
>> > just init the engine and get out of the way.
>>
>> But... you need to map memory to set up the engine. Not a lot, but
>> it's gotta go *somewhere*.
>
> There's some minimal state that needs to be mapped into GPU address
space.
> One thing that comes to mind are pushbuffers, which are needed to submit
> stuff to any engine.I guess you can probably use the start of the kernel's address space
carveout for these kind of mappings actually?  It's not like userspace
can ever have virtual addresses there?

Ben.
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau

Andrew Chew

2015-Jul-08 00:31 UTC

head link

[Nouveau] CUDA fixed VA allocations and sparse mappings

On Wed, Jul 08, 2015 at 10:18:36AM +1000, Ben Skeggs
wrote:> > There's some minimal state that needs to be mapped into GPU
address space.
> > One thing that comes to mind are pushbuffers, which are needed to
submit
> > stuff to any engine.
> I guess you can probably use the start of the kernel's address space
> carveout for these kind of mappings actually?  It's not like userspace
> can ever have virtual addresses there?
Yeah.  I'm looking into it further, but to answer your original question,
I believe there is essentially an address range that nouveau would know
about, which it uses for fixed address allocations (I'm referring to how
the nvgpu driver does things...we may or may not come up with something
different for nouveau).

Although it's dangerous, AFAIK the allocator in nouveau starts allocating
addresses at page 1, and as you suggested, one wouldn't ever get a CPU
address that low.  But having a set of addresses reserved would be much
better of course.

Ken Adams

2015-Jul-08 18:27 UTC

head link

[Nouveau] CUDA fixed VA allocations and sparse mappings

responding to this bit of text from ben below:> "I guess you can probably use the start of the kernel's address
spacecarveout for these kind of mappings actually?  It's not like userspace
can ever have virtual addresses there?"



one of the salient points of how we implement gr and compute setup is that
these buffer regions (shared, global, any but for a hole 0-128MB) are
allocated dynamically.  an address space can be setup well in advance and
as long as the gr/compute engine setup buffer allocator is playing along
(i.e honoring the previously allocated regions) things work out just fine.
 the term we use internally is "anonymous" address spaces.  unbound,
unused as yet. 

now, as for how the gpu and cpu address ranges work/or don't: that's up
to
the user space code to work through.  the cuda guys have various
techniques to make it unified (some work in 64b only, some both, and
almost all require specific API conditions).  but, as long as we can have
them tell the kernel what gpu ranges to avoid (by allocating them in
advance) it's up to that code to fulfill the cpu portion.

---
ken 

On 7/7/15, 8:18 PM, "Nouveau on behalf of Ben Skeggs"
<nouveau-bounces at lists.freedesktop.org on behalf of skeggsb at
gmail.com>
wrote:
>On 8 July 2015 at 10:15, Andrew Chew <achew at nvidia.com> wrote:
>> On Tue, Jul 07, 2015 at 08:13:28PM -0400, Ilia Mirkin wrote:
>>> On Tue, Jul 7, 2015 at 8:11 PM, C Bergström <cbergstrom at
pathscale.com>
>>>wrote:
>>> > On Wed, Jul 8, 2015 at 7:08 AM, Ilia Mirkin <imirkin at
alum.mit.edu>
>>>wrote:
>>> >> On Tue, Jul 7, 2015 at 8:07 PM, C Bergström
>>><cbergstrom at pathscale.com> wrote:
>>> >>> On Wed, Jul 8, 2015 at 6:58 AM, Ben Skeggs <skeggsb
at gmail.com>
>>>wrote:
>>> >>>> On 8 July 2015 at 09:53, C Bergström
<cbergstrom at pathscale.com>
>>>wrote:
>>> >>>>> regarding
>>> >>>>> --------
>>> >>>>> Fixed address allocations weren't going to
be part of that, but
>>>I see
>>> >>>>> that it makes sense for a variety of use
cases.  One question I
>>>have
>>> >>>>> here is how this is intended to work where the
RM needs to make
>>>some
>>> >>>>> of these allocations itself (for graphics
context mapping, etc),
>>>how
>>> >>>>> should potential conflicts with user mappings
be handled?
>>> >>>>> --------
>>> >>>>> As an initial implemetation you can probably
assume that the GPU
>>> >>>>> offloading is in "exclusive" mode.
Basically that the CUDA or
>>>OpenACC
>>> >>>>> code has full ownership of the card. The Tesla
cards don't even
>>>have a
>>> >>>>> video out on them. To complicate this even
more - some
>>>offloading code
>>> >>>>> has very long running kernels and even worse -
may critically
>>>depend
>>> >>>>> on using the full available GPU ram. (Large
matrix sizes and
>>>soon big
>>> >>>>> Fortran arrays or complex data types)
>>> >>>> This doesn't change that, to setup the
graphics engine, the driver
>>> >>>> needs to map various system-use data structures
into the channel's
>>> >>>> address space *somewhere* :)
>>> >>>
>>> >>> I'm not sure I follow exactly what you mean, but I
think the
>>>answer is
>>> >>> - don't setup the graphics engine if you're in
"compute" mode.
>>>Doing
>>> >>> that, iiuc, will at least provide a start to support
for compute.
>>> >>> Anyone who argues that graphics+compute is critical to
have
>>>working at
>>> >>> the same time is probably a 1%.
>>> >>
>>> >> On NVIDIA GPUs, compute _is_ part of the graphics
engine... aka
>>>PGRAPH.
>>> >
>>> > You can afaik setup PGRAPH without mapping memory for
graphics. You
>>> > just init the engine and get out of the way.
>>>
>>> But... you need to map memory to set up the engine. Not a lot, but
>>> it's gotta go *somewhere*.
>>
>> There's some minimal state that needs to be mapped into GPU address
>>space.
>> One thing that comes to mind are pushbuffers, which are needed to
submit
>> stuff to any engine.
>I guess you can probably use the start of the kernel's address space
>carveout for these kind of mappings actually?  It's not like userspace
>can ever have virtual addresses there?
>
>Ben.
>
>> _______________________________________________
>> Nouveau mailing list
>> Nouveau at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/nouveau
>_______________________________________________
>Nouveau mailing list
>Nouveau at lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/nouveau

Reasonably Related Threads

Search for more reasonably related threads

Nouveau - Jul 2015 - CUDA fixed VA allocations and sparse mappings

[Nouveau] CUDA fixed VA allocations and sparse mappings

[Nouveau] CUDA fixed VA allocations and sparse mappings

[Nouveau] CUDA fixed VA allocations and sparse mappings

[Nouveau] CUDA fixed VA allocations and sparse mappings

Reasonably Related Threads