thr3ads.net - Nouveau - [Nouveau] CUDA fixed VA allocations and sparse mappings [Jul 2015]

If this information is useful, please help other people find it:
Share via:

Jerome Glisse

2015-Jul-08 19:40 UTC

[Nouveau] CUDA fixed VA allocations and sparse mappings

On Wed, Jul 08, 2015 at 10:51:55AM +1000, Ben Skeggs
wrote:> On 8 July 2015 at 10:47, Andrew Chew <achew at nvidia.com> wrote:
> > On Wed, Jul 08, 2015 at 10:37:34AM +1000, Ben Skeggs wrote:
> >> On 8 July 2015 at 10:31, Andrew Chew <achew at nvidia.com>
wrote:
> >> > On Wed, Jul 08, 2015 at 10:18:36AM +1000, Ben Skeggs wrote:
> >> >> > There's some minimal state that needs to be
mapped into GPU address space.
> >> >> > One thing that comes to mind are pushbuffers, which
are needed to submit
> >> >> > stuff to any engine.
> >> >> I guess you can probably use the start of the
kernel's address space
> >> >> carveout for these kind of mappings actually?  It's
not like userspace
> >> >> can ever have virtual addresses there?
> >> >
> >> > Yeah.  I'm looking into it further, but to answer your
original question,
> >> > I believe there is essentially an address range that nouveau
would know
> >> > about, which it uses for fixed address allocations (I'm
referring to how
> >> > the nvgpu driver does things...we may or may not come up with
something
> >> > different for nouveau).
> >> >
> >> > Although it's dangerous, AFAIK the allocator in nouveau
starts allocating
> >> > addresses at page 1, and as you suggested, one wouldn't
ever get a CPU
> >> > address that low.  But having a set of addresses reserved
would be much
> >> > better of course.
> >> I'm thinking more about the top of the address space.  As I
understand
> >> it, the kernel already splits the CPU virtual address space into
> >> user/system areas (3GiB/1GiB for 32-bit IIUC), or something very
> >> similar to that.
> >>
> >> Perhaps, if we can get at that information, we can use those same
> >> definitions for GPU address space?
> >
> > Ah, I get what you're saying.  Sure, I think that might be okay. 
Not sure
> > how we would get at that information, though, and it would be horrible
to
> > just bake it in somewhere.  I'm looking into how nvgpu driver does
it...
> > maybe they have good reasons to do it the way they do.  Sorry if I go
> > quiet for a little bit...
> After a very quick look, it looks like the kernel defines a
> PAGE_OFFSET macro which is the start of kernel virtual address space.
You need to be carefull here, first the hardware might not have as many bit
as the CPU. For instance x86-64 have a 48bits for virtual address ie only
48bits of the address is meaning full, older radeon (<CI iirc) only have
40bits for the address bus. With such configuration you could not move all
private kernel allocation inside the kernel zone.

Second issue is thing like 32bit process on 64bit kernel, in which case
you have the usual 3GB userspace, 1GB kernel space split. So instead of
using PAGE_OFFSET you might want to use TASK_SIZE which is a macro that
will lookup the limit using the current (process struct pointer).

I think issue for nouveau is that kernel space already handle some
allocation of virtual address, while for radeon the whole virtual address
space is fully under the userspace control.

Given this, you might want to use trick on both side (kernel and user
space). For instance you could mmap a region with PROT_NONE to reserve
a range of virtual address from userspace, then tell the driver about
that range and have the driver initialize the GPU and use that chunk
for kernel private structure allocation.

Issue is that it is kind of a API violation for nouveau kernel driver.
Thought i am not familiar enough, maybe you can do ioctl to nouveau
before nouveau inialize and allocate the kernel private buffer (gr and
other stuff). If so then problem solve i guess. Process that want to
use CUDA will need to do the mmap dance and early ioctl.

Hope this helps, cheers
Jérôme

Andrew Chew

2015-Jul-08 21:18 UTC

head link

[Nouveau] CUDA fixed VA allocations and sparse mappings

> > > Ah, I get what you're saying.  Sure, I think that might be
okay.  Not sure
> > > how we would get at that information, though, and it would be
horrible to
> > > just bake it in somewhere.  I'm looking into how nvgpu driver
does it...
> > > maybe they have good reasons to do it the way they do.  Sorry if
I go
> > > quiet for a little bit...
> > After a very quick look, it looks like the kernel defines a
> > PAGE_OFFSET macro which is the start of kernel virtual address space.
> 
> You need to be carefull here, first the hardware might not have as many bit
> as the CPU. For instance x86-64 have a 48bits for virtual address ie only
> 48bits of the address is meaning full, older radeon (<CI iirc) only have
> 40bits for the address bus. With such configuration you could not move all
> private kernel allocation inside the kernel zone.
> 
> Second issue is thing like 32bit process on 64bit kernel, in which case
> you have the usual 3GB userspace, 1GB kernel space split. So instead of
> using PAGE_OFFSET you might want to use TASK_SIZE which is a macro that
> will lookup the limit using the current (process struct pointer).
> 
> I think issue for nouveau is that kernel space already handle some
> allocation of virtual address, while for radeon the whole virtual address
> space is fully under the userspace control.
> 
> Given this, you might want to use trick on both side (kernel and user
> space). For instance you could mmap a region with PROT_NONE to reserve
> a range of virtual address from userspace, then tell the driver about
> that range and have the driver initialize the GPU and use that chunk
> for kernel private structure allocation.
> 
> Issue is that it is kind of a API violation for nouveau kernel driver.
> Thought i am not familiar enough, maybe you can do ioctl to nouveau
> before nouveau inialize and allocate the kernel private buffer (gr and
> other stuff). If so then problem solve i guess. Process that want to
> use CUDA will need to do the mmap dance and early ioctl.
I think we can have a nouveau ioctl to report the full address range that
the GPU supports.  Userspace can use this information to know what range
it can reserve.  The reservation part we can do with the original AS_ALLOC
and AS_FREE nouveau ioctls that I originally proposed, and in the CUDA case,
this reservation should happen before any channel for a particular context
gets created.

Andrew Chew

2015-Jul-13 18:45 UTC

head link

[Nouveau] CUDA fixed VA allocations and sparse mappings

I apologize for my ignorance.  In digging through nouveau, I've become
a bit confused regarding the relationship between virtual address
allocations and nouveau bo's.
>From my reading of the code, it seems that a nouveau_bo reallyencapsulates a buffer (whether imported, or allocated within nouveau like,
say, pushbuffers).  So I'm confused about an earlier statement that to
allocate a chunk of address space, I have to create a nouveau_bo for it.

What I really want to do is reserve some space in the address allocator
(the stuff in nvkm/subdev/mmu/base.c).  Note that there are no buffers
at this time.  This is just blocking out some chunk of the address space
so that normal address space allocations (for, say, bo's) avoid this region.

At some point after that, I'd like to import a buffer, and map it to
certain regions of my pre-allocated address space.  This is why I can't
go through the normal path of importing a buffer...that path assumes there
is no address for this buffer, and tries to allocate one.  In our case,
we already have an address in mind.  Naively, at this point, I'd like to
create a nouveau_bo for this imported buffer, but not have it go through
the address allocator and instead just take a fixed address.

Can someone clear up some of my confusion?

Nouveau - Jul 2015 - CUDA fixed VA allocations and sparse mappings

[Nouveau] CUDA fixed VA allocations and sparse mappings

[Nouveau] CUDA fixed VA allocations and sparse mappings

[Nouveau] CUDA fixed VA allocations and sparse mappings