thr3ads.net - Nouveau - [Nouveau] [RFC PATCH 00/13] SVM (share virtual memory) with HMM in nouveau [Mar 2018]

If this information is useful, please help other people find it:
Share via:

Daniel Vetter

2018-Mar-12 17:30 UTC

[Nouveau] [RFC PATCH 00/13] SVM (share virtual memory) with HMM in nouveau

On Sat, Mar 10, 2018 at 04:01:58PM +0100, Christian K??nig
wrote:> Good to have an example how to use HMM with an upstream driver.
> 
> Am 10.03.2018 um 04:21 schrieb jglisse at redhat.com:
> > This patchset adds SVM (Share Virtual Memory) using HMM (Heterogeneous
> > Memory Management) to the nouveau driver. SVM means that GPU threads
> > spawn by GPU driver for a specific user process can access any valid
> > CPU address in that process. A valid pointer is a pointer inside an
> > area coming from mmap of private, share or regular file. Pointer to
> > a mmap of a device file or special file are not supported.
> 
> BTW: The recent IOMMU patches which generalized the PASID handling calls
> this SVA for shared virtual address space.
> 
> We should probably sync up with those guys at some point what naming to
use.
> 
> > This is an RFC for few reasons technical reasons listed below and also
> > because we are still working on a proper open source userspace (namely
> > a OpenCL 2.0 for nouveau inside mesa). Open source userspace being a
> > requirement for the DRM subsystem. I pushed in [1] a simple standalone
> > program that can be use to test SVM through HMM with nouveau. I expect
> > we will have a somewhat working userspace in the coming weeks, work
> > being well underway and some patches have already been posted on mesa
> > mailing list.
> 
> You could use the OpenGL extensions to import arbitrary user pointers as
> bringup use case for this.
> 
> I was hoping to do the same for my ATC/HMM work on radeonsi and as far as I
> know there are even piglit tests for that.
Yeah userptr seems like a reasonable bring-up use-case for stuff like
this, makes it all a bit more manageable. I suggested the same for the
i915 efforts. Definitely has my ack for upstream HMM/SVM uapi extensions.
> > They are work underway to revamp nouveau channel creation with a new
> > userspace API. So we might want to delay upstreaming until this lands.
> > We can stil discuss one aspect specific to HMM here namely the issue
> > around GEM objects used for some specific part of the GPU. Some engine
> > inside the GPU (engine are a GPU block like the display block which
> > is responsible of scaning memory to send out a picture through some
> > connector for instance HDMI or DisplayPort) can only access memory
> > with virtual address below (1 << 40). To accomodate those we
need to
> > create a "hole" inside the process address space. This
patchset have
> > a hack for that (patch 13 HACK FOR HMM AREA), it reserves a range of
> > device file offset so that process can mmap this range with PROT_NONE
> > to create a hole (process must make sure the hole is below 1 <<
40).
> > I feel un-easy of doing it this way but maybe it is ok with other
> > folks.
> 
> Well we have essentially the same problem with pre gfx9 AMD hardware. Felix
> might have some advise how it was solved for HSA.
Couldn't we do an in-kernel address space for those special gpu blocks? As
long as it's display the kernel needs to manage it anyway, and adding a
2nd mapping when you pin/unpin for scanout usage shouldn't really matter
(as long as you cache the mapping until the buffer gets thrown out of
vram). More-or-less what we do for i915 (where we have an entirely
separate address space for these things which is 4G on the latest chips).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Jerome Glisse

2018-Mar-12 17:50 UTC

head link

[Nouveau] [RFC PATCH 00/13] SVM (share virtual memory) with HMM in nouveau

On Mon, Mar 12, 2018 at 06:30:09PM +0100, Daniel Vetter
wrote:> On Sat, Mar 10, 2018 at 04:01:58PM +0100, Christian K??nig wrote:
[...]
> > > They are work underway to revamp nouveau channel creation with a
new
> > > userspace API. So we might want to delay upstreaming until this
lands.
> > > We can stil discuss one aspect specific to HMM here namely the
issue
> > > around GEM objects used for some specific part of the GPU. Some
engine
> > > inside the GPU (engine are a GPU block like the display block
which
> > > is responsible of scaning memory to send out a picture through
some
> > > connector for instance HDMI or DisplayPort) can only access
memory
> > > with virtual address below (1 << 40). To accomodate those
we need to
> > > create a "hole" inside the process address space. This
patchset have
> > > a hack for that (patch 13 HACK FOR HMM AREA), it reserves a range
of
> > > device file offset so that process can mmap this range with
PROT_NONE
> > > to create a hole (process must make sure the hole is below 1
<< 40).
> > > I feel un-easy of doing it this way but maybe it is ok with other
> > > folks.
> > 
> > Well we have essentially the same problem with pre gfx9 AMD hardware.
Felix
> > might have some advise how it was solved for HSA.
> 
> Couldn't we do an in-kernel address space for those special gpu blocks?
As
> long as it's display the kernel needs to manage it anyway, and adding a
> 2nd mapping when you pin/unpin for scanout usage shouldn't really
matter
> (as long as you cache the mapping until the buffer gets thrown out of
> vram). More-or-less what we do for i915 (where we have an entirely
> separate address space for these things which is 4G on the latest chips).
> -Daniel
We can not do an in-kernel address space for those. We already have an
in kernel address space but it does not apply for the object considered
here.

For NVidia (i believe this is the same for AMD AFAIK) the objects we
are talking about are objects that must be in the same address space
as the one against which process's shader/dma/... get executed.

For instance command buffer submited by userspace must be inside a
GEM object mapped inside the GPU's process address against which the
command are executed. My understanding is that the PFIFO (the engine
on nv GPU that fetch commands) first context switch to address space
associated with the channel and then starts fetching commands with
all address being interpreted against the channel address space.

Hence why we need to reserve some range in the process virtual address
space if we want to do SVM in a sane way. I mean we could just map
buffer into GPU page table and then cross fingers and toes hopping that
the process will never get any of its mmap overlapping those mapping :)

Cheers,
Jérôme

John Hubbard

2018-Mar-13 06:14 UTC

head link

[Nouveau] [RFC PATCH 00/13] SVM (share virtual memory) with HMM in nouveau

On 03/12/2018 10:50 AM, Jerome Glisse wrote:> On Mon, Mar 12, 2018 at 06:30:09PM +0100, Daniel Vetter wrote:
>> On Sat, Mar 10, 2018 at 04:01:58PM +0100, Christian K??nig wrote:
> 
> [...]
> 
>>>> They are work underway to revamp nouveau channel creation with
a new
>>>> userspace API. So we might want to delay upstreaming until this
lands.
>>>> We can stil discuss one aspect specific to HMM here namely the
issue
>>>> around GEM objects used for some specific part of the GPU. Some
engine
>>>> inside the GPU (engine are a GPU block like the display block
which
>>>> is responsible of scaning memory to send out a picture through
some
>>>> connector for instance HDMI or DisplayPort) can only access
memory
>>>> with virtual address below (1 << 40). To accomodate those
we need to
>>>> create a "hole" inside the process address space.
This patchset have
>>>> a hack for that (patch 13 HACK FOR HMM AREA), it reserves a
range of
>>>> device file offset so that process can mmap this range with
PROT_NONE
>>>> to create a hole (process must make sure the hole is below 1
<< 40).
>>>> I feel un-easy of doing it this way but maybe it is ok with
other
>>>> folks.
>>>
>>> Well we have essentially the same problem with pre gfx9 AMD
hardware. Felix
>>> might have some advise how it was solved for HSA.
>>
>> Couldn't we do an in-kernel address space for those special gpu
blocks? As
>> long as it's display the kernel needs to manage it anyway, and
adding a
>> 2nd mapping when you pin/unpin for scanout usage shouldn't really
matter
>> (as long as you cache the mapping until the buffer gets thrown out of
>> vram). More-or-less what we do for i915 (where we have an entirely
>> separate address space for these things which is 4G on the latest
chips).
>> -Daniel
> 
> We can not do an in-kernel address space for those. We already have an
> in kernel address space but it does not apply for the object considered
> here.
> 
> For NVidia (i believe this is the same for AMD AFAIK) the objects we
> are talking about are objects that must be in the same address space
> as the one against which process's shader/dma/... get executed.
> 
> For instance command buffer submited by userspace must be inside a
> GEM object mapped inside the GPU's process address against which the
> command are executed. My understanding is that the PFIFO (the engine
> on nv GPU that fetch commands) first context switch to address space
> associated with the channel and then starts fetching commands with
> all address being interpreted against the channel address space.
> 
> Hence why we need to reserve some range in the process virtual address
> space if we want to do SVM in a sane way. I mean we could just map
> buffer into GPU page table and then cross fingers and toes hopping that
> the process will never get any of its mmap overlapping those mapping :)
> 
> Cheers,
> Jérôme
> 
Hi Jerome and all,

Yes, on NVIDIA GPUs, the Host/FIFO unit is limited to 40-bit addresses, so
things such as the following need to be below (1 << 40), and also
accessible
to both CPU (user space) and GPU hardware. 
    -- command buffers (CPU user space driver fills them, GPU consumes them), 
    -- semaphores (here, a GPU-centric term, rather than OS-type: these are
       memory locations that, for example, the GPU hardware might write to, in
       order to indicate work completion; there are other uses as well), 
    -- a few other things most likely (this is not a complete list).

So what I'd tentatively expect that to translate into in the driver stack
is,
approximately:

    -- User space driver code mmap's an area below (1 << 40). It's
hard to avoid this,
       given that user space needs access to the area (for filling out command
       buffers and monitoring semaphores, that sort of thing). Then suballocate
       from there using mmap's MAP_FIXED or (future-ish) MAP_FIXED_SAFE
flags.

       ...glancing at the other fork of this thread, I think that is exactly
what
       Felix is saying, too. So that's good.

    -- The user space program sits above the user space driver, and although the
       program could, in theory, interfere with this mmap'd area, that would
be
       wrong in the same way that mucking around with malloc'd areas
(outside of
       malloc() itself) is wrong. So I don't see any particular need to do
much
       more than the above.

thanks,
-- 
John Hubbard
NVIDIA

Daniel Vetter

2018-Mar-13 10:46 UTC

head link

[Nouveau] [RFC PATCH 00/13] SVM (share virtual memory) with HMM in nouveau

On Mon, Mar 12, 2018 at 01:50:58PM -0400, Jerome Glisse
wrote:> On Mon, Mar 12, 2018 at 06:30:09PM +0100, Daniel Vetter wrote:
> > On Sat, Mar 10, 2018 at 04:01:58PM +0100, Christian K??nig wrote:
> 
> [...]
> 
> > > > They are work underway to revamp nouveau channel creation
with a new
> > > > userspace API. So we might want to delay upstreaming until
this lands.
> > > > We can stil discuss one aspect specific to HMM here namely
the issue
> > > > around GEM objects used for some specific part of the GPU.
Some engine
> > > > inside the GPU (engine are a GPU block like the display
block which
> > > > is responsible of scaning memory to send out a picture
through some
> > > > connector for instance HDMI or DisplayPort) can only access
memory
> > > > with virtual address below (1 << 40). To accomodate
those we need to
> > > > create a "hole" inside the process address space.
This patchset have
> > > > a hack for that (patch 13 HACK FOR HMM AREA), it reserves a
range of
> > > > device file offset so that process can mmap this range with
PROT_NONE
> > > > to create a hole (process must make sure the hole is below 1
<< 40).
> > > > I feel un-easy of doing it this way but maybe it is ok with
other
> > > > folks.
> > > 
> > > Well we have essentially the same problem with pre gfx9 AMD
hardware. Felix
> > > might have some advise how it was solved for HSA.
> > 
> > Couldn't we do an in-kernel address space for those special gpu
blocks? As
> > long as it's display the kernel needs to manage it anyway, and
adding a
> > 2nd mapping when you pin/unpin for scanout usage shouldn't really
matter
> > (as long as you cache the mapping until the buffer gets thrown out of
> > vram). More-or-less what we do for i915 (where we have an entirely
> > separate address space for these things which is 4G on the latest
chips).
> > -Daniel
> 
> We can not do an in-kernel address space for those. We already have an
> in kernel address space but it does not apply for the object considered
> here.
> 
> For NVidia (i believe this is the same for AMD AFAIK) the objects we
> are talking about are objects that must be in the same address space
> as the one against which process's shader/dma/... get executed.
> 
> For instance command buffer submited by userspace must be inside a
> GEM object mapped inside the GPU's process address against which the
> command are executed. My understanding is that the PFIFO (the engine
> on nv GPU that fetch commands) first context switch to address space
> associated with the channel and then starts fetching commands with
> all address being interpreted against the channel address space.
> 
> Hence why we need to reserve some range in the process virtual address
> space if we want to do SVM in a sane way. I mean we could just map
> buffer into GPU page table and then cross fingers and toes hopping that
> the process will never get any of its mmap overlapping those mapping :)
Ah, from the example I got the impression it's just the display engine
that has this restriction. CS/PFIFO having the same restriction is indeed
more fun.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Maybe Matching Threads

Search for more possibly parallel threads

Nouveau - Mar 2018 - [RFC PATCH 00/13] SVM (share virtual memory) with HMM in nouveau

[Nouveau] [RFC PATCH 00/13] SVM (share virtual memory) with HMM in nouveau

[Nouveau] [RFC PATCH 00/13] SVM (share virtual memory) with HMM in nouveau

[Nouveau] [RFC PATCH 00/13] SVM (share virtual memory) with HMM in nouveau

[Nouveau] [RFC PATCH 00/13] SVM (share virtual memory) with HMM in nouveau

Maybe Matching Threads