thr3ads.net - Nouveau - [Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM

If this information is useful, please help other people find it:
Share via:

Danilo Krummrich

2023-Jan-29 18:46 UTC

[Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces

On 1/27/23 22:09, Danilo Krummrich wrote:> On 1/27/23 16:17, Christian K?nig wrote:
>> Am 27.01.23 um 15:44 schrieb Danilo Krummrich:
>>> [SNIP]
>>>>>>
>>>>>> What you want is one component for tracking the VA
allocations
>>>>>> (drm_mm based) and a different component/interface for
tracking
>>>>>> the VA mappings (probably rb tree based).
>>>>>
>>>>> That's what the GPUVA manager is doing. There are
gpuva_regions
>>>>> which correspond to VA allocations and gpuvas which
represent the
>>>>> mappings. Both are tracked separately (currently both with
a
>>>>> separate drm_mm, though). However, the GPUVA manager needs
to take
>>>>> regions into account when dealing with mappings to make
sure the
>>>>> GPUVA manager doesn't propose drivers to merge over
region
>>>>> boundaries. Speaking from userspace PoV, the kernel
wouldn't merge
>>>>> mappings from different VKBuffer objects even if
they're virtually
>>>>> and physically contiguous.
>>>>
>>>> That are two completely different things and shouldn't be
handled in
>>>> a single component.
>>>
>>> They are different things, but they're related in a way that
for
>>> handling the mappings (in particular merging and sparse) the GPUVA 
>>> manager needs to know the VA allocation (or region) boundaries.
>>>
>>> I have the feeling there might be a misunderstanding. Userspace is
in
>>> charge to actually allocate a portion of VA space and manage it.
The
>>> GPUVA manager just needs to know about those VA space allocations
and
>>> hence keeps track of them.
>>>
>>> The GPUVA manager is not meant to be an allocator in the sense of 
>>> finding and providing a hole for a given request.
>>>
>>> Maybe the non-ideal choice of using drm_mm was implying something
else.
>>
>> Uff, well long story short that doesn't even remotely match the 
>> requirements. This way the GPUVA manager won't be usable for a
whole
>> bunch of use cases.
>>
>> What we have are mappings which say X needs to point to Y with this 
>> and hw dependent flags.
>>
>> The whole idea of having ranges is not going to fly. Neither with AMD 
>> GPUs and I strongly think not with Intels XA either.
> 
> A range in the sense of the GPUVA manager simply represents a VA space 
> allocation (which in case of Nouveau is taken in userspace). Userspace 
> allocates the portion of VA space and lets the kernel know about it. The 
> current implementation needs that for the named reasons. So, I think 
> there is no reason why this would work with one GPU, but not with 
> another. It's just part of the design choice of the manager.
> 
> And I'm absolutely happy to discuss the details of the manager 
> implementation though.
> 
>>
>>>> We should probably talk about the design of the GPUVA manager
once
>>>> more when this should be applicable to all GPU drivers.
>>>
>>> That's what I try to figure out with this RFC, how to make it 
>>> appicable for all GPU drivers, so I'm happy to discuss this.
:-)
>>
>> Yeah, that was really good idea :) That proposal here is really far 
>> away from the actual requirements.
>>
> 
> And those are the ones I'm looking for. Do you mind sharing the 
> requirements for amdgpu in particular?
> 
>>>>> For sparse residency the kernel also needs to know the
region
>>>>> boundaries to make sure that it keeps sparse mappings
around.
>>>>
>>>> What?
>>>
>>> When userspace creates a new VKBuffer with the 
>>> VK_BUFFER_CREATE_SPARSE_BINDING_BIT the kernel may need to create 
>>> sparse mappings in order to ensure that using this buffer without
any
>>> memory backed mappings doesn't fault the GPU.
>>>
>>> Currently, the implementation does this the following way:
>>>
>>> 1. Userspace creates a new VKBuffer and hence allocates a portion
of
>>> the VA space for it. It calls into the kernel indicating the new VA
>>> space region and the fact that the region is sparse.
>>>
>>> 2. The kernel picks up the region and stores it in the GPUVA
manager,
>>> the driver creates the corresponding sparse mappings / page table 
>>> entries.
>>>
>>> 3. Userspace might ask the driver to create a couple of memory
backed
>>> mappings for this particular VA region. The GPUVA manager stores
the
>>> mapping parameters, the driver creates the corresponding page table
>>> entries.
>>>
>>> 4. Userspace might ask to unmap all the memory backed mappings from
>>> this particular VA region. The GPUVA manager removes the mapping 
>>> parameters, the driver cleans up the corresponding page table 
>>> entries. However, the driver also needs to re-create the sparse 
>>> mappings, since it's a sparse buffer, hence it needs to know
the
>>> boundaries of the region it needs to create the sparse mappings in.
>>
>> Again, this is not how things are working. First of all the kernel 
>> absolutely should *NOT* know about those regions.
>>
>> What we have inside the kernel is the information what happens if an 
>> address X is accessed. On AMD HW this can be:
>>
>> 1. Route to the PCIe bus because the mapped BO is stored in system 
>> memory.
>> 2. Route to the internal MC because the mapped BO is stored in local 
>> memory.
>> 3. Route to other GPUs in the same hive.
>> 4. Route to some doorbell to kick of other work.
>> ...
>> x. Ignore write, return 0 on reads (this is what is used for sparse 
>> mappings).
>> x+1. Trigger a recoverable page fault. This is used for things like
SVA.
>> x+2. Trigger a non-recoverable page fault. This is used for things 
>> like unmapped regions where access is illegal.
>>
>> All this is plus some hw specific caching flags.
>>
>> When Vulkan allocates a sparse VKBuffer what should happen is the 
>> following:
>>
>> 1. The Vulkan driver somehow figures out a VA region A..B for the 
>> buffer. This can be in userspace (libdrm_amdgpu) or kernel (drm_mm), 
>> but essentially is currently driver specific.
> 
> Right, for Nouveau we have this in userspace as well.
> 
>>
>> 2. The kernel gets a request to map the VA range A..B as sparse, 
>> meaning that it updates the page tables from A..B with the sparse 
>> setting.
>>
>> 3. User space asks kernel to map a couple of memory backings at 
>> location A+1, A+10, A+15 etc....
>>
>> 4. The VKBuffer is de-allocated, userspace asks kernel to update 
>> region A..B to not map anything (usually triggers a non-recoverable 
>> fault).
> 
> Until here this seems to be identical to what I'm doing.
> 
> It'd be interesting to know how amdgpu handles everything that 
> potentially happens between your 3) and 4). More specifically, how are 
> the page tables changed when memory backed mappings are mapped on a 
> sparse range? What happens when the memory backed mappings are unmapped, 
> but the VKBuffer isn't de-allocated, and hence sparse mappings need to 
> be re-deployed?
> 
> Let's assume the sparse VKBuffer (and hence the VA space allocation) is
> pretty large. In Nouveau the corresponding PTEs would have a rather huge 
> page size to cover this. Now, if small memory backed mappings are mapped 
> to this huge sparse buffer, in Nouveau we'd allocate a new PT with a 
> corresponding smaller page size overlaying the sparse mappings PTEs.
> 
> How would this look like in amdgpu?
> 
>>
>> When you want to unify this between hw drivers I strongly suggest to 
>> completely start from scratch once more.
>>
I just took some time digging into amdgpu and, surprisingly, aside from 
the gpuva_regions it seems like amdgpu basically does exactly the same 
as I do in the GPU VA manager. As explained, those region boundaries are 
needed for merging only and, depending on the driver, might be useful 
for sparse mappings.

For drivers that don't intend to merge at all and (somehow) are capable 
of dealing with sparse regions without knowing the sparse region's 
boundaries, it'd be easy to make those gpuva_regions optional.
>> First of all don't think about those mappings as VMAs, that
won't work
>> because VMAs are usually something large. Think of this as individual 
>> PTEs controlled by the application. similar how COW mappings and 
>> struct pages are handled inside the kernel.
> 
> Why do you consider tracking single PTEs superior to tracking VMAs? All 
> the properties for a page you mentioned above should be equal for the 
> entirety of pages of a whole (memory backed) mapping, aren't they?
> 
>>
>> Then I would start with the VA allocation manager. You could probably 
>> base that on drm_mm. We handle it differently in amdgpu currently, but 
>> I think this is something we could change.
> 
> It was not my intention to come up with an actual allocator for the VA 
> space in the sense of actually finding a free and fitting hole in the VA 
> space.
> 
> For Nouveau (and XE, I think) we have this in userspace and from what 
> you've written previously I thought the same applies for amdgpu?
> 
>>
>> Then come up with something close to the amdgpu VM system. I'm
pretty
>> sure that should work for Nouveau and Intel XA as well. In other words 
>> you just have a bunch of very very small structures which represents 
>> mappings and a larger structure which combine all mappings of a 
>> specific type, e.g. all mappings of a BO or all sparse mappings etc...
> 
> Considering what you wrote above I assume that small structures / 
> mappings in this paragraph refer to PTEs.
> 
> Immediately, I don't really see how this fine grained resolution of 
> single PTEs would help implementing this in Nouveau. Actually, I think 
> it would even complicate the handling of PTs, but I would need to think 
> about this a bit more.
> 
>>
>> Merging of regions is actually not mandatory. We don't do it in
amdgpu
>> and can live with the additional mappings pretty well. But I think 
>> this can differ between drivers.
>>
>> Regards,
>> Christian.
>>

Christian König

2023-Jan-30 13:02 UTC

head link

[Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces

Am 29.01.23 um 19:46 schrieb Danilo Krummrich:> On 1/27/23 22:09, Danilo Krummrich wrote:
>> On 1/27/23 16:17, Christian K?nig wrote:
>>> Am 27.01.23 um 15:44 schrieb Danilo Krummrich:
>>>> [SNIP]
>>>>>>>
>>>>>>> What you want is one component for tracking the VA
allocations
>>>>>>> (drm_mm based) and a different component/interface
for tracking
>>>>>>> the VA mappings (probably rb tree based).
>>>>>>
>>>>>> That's what the GPUVA manager is doing. There are
gpuva_regions
>>>>>> which correspond to VA allocations and gpuvas which
represent the
>>>>>> mappings. Both are tracked separately (currently both
with a
>>>>>> separate drm_mm, though). However, the GPUVA manager
needs to
>>>>>> take regions into account when dealing with mappings to
make sure
>>>>>> the GPUVA manager doesn't propose drivers to merge
over region
>>>>>> boundaries. Speaking from userspace PoV, the kernel
wouldn't
>>>>>> merge mappings from different VKBuffer objects even if
they're
>>>>>> virtually and physically contiguous.
>>>>>
>>>>> That are two completely different things and shouldn't
be handled
>>>>> in a single component.
>>>>
>>>> They are different things, but they're related in a way
that for
>>>> handling the mappings (in particular merging and sparse) the
GPUVA
>>>> manager needs to know the VA allocation (or region) boundaries.
>>>>
>>>> I have the feeling there might be a misunderstanding. Userspace
is
>>>> in charge to actually allocate a portion of VA space and manage
it.
>>>> The GPUVA manager just needs to know about those VA space 
>>>> allocations and hence keeps track of them.
>>>>
>>>> The GPUVA manager is not meant to be an allocator in the sense
of
>>>> finding and providing a hole for a given request.
>>>>
>>>> Maybe the non-ideal choice of using drm_mm was implying
something
>>>> else.
>>>
>>> Uff, well long story short that doesn't even remotely match the
>>> requirements. This way the GPUVA manager won't be usable for a
whole
>>> bunch of use cases.
>>>
>>> What we have are mappings which say X needs to point to Y with this
>>> and hw dependent flags.
>>>
>>> The whole idea of having ranges is not going to fly. Neither with 
>>> AMD GPUs and I strongly think not with Intels XA either.
>>
>> A range in the sense of the GPUVA manager simply represents a VA 
>> space allocation (which in case of Nouveau is taken in userspace). 
>> Userspace allocates the portion of VA space and lets the kernel know 
>> about it. The current implementation needs that for the named 
>> reasons. So, I think there is no reason why this would work with one 
>> GPU, but not with another. It's just part of the design choice of
the
>> manager.
>>
>> And I'm absolutely happy to discuss the details of the manager 
>> implementation though.
>>
>>>
>>>>> We should probably talk about the design of the GPUVA
manager once
>>>>> more when this should be applicable to all GPU drivers.
>>>>
>>>> That's what I try to figure out with this RFC, how to make
it
>>>> appicable for all GPU drivers, so I'm happy to discuss
this. :-)
>>>
>>> Yeah, that was really good idea :) That proposal here is really far
>>> away from the actual requirements.
>>>
>>
>> And those are the ones I'm looking for. Do you mind sharing the 
>> requirements for amdgpu in particular?
>>
>>>>>> For sparse residency the kernel also needs to know the
region
>>>>>> boundaries to make sure that it keeps sparse mappings
around.
>>>>>
>>>>> What?
>>>>
>>>> When userspace creates a new VKBuffer with the 
>>>> VK_BUFFER_CREATE_SPARSE_BINDING_BIT the kernel may need to
create
>>>> sparse mappings in order to ensure that using this buffer
without
>>>> any memory backed mappings doesn't fault the GPU.
>>>>
>>>> Currently, the implementation does this the following way:
>>>>
>>>> 1. Userspace creates a new VKBuffer and hence allocates a
portion
>>>> of the VA space for it. It calls into the kernel indicating the
new
>>>> VA space region and the fact that the region is sparse.
>>>>
>>>> 2. The kernel picks up the region and stores it in the GPUVA 
>>>> manager, the driver creates the corresponding sparse mappings /
>>>> page table entries.
>>>>
>>>> 3. Userspace might ask the driver to create a couple of memory 
>>>> backed mappings for this particular VA region. The GPUVA
manager
>>>> stores the mapping parameters, the driver creates the
corresponding
>>>> page table entries.
>>>>
>>>> 4. Userspace might ask to unmap all the memory backed mappings
from
>>>> this particular VA region. The GPUVA manager removes the
mapping
>>>> parameters, the driver cleans up the corresponding page table 
>>>> entries. However, the driver also needs to re-create the sparse
>>>> mappings, since it's a sparse buffer, hence it needs to
know the
>>>> boundaries of the region it needs to create the sparse mappings
in.
>>>
>>> Again, this is not how things are working. First of all the kernel 
>>> absolutely should *NOT* know about those regions.
>>>
>>> What we have inside the kernel is the information what happens if
an
>>> address X is accessed. On AMD HW this can be:
>>>
>>> 1. Route to the PCIe bus because the mapped BO is stored in system 
>>> memory.
>>> 2. Route to the internal MC because the mapped BO is stored in
local
>>> memory.
>>> 3. Route to other GPUs in the same hive.
>>> 4. Route to some doorbell to kick of other work.
>>> ...
>>> x. Ignore write, return 0 on reads (this is what is used for sparse
>>> mappings).
>>> x+1. Trigger a recoverable page fault. This is used for things like
>>> SVA.
>>> x+2. Trigger a non-recoverable page fault. This is used for things 
>>> like unmapped regions where access is illegal.
>>>
>>> All this is plus some hw specific caching flags.
>>>
>>> When Vulkan allocates a sparse VKBuffer what should happen is the 
>>> following:
>>>
>>> 1. The Vulkan driver somehow figures out a VA region A..B for the 
>>> buffer. This can be in userspace (libdrm_amdgpu) or kernel
(drm_mm),
>>> but essentially is currently driver specific.
>>
>> Right, for Nouveau we have this in userspace as well.
>>
>>>
>>> 2. The kernel gets a request to map the VA range A..B as sparse, 
>>> meaning that it updates the page tables from A..B with the sparse 
>>> setting.
>>>
>>> 3. User space asks kernel to map a couple of memory backings at 
>>> location A+1, A+10, A+15 etc....
>>>
>>> 4. The VKBuffer is de-allocated, userspace asks kernel to update 
>>> region A..B to not map anything (usually triggers a non-recoverable
>>> fault).
>>
>> Until here this seems to be identical to what I'm doing.
>>
>> It'd be interesting to know how amdgpu handles everything that 
>> potentially happens between your 3) and 4). More specifically, how 
>> are the page tables changed when memory backed mappings are mapped on 
>> a sparse range? What happens when the memory backed mappings are 
>> unmapped, but the VKBuffer isn't de-allocated, and hence sparse 
>> mappings need to be re-deployed?
>>
>> Let's assume the sparse VKBuffer (and hence the VA space
allocation)
>> is pretty large. In Nouveau the corresponding PTEs would have a 
>> rather huge page size to cover this. Now, if small memory backed 
>> mappings are mapped to this huge sparse buffer, in Nouveau we'd 
>> allocate a new PT with a corresponding smaller page size overlaying 
>> the sparse mappings PTEs.
>>
>> How would this look like in amdgpu?
>>
>>>
>>> When you want to unify this between hw drivers I strongly suggest
to
>>> completely start from scratch once more.
>>>
>
> I just took some time digging into amdgpu and, surprisingly, aside 
> from the gpuva_regions it seems like amdgpu basically does exactly the 
> same as I do in the GPU VA manager. As explained, those region 
> boundaries are needed for merging only and, depending on the driver, 
> might be useful for sparse mappings.
>
> For drivers that don't intend to merge at all and (somehow) are 
> capable of dealing with sparse regions without knowing the sparse 
> region's boundaries, it'd be easy to make those gpuva_regions
optional.
Yeah, but this then defeats the approach of having the same hw 
independent interface/implementation for all drivers.

Let me ask the other way around how does the hw implementation of a 
sparse mapping looks like for NVidia based hardware?

For newer AMD hw its a flag in the page tables, for older hw its a 
register where you can specify ranges A..B. We don't really support the 
later with AMDGPU any more, but from this interface I would guess you 
have the second variant, right?

Christian.
>
>>> First of all don't think about those mappings as VMAs, that
won't
>>> work because VMAs are usually something large. Think of this as 
>>> individual PTEs controlled by the application. similar how COW 
>>> mappings and struct pages are handled inside the kernel.
>>
>> Why do you consider tracking single PTEs superior to tracking VMAs? 
>> All the properties for a page you mentioned above should be equal for 
>> the entirety of pages of a whole (memory backed) mapping, aren't
they?
>>
>>>
>>> Then I would start with the VA allocation manager. You could 
>>> probably base that on drm_mm. We handle it differently in amdgpu 
>>> currently, but I think this is something we could change.
>>
>> It was not my intention to come up with an actual allocator for the 
>> VA space in the sense of actually finding a free and fitting hole in 
>> the VA space.
>>
>> For Nouveau (and XE, I think) we have this in userspace and from what 
>> you've written previously I thought the same applies for amdgpu?
>>
>>>
>>> Then come up with something close to the amdgpu VM system. I'm 
>>> pretty sure that should work for Nouveau and Intel XA as well. In 
>>> other words you just have a bunch of very very small structures 
>>> which represents mappings and a larger structure which combine all 
>>> mappings of a specific type, e.g. all mappings of a BO or all
sparse
>>> mappings etc...
>>
>> Considering what you wrote above I assume that small structures / 
>> mappings in this paragraph refer to PTEs.
>>
>> Immediately, I don't really see how this fine grained resolution of
>> single PTEs would help implementing this in Nouveau. Actually, I 
>> think it would even complicate the handling of PTs, but I would need 
>> to think about this a bit more.
>>
>>>
>>> Merging of regions is actually not mandatory. We don't do it in
>>> amdgpu and can live with the additional mappings pretty well. But I
>>> think this can differ between drivers.
>>>
>>> Regards,
>>> Christian.
>>>
>

Apparently Analagous Threads

Search for more apparently analagous threads

Nouveau - Jan 2023 - [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces

[Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces

[Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces

Apparently Analagous Threads