thr3ads.net - Nouveau - [Nouveau] [PATCH 6/7] nouveau/dmem: Evict device private memory during release [Sep 2022]

If this information is useful, please help other people find it:
Share via:

Lyude Paul

2022-Sep-26 21:35 UTC

[Nouveau] [PATCH 6/7] nouveau/dmem: Evict device private memory during release

On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple
wrote:> When the module is unloaded or a GPU is unbound from the module it is
> possible for device private pages to be left mapped in currently running
> processes. This leads to a kernel crash when the pages are either freed
> or accessed from the CPU because the GPU and associated data structures
> and callbacks have all been freed.
> 
> Fix this by migrating any mappings back to normal CPU memory prior to
> freeing the GPU memory chunks and associated device private pages.
> 
> Signed-off-by: Alistair Popple <apopple at nvidia.com>
> 
> ---
> 
> I assume the AMD driver might have a similar issue. However I can't see
> where device private (or coherent) pages actually get unmapped/freed
> during teardown as I couldn't find any relevant calls to
> devm_memunmap(), memunmap(), devm_release_mem_region() or
> release_mem_region(). So it appears that ZONE_DEVICE pages are not being
> properly freed during module unload, unless I'm missing something?
I've got no idea, will poke Ben to see if they know the answer to this
> ---
>  drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++++++++++++++++++++++++++-
>  1 file changed, 48 insertions(+)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> index 66ebbd4..3b247b8 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> @@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm)
>  	mutex_unlock(&drm->dmem->mutex);
>  }
>  
> +/*
> + * Evict all pages mapping a chunk.
> + */
> +void
> +nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk)
> +{
> +	unsigned long i, npages = range_len(&chunk->pagemap.range)
>> PAGE_SHIFT;
> +	unsigned long *src_pfns, *dst_pfns;
> +	dma_addr_t *dma_addrs;
> +	struct nouveau_fence *fence;
> +
> +	src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL);
> +	dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL);
> +	dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL);
> +
> +	migrate_device_range(src_pfns, chunk->pagemap.range.start >>
PAGE_SHIFT,
> +			npages);
> +
> +	for (i = 0; i < npages; i++) {
> +		if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
> +			struct page *dpage;
> +
> +			/*
> +			 * _GFP_NOFAIL because the GPU is going away and there
> +			 * is nothing sensible we can do if we can't copy the
> +			 * data back.
> +			 */
You'll have to excuse me for a moment since this area of nouveau isn't
one of
my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means infinite
retry, in the case of a GPU hotplug event I would assume we would rather just
stop trying to migrate things to the GPU and just drop the data instead of
hanging on infinite retries.
> +			dpage = alloc_page(GFP_HIGHUSER | __GFP_NOFAIL);
> +			dst_pfns[i] = migrate_pfn(page_to_pfn(dpage));
> +			nouveau_dmem_copy_one(chunk->drm,
> +					migrate_pfn_to_page(src_pfns[i]), dpage,
> +					&dma_addrs[i]);
> +		}
> +	}
> +
> +	nouveau_fence_new(chunk->drm->dmem->migrate.chan, false,
&fence);
> +	migrate_device_pages(src_pfns, dst_pfns, npages);
> +	nouveau_dmem_fence_done(&fence);
> +	migrate_device_finalize(src_pfns, dst_pfns, npages);
> +	kfree(src_pfns);
> +	kfree(dst_pfns);
> +	for (i = 0; i < npages; i++)
> +		dma_unmap_page(chunk->drm->dev->dev, dma_addrs[i], PAGE_SIZE,
DMA_BIDIRECTIONAL);
> +	kfree(dma_addrs);
> +}
> +
>  void
>  nouveau_dmem_fini(struct nouveau_drm *drm)
>  {
> @@ -380,8 +426,10 @@ nouveau_dmem_fini(struct nouveau_drm *drm)
>  	mutex_lock(&drm->dmem->mutex);
>  
>  	list_for_each_entry_safe(chunk, tmp, &drm->dmem->chunks, list)
{
> +		nouveau_dmem_evict_chunk(chunk);
>  		nouveau_bo_unpin(chunk->bo);
>  		nouveau_bo_ref(NULL, &chunk->bo);
> +		WARN_ON(chunk->callocated);
>  		list_del(&chunk->list);
>  		memunmap_pages(&chunk->pagemap);
>  		release_mem_region(chunk->pagemap.range.start,
-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat

John Hubbard

2022-Sep-26 22:14 UTC

head link

[Nouveau] [PATCH 6/7] nouveau/dmem: Evict device private memory during release

On 9/26/22 14:35, Lyude Paul wrote:>> +	for (i = 0; i < npages; i++) {
>> +		if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
>> +			struct page *dpage;
>> +
>> +			/*
>> +			 * _GFP_NOFAIL because the GPU is going away and there
>> +			 * is nothing sensible we can do if we can't copy the
>> +			 * data back.
>> +			 */
> 
> You'll have to excuse me for a moment since this area of nouveau
isn't one of
> my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means
infinite
> retry, in the case of a GPU hotplug event I would assume we would rather
just
> stop trying to migrate things to the GPU and just drop the data instead of
> hanging on infinite retries.
> Hi Lyude!

Actually, I really think it's better in this case to keep trying
(presumably not necessarily infinitely, but only until memory becomes
available), rather than failing out and corrupting data.

That's because I'm not sure it's completely clear that this memory
is
discardable. And at some point, we're going to make this all work with
file-backed memory, which will *definitely* not be discardable--I
realize that we're not there yet, of course.

But here, it's reasonable to commit to just retrying indefinitely,
really. Memory should eventually show up. And if it doesn't, then
restarting the machine is better than corrupting data, generally.

thanks,

-- 
John Hubbard
NVIDIA

Felix Kuehling

2022-Sep-26 23:07 UTC

head link

[Nouveau] [PATCH 6/7] nouveau/dmem: Evict device private memory during release

On 2022-09-26 17:35, Lyude Paul wrote:> On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote:
>> When the module is unloaded or a GPU is unbound from the module it is
>> possible for device private pages to be left mapped in currently
running
>> processes. This leads to a kernel crash when the pages are either freed
>> or accessed from the CPU because the GPU and associated data structures
>> and callbacks have all been freed.
>>
>> Fix this by migrating any mappings back to normal CPU memory prior to
>> freeing the GPU memory chunks and associated device private pages.
>>
>> Signed-off-by: Alistair Popple <apopple at nvidia.com>
>>
>> ---
>>
>> I assume the AMD driver might have a similar issue. However I can't
see
>> where device private (or coherent) pages actually get unmapped/freed
>> during teardown as I couldn't find any relevant calls to
>> devm_memunmap(), memunmap(), devm_release_mem_region() or
>> release_mem_region(). So it appears that ZONE_DEVICE pages are not
being
>> properly freed during module unload, unless I'm missing something?
> I've got no idea, will poke Ben to see if they know the answer to this
I guess we're relying on devm to release the region. Isn't the whole 
point of using devm_request_free_mem_region that we don't have to 
remember to explicitly release it when the device gets destroyed? I 
believe we had an explicit free call at some point by mistake, and that 
caused a double-free during module unload. See this commit for reference:

commit 22f4f4faf337d5fb2d2750aff13215726814273e
Author: Philip Yang <Philip.Yang at amd.com>
Date:   Mon Sep 20 17:25:52 2021 -0400

     drm/amdkfd: fix svm_migrate_fini warning
     
     Device manager releases device-specific resources when a driver
     disconnects from a device, devm_memunmap_pages and
     devm_release_mem_region calls in svm_migrate_fini are redundant.
     
     It causes below warning trace after patch "drm/amdgpu: Split
     amdgpu_device_fini into early and late", so remove function
     svm_migrate_fini.
     
     BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718
     
     WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795
     devm_release_action+0x51/0x60
     Call Trace:
         ? memunmap_pages+0x360/0x360
         svm_migrate_fini+0x2d/0x60 [amdgpu]
         kgd2kfd_device_exit+0x23/0xa0 [amdgpu]
         amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu]
         amdgpu_device_fini_sw+0x45/0x290 [amdgpu]
         amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
         drm_dev_release+0x20/0x40 [drm]
         release_nodes+0x196/0x1e0
         device_release_driver_internal+0x104/0x1d0
         driver_detach+0x47/0x90
         bus_remove_driver+0x7a/0xd0
         pci_unregister_driver+0x3d/0x90
         amdgpu_exit+0x11/0x20 [amdgpu]
     
     Signed-off-by: Philip Yang <Philip.Yang at amd.com>
     Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
     Signed-off-by: Alex Deucher <alexander.deucher at amd.com>

Furthermore, I guess we are assuming that nobody is using the GPU when 
the module is unloaded. As long as any processes have /dev/kfd open, you 
won't be able to unload the module (except by force-unload). I suppose 
with ZONE_DEVICE memory, we can have references to device memory pages 
even when user mode has closed /dev/kfd. We do have a cleanup handler 
that runs in an MMU-free-notifier. In theory that should run after all 
the pages in the mm_struct have been freed. It releases all sorts of 
other device resources and needs the driver to still be there. I'm not 
sure if there is anything preventing a module unload before the 
free-notifier runs. I'll look into that.

Regards,
 ? Felix

>
>> ---
>>   drivers/gpu/drm/nouveau/nouveau_dmem.c | 48
+++++++++++++++++++++++++++-
>>   1 file changed, 48 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> index 66ebbd4..3b247b8 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> @@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm)
>>   	mutex_unlock(&drm->dmem->mutex);
>>   }
>>   
>> +/*
>> + * Evict all pages mapping a chunk.
>> + */
>> +void
>> +nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk)
>> +{
>> +	unsigned long i, npages = range_len(&chunk->pagemap.range)
>> PAGE_SHIFT;
>> +	unsigned long *src_pfns, *dst_pfns;
>> +	dma_addr_t *dma_addrs;
>> +	struct nouveau_fence *fence;
>> +
>> +	src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL);
>> +	dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL);
>> +	dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL);
>> +
>> +	migrate_device_range(src_pfns, chunk->pagemap.range.start >>
PAGE_SHIFT,
>> +			npages);
>> +
>> +	for (i = 0; i < npages; i++) {
>> +		if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
>> +			struct page *dpage;
>> +
>> +			/*
>> +			 * _GFP_NOFAIL because the GPU is going away and there
>> +			 * is nothing sensible we can do if we can't copy the
>> +			 * data back.
>> +			 */
> You'll have to excuse me for a moment since this area of nouveau
isn't one of
> my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means
infinite
> retry, in the case of a GPU hotplug event I would assume we would rather
just
> stop trying to migrate things to the GPU and just drop the data instead of
> hanging on infinite retries.
>
>> +			dpage = alloc_page(GFP_HIGHUSER | __GFP_NOFAIL);
>> +			dst_pfns[i] = migrate_pfn(page_to_pfn(dpage));
>> +			nouveau_dmem_copy_one(chunk->drm,
>> +					migrate_pfn_to_page(src_pfns[i]), dpage,
>> +					&dma_addrs[i]);
>> +		}
>> +	}
>> +
>> +	nouveau_fence_new(chunk->drm->dmem->migrate.chan, false,
&fence);
>> +	migrate_device_pages(src_pfns, dst_pfns, npages);
>> +	nouveau_dmem_fence_done(&fence);
>> +	migrate_device_finalize(src_pfns, dst_pfns, npages);
>> +	kfree(src_pfns);
>> +	kfree(dst_pfns);
>> +	for (i = 0; i < npages; i++)
>> +		dma_unmap_page(chunk->drm->dev->dev, dma_addrs[i],
PAGE_SIZE, DMA_BIDIRECTIONAL);
>> +	kfree(dma_addrs);
>> +}
>> +
>>   void
>>   nouveau_dmem_fini(struct nouveau_drm *drm)
>>   {
>> @@ -380,8 +426,10 @@ nouveau_dmem_fini(struct nouveau_drm *drm)
>>   	mutex_lock(&drm->dmem->mutex);
>>   
>>   	list_for_each_entry_safe(chunk, tmp, &drm->dmem->chunks,
list) {
>> +		nouveau_dmem_evict_chunk(chunk);
>>   		nouveau_bo_unpin(chunk->bo);
>>   		nouveau_bo_ref(NULL, &chunk->bo);
>> +		WARN_ON(chunk->callocated);
>>   		list_del(&chunk->list);
>>   		memunmap_pages(&chunk->pagemap);
>>   		release_mem_region(chunk->pagemap.range.start,

Nouveau - Sep 2022 - [PATCH 6/7] nouveau/dmem: Evict device private memory during release

[Nouveau] [PATCH 6/7] nouveau/dmem: Evict device private memory during release

[Nouveau] [PATCH 6/7] nouveau/dmem: Evict device private memory during release

[Nouveau] [PATCH 6/7] nouveau/dmem: Evict device private memory during release