thr3ads.net - Nouveau - [Nouveau] [PATCH] drm/nouveau/dmem: missing mutex

If this information is useful, please help other people find it:
Share via:

Ralph Campbell

2019-Jun-14 00:11 UTC

[Nouveau] [PATCH] drm/nouveau/dmem: missing mutex_lock in error path

In nouveau_dmem_pages_alloc(), the drm->dmem->mutex is unlocked before
calling nouveau_dmem_chunk_alloc().
Reacquire the lock before continuing to the next page.

Signed-off-by: Ralph Campbell <rcampbell at nvidia.com>
---

I found this while testing Jason Gunthorpe's hmm tree but this is
independant of those changes. I guess it could go through
David Airlie's tree for nouveau or Jason's tree.

 drivers/gpu/drm/nouveau/nouveau_dmem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 27aa4e72abe9..00f7236af1b9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -379,9 +379,10 @@ nouveau_dmem_pages_alloc(struct nouveau_drm *drm,
 			ret = nouveau_dmem_chunk_alloc(drm);
 			if (ret) {
 				if (c)
-					break;
+					return 0;
 				return ret;
 			}
+			mutex_lock(&drm->dmem->mutex);
 			continue;
 		}
 
-- 
2.20.1

Jason Gunthorpe

2019-Jun-14 00:24 UTC

head link

[Nouveau] [PATCH] drm/nouveau/dmem: missing mutex_lock in error path

On Thu, Jun 13, 2019 at 05:11:21PM -0700, Ralph Campbell
wrote:> In nouveau_dmem_pages_alloc(), the drm->dmem->mutex is unlocked
before
> calling nouveau_dmem_chunk_alloc().
> Reacquire the lock before continuing to the next page.
> 
> Signed-off-by: Ralph Campbell <rcampbell at nvidia.com>
> ---
> 
> I found this while testing Jason Gunthorpe's hmm tree but this is
> independant of those changes. I guess it could go through
> David Airlie's tree for nouveau or Jason's tree.
This seems like a bad enough bug to send it into -rc?

It probably should go through the normal nouveau channels, thanks

Jason

John Hubbard

2019-Jun-14 00:49 UTC

head link

[Nouveau] [PATCH] drm/nouveau/dmem: missing mutex_lock in error path

On 6/13/19 5:11 PM, Ralph Campbell wrote:> In nouveau_dmem_pages_alloc(), the drm->dmem->mutex is unlocked
before
> calling nouveau_dmem_chunk_alloc().
> Reacquire the lock before continuing to the next page.
> 
> Signed-off-by: Ralph Campbell <rcampbell at nvidia.com>
> ---
> 
> I found this while testing Jason Gunthorpe's hmm tree but this is
> independant of those changes. I guess it could go through
> David Airlie's tree for nouveau or Jason's tree.
> 
Hi Ralph,

btw, was this the fix for the crash you were seeing? It might be nice to
mention in the commit description, if you are seeing real symptoms.

>  drivers/gpu/drm/nouveau/nouveau_dmem.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> index 27aa4e72abe9..00f7236af1b9 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> @@ -379,9 +379,10 @@ nouveau_dmem_pages_alloc(struct nouveau_drm *drm,
>  			ret = nouveau_dmem_chunk_alloc(drm);
>  			if (ret) {
>  				if (c)
> -					break;
Actually, the pre-existing code is a little concerning. Your change preserves
the behavior, but it seems questionable to be doing a "return 0"
(whether
via the above break, or your change) when it's in this partially allocated
state. It's reporting success when it only allocates part of what was
requested,
and it doesn't fill in the pages array either.


> +					return 0;
>  				return ret;
>  			}
> +			mutex_lock(&drm->dmem->mutex);
>  			continue;
>  		}
>  
> 
The above comment is about pre-existing potential problems, but your patch
itself
looks correct, so:

Reviewed-by: John Hubbard <jhubbard at nvidia.com> 


thanks,
-- 
John Hubbard
NVIDIA

Ralph Campbell

2019-Jun-14 17:39 UTC

head link

[Nouveau] [PATCH] drm/nouveau/dmem: missing mutex_lock in error path

On 6/13/19 5:49 PM, John Hubbard wrote:> On 6/13/19 5:11 PM, Ralph Campbell wrote:
>> In nouveau_dmem_pages_alloc(), the drm->dmem->mutex is unlocked
before
>> calling nouveau_dmem_chunk_alloc().
>> Reacquire the lock before continuing to the next page.
>>
>> Signed-off-by: Ralph Campbell <rcampbell at nvidia.com>
>> ---
>>
>> I found this while testing Jason Gunthorpe's hmm tree but this is
>> independent of those changes. I guess it could go through
>> David Airlie's tree for nouveau or Jason's tree.
>>
> 
> Hi Ralph,
> 
> btw, was this the fix for the crash you were seeing? It might be nice to
> mention in the commit description, if you are seeing real symptoms.
> 
> 
>>   drivers/gpu/drm/nouveau/nouveau_dmem.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> index 27aa4e72abe9..00f7236af1b9 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> @@ -379,9 +379,10 @@ nouveau_dmem_pages_alloc(struct nouveau_drm *drm,
>>   			ret = nouveau_dmem_chunk_alloc(drm);
>>   			if (ret) {
>>   				if (c)
>> -					break;
> 
> Actually, the pre-existing code is a little concerning. Your change
preserves
> the behavior, but it seems questionable to be doing a "return 0"
(whether
> via the above break, or your change) when it's in this partially
allocated
> state. It's reporting success when it only allocates part of what was
requested,
> and it doesn't fill in the pages array either.
> 
> 
> 
>> +					return 0;
>>   				return ret;
>>   			}
>> +			mutex_lock(&drm->dmem->mutex);
>>   			continue;
>>   		}
>>   
>>
> 
> The above comment is about pre-existing potential problems, but your patch
itself
> looks correct, so:
> 
> Reviewed-by: John Hubbard <jhubbard at nvidia.com>
> 
> 
> thanks,
> The crash was the NULL pointer bug in Christoph's patch #10.
I sent a separate reply for that.

Below is the console output I got, then I made the changes just based on
code inspection. Do you think I should include it in the change log?

As for the "return 0", If you follow the call chain,
nouveau_dmem_pages_alloc() is only ever called for one page so this
currently "works" but I agree it is a bit of a time bomb. There are a
number of other bugs that I can see that need fixing but I think those
should be separate patches.

[ 1294.871933] ====================================[ 1294.876656] WARNING: bad
unlock balance detected!
[ 1294.881375] 5.2.0-rc3+ #5 Not tainted
[ 1294.885048] -------------------------------------
[ 1294.889773] test-malloc-vra/6299 is trying to release lock 
(&drm->dmem->mutex) at:
[ 1294.897482] [<ffffffffa01a220f>] 
nouveau_dmem_migrate_alloc_and_copy+0x79f/0xbf0 [nouveau]
[ 1294.905782] but there are no more locks to release!
[ 1294.910690]
[ 1294.910690] other info that might help us debug this:
[ 1294.917249] 1 lock held by test-malloc-vra/6299:
[ 1294.921881]  #0: 0000000016e10454 (&mm->mmap_sem#2){++++}, at: 
nouveau_svmm_bind+0x142/0x210 [nouveau]
[ 1294.931313]
[ 1294.931313] stack backtrace:
[ 1294.935702] CPU: 4 PID: 6299 Comm: test-malloc-vra Not tainted 
5.2.0-rc3+ #5
[ 1294.942786] Hardware name: ASUS X299-A/PRIME X299-A, BIOS 1401 05/21/2018
[ 1294.949590] Call Trace:
[ 1294.952059]  dump_stack+0x7c/0xc0
[ 1294.955469]  ? nouveau_dmem_migrate_alloc_and_copy+0x79f/0xbf0 [nouveau]
[ 1294.962213]  print_unlock_imbalance_bug.cold.52+0xca/0xcf
[ 1294.967641]  lock_release+0x306/0x380
[ 1294.971383]  ? nouveau_dmem_migrate_alloc_and_copy+0x79f/0xbf0 [nouveau]
[ 1294.978089]  ? lock_downgrade+0x2d0/0x2d0
[ 1294.982121]  ? find_held_lock+0xac/0xd0
[ 1294.985979]  __mutex_unlock_slowpath+0x8f/0x3f0
[ 1294.990540]  ? wait_for_completion+0x230/0x230
[ 1294.995002]  ? rwlock_bug.part.2+0x60/0x60
[ 1294.999197]  nouveau_dmem_migrate_alloc_and_copy+0x79f/0xbf0 [nouveau]
[ 1295.005751]  ? page_mapping+0x98/0x110
[ 1295.009511]  migrate_vma+0xa74/0x1090
[ 1295.013186]  ? move_to_new_page+0x480/0x480
[ 1295.017400]  ? __kmalloc+0x153/0x300
[ 1295.021052]  ? nouveau_dmem_migrate_vma+0xd8/0x1e0 [nouveau]
[ 1295.026796]  nouveau_dmem_migrate_vma+0x157/0x1e0 [nouveau]
[ 1295.032466]  ? nouveau_dmem_init+0x490/0x490 [nouveau]
[ 1295.037612]  ? vmacache_find+0xc2/0x110
[ 1295.041537]  nouveau_svmm_bind+0x1b4/0x210 [nouveau]
[ 1295.046583]  ? nouveau_svm_fault+0x13e0/0x13e0 [nouveau]
[ 1295.051912]  drm_ioctl_kernel+0x14d/0x1a0
[ 1295.055930]  ? drm_setversion+0x330/0x330
[ 1295.059971]  drm_ioctl+0x308/0x530
[ 1295.063384]  ? drm_version+0x150/0x150
[ 1295.067153]  ? find_held_lock+0xac/0xd0
[ 1295.070996]  ? __pm_runtime_resume+0x3f/0xa0
[ 1295.075285]  ? mark_held_locks+0x29/0xa0
[ 1295.079230]  ? _raw_spin_unlock_irqrestore+0x3c/0x50
[ 1295.084232]  ? lockdep_hardirqs_on+0x17d/0x250
[ 1295.088768]  nouveau_drm_ioctl+0x9a/0x100 [nouveau]
[ 1295.093661]  do_vfs_ioctl+0x137/0x9a0
[ 1295.097341]  ? ioctl_preallocate+0x140/0x140
[ 1295.101623]  ? match_held_lock+0x1b/0x230
[ 1295.105646]  ? match_held_lock+0x1b/0x230
[ 1295.109660]  ? find_held_lock+0xac/0xd0
[ 1295.113512]  ? __do_page_fault+0x324/0x630
[ 1295.117617]  ? lock_downgrade+0x2d0/0x2d0
[ 1295.121648]  ? mark_held_locks+0x79/0xa0
[ 1295.125583]  ? handle_mm_fault+0x352/0x430
[ 1295.129687]  ksys_ioctl+0x60/0x90
[ 1295.133020]  ? mark_held_locks+0x29/0xa0
[ 1295.136964]  __x64_sys_ioctl+0x3d/0x50
[ 1295.140726]  do_syscall_64+0x68/0x250
[ 1295.144400]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1295.149465] RIP: 0033:0x7f1a3495809b
[ 1295.153053] Code: 0f 1e fa 48 8b 05 ed bd 0c 00 64 c7 00 26 00 00 00 
48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd bd 0c 00 f7 d8 64 89 01 48
[ 1295.171850] RSP: 002b:00007ffef7ed1358 EFLAGS: 00000246 ORIG_RAX: 
0000000000000010
[ 1295.179451] RAX: ffffffffffffffda RBX: 00007ffef7ed1628 RCX: 
00007f1a3495809b
[ 1295.186601] RDX: 00007ffef7ed13b0 RSI: 0000000040406449 RDI: 
0000000000000004
[ 1295.193759] RBP: 00007ffef7ed13b0 R08: 0000000000000000 R09: 
000000000157e770
[ 1295.200917] R10: 000000000151c010 R11: 0000000000000246 R12: 
0000000040406449
[ 1295.208083] R13: 0000000000000004 R14: 0000000000000000 R15: 
0000000000000000

Apparently Analagous Threads

Search for more possibly parallel threads

Nouveau - Jun 2019 - [PATCH] drm/nouveau/dmem: missing mutex_lock in error path

[Nouveau] [PATCH] drm/nouveau/dmem: missing mutex_lock in error path

[Nouveau] [PATCH] drm/nouveau/dmem: missing mutex_lock in error path

[Nouveau] [PATCH] drm/nouveau/dmem: missing mutex_lock in error path

[Nouveau] [PATCH] drm/nouveau/dmem: missing mutex_lock in error path

Apparently Analagous Threads