thr3ads.net - Nouveau - [Nouveau] nouveau lockdep deadlock report with 5.18-rc6 [May 2022]

If this information is useful, please help other people find it:
Share via:

Lyude Paul

2022-May-18 17:42 UTC

[Nouveau] nouveau lockdep deadlock report with 5.18-rc6

Yeah I noticed this as well, I will try to bisect this the next change that I
get


On Tue, 2022-05-17 at 13:10 +0200, Hans de Goede wrote:> Hi All,
> 
> I just noticed the below lockdep possible deadlock report with a 5.18-rc6
> kernel on a Dell Latitude E6430 laptop with the following nvidia GPU:
> 
> 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF108GLM [NVS
> 5200M] [10de:0dfc] (rev a1)
> 01:00.1 Audio device [0403]: NVIDIA Corporation GF108 High Definition Audio
> Controller [10de:0bea] (rev a1)
> 
> This is with the laptop in Optimus mode, so with the Intel integrated
> gfx from the i5-3320M CPU driving the LCD panel and with nothing connected
> to the HDMI connector, which is always routed to the NVIDIA GPU on this
> laptop.
> 
> The lockdep possible deadlock warning seems to happen when the NVIDIA GPU
> is runtime suspended shortly after gdm has loaded:
> 
> [?? 24.859171] =====================================================>
[?? 24.859173] WARNING: possible circular locking dependency detected
> [?? 24.859175] 5.18.0-rc6+ #34 Tainted: G??????????? E??? 
> [?? 24.859178] ------------------------------------------------------
> [?? 24.859179] kworker/1:1/46 is trying to acquire lock:
> [?? 24.859181] ffff92b0c0ee0518 (&cli->mutex){+.+.}-{3:3}, at:
> nouveau_vga_lastclose+0x910/0x1030 [nouveau]
> [?? 24.859231] 
> ?????????????? but task is already holding lock:
> [?? 24.859233] ffff92b0c4bf35a0 (reservation_ww_class_mutex){+.+.}-{3:3},
> at: ttm_bo_wait+0x7d/0x140 [ttm]
> [?? 24.859243] 
> ?????????????? which lock already depends on the new lock.
> 
> [?? 24.859244] 
> ?????????????? the existing dependency chain (in reverse order) is:
> [?? 24.859246] 
> ?????????????? -> #1 (reservation_ww_class_mutex){+.+.}-{3:3}:
> [?? 24.859249]??????? __ww_mutex_lock.constprop.0+0xb3/0xfb0
> [?? 24.859256]??????? ww_mutex_lock+0x38/0xa0
> [?? 24.859259]??????? nouveau_bo_pin+0x30/0x380 [nouveau]
> [?? 24.859297]??????? nouveau_channel_del+0x1d7/0x3e0 [nouveau]
> [?? 24.859328]??????? nouveau_channel_new+0x48/0x730 [nouveau]
> [?? 24.859358]??????? nouveau_abi16_ioctl_channel_alloc+0x113/0x360
> [nouveau]
> [?? 24.859389]??????? drm_ioctl_kernel+0xa1/0x150
> [?? 24.859392]??????? drm_ioctl+0x21c/0x410
> [?? 24.859395]??????? nouveau_drm_ioctl+0x56/0x1820 [nouveau]
> [?? 24.859431]??????? __x64_sys_ioctl+0x8d/0xc0
> [?? 24.859436]??????? do_syscall_64+0x5b/0x80
> [?? 24.859440]??????? entry_SYSCALL_64_after_hwframe+0x44/0xae
> [?? 24.859443] 
> ?????????????? -> #0 (&cli->mutex){+.+.}-{3:3}:
> [?? 24.859446]??????? __lock_acquire+0x12e2/0x1f90
> [?? 24.859450]??????? lock_acquire+0xad/0x290
> [?? 24.859453]??????? __mutex_lock+0x90/0x830
> [?? 24.859456]??????? nouveau_vga_lastclose+0x910/0x1030 [nouveau]
> [?? 24.859493]??????? ttm_bo_move_to_lru_tail+0x32c/0x980 [ttm]
> [?? 24.859498]??????? ttm_mem_evict_first+0x25c/0x4b0 [ttm]
> [?? 24.859503]??????? ttm_resource_manager_evict_all+0x93/0x1b0 [ttm]
> [?? 24.859509]??????? nouveau_debugfs_fini+0x161/0x260 [nouveau]
> [?? 24.859545]??????? nouveau_drm_ioctl+0xa4a/0x1820 [nouveau]
> [?? 24.859582]??????? pci_pm_runtime_suspend+0x5c/0x180
> [?? 24.859585]??????? __rpm_callback+0x48/0x1b0
> [?? 24.859589]??????? rpm_callback+0x5a/0x70
> [?? 24.859591]??????? rpm_suspend+0x10a/0x6f0
> [?? 24.859594]??????? pm_runtime_work+0xa0/0xb0
> [?? 24.859596]??????? process_one_work+0x254/0x560
> [?? 24.859601]??????? worker_thread+0x4f/0x390
> [?? 24.859604]??????? kthread+0xe6/0x110
> [?? 24.859607]??????? ret_from_fork+0x22/0x30
> [?? 24.859611] 
> ?????????????? other info that might help us debug this:
> 
> [?? 24.859612]? Possible unsafe locking scenario:
> 
> [?? 24.859613]??????? CPU0??????????????????? CPU1
> [?? 24.859615]??????? ----??????????????????? ----
> [?? 24.859616]?? lock(reservation_ww_class_mutex);
> [?? 24.859618]??????????????????????????????? lock(&cli->mutex);
> [?? 24.859620]???????????????????????????????
> lock(reservation_ww_class_mutex);
> [?? 24.859622]?? lock(&cli->mutex);
> [?? 24.859624] 
> ??????????????? *** DEADLOCK ***
> 
> [?? 24.859625] 3 locks held by kworker/1:1/46:
> [?? 24.859627]? #0: ffff92b0c0bb4338 ((wq_completion)pm){+.+.}-{0:0}, at:
> process_one_work+0x1d0/0x560
> [?? 24.859634]? #1: ffffa8ffc02dfe80 ((work_completion)(&dev-
> >power.work)){+.+.}-{0:0}, at: process_one_work+0x1d0/0x560
> [?? 24.859641]? #2: ffff92b0c4bf35a0 (reservation_ww_class_mutex){+.+.}-
> {3:3}, at: ttm_bo_wait+0x7d/0x140 [ttm]
> [?? 24.859649] 
> ?????????????? stack backtrace:
> [?? 24.859651] CPU: 1 PID: 46 Comm: kworker/1:1 Tainted: G??????????? E????
> 5.18.0-rc6+ #34
> [?? 24.859654] Hardware name: Dell Inc. Latitude E6430/0H3MT5, BIOS A21
> 05/08/2017
> [?? 24.859656] Workqueue: pm pm_runtime_work
> [?? 24.859660] Call Trace:
> [?? 24.859662]? <TASK>
> [?? 24.859665]? dump_stack_lvl+0x5b/0x74
> [?? 24.859669]? check_noncircular+0xdf/0x100
> [?? 24.859672]? ? register_lock_class+0x38/0x470
> [?? 24.859678]? __lock_acquire+0x12e2/0x1f90
> [?? 24.859683]? lock_acquire+0xad/0x290
> [?? 24.859686]? ? nouveau_vga_lastclose+0x910/0x1030 [nouveau]
> [?? 24.859724]? ? lock_is_held_type+0xa6/0x120
> [?? 24.859730]? __mutex_lock+0x90/0x830
> [?? 24.859733]? ? nouveau_vga_lastclose+0x910/0x1030 [nouveau]
> [?? 24.859770]? ? nvif_vmm_map+0x114/0x130 [nouveau]
> [?? 24.859791]? ? nouveau_vga_lastclose+0x910/0x1030 [nouveau]
> [?? 24.859829]? ? nouveau_vga_lastclose+0x910/0x1030 [nouveau]
> [?? 24.859866]? nouveau_vga_lastclose+0x910/0x1030 [nouveau]
> [?? 24.859905]? ttm_bo_move_to_lru_tail+0x32c/0x980 [ttm]
> [?? 24.859912]? ttm_mem_evict_first+0x25c/0x4b0 [ttm]
> [?? 24.859919]? ? lock_release+0x20/0x2a0
> [?? 24.859923]? ttm_resource_manager_evict_all+0x93/0x1b0 [ttm]
> [?? 24.859930]? nouveau_debugfs_fini+0x161/0x260 [nouveau]
> [?? 24.859968]? nouveau_drm_ioctl+0xa4a/0x1820 [nouveau]
> [?? 24.860005]? pci_pm_runtime_suspend+0x5c/0x180
> [?? 24.860008]? ? pci_dev_put+0x20/0x20
> [?? 24.860011]? __rpm_callback+0x48/0x1b0
> [?? 24.860014]? ? pci_dev_put+0x20/0x20
> [?? 24.860018]? rpm_callback+0x5a/0x70
> [?? 24.860020]? ? pci_dev_put+0x20/0x20
> [?? 24.860023]? rpm_suspend+0x10a/0x6f0
> [?? 24.860025]? ? process_one_work+0x1d0/0x560
> [?? 24.860031]? pm_runtime_work+0xa0/0xb0
> [?? 24.860034]? process_one_work+0x254/0x560
> [?? 24.860039]? worker_thread+0x4f/0x390
> [?? 24.860043]? ? process_one_work+0x560/0x560
> [?? 24.860046]? kthread+0xe6/0x110
> [?? 24.860049]? ? kthread_complete_and_exit+0x20/0x20
> [?? 24.860053]? ret_from_fork+0x22/0x30
> [?? 24.860059]? </TASK>
> 
> Regards,
> 
> Hans
> 
> 
-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat

Computer Enthusiastic

2022-May-20 11:46 UTC

head link

[Nouveau] nouveau lockdep deadlock report with 5.18-rc6

Hello,

Il giorno mer 18 mag 2022 alle ore 19:42 Lyude Paul <lyude at redhat.com>
ha scritto:>
> Yeah I noticed this as well, I will try to bisect this the next change that
I
> get
>
> On Tue, 2022-05-17 at 13:10 +0200, Hans de Goede wrote:
> > Hi All,
> > I just noticed the below lockdep possible deadlock report with a
5.18-rc6
> > kernel on a Dell Latitude E6430 laptop with the following nvidia GPU:
> > [..]I hope not to be off topic in regard to kernel version, otherwise I
apologize in advance.

I would like to report that I'm constantly observing a similar, but
somehow different, lockdep warning (see [1]) in kernels 5.16 and 5.17
(compiled with lockdep debugging features) every time I activate the
Suspend To Ram (regardless if STR succeeds or not).

Thanks.

[1]
https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/-/issues/547#note_1361411

Nouveau - May 2022 - nouveau lockdep deadlock report with 5.18-rc6

[Nouveau] nouveau lockdep deadlock report with 5.18-rc6

[Nouveau] nouveau lockdep deadlock report with 5.18-rc6