Yeah I noticed this as well, I will try to bisect this the next change that I get On Tue, 2022-05-17 at 13:10 +0200, Hans de Goede wrote:> Hi All, > > I just noticed the below lockdep possible deadlock report with a 5.18-rc6 > kernel on a Dell Latitude E6430 laptop with the following nvidia GPU: > > 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF108GLM [NVS > 5200M] [10de:0dfc] (rev a1) > 01:00.1 Audio device [0403]: NVIDIA Corporation GF108 High Definition Audio > Controller [10de:0bea] (rev a1) > > This is with the laptop in Optimus mode, so with the Intel integrated > gfx from the i5-3320M CPU driving the LCD panel and with nothing connected > to the HDMI connector, which is always routed to the NVIDIA GPU on this > laptop. > > The lockdep possible deadlock warning seems to happen when the NVIDIA GPU > is runtime suspended shortly after gdm has loaded: > > [?? 24.859171] =====================================================> [?? 24.859173] WARNING: possible circular locking dependency detected > [?? 24.859175] 5.18.0-rc6+ #34 Tainted: G??????????? E??? > [?? 24.859178] ------------------------------------------------------ > [?? 24.859179] kworker/1:1/46 is trying to acquire lock: > [?? 24.859181] ffff92b0c0ee0518 (&cli->mutex){+.+.}-{3:3}, at: > nouveau_vga_lastclose+0x910/0x1030 [nouveau] > [?? 24.859231] > ?????????????? but task is already holding lock: > [?? 24.859233] ffff92b0c4bf35a0 (reservation_ww_class_mutex){+.+.}-{3:3}, > at: ttm_bo_wait+0x7d/0x140 [ttm] > [?? 24.859243] > ?????????????? which lock already depends on the new lock. > > [?? 24.859244] > ?????????????? the existing dependency chain (in reverse order) is: > [?? 24.859246] > ?????????????? -> #1 (reservation_ww_class_mutex){+.+.}-{3:3}: > [?? 24.859249]??????? __ww_mutex_lock.constprop.0+0xb3/0xfb0 > [?? 24.859256]??????? ww_mutex_lock+0x38/0xa0 > [?? 24.859259]??????? nouveau_bo_pin+0x30/0x380 [nouveau] > [?? 24.859297]??????? nouveau_channel_del+0x1d7/0x3e0 [nouveau] > [?? 24.859328]??????? nouveau_channel_new+0x48/0x730 [nouveau] > [?? 24.859358]??????? nouveau_abi16_ioctl_channel_alloc+0x113/0x360 > [nouveau] > [?? 24.859389]??????? drm_ioctl_kernel+0xa1/0x150 > [?? 24.859392]??????? drm_ioctl+0x21c/0x410 > [?? 24.859395]??????? nouveau_drm_ioctl+0x56/0x1820 [nouveau] > [?? 24.859431]??????? __x64_sys_ioctl+0x8d/0xc0 > [?? 24.859436]??????? do_syscall_64+0x5b/0x80 > [?? 24.859440]??????? entry_SYSCALL_64_after_hwframe+0x44/0xae > [?? 24.859443] > ?????????????? -> #0 (&cli->mutex){+.+.}-{3:3}: > [?? 24.859446]??????? __lock_acquire+0x12e2/0x1f90 > [?? 24.859450]??????? lock_acquire+0xad/0x290 > [?? 24.859453]??????? __mutex_lock+0x90/0x830 > [?? 24.859456]??????? nouveau_vga_lastclose+0x910/0x1030 [nouveau] > [?? 24.859493]??????? ttm_bo_move_to_lru_tail+0x32c/0x980 [ttm] > [?? 24.859498]??????? ttm_mem_evict_first+0x25c/0x4b0 [ttm] > [?? 24.859503]??????? ttm_resource_manager_evict_all+0x93/0x1b0 [ttm] > [?? 24.859509]??????? nouveau_debugfs_fini+0x161/0x260 [nouveau] > [?? 24.859545]??????? nouveau_drm_ioctl+0xa4a/0x1820 [nouveau] > [?? 24.859582]??????? pci_pm_runtime_suspend+0x5c/0x180 > [?? 24.859585]??????? __rpm_callback+0x48/0x1b0 > [?? 24.859589]??????? rpm_callback+0x5a/0x70 > [?? 24.859591]??????? rpm_suspend+0x10a/0x6f0 > [?? 24.859594]??????? pm_runtime_work+0xa0/0xb0 > [?? 24.859596]??????? process_one_work+0x254/0x560 > [?? 24.859601]??????? worker_thread+0x4f/0x390 > [?? 24.859604]??????? kthread+0xe6/0x110 > [?? 24.859607]??????? ret_from_fork+0x22/0x30 > [?? 24.859611] > ?????????????? other info that might help us debug this: > > [?? 24.859612]? Possible unsafe locking scenario: > > [?? 24.859613]??????? CPU0??????????????????? CPU1 > [?? 24.859615]??????? ----??????????????????? ---- > [?? 24.859616]?? lock(reservation_ww_class_mutex); > [?? 24.859618]??????????????????????????????? lock(&cli->mutex); > [?? 24.859620]??????????????????????????????? > lock(reservation_ww_class_mutex); > [?? 24.859622]?? lock(&cli->mutex); > [?? 24.859624] > ??????????????? *** DEADLOCK *** > > [?? 24.859625] 3 locks held by kworker/1:1/46: > [?? 24.859627]? #0: ffff92b0c0bb4338 ((wq_completion)pm){+.+.}-{0:0}, at: > process_one_work+0x1d0/0x560 > [?? 24.859634]? #1: ffffa8ffc02dfe80 ((work_completion)(&dev- > >power.work)){+.+.}-{0:0}, at: process_one_work+0x1d0/0x560 > [?? 24.859641]? #2: ffff92b0c4bf35a0 (reservation_ww_class_mutex){+.+.}- > {3:3}, at: ttm_bo_wait+0x7d/0x140 [ttm] > [?? 24.859649] > ?????????????? stack backtrace: > [?? 24.859651] CPU: 1 PID: 46 Comm: kworker/1:1 Tainted: G??????????? E???? > 5.18.0-rc6+ #34 > [?? 24.859654] Hardware name: Dell Inc. Latitude E6430/0H3MT5, BIOS A21 > 05/08/2017 > [?? 24.859656] Workqueue: pm pm_runtime_work > [?? 24.859660] Call Trace: > [?? 24.859662]? <TASK> > [?? 24.859665]? dump_stack_lvl+0x5b/0x74 > [?? 24.859669]? check_noncircular+0xdf/0x100 > [?? 24.859672]? ? register_lock_class+0x38/0x470 > [?? 24.859678]? __lock_acquire+0x12e2/0x1f90 > [?? 24.859683]? lock_acquire+0xad/0x290 > [?? 24.859686]? ? nouveau_vga_lastclose+0x910/0x1030 [nouveau] > [?? 24.859724]? ? lock_is_held_type+0xa6/0x120 > [?? 24.859730]? __mutex_lock+0x90/0x830 > [?? 24.859733]? ? nouveau_vga_lastclose+0x910/0x1030 [nouveau] > [?? 24.859770]? ? nvif_vmm_map+0x114/0x130 [nouveau] > [?? 24.859791]? ? nouveau_vga_lastclose+0x910/0x1030 [nouveau] > [?? 24.859829]? ? nouveau_vga_lastclose+0x910/0x1030 [nouveau] > [?? 24.859866]? nouveau_vga_lastclose+0x910/0x1030 [nouveau] > [?? 24.859905]? ttm_bo_move_to_lru_tail+0x32c/0x980 [ttm] > [?? 24.859912]? ttm_mem_evict_first+0x25c/0x4b0 [ttm] > [?? 24.859919]? ? lock_release+0x20/0x2a0 > [?? 24.859923]? ttm_resource_manager_evict_all+0x93/0x1b0 [ttm] > [?? 24.859930]? nouveau_debugfs_fini+0x161/0x260 [nouveau] > [?? 24.859968]? nouveau_drm_ioctl+0xa4a/0x1820 [nouveau] > [?? 24.860005]? pci_pm_runtime_suspend+0x5c/0x180 > [?? 24.860008]? ? pci_dev_put+0x20/0x20 > [?? 24.860011]? __rpm_callback+0x48/0x1b0 > [?? 24.860014]? ? pci_dev_put+0x20/0x20 > [?? 24.860018]? rpm_callback+0x5a/0x70 > [?? 24.860020]? ? pci_dev_put+0x20/0x20 > [?? 24.860023]? rpm_suspend+0x10a/0x6f0 > [?? 24.860025]? ? process_one_work+0x1d0/0x560 > [?? 24.860031]? pm_runtime_work+0xa0/0xb0 > [?? 24.860034]? process_one_work+0x254/0x560 > [?? 24.860039]? worker_thread+0x4f/0x390 > [?? 24.860043]? ? process_one_work+0x560/0x560 > [?? 24.860046]? kthread+0xe6/0x110 > [?? 24.860049]? ? kthread_complete_and_exit+0x20/0x20 > [?? 24.860053]? ret_from_fork+0x22/0x30 > [?? 24.860059]? </TASK> > > Regards, > > Hans > >-- Cheers, Lyude Paul (she/her) Software Engineer at Red Hat
Computer Enthusiastic
2022-May-20 11:46 UTC
[Nouveau] nouveau lockdep deadlock report with 5.18-rc6
Hello, Il giorno mer 18 mag 2022 alle ore 19:42 Lyude Paul <lyude at redhat.com> ha scritto:> > Yeah I noticed this as well, I will try to bisect this the next change that I > get > > On Tue, 2022-05-17 at 13:10 +0200, Hans de Goede wrote: > > Hi All, > > I just noticed the below lockdep possible deadlock report with a 5.18-rc6 > > kernel on a Dell Latitude E6430 laptop with the following nvidia GPU: > > [..]I hope not to be off topic in regard to kernel version, otherwise I apologize in advance. I would like to report that I'm constantly observing a similar, but somehow different, lockdep warning (see [1]) in kernels 5.16 and 5.17 (compiled with lockdep debugging features) every time I activate the Suspend To Ram (regardless if STR succeeds or not). Thanks. [1] https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/-/issues/547#note_1361411