On Fri, Aug 4, 2023 at 2:02?PM Thorsten Leemhuis <regressions at leemhuis.info> wrote:> > Hi! > > On 02.08.23 23:28, Olaf Skibbe wrote: > > Dear Maintainers, > > > > Hereby I would like to report an apparent bug in the nouveau driver in > > linux/6.1.38-2. > > Thx for your report. Maybe your problem is caused by a incomplete > backport. I Cced the maintainers for the drivers (and the regressions > and the stable list), maybe one of them has an idea, as they know the > driver. > > If they don't reply in the next few days, please check if the problem is > also present in mainline. If not, check if the latest 6.1.y. release > already fixes this. If not, try to check which of the four patches you > reverted to make things going is actually causing this (e.g. first only > revert the one that was applied last; then the two last ones; ...). > > > Running a current debian stable on a Dell Latitude E6510 with a > > "NVIDIA Corporation GT218M" graphic card, the monitor turns black > > after the grub screen. Also switching to a console (Strg-Alt-F2) shows > > just a black screen. Access via ssh is possible. > > > > ~# uname -r > > 6.1.0-10-amd64 > > > > demesg shows the following error message: > > > > [ 3.560153] WARNING: CPU: 0 PID: 176 at > > drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460 > > nvkm_dp_acquire+0x26a/0x490 [nouveau] > > [ 3.560287] Modules linked in: sd_mod t10_pi sr_mod crc64_rocksoft > > cdrom crc64 crc_t10dif crct10dif_generic nouveau(+) ahci libahci mxm_wmi > > i2c_algo_bit drm_display_helper libata cec rc_core drm_ttm_helper ttm > > scsi_mod e1000e drm_kms_helper ptp firewire_ohci sdhci_pci cqhci > > ehci_pci sdhci ehci_hcd firewire_core i2c_i801 crct10dif_pclmul > > crct10dif_common drm crc32_pclmul crc32c_intel psmouse usbcore mmc_core > > crc_itu_t pps_core scsi_common i2c_smbus lpc_ich usb_common battery > > video wmi button > > [ 3.560322] CPU: 0 PID: 176 Comm: kworker/u16:5 Not tainted > > 6.1.0-10-amd64 #1 Debian 6.1.38-2 > > [ 3.560325] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17 > > 05/12/2017 > > [ 3.560327] Workqueue: nvkm-disp nv50_disp_super [nouveau] > > [ 3.560433] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau] > > [ 3.560538] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37 > > 02 00 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc > > cc <0f> 0b c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26 > > [ 3.560541] RSP: 0018:ffff9899c048bd60 EFLAGS: 00010246 > > [ 3.560542] RAX: 0000000000041eb0 RBX: ffff88e0209d2600 RCX: > > 0000000000041eb0 > > [ 3.560544] RDX: ffffffffc079f760 RSI: 0000000000000000 RDI: > > ffff9899c048bcf0 > > [ 3.560545] RBP: 0000000000000001 R08: ffff9899c048bc64 R09: > > 0000000000005b76 > > [ 3.560546] R10: 000000000000000d R11: ffff9899c048bde0 R12: > > 00000000ffffffea > > [ 3.560548] R13: ffff88e00b39e480 R14: 0000000000044d45 R15: > > 0000000000000000 > > [ 3.560549] FS: 0000000000000000(0000) GS:ffff88e123c00000(0000) > > knlGS:0000000000000000 > > [ 3.560551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 3.560552] CR2: 00007f57f4e90451 CR3: 0000000181410000 CR4: > > 00000000000006f0 > > [ 3.560554] Call Trace: > > [ 3.560558] <TASK> > > [ 3.560560] ? __warn+0x7d/0xc0 > > [ 3.560566] ? nvkm_dp_acquire+0x26a/0x490 [nouveau] > > [ 3.560671] ? report_bug+0xe6/0x170 > > [ 3.560675] ? handle_bug+0x41/0x70 > > [ 3.560679] ? exc_invalid_op+0x13/0x60 > > [ 3.560681] ? asm_exc_invalid_op+0x16/0x20 > > [ 3.560685] ? init_reset_begun+0x20/0x20 [nouveau] > > [ 3.560769] ? nvkm_dp_acquire+0x26a/0x490 [nouveau] > > [ 3.560888] nv50_disp_super_2_2+0x70/0x430 [nouveau] > > [ 3.560997] nv50_disp_super+0x113/0x210 [nouveau] > > [ 3.561103] process_one_work+0x1c7/0x380 > > [ 3.561109] worker_thread+0x4d/0x380 > > [ 3.561113] ? rescuer_thread+0x3a0/0x3a0 > > [ 3.561116] kthread+0xe9/0x110 > > [ 3.561120] ? kthread_complete_and_exit+0x20/0x20 > > [ 3.561122] ret_from_fork+0x22/0x30 > > [ 3.561130] </TASK> > > > > Further information: > > > > $ lspci -v -s $(lspci | grep -i vga | awk '{ print $1 }') > > 01:00.0 VGA compatible controller: NVIDIA Corporation GT218M [NVS 3100M] > > (rev a2) (prog-if 00 [VGA controller]) > > Subsystem: Dell Latitude E6510 > > Flags: bus master, fast devsel, latency 0, IRQ 27 > > Memory at e2000000 (32-bit, non-prefetchable) [size=16M] > > Memory at d0000000 (64-bit, prefetchable) [size=256M] > > Memory at e0000000 (64-bit, prefetchable) [size=32M] > > I/O ports at 7000 [size=128] > > Expansion ROM at 000c0000 [disabled] [size=128K] > > Capabilities: <access denied> > > Kernel driver in use: nouveau > > Kernel modules: nouveau > > > > I reported this bug to debian already, see > > https://bugs.debian.org/1042753 for context. > > > > With support (thanks Diederik!) I managed to figure out that the cause > > was a regression between upstream kernel version 6.1.27 and 6.1.38. > > > > I build a new 6.1.38 kernel with these commits reverted: > > > > 62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL > > fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode > > 90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA device > > 5a144bad3e75 nouveau: fix client work fence deletion race > >mind retrying with only fb725beca62d and 62aecf23f3d1 reverted? Would be weird if the other two commits are causing it. If that's the case, it's a bit worrying that reverting either of the those causes issues, but maybe there is a good reason for it. Anyway, mind figuring out which of the two you need reverted to fix your issue? Thanks!> > With that kernel the graphic works again. > > > > Please inform me if further tests are required. > > FWIW, to be sure the issue doesn't fall through the cracks unnoticed, > I'm adding it to regzbot, the Linux kernel regression tracking bot: > > #regzbot ^introduced v6.1.27..v6.1.38 > #regzbot title drm/nouveau: display stays black > #regzbot ignore-activity > > This isn't a regression? This issue or a fix for it are already > discussed somewhere else? It was fixed already? You want to clarify when > the regression started to happen? Or point out I got the title or > something else totally wrong? Then just reply and tell me -- ideally > while also telling regzbot about it, as explained by the page listed in > the footer of this mail. > > Developers: When fixing the issue, remember to add 'Link:' tags pointing > to the report (the parent of this mail). See page linked in footer for > details. > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > -- > Everything you wanna know about Linux kernel regression tracking: > https://linux-regtracking.leemhuis.info/about/#tldr > That page also explains what to do if mails like this annoy you. >
On Fri, 4 Aug 2023 at 14:15, Karol Herbst wrote:> mind retrying with only fb725beca62d and 62aecf23f3d1 reverted?I will do this later this day (takes some time, it is a slow machine).> Would be weird if the other two commits are causing it. If that's the > case, it's a bit worrying that reverting either of the those causes > issues, but maybe there is a good reason for it. Anyway, mind figuring > out which of the two you need reverted to fix your issue? Thanks!I can do this. But if I build two kernels anyway, isn't it faster to build each with only one of the patches applied? Or do you expect the patches to interact (so that the bug would only be present when both are applied)? Cheers, Olaf
On Fri, Aug 4, 2023 at 2:48?PM Olaf Skibbe <news at kravcenko.com> wrote:> > On Fri, 4 Aug 2023 at 14:15, Karol Herbst wrote: > > > mind retrying with only fb725beca62d and 62aecf23f3d1 reverted? > > I will do this later this day (takes some time, it is a slow machine). > > > Would be weird if the other two commits are causing it. If that's the > > case, it's a bit worrying that reverting either of the those causes > > issues, but maybe there is a good reason for it. Anyway, mind figuring > > out which of the two you need reverted to fix your issue? Thanks! > > I can do this. But if I build two kernels anyway, isn't it faster to > build each with only one of the patches applied? Or do you expect the > patches to interact (so that the bug would only be present when both are > applied)? >How are you building the kernel? Because normally from git reverting one of those shouldn't take long, because it doesn't recompile the entire kernel. But yeah, you can potentially just revert one of one for now and it should be fine.> Cheers, > Olaf >
Dear all, On Fri, 4 Aug 2023 at 14:15, Karol Herbst wrote:>>> 62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL >>> fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode >>> 90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA device >>> 5a144bad3e75 nouveau: fix client work fence deletion race > > mind retrying with only fb725beca62d and 62aecf23f3d1 reverted? Would > be weird if the other two commits are causing it. If that's the case, > it's a bit worrying that reverting either of the those causes issues, > but maybe there is a good reason for it. Anyway, mind figuring out > which of the two you need reverted to fix your issue? Thanks!The result is: Patch with commit fb725beca62d reverted: Graphics works. I attached the respective patch again to this mail. Patch with commit 62aecf23f3d1 reverted: Screen remains black, error message: # dmesg | grep -A 36 "cut here" [ 2.921358] ------------[ cut here ]------------ [ 2.921361] WARNING: CPU: 1 PID: 176 at drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460 nvkm_dp_acquire+0x26a/0x490 [nouveau] [ 2.921627] Modules linked in: sd_mod(E) t10_pi(E) crc64_rocksoft(E) sr_mod(E) crc64(E) crc_t10dif(E) crct10dif_generic(E) cdrom(E) nouveau(E+) mxm_wmi(E) i2c_algo_bit(E) drm_display_helper(E) cec(E) ahci(E) rc_core(E) drm_ttm_helper(E) libahci(E) ttm(E) ehci_pci(E) crct10dif_pclmul(E) crct10dif_common(E) ehci_hcd(E) drm_kms_helper(E) crc32_pclmul(E) firewire_ohci(E) sdhci_pci(E) cqhci(E) libata(E) e1000e(E) sdhci(E) psmouse(E) crc32c_intel(E) lpc_ich(E) ptp(E) i2c_i801(E) scsi_mod(E) i2c_smbus(E) firewire_core(E) scsi_common(E) usbcore(E) crc_itu_t(E) mmc_core(E) drm(E) pps_core(E) usb_common(E) battery(E) video(E) wmi(E) button(E) [ 2.921695] CPU: 1 PID: 176 Comm: kworker/u16:5 Tainted: G E 6.1.0-0.a.test-amd64 #1 Debian 6.1.38-2a~test [ 2.921701] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17 05/12/2017 [ 2.921705] Workqueue: nvkm-disp nv50_disp_super [nouveau] [ 2.921948] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau] [ 2.922192] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37 02 00 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26 [ 2.922196] RSP: 0018:ffffc077c04dfd60 EFLAGS: 00010246 [ 2.922201] RAX: 0000000000041eb0 RBX: ffff9a8482624c00 RCX: 0000000000041eb0 [ 2.922204] RDX: ffffffffc0b47760 RSI: 0000000000000000 RDI: ffffc077c04dfcf0 [ 2.922206] RBP: 0000000000000001 R08: ffffc077c04dfc64 R09: 0000000000005b76 [ 2.922209] R10: 000000000000000d R11: ffffc077c04dfde0 R12: 00000000ffffffea [ 2.922212] R13: ffff9a8517541e00 R14: 0000000000044d45 R15: 0000000000000000 [ 2.922215] FS: 0000000000000000(0000) GS:ffff9a85a3c40000(0000) knlGS:0000000000000000 [ 2.922219] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.922222] CR2: 000055f660bcb3a8 CR3: 0000000197610000 CR4: 00000000000006e0 [ 2.922226] Call Trace: [ 2.922231] <TASK> [ 2.922235] ? __warn+0x7d/0xc0 [ 2.922244] ? nvkm_dp_acquire+0x26a/0x490 [nouveau] [ 2.922487] ? report_bug+0xe6/0x170 [ 2.922494] ? handle_bug+0x41/0x70 [ 2.922501] ? exc_invalid_op+0x13/0x60 [ 2.922505] ? asm_exc_invalid_op+0x16/0x20 [ 2.922512] ? init_reset_begun+0x20/0x20 [nouveau] [ 2.922708] ? nvkm_dp_acquire+0x26a/0x490 [nouveau] [ 2.922954] nv50_disp_super_2_2+0x70/0x430 [nouveau] [ 2.923200] nv50_disp_super+0x113/0x210 [nouveau] [ 2.923445] process_one_work+0x1c7/0x380 [ 2.923456] worker_thread+0x4d/0x380 [ 2.923463] ? rescuer_thread+0x3a0/0x3a0 [ 2.923469] kthread+0xe9/0x110 [ 2.923476] ? kthread_complete_and_exit+0x20/0x20 [ 2.923482] ret_from_fork+0x22/0x30 [ 2.923493] </TASK> [ 2.923494] ---[ end trace 0000000000000000 ]--- (Maybe it's worth to mention that the LED back-light is on, while the screen appears black.) Cheers, Olaf P.S.: By the way: as a linux user for more than 20 years, I am very pleased to have the opportunity to contribute at least a little bit to the improvement. I'd like to use the chance to thank you all very much for building and developing this great operating system. -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-Revert-drm-nouveau-dp-check-for-NULL-nv_connector-na.patch Type: text/x-diff Size: 1612 bytes Desc: URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20230804/25dae8b1/attachment-0001.patch>
On Fri, Aug 4, 2023 at 8:10?PM Olaf Skibbe <news at kravcenko.com> wrote:> > Dear all, > > On Fri, 4 Aug 2023 at 14:15, Karol Herbst wrote: > > >>> 62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL > >>> fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode > >>> 90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA device > >>> 5a144bad3e75 nouveau: fix client work fence deletion race > > > > mind retrying with only fb725beca62d and 62aecf23f3d1 reverted? Would > > be weird if the other two commits are causing it. If that's the case, > > it's a bit worrying that reverting either of the those causes issues, > > but maybe there is a good reason for it. Anyway, mind figuring out > > which of the two you need reverted to fix your issue? Thanks! > > The result is: > > Patch with commit fb725beca62d reverted: Graphics works. I attached the > respective patch again to this mail. >Mind checking if instead of reverting the entire commit that this is enough to fix it as well? https://gitlab.freedesktop.org/karolherbst/nouveau/-/commit/f99ae069876f7ffeb6368da0381485e8c3adda43.patch> Patch with commit 62aecf23f3d1 reverted: Screen remains black, error > message: > > # dmesg | grep -A 36 "cut here" > [ 2.921358] ------------[ cut here ]------------ > [ 2.921361] WARNING: CPU: 1 PID: 176 at drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460 nvkm_dp_acquire+0x26a/0x490 [nouveau] > [ 2.921627] Modules linked in: sd_mod(E) t10_pi(E) crc64_rocksoft(E) sr_mod(E) crc64(E) crc_t10dif(E) crct10dif_generic(E) cdrom(E) nouveau(E+) mxm_wmi(E) i2c_algo_bit(E) drm_display_helper(E) cec(E) ahci(E) rc_core(E) drm_ttm_helper(E) libahci(E) ttm(E) ehci_pci(E) crct10dif_pclmul(E) crct10dif_common(E) ehci_hcd(E) drm_kms_helper(E) crc32_pclmul(E) firewire_ohci(E) sdhci_pci(E) cqhci(E) libata(E) e1000e(E) sdhci(E) psmouse(E) crc32c_intel(E) lpc_ich(E) ptp(E) i2c_i801(E) scsi_mod(E) i2c_smbus(E) firewire_core(E) scsi_common(E) usbcore(E) crc_itu_t(E) mmc_core(E) drm(E) pps_core(E) usb_common(E) battery(E) video(E) wmi(E) button(E) > [ 2.921695] CPU: 1 PID: 176 Comm: kworker/u16:5 Tainted: G E 6.1.0-0.a.test-amd64 #1 Debian 6.1.38-2a~test > [ 2.921701] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17 05/12/2017 > [ 2.921705] Workqueue: nvkm-disp nv50_disp_super [nouveau] > [ 2.921948] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau] > [ 2.922192] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37 02 00 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26 > [ 2.922196] RSP: 0018:ffffc077c04dfd60 EFLAGS: 00010246 > [ 2.922201] RAX: 0000000000041eb0 RBX: ffff9a8482624c00 RCX: 0000000000041eb0 > [ 2.922204] RDX: ffffffffc0b47760 RSI: 0000000000000000 RDI: ffffc077c04dfcf0 > [ 2.922206] RBP: 0000000000000001 R08: ffffc077c04dfc64 R09: 0000000000005b76 > [ 2.922209] R10: 000000000000000d R11: ffffc077c04dfde0 R12: 00000000ffffffea > [ 2.922212] R13: ffff9a8517541e00 R14: 0000000000044d45 R15: 0000000000000000 > [ 2.922215] FS: 0000000000000000(0000) GS:ffff9a85a3c40000(0000) knlGS:0000000000000000 > [ 2.922219] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 2.922222] CR2: 000055f660bcb3a8 CR3: 0000000197610000 CR4: 00000000000006e0 > [ 2.922226] Call Trace: > [ 2.922231] <TASK> > [ 2.922235] ? __warn+0x7d/0xc0 > [ 2.922244] ? nvkm_dp_acquire+0x26a/0x490 [nouveau] > [ 2.922487] ? report_bug+0xe6/0x170 > [ 2.922494] ? handle_bug+0x41/0x70 > [ 2.922501] ? exc_invalid_op+0x13/0x60 > [ 2.922505] ? asm_exc_invalid_op+0x16/0x20 > [ 2.922512] ? init_reset_begun+0x20/0x20 [nouveau] > [ 2.922708] ? nvkm_dp_acquire+0x26a/0x490 [nouveau] > [ 2.922954] nv50_disp_super_2_2+0x70/0x430 [nouveau] > [ 2.923200] nv50_disp_super+0x113/0x210 [nouveau] > [ 2.923445] process_one_work+0x1c7/0x380 > [ 2.923456] worker_thread+0x4d/0x380 > [ 2.923463] ? rescuer_thread+0x3a0/0x3a0 > [ 2.923469] kthread+0xe9/0x110 > [ 2.923476] ? kthread_complete_and_exit+0x20/0x20 > [ 2.923482] ret_from_fork+0x22/0x30 > [ 2.923493] </TASK> > [ 2.923494] ---[ end trace 0000000000000000 ]--- > > (Maybe it's worth to mention that the LED back-light is on, while the > screen appears black.) > > Cheers, > Olaf > > P.S.: By the way: as a linux user for more than 20 years, I am very > pleased to have the opportunity to contribute at least a little bit to > the improvement. I'd like to use the chance to thank you all very much > for building and developing this great operating system.