riveravaldez
2021-Oct-26 00:55 UTC
[Nouveau] System freeze: Debian Stable with C61 [GeForce 7025 / nForce 630a]
Hi, this may be an old issue. I copied at the bottom the last message of a previous solution for this same machine. Essentially I have random freezes (display image seems like a snow storm or just gets frozen), nothing works, I'm forced to hard reset the machine. Last time the solution was just to remove nouveau_dri.so, and for what seems to be a couple of years system worked rock-solid. But a couple of days ago I did a system update (I'm on Debian Stable Bullseye right now) and apparently the problem reappeared but worse/different: now it happens even without nouveau_dri.so present on the system. (Meaning: I remove nouveau_dri.so and the freezes happen randomly anyway.) The hardware is the same, so, I'm imaging maybe is a kernel issue? This is the hardware: $ lspci 00:00.0 RAM memory: NVIDIA Corporation MCP61 Host Bridge (rev a1) 00:01.0 ISA bridge: NVIDIA Corporation MCP61 LPC Bridge (rev a2) 00:01.1 SMBus: NVIDIA Corporation MCP61 SMBus (rev a2) 00:01.2 RAM memory: NVIDIA Corporation MCP61 Memory Controller (rev a2) 00:02.0 USB controller: NVIDIA Corporation MCP61 USB 1.1 Controller (rev a3) 00:02.1 USB controller: NVIDIA Corporation MCP61 USB 2.0 Controller (rev a3) 00:04.0 PCI bridge: NVIDIA Corporation MCP61 PCI bridge (rev a1) 00:05.0 Audio device: NVIDIA Corporation MCP61 High Definition Audio (rev a2) 00:06.0 IDE interface: NVIDIA Corporation MCP61 IDE (rev a2) 00:07.0 Bridge: NVIDIA Corporation MCP61 Ethernet (rev a2) 00:08.0 IDE interface: NVIDIA Corporation MCP61 SATA Controller (rev a2) 00:08.1 IDE interface: NVIDIA Corporation MCP61 SATA Controller (rev a2) 00:0d.0 VGA compatible controller: NVIDIA Corporation C61 [GeForce 7025 / nForce 630a] (rev a2) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Address Map 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Link Control With: $ uname -a Linux debian 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64 GNU/Linux And: $ apt-cache policy xserver-xorg-video-nouveau xserver-xorg-video-nouveau: Instalados: 1:1.0.17-1 Candidato: 1:1.0.17-1 Tabla de versi?n: *** 1:1.0.17-1 500 500 https://deb.debian.org/debian bullseye/main amd64 Packages 100 /var/lib/dpkg/status And this seems to be all the info I have: $ sudo journalctl -S 2021-10-21 -x -p 4 | grep nouveau oct 21 13:16:43 debian kernel: nouveau 0000:00:0d.0: DRM: DCB type 4 not known oct 21 13:16:43 debian kernel: nouveau 0000:00:0d.0: DRM: Unknown-1 has no encoders, removing oct 21 13:17:00 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 005c0001 FAULT at 00b000 oct 21 17:12:05 debian kernel: nouveau 0000:00:0d.0: DRM: DCB type 4 not known oct 21 17:12:05 debian kernel: nouveau 0000:00:0d.0: DRM: Unknown-1 has no encoders, removing oct 21 17:12:18 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 005c0001 FAULT at 00b000 oct 21 17:22:21 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 014b0001 FAULT at 00b010 oct 21 17:22:21 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 014b0001 FAULT at 00b010 oct 21 21:32:55 debian kernel: autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sd_mod t10_pi crc_t10dif crct10dif_generic crct10dif_common nouveau video mxm_wmi wmi i2c_algo_bit drm_kms_helper cec ttm ata_generic sata_nv drm libata scsi_mod psmouse serio_raw evdev button oct 21 22:22:53 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 01f20001 FAULT at 00b020 oct 21 22:23:14 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 00000000 FAULT at 00b020 oct 21 22:23:23 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 01f20001 FAULT at 00b020 oct 21 22:25:32 debian kernel: nouveau 0000:00:0d.0: DRM: DCB type 4 not known oct 21 22:25:32 debian kernel: nouveau 0000:00:0d.0: DRM: Unknown-1 has no encoders, removing oct 21 22:26:03 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 005c0001 FAULT at 00b000 oct 21 22:46:40 debian kernel: nouveau 0000:00:0d.0: DRM: DCB type 4 not known oct 21 22:46:40 debian kernel: nouveau 0000:00:0d.0: DRM: Unknown-1 has no encoders, removing oct 21 22:46:51 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 005c0001 FAULT at 00b000 oct 21 22:59:55 debian kernel: crct10dif_common nouveau video mxm_wmi wmi i2c_algo_bit drm_kms_helper ata_generic sata_nv cec libata ttm drm scsi_mod psmouse evdev serio_raw button oct 21 23:44:26 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 01570001 FAULT at 00b010 oct 21 23:44:26 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 01ef0001 FAULT at 00b020 oct 21 23:45:08 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 02550001 FAULT at 00b030 oct 21 23:45:09 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 01440001 FAULT at 00b030 oct 21 23:45:09 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 02610001 FAULT at 00b030 oct 21 23:45:09 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 015e0001 FAULT at 00b030 oct 21 23:45:10 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 02610001 FAULT at 00b030 oct 21 23:45:11 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 01440001 FAULT at 00b030 oct 21 23:45:13 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 02550001 FAULT at 00b030 oct 21 23:45:13 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 01440001 FAULT at 00b030 oct 21 23:45:16 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 02610001 FAULT at 00b040 oct 21 23:45:17 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 02550001 FAULT at 00b040 oct 21 23:45:17 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 00000000 FAULT at 00b030 oct 21 23:45:24 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 00000000 FAULT at 00b040 oct 21 23:48:53 debian kernel: nouveau 0000:00:0d.0: DRM: DCB type 4 not known oct 21 23:48:53 debian kernel: nouveau 0000:00:0d.0: DRM: Unknown-1 has no encoders, removing oct 21 23:49:21 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 005c0001 FAULT at 00b000 oct 22 00:28:54 debian kernel: nouveau 0000:00:0d.0: DRM: DCB type 4 not known oct 22 00:28:54 debian kernel: nouveau 0000:00:0d.0: DRM: Unknown-1 has no encoders, removing oct 22 00:29:04 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 005c0001 FAULT at 00b000 oct 23 14:48:33 debian kernel: parport_pc ppdev lp parport fuse configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sd_mod t10_pi crc_t10dif crct10dif_generic crct10dif_common nouveau video mxm_wmi wmi i2c_algo_bit drm_kms_helper cec ttm drm sata_nv ata_generic psmouse libata serio_raw scsi_mod evdev button oct 24 03:18:13 debian kernel: nouveau 0000:00:0d.0: DRM: DCB type 4 not known oct 24 03:18:13 debian kernel: nouveau 0000:00:0d.0: DRM: Unknown-1 has no encoders, removing oct 24 03:18:38 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 005c0001 FAULT at 00b000 oct 24 11:08:30 debian kernel: nouveau 0000:00:0d.0: DRM: DCB type 4 not known oct 24 11:08:30 debian kernel: nouveau 0000:00:0d.0: DRM: Unknown-1 has no encoders, removing oct 24 11:09:04 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 005c0001 FAULT at 00b000 oct 24 11:12:01 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 01740001 FAULT at 00b010 oct 24 11:12:02 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 018b0001 FAULT at 00b020 oct 24 11:12:18 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 02150001 FAULT at 00b030 oct 24 11:12:23 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 00000000 FAULT at 00b030 oct 24 11:12:23 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 02910001 FAULT at 00b030 oct 24 11:23:33 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 00000000 FAULT at 00b030 oct 24 12:33:35 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 00000000 FAULT at 00b010 oct 24 12:33:35 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 00000000 FAULT at 00b020 oct 24 12:33:35 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 005c0001 FAULT at 00b000 oct 24 12:33:45 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 014a0001 FAULT at 00b010 oct 24 12:33:45 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 018b0001 FAULT at 00b020 oct 24 13:09:37 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 02290001 FAULT at 00b030 oct 24 13:09:38 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 01710001 FAULT at 00b040 oct 24 13:09:38 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 00000000 FAULT at 00b030 oct 24 13:09:39 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 00000000 FAULT at 00b040 oct 24 13:09:39 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 02870001 FAULT at 00b030 oct 24 13:09:39 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 01570001 FAULT at 00b030 oct 24 13:09:42 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write of 00000000 FAULT at 00b030 Sorry for not trimming anything, not sure what's useful and what's not. Any hint? As previously, thanks A LOT in advance. Best regards! On 1/29/20, Ilia Mirkin <imirkin at alum.mit.edu> wrote:> On Wed, Jan 29, 2020 at 5:03 AM riveravaldez <riveravaldezmail at gmail.com> > wrote: >> >> On 12/11/18, Ilia Mirkin <imirkin at alum.mit.edu> wrote: >> > On Tue, Dec 11, 2018 at 11:16 AM riveravaldez >> > <riveravaldezmail at gmail.com> wrote: >> > >> >> The freezes appears randomly, in every situation, and not when I >> >> launch some 3D applications or anything similar. >> > >> > Try removing nouveau_dri.so -- that will ensure no 3d accel is used, >> > while keeping your 2d accel provided by the nouveau ddx. >> >> Sorry if it's wrong to continue this old thread, but after a good >> amount of testing (+1 year) I can confirm that both the problem and >> the solution where the mentioned ones. >> >> The problem (random full-system freezes) persists without change, >> identical. And removing nouveau_dri.so from >> /usr/lib/x86_64-linux-gnu/dri/ effectively fixes it completely >> (leaving aside any lost of performance and some warning messages in >> system upgrades and programs launching[1]). >> >> So, after a GREAT thank-you to Ilia, I consult: >> >> 1. Is this something that could be fixed? Can I do anything to help? >> >> 2. If the only possible/viable solution is the mentioned one (remove >> nouveau_dri.so), which would be the proper way to make it permanent? >> >> 2'. In many dist-upgrades the nouveau_dri.so file is re-created in the >> same folder, what would be a clean/neat way to handle this? >> >> Thanks A LOT again. >> >> [1] A lot of lines like these on some dist-upgrades: >> >> W: Possible missing firmware >> /lib/firmware/nvidia/gp100/gr/sw_method_init.bin for module nouveau >> W: Possible missing firmware >> /lib/firmware/nvidia/gp100/gr/sw_bundle_init.bin for module nouveau >> W: Possible missing firmware >> /lib/firmware/nvidia/gp100/gr/sw_nonctx.bin for module nouveau >> (...) > > Sounds like your initramfs builder tries to include these but they're > not available on your filesystem. As long as you're not plugging a > Pascal GPU into your system, you're fine. > >> >> And a lot of programs producing messages like these on start: >> >> libGL error: unable to load driver: nouveau_dri.so >> libGL error: driver pointer missing >> libGL error: failed to load driver: nouveau > > Hmmmm annoying. I hadn't considered that. I could add an option to the > DDX which makes the default driver "swrast" or something. I also > wonder if just not loading the "glx" and "dri2" X modules would be > sufficient to get rid of these. > > You can also stick LIBGL_ALWAYS_SOFTWARE=1 into your /etc/environment > (or whatever location causes that env var to appear everywhere) which > will force it to use swrast. (With the added benefit of being able to > unset it for the programs where you really do want 3d accel.) > > As for a more permanent fix, one could invest developer attention to > the nv30 gallium driver, but that one would first have to be located. > I'd be happy to provide some limited mentoring in such a case. > > Cheers, > > -ilia >