On Sat, Nov 5, 2022 at 8:36 PM Timothy Madden <terminatorul at gmail.com> wrote:> > Hello > > My Msi Gaming X Trio 2080 Ti randomly ramps up the fans with no way to recover > (I have to reboot) even when the card is idle or is only showing the desktop. > > This issue happens even when the card is not connected to a monitor. > > My dmesg output from nouveau is included below, I think the last 2 lines are > the relevant ones: > [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to change power state from D3hot to D0, device inaccessible > [ 9427.889387] nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff > >that's kind of odd, because "nvidia-gpu" implies you might have multiple drivers here? Though .3 should be some USB/UCSI or something related sub device on the GPU and Nvidia might have messed it up (adding the maintainer of the i2c-nvidia-gpu driver on CC). Anyway, the fans are probably controlled by the Laptops firmware and maybe something goes wrong with the runtime power management feature here, which as far as I can tell works on the Nouveau side, but i2c-nvidia-gpu might prevent the GPU from powering done and so causing more heat. It's also interesting that the GPU runs that hot, but given we don't support changing power states yet in Nouveau (still WIP wiring up the new released firmware from nvidia), not much we can do while the GPU is actually in use at this point.> > > timothy at localhost:~> dmesg | grep -i -e nouveau -e nvidia > [ 6.511064] nouveau 0000:0b:00.0: NVIDIA TU102 (162000a1) > [ 6.594464] nouveau 0000:0b:00.0: bios: version 90.02.42.00.14 > [ 6.597756] nouveau 0000:0b:00.0: pmu: firmware unavailable > [ 6.601947] nouveau 0000:0b:00.0: fb: 11264 MiB GDDR6 > [ 6.618463] nouveau 0000:0b:00.0: DRM: VRAM: 11264 MiB > [ 6.618465] nouveau 0000:0b:00.0: DRM: GART: 536870912 MiB > [ 6.618466] nouveau 0000:0b:00.0: DRM: BIT table 'A' not found > [ 6.618468] nouveau 0000:0b:00.0: DRM: BIT table 'L' not found > [ 6.618469] nouveau 0000:0b:00.0: DRM: TMDS table version 2.0 > [ 6.618470] nouveau 0000:0b:00.0: DRM: DCB version 4.1 > [ 6.618471] nouveau 0000:0b:00.0: DRM: DCB outp 00: 02800f66 04600020 > [ 6.618473] nouveau 0000:0b:00.0: DRM: DCB outp 01: 02000f62 00020020 > [ 6.618474] nouveau 0000:0b:00.0: DRM: DCB outp 03: 02011f52 00020010 > [ 6.618475] nouveau 0000:0b:00.0: DRM: DCB outp 04: 04822f76 04600010 > [ 6.618476] nouveau 0000:0b:00.0: DRM: DCB outp 05: 04022f72 00020010 > [ 6.618477] nouveau 0000:0b:00.0: DRM: DCB outp 08: 01844f36 04600010 > [ 6.618478] nouveau 0000:0b:00.0: DRM: DCB outp 09: 01044f32 00020010 > [ 6.618479] nouveau 0000:0b:00.0: DRM: DCB outp 10: 04833f86 04600020 > [ 6.618481] nouveau 0000:0b:00.0: DRM: DCB conn 00: 00020046 > [ 6.618481] nouveau 0000:0b:00.0: DRM: DCB conn 01: 00010161 > [ 6.618482] nouveau 0000:0b:00.0: DRM: DCB conn 02: 01000246 > [ 6.618483] nouveau 0000:0b:00.0: DRM: DCB conn 03: 02000371 > [ 6.618484] nouveau 0000:0b:00.0: DRM: DCB conn 04: 00001446 > [ 6.620448] nouveau 0000:0b:00.0: DRM: MM: using COPY for buffer copies > [ 7.062338] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > [ 7.065331] [drm] Initialized nouveau 1.3.1 20120801 for 0000:0b:00.0 on minor 1 > [ 7.254317] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > [ 7.446318] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > [ 8.501252] nvidia-gpu 0000:0b:00.3: enabling device (0000 -> 0002) > [ 8.696138] audit: type=1400 audit(1667665884.700:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=926 comm="apparmor_parser" > [ 8.696141] audit: type=1400 audit(1667665884.700:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=926 comm="apparmor_parser" > [ 8.704333] snd_hda_intel 0000:0b:00.1: bound 0000:0b:00.0 (ops nv50_audio_component_bind_ops [nouveau]) > [ 8.708797] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input15 > [ 8.708903] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input16 > [ 8.708936] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input17 > [ 8.708965] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input18 > [ 8.708994] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input19 > [ 8.709032] input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input20 > [ 8.709065] input: HDA NVidia HDMI/DP,pcm=12 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input21 > [ 10.776280] nouveau 0000:0b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none > [ 3275.720190] nouveau 0000:0b:00.0: therm: temperature (90 C) hit the 'fanboost' threshold > [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to change power state from D3hot to D0, device inaccessible > [ 9427.889387] nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff > timothy at localhost:~> >
Hi> -----Original Message----- > From: Karol Herbst <kherbst at redhat.com> > Sent: Monday, November 7, 2022 3:42 AM > To: Timothy Madden <terminatorul at gmail.com> > Cc: nouveau at lists.freedesktop.org; Ajay Gupta <ajayg at nvidia.com> > Subject: Re: [Nouveau] Fans ramping up randomly when idle > > External email: Use caution opening links or attachments > > > On Sat, Nov 5, 2022 at 8:36 PM Timothy Madden <terminatorul at gmail.com> > wrote: > > > > Hello > > > > My Msi Gaming X Trio 2080 Ti randomly ramps up the fans with no way to > > recover (I have to reboot) even when the card is idle or is only showing the > desktop. > > > > This issue happens even when the card is not connected to a monitor. > > > > My dmesg output from nouveau is included below, I think the last 2 > > lines are the relevant ones: > > [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to change power state > > from D3hot to D0, device inaccessible [ 9427.889387] nvidia-gpu > > 0000:0b:00.3: i2c timeout error ffffffffThis only implies that there is no usb/ucsi device on the card, it is expected from such cards and should be seen in dmesg even when heating issue is not there. Thanks>nvpublic > > > > > > that's kind of odd, because "nvidia-gpu" implies you might have multiple > drivers here? Though .3 should be some USB/UCSI or something related sub > device on the GPU and Nvidia might have messed it up (adding the > maintainer of the i2c-nvidia-gpu driver on CC). > > Anyway, the fans are probably controlled by the Laptops firmware and > maybe something goes wrong with the runtime power management feature > here, which as far as I can tell works on the Nouveau side, but i2c-nvidia-gpu > might prevent the GPU from powering done and so causing more heat. It's > also interesting that the GPU runs that hot, but given we don't support > changing power states yet in Nouveau (still WIP wiring up the new released > firmware from nvidia), not much we can do while the GPU is actually in use at > this point. > > > > > > > timothy at localhost:~> dmesg | grep -i -e nouveau -e nvidia > > [ 6.511064] nouveau 0000:0b:00.0: NVIDIA TU102 (162000a1) > > [ 6.594464] nouveau 0000:0b:00.0: bios: version 90.02.42.00.14 > > [ 6.597756] nouveau 0000:0b:00.0: pmu: firmware unavailable > > [ 6.601947] nouveau 0000:0b:00.0: fb: 11264 MiB GDDR6 > > [ 6.618463] nouveau 0000:0b:00.0: DRM: VRAM: 11264 MiB > > [ 6.618465] nouveau 0000:0b:00.0: DRM: GART: 536870912 MiB > > [ 6.618466] nouveau 0000:0b:00.0: DRM: BIT table 'A' not found > > [ 6.618468] nouveau 0000:0b:00.0: DRM: BIT table 'L' not found > > [ 6.618469] nouveau 0000:0b:00.0: DRM: TMDS table version 2.0 > > [ 6.618470] nouveau 0000:0b:00.0: DRM: DCB version 4.1 > > [ 6.618471] nouveau 0000:0b:00.0: DRM: DCB outp 00: 02800f66 04600020 > > [ 6.618473] nouveau 0000:0b:00.0: DRM: DCB outp 01: 02000f62 00020020 > > [ 6.618474] nouveau 0000:0b:00.0: DRM: DCB outp 03: 02011f52 00020010 > > [ 6.618475] nouveau 0000:0b:00.0: DRM: DCB outp 04: 04822f76 04600010 > > [ 6.618476] nouveau 0000:0b:00.0: DRM: DCB outp 05: 04022f72 00020010 > > [ 6.618477] nouveau 0000:0b:00.0: DRM: DCB outp 08: 01844f36 04600010 > > [ 6.618478] nouveau 0000:0b:00.0: DRM: DCB outp 09: 01044f32 00020010 > > [ 6.618479] nouveau 0000:0b:00.0: DRM: DCB outp 10: 04833f86 04600020 > > [ 6.618481] nouveau 0000:0b:00.0: DRM: DCB conn 00: 00020046 > > [ 6.618481] nouveau 0000:0b:00.0: DRM: DCB conn 01: 00010161 > > [ 6.618482] nouveau 0000:0b:00.0: DRM: DCB conn 02: 01000246 > > [ 6.618483] nouveau 0000:0b:00.0: DRM: DCB conn 03: 02000371 > > [ 6.618484] nouveau 0000:0b:00.0: DRM: DCB conn 04: 00001446 > > [ 6.620448] nouveau 0000:0b:00.0: DRM: MM: using COPY for buffer > copies > > [ 7.062338] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > > [ 7.065331] [drm] Initialized nouveau 1.3.1 20120801 for 0000:0b:00.0 on > minor 1 > > [ 7.254317] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > > [ 7.446318] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > > [ 8.501252] nvidia-gpu 0000:0b:00.3: enabling device (0000 -> 0002) > > [ 8.696138] audit: type=1400 audit(1667665884.700:5): > apparmor="STATUS" operation="profile_load" profile="unconfined" > name="nvidia_modprobe" pid=926 comm="apparmor_parser" > > [ 8.696141] audit: type=1400 audit(1667665884.700:6): > apparmor="STATUS" operation="profile_load" profile="unconfined" > name="nvidia_modprobe//kmod" pid=926 comm="apparmor_parser" > > [ 8.704333] snd_hda_intel 0000:0b:00.1: bound 0000:0b:00.0 (ops > nv50_audio_component_bind_ops [nouveau]) > > [ 8.708797] input: HDA NVidia HDMI/DP,pcm=3 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input15 > > [ 8.708903] input: HDA NVidia HDMI/DP,pcm=7 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input16 > > [ 8.708936] input: HDA NVidia HDMI/DP,pcm=8 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input17 > > [ 8.708965] input: HDA NVidia HDMI/DP,pcm=9 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input18 > > [ 8.708994] input: HDA NVidia HDMI/DP,pcm=10 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input19 > > [ 8.709032] input: HDA NVidia HDMI/DP,pcm=11 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input20 > > [ 8.709065] input: HDA NVidia HDMI/DP,pcm=12 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input21 > > [ 10.776280] nouveau 0000:0b:00.0: vgaarb: changed VGA decodes: > olddecodes=io+mem,decodes=none:owns=none > > [ 3275.720190] nouveau 0000:0b:00.0: therm: temperature (90 C) hit the > > 'fanboost' threshold [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to > > change power state from D3hot to D0, device inaccessible [ > > 9427.889387] nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff > > timothy at localhost:~> > >
On 11/7/22 13:41, Karol Herbst wrote:> On Sat, Nov 5, 2022 at 8:36 PM Timothy Madden <terminatorul at gmail.com> wrote: >> >> Hello >> >> My Msi Gaming X Trio 2080 Ti randomly ramps up the fans with no way to recover >> (I have to reboot) even when the card is idle or is only showing the desktop. >> >> This issue happens even when the card is not connected to a monitor. >> >> My dmesg output from nouveau is included below, I think the last 2 lines are >> the relevant ones: >> [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to change power state from D3hot to D0, device inaccessible >> [ 9427.889387] nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff >> >> > > that's kind of odd, because "nvidia-gpu" implies you might have > multiple drivers here? Though .3 should be some USB/UCSI or something > related sub device on the GPU and Nvidia might have messed it up > (adding the maintainer of the i2c-nvidia-gpu driver on CC).Is there a way to check for multiple drivers ? I have openSUSE Tumbleweed at version 2022-11-08, and I did not install proprietary or other NVIDIA drivers.> > Anyway, the fans are probably controlled by the Laptops firmware andI meant the fans on the graphics card. No laptop here, my desktop computer has a Gigabyte X470 Aorus Gaming 5 WiFi motherboard with the latest UEFI from gigabyte web site.> maybe something goes wrong with the runtime power management feature > here, which as far as I can tell works on the Nouveau side, but > i2c-nvidia-gpu might prevent the GPU from powering done and so causing > more heat. It's also interesting that the GPU runs that hot, but given > we don't support changing power states yet in Nouveau (still WIP > wiring up the new released firmware from nvidia), not much we can do > while the GPU is actually in use at this point. > >>