bugzilla-daemon at freedesktop.org
2018-Jul-21 20:36 UTC
[Nouveau] [Bug 107325] New: Reported temperature of nvidia card with nouveau driver is wrong
https://bugs.freedesktop.org/show_bug.cgi?id=107325 Bug ID: 107325 Summary: Reported temperature of nvidia card with nouveau driver is wrong Product: xorg Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: Driver/nouveau Assignee: nouveau at lists.freedesktop.org Reporter: j.novak at netsystem.cz QA Contact: xorg-team at lists.x.org Hello, I use Dell Precision 3530 with NVIDIA Corporation GP107GLM [Quadro P600 Mobile] (rev a1). I use Fedora Core 28 with 4.17.6 x86_64 kernel. I found that sensors tool shows wrong temperature: $ sensors nouveau-pci-0100 Adapter: PCI adapter temp1: +511.0°C (high = +95.0°C, hyst = +3.0°C) (crit = +105.0°C, hyst = +5.0°C) (emerg = +135.0°C, hyst = +5.0°C) Temperature is obviously wrong. I tried to troubleshoot it on sensors side and it looks that sensors tool receives this wrong value from driver. I made one more observation - right after suspend/wakeup the value is completely different: $ sensors nouveau-pci-0100 Adapter: PCI adapter temp1: +511.0°C (high = +95.0°C, hyst = +3.0°C) (crit = +105.0°C, hyst = +5.0°C) (emerg = +135.0°C, hyst = +5.0°C) $ sensors nouveau-pci-0100 Adapter: PCI adapter temp1: +43.0°C (high = +95.0°C, hyst = +3.0°C) (crit = +105.0°C, hyst = +5.0°C) (emerg = +135.0°C, hyst = +5.0°C) $ sensors nouveau-pci-0100 Adapter: PCI adapter temp1: +511.0°C (high = +95.0°C, hyst = +3.0°C) (crit = +105.0°C, hyst = +5.0°C) (emerg = +135.0°C, hyst = +5.0°C) I can provide more information when required. -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20180721/929584f9/attachment.html>
bugzilla-daemon at freedesktop.org
2018-Jul-22 15:08 UTC
[Nouveau] [Bug 107325] Reported temperature of nvidia card with nouveau driver is wrong
https://bugs.freedesktop.org/show_bug.cgi?id=107325 Rhys Kidd <rhyskidd at gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rhyskidd at gmail.com --- Comment #1 from Rhys Kidd <rhyskidd at gmail.com> --- Thanks for the bug report Jirka, It would be helpful if you could download and build a nouveau-related debug toolkit, envytools [0], and run the following commands (inside the nva/ subfolder): $ ./nvapeek 0x020460 $ ./nvapeek 0x020400 It would also be helpful to see a copy of your GPU's VBIOS attached here. This can be produced by running the below command: $ cat /sys/kernel/debug/dri/0/vbios.rom > vbios.rom [0] https://github.com/envytools/envytools -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20180722/d04ddb3b/attachment.html>
bugzilla-daemon at freedesktop.org
2018-Jul-22 15:46 UTC
[Nouveau] [Bug 107325] Reported temperature of nvidia card with nouveau driver is wrong
https://bugs.freedesktop.org/show_bug.cgi?id=107325 --- Comment #2 from Ilia Mirkin <imirkin at alum.mit.edu> --- Perhaps when it's runtime-suspended, the readings return all 1's, and we report 511 (0x1ff). Or some variation thereof. Jirka - if you boot with nouveau.runpm=0, I suspect the temperature will be fine -- good to check though. -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20180722/b049622a/attachment.html>
bugzilla-daemon at freedesktop.org
2018-Jul-22 19:12 UTC
[Nouveau] [Bug 107325] Reported temperature of nvidia card with nouveau driver is wrong
https://bugs.freedesktop.org/show_bug.cgi?id=107325 --- Comment #3 from Jirka Novak <j.novak at netsystem.cz> --- Hi,> It would be helpful if you could download and build a nouveau-related debug > toolkit, envytools [0], and run the following commands (inside the nva/ > subfolder): > > $ ./nvapeek 0x020460 > > $ ./nvapeek 0x020400Output is there, but I see different output for subsequent calls: # nvapeek 0x020460 00020460: 20003170 # nvapeek 0x020460 00020460: 20003180 # nvapeek 0x020460 00020460: 200031a8 # nvapeek 0x020460 00020460: 200031a0 # nvapeek 0x020460 00020460: 200031e8 # nvapeek 0x020400 00020400: 00000030 # nvapeek 0x020400 00020400: 00000031 # nvapeek 0x020400 00020400: 00000031 # nvapeek 0x020400 00020400: 00000031 # nvapeek 0x020400 00020400: 00000031 # nvapeek 0x020400 00020400: 00000032> It would also be helpful to see a copy of your GPU's VBIOS attached here. This > can be produced by running the below command: > > $ cat /sys/kernel/debug/dri/0/vbios.rom > vbios.romFile is attached. Best regards, Jirka Novak -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20180722/14c13bad/attachment.html>
bugzilla-daemon at freedesktop.org
2018-Jul-22 19:14 UTC
[Nouveau] [Bug 107325] Reported temperature of nvidia card with nouveau driver is wrong
https://bugs.freedesktop.org/show_bug.cgi?id=107325 --- Comment #4 from Jirka Novak <j.novak at netsystem.cz> --- Hi,> Perhaps when it's runtime-suspended, the readings return all 1's, and we report > 511 (0x1ff). Or some variation thereof. > > Jirka - if you boot with nouveau.runpm=0, I suspect the temperature will be > fine -- good to check though.yes, you are correct. It then returns 49-50 degrees. Best regards, Jirka Novak -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20180722/fe028b42/attachment.html>
bugzilla-daemon at freedesktop.org
2019-Mar-31 11:09 UTC
[Nouveau] [Bug 107325] Reported temperature of nvidia card with nouveau driver is wrong
https://bugs.freedesktop.org/show_bug.cgi?id=107325 Pacho Ramos <pachoramos1 at gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |pachoramos1 at gmail.com --- Comment #5 from Pacho Ramos <pachoramos1 at gmail.com> --- I have the same issue with kernel 4.19.30 still -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20190331/8a8a851d/attachment.html>
bugzilla-daemon at freedesktop.org
2019-Apr-10 14:31 UTC
[Nouveau] [Bug 107325] Reported temperature of nvidia card with nouveau driver is wrong
https://bugs.freedesktop.org/show_bug.cgi?id=107325 --- Comment #6 from Karol Herbst <karolherbst at gmail.com> --- this is a runtime suspend issue. While the GPU is suspended the temperature reading fails, but we don't actually check for that, so we return the error value (-1 & 0x1ff = 511). I think I had a patch for that somewhere, let me see. -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20190410/9e347104/attachment.html>
bugzilla-daemon at freedesktop.org
2019-Apr-10 14:34 UTC
[Nouveau] [Bug 107325] Reported temperature of nvidia card with nouveau driver is wrong
https://bugs.freedesktop.org/show_bug.cgi?id=107325 --- Comment #7 from Roy <nouveau at spliet.org> --- Will this bug interact with Lyude's recent patch, "drm/nouveau/i2c: Disable i2c bus access after ->fini()"? -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20190410/02a5e434/attachment.html>
bugzilla-daemon at freedesktop.org
2019-Apr-11 00:46 UTC
[Nouveau] [Bug 107325] Reported temperature of nvidia card with nouveau driver is wrong
https://bugs.freedesktop.org/show_bug.cgi?id=107325 --- Comment #8 from Karol Herbst <karolherbst at gmail.com> --- no. This was mainly for displays afaik and we read out the temperature through MMIO. -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20190411/a923fe17/attachment.html>
bugzilla-daemon at freedesktop.org
2019-Dec-04 09:44 UTC
[Nouveau] [Bug 107325] Reported temperature of nvidia card with nouveau driver is wrong
https://bugs.freedesktop.org/show_bug.cgi?id=107325 Martin Peres <martin.peres at free.fr> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |MOVED Status|NEW |RESOLVED --- Comment #9 from Martin Peres <martin.peres at free.fr> --- -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/445. -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20191204/bee3bd36/attachment.html>
Seemingly Similar Threads
- GeForce 8800GT fan control
- [PATCH 03/32] therm: Split return code and value in nvkm_get_temp
- CentOS 7 + Dell Latitude E6420 laptop = thermalshutdown
- [PATCH 03/32] therm: Split return code and value in nvkm_get_temp
- CentOS 7 + Dell Latitude E6420 laptop = thermal shutdown