bugzilla-daemon at freedesktop.org
2013-Aug-26 12:53 UTC
[Nouveau] [Bug 68572] New: shutdown threshold temperature sometimes isn't restored properly after hibernate
https://bugs.freedesktop.org/show_bug.cgi?id=68572 Priority: medium Bug ID: 68572 Assignee: nouveau at lists.freedesktop.org Summary: shutdown threshold temperature sometimes isn't restored properly after hibernate QA Contact: xorg-team at lists.x.org Severity: normal Classification: Unclassified OS: All Reporter: mr.dash.four at googlemail.com Hardware: Other Status: NEW Version: unspecified Component: Driver/nouveau Product: xorg This is what I had about an hour or so ago after restore from hibernate: Aug 26 13:04:36 test1 kernel: nouveau E[ PFIFO][0000:01:00.0] DMA_PUSHER - ch 1 [Xorg[1928]] get 0x00021cfc put 0x0001dcc8 state 0x8002b8c8 (err: INVALID_CMD) push 0x00000000 Aug 26 13:04:36 test1 kernel: nouveau E[ PFIFO][0000:01:00.0] DMA_PUSHER - ch 1 [Xorg[1928]] get 0x0002daa8 put 0x00042324 state 0x80000000 (err: INVALID_CMD) push 0x5f000000 Aug 26 13:04:36 test1 kernel: nouveau E[ PFIFO][0000:01:00.0] DMA_PUSHER - ch 1 [Xorg[1928]] get 0x0004232c put 0x00008800 state 0x80000000 (err: INVALID_CMD) push 0xff010000 Aug 26 13:04:36 test1 kernel: nouveau E[ PFIFO][0000:01:00.0] DMA_PUSHER - ch 1 [Xorg[1928]] get 0x00008810 put 0x0000dd88 state 0x80000000 (err: INVALID_CMD) push 0xff010000 Aug 26 13:04:36 test1 kernel: nouveau E[ PFIFO][0000:01:00.0] DMA_PUSHER - ch 1 [Xorg[1928]] get 0x0000dd88 put 0x0000a0cc state 0x00000000 (err: NONE) push 0x4d011000 Aug 26 13:04:36 test1 kernel: nouveau E[ PFIFO][0000:01:00.0] DMA_PUSHER - ch 1 [Xorg[1928]] get 0x0000a0d0 put 0x00008800 state 0x80000000 (err: INVALID_CMD) push 0xff010000 Aug 26 13:04:36 test1 kernel: nouveau [ PTHERM][0000:01:00.0] temperature (0 C) hit the 'shutdown' threshold Aug 26 13:04:36 test1 kernel: nouveau E[ PFIFO][0000:01:00.0] DMA_PUSHER - ch 1 [Xorg[1928]] get 0x00008810 put 0x80002264 state 0x80000000 (err: INVALID_CMD) push 0xff011000 Aug 26 13:04:36 test1 kernel: nouveau W[ PFIFO][0000:01:00.0] unknown intr 0x00010000, ch 3 Aug 26 13:04:36 test1 kernel: nouveau W[ PFIFO][0000:01:00.0] unknown intr 0x00010000, ch 9 [...] Aug 26 13:04:42 test1 kernel: nouveau E[Xorg[1928]] failed to idle channel 0xcccc0000 [Xorg[1928]] Aug 26 13:04:43 test1 acpid: exiting Aug 26 13:04:45 test1 kernel: nouveau E[Xorg[1928]] failed to idle channel 0xcccc0000 [Xorg[1928]] Aug 26 13:04:45 test1 kernel: nouveau W[ PFIFO][0000:01:00.0] unknown intr 0x00010000, ch 2 Aug 26 13:04:45 test1 kernel: nouveau W[ PFIFO][0000:01:00.0] unknown intr 0x00010000, ch 2 Aug 26 13:04:45 test1 kernel: nouveau W[ PFIFO][0000:01:00.0] unknown intr 0x00010000, ch 2 Aug 26 13:04:45 test1 kernel: nouveau W[ PFIFO][0000:01:00.0] unknown intr 0x00010000, ch 2 [...] Aug 26 13:04:45 test1 kernel: nouveau E[ PFIFO][0000:01:00.0] still angry after 101 spins, halt As evident from the above logs, either the shutdown threshold temperature or my current card temperature is 0C (or both) for some reason, which causes my video card to freeze for a couple of seconds and then shut itself down. My guess is that these values are not restored properly after hibernate. This doesn't happen very often though - it is the first time I am seeing this after about 20+ hibernate/restore cycles. Also worth pointing out that I have the nouveau patches for bug #66177 applied and I am in auto fan management mode, which has been working properly. -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/nouveau/attachments/20130826/0146d4c7/attachment.html>
bugzilla-daemon at freedesktop.org
2013-Aug-26 14:09 UTC
[Nouveau] [Bug 68572] shutdown threshold temperature sometimes isn't restored properly after hibernate
https://bugs.freedesktop.org/show_bug.cgi?id=68572 --- Comment #1 from Ilia Mirkin <imirkin at alum.mit.edu> --- You appear to have provided almost none of the information requested at http://nouveau.freedesktop.org/wiki/Bugs/ What hardware do you have Full kernel log VBIOS might make sense here too The various temperature thresholds just take care of setting the fan (in this case it's saying that your card is at 0C so it can shut the fan down). I think the real issue are the INVALID_CMD's that you see... -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/nouveau/attachments/20130826/d2d58b47/attachment-0001.html>
bugzilla-daemon at freedesktop.org
2013-Aug-26 22:16 UTC
[Nouveau] [Bug 68572] shutdown threshold temperature sometimes isn't restored properly after hibernate
https://bugs.freedesktop.org/show_bug.cgi?id=68572 --- Comment #2 from Mr-4 <mr.dash.four at googlemail.com> --- (In reply to comment #1)> What hardware do you haveNVidia 7800GS> Full kernel logI can't provide you with "full kernel log" as I was coming out of restore after hibernate, but can provide you with this: Aug 26 13:00:05 test1 kernel: PM: Syncing filesystems ... done. Aug 26 13:00:05 test1 kernel: Freezing user space processes ... (elapsed 0.01 seconds) done. Aug 26 13:00:05 test1 kernel: PM: Preallocating image memory... done (allocated 194506 pages) Aug 26 13:00:05 test1 kernel: PM: Allocated 778024 kbytes in 0.10 seconds (7780.24 MB/s) Aug 26 13:00:05 test1 kernel: Freezing remaining freezable tasks ... (elapsed 5.31 seconds) done. Aug 26 13:00:05 test1 kernel: Suspending console(s) (use no_console_suspend to debug) Aug 26 13:00:05 test1 kernel: i8042 kbd 00:09: System wakeup enabled by ACPI Aug 26 13:00:05 test1 kernel: mpu401 00:04: disabled Aug 26 13:00:05 test1 kernel: nouveau [ DRM] suspending fbcon... Aug 26 13:00:05 test1 kernel: nouveau [ DRM] suspending display... Aug 26 13:00:05 test1 kernel: nouveau [ DRM] unpinning framebuffer(s)... Aug 26 13:00:05 test1 kernel: nouveau [ DRM] evicting buffers... Aug 26 13:00:05 test1 kernel: pci 0000:00:13.1: System wakeup enabled by ACPI Aug 26 13:00:05 test1 kernel: nouveau [ DRM] suspending client object trees... Aug 26 13:00:05 test1 kernel: PM: freeze of devices complete after 313.451 msecs Aug 26 13:00:05 test1 kernel: PM: late freeze of devices complete after 0.402 msecs Aug 26 13:00:05 test1 kernel: PM: noirq freeze of devices complete after 0.542 msecs Aug 26 13:00:05 test1 kernel: ACPI: Preparing to enter system sleep state S4 Aug 26 13:00:05 test1 kernel: PM: Saving platform NVS memory Aug 26 13:00:05 test1 kernel: Disabling non-boot CPUs ... Aug 26 13:00:05 test1 kernel: smpboot: CPU 1 is now offline Aug 26 13:00:05 test1 kernel: PM: Creating hibernation image: Aug 26 13:00:05 test1 kernel: PM: Need to copy 194329 pages Aug 26 13:00:05 test1 kernel: PM: Restoring platform NVS memory Aug 26 13:00:05 test1 kernel: Enabling non-boot CPUs ... Aug 26 13:00:05 test1 kernel: smpboot: Booting Node 0 Processor 1 APIC 0x1 Aug 26 13:00:05 test1 kernel: CPU1 is up Aug 26 13:00:05 test1 kernel: ACPI: Waking up from system sleep state S4 Aug 26 13:00:05 test1 kernel: PM: noirq restore of devices complete after 33.248 msecs Aug 26 13:00:05 test1 kernel: PM: early restore of devices complete after 0.127 msecs Aug 26 13:00:05 test1 kernel: usb usb2: root hub lost power or was reset Aug 26 13:00:05 test1 kernel: usb usb3: root hub lost power or was reset Aug 26 13:00:05 test1 kernel: usb usb4: root hub lost power or was reset Aug 26 13:00:05 test1 kernel: usb usb5: root hub lost power or was reset Aug 26 13:00:05 test1 kernel: usb usb1: root hub lost power or was reset Aug 26 13:00:05 test1 kernel: nouveau [ DRM] re-enabling device... Aug 26 13:00:05 test1 kernel: nouveau [ DRM] resuming client object trees... Aug 26 13:00:05 test1 kernel: nouveau [ VBIOS][0000:01:00.0] running init tables Aug 26 13:00:05 test1 kernel: pci 0000:00:13.1: System wakeup disabled by ACPI Aug 26 13:00:05 test1 kernel: mpu401 00:04: activated Aug 26 13:00:05 test1 kernel: i8042 kbd 00:09: System wakeup disabled by ACPI Aug 26 13:00:05 test1 kernel: nouveau [ PTHERM][0000:01:00.0] fan management: automatic Aug 26 13:00:05 test1 kernel: nouveau [ PTHERM][0000:01:00.0] programmed thresholds [ 90(3), 95(3), 115(2), 135(5) ] Aug 26 13:00:05 test1 kernel: agpgart-via 0000:00:00.0: AGP 3.5 bridge Aug 26 13:00:05 test1 kernel: agpgart: kworker/u:0 tried to set rate=x12. Setting to AGP3 x8 mode. Aug 26 13:00:05 test1 kernel: agpgart-via 0000:00:00.0: putting AGP V3 device into 8x mode Aug 26 12:00:05 test1 rtkit-daemon[2090]: The canary thread is apparently starving. Taking action. Aug 26 12:00:05 test1 rtkit-daemon[2090]: Demoting known real-time threads. Aug 26 13:00:05 test1 kernel: nouveau 0000:01:00.0: putting AGP V3 device into 8x mode Aug 26 13:00:05 test1 kernel: nouveau [ DRM] resuming display... Aug 26 13:00:05 test1 kernel: nouveau [ DRM] 0xD3FB: Parsing digital output script table Aug 26 13:00:05 test1 kernel: nouveau [ DRM] Setting dpms mode 3 on TV encoder (output 3) Aug 26 13:00:05 test1 kernel: nouveau [ DRM] 0xD3FB: Parsing digital output script table Aug 26 13:00:05 test1 kernel: ata4.00: ACPI cmd ef/03:42:00:00:00:a0 (SET FEATURES) filtered out Aug 26 13:00:05 test1 kernel: ata3.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out Aug 26 13:00:05 test1 kernel: ata3.00: ACPI cmd ef/03:01:00:00:00:a0 (SET FEATURES) filtered out Aug 26 13:00:05 test1 kernel: ata4.00: configured for UDMA/33 Aug 26 13:00:05 test1 kernel: ata3.00: configured for UDMA/100 Aug 26 13:00:05 test1 kernel: sd 2:0:0:0: [sda] Starting disk Aug 26 13:00:05 test1 kernel: usb 1-2: reset high-speed USB device number 8 using ehci-pci Aug 26 13:00:05 test1 kernel: usb 1-2.3: reset low-speed USB device number 9 using ehci-pci Aug 26 13:00:05 test1 kernel: PM: restore of devices complete after 1157.251 msecs Aug 26 12:00:05 test1 rtkit-daemon[2090]: Successfully demoted thread 2287 of process 2267 (/usr/bin/pulseaudio). Aug 26 12:00:05 test1 rtkit-daemon[2090]: Successfully demoted thread 2285 of process 2267 (/usr/bin/pulseaudio). Aug 26 12:00:05 test1 rtkit-daemon[2090]: Successfully demoted thread 2267 of process 2267 (/usr/bin/pulseaudio). Aug 26 12:00:05 test1 rtkit-daemon[2090]: Demoted 3 threads. Aug 26 13:00:05 test1 kernel: Restarting tasks ... done. After which comes the log I included in the initial report.> VBIOS might make sense here tooHere goes the start up log: Aug 26 13:06:08 test1 kernel: [drm] Initialized drm 1.1.0 20060810 Aug 26 13:06:08 test1 kernel: nouveau [ DEVICE][0000:01:00.0] BOOT0 : 0x049200a2 Aug 26 13:06:08 test1 kernel: nouveau [ DEVICE][0000:01:00.0] Chipset: G71 (NV49) Aug 26 13:06:08 test1 kernel: nouveau [ DEVICE][0000:01:00.0] Family : NV40 Aug 26 13:06:08 test1 kernel: nouveau [ VBIOS][0000:01:00.0] checking PRAMIN for image... Aug 26 13:06:08 test1 kernel: nouveau [ VBIOS][0000:01:00.0] ... checksum invalid Aug 26 13:06:08 test1 kernel: nouveau [ VBIOS][0000:01:00.0] checking PROM for image... Aug 26 13:06:08 test1 kernel: nouveau [ VBIOS][0000:01:00.0] ... appears to be valid Aug 26 13:06:08 test1 kernel: nouveau [ VBIOS][0000:01:00.0] using image from PROM Aug 26 13:06:08 test1 kernel: nouveau [ VBIOS][0000:01:00.0] BIT signature found Aug 26 13:06:08 test1 kernel: nouveau [ VBIOS][0000:01:00.0] version 05.71.22.21.0a Aug 26 13:06:08 test1 kernel: nouveau [ PFB][0000:01:00.0] RAM type: GDDR3 Aug 26 13:06:08 test1 kernel: nouveau [ PFB][0000:01:00.0] RAM size: 256 MiB Aug 26 13:06:08 test1 kernel: nouveau [ PFB][0000:01:00.0] ZCOMP: 294912 tags Aug 26 13:06:08 test1 kernel: nouveau [ PTHERM][0000:01:00.0] FAN control: PWM Aug 26 13:06:08 test1 kernel: nouveau [ PTHERM][0000:01:00.0] fan management: disabled Aug 26 13:06:08 test1 kernel: nouveau [ PTHERM][0000:01:00.0] internal sensor: yes Aug 26 13:06:08 test1 kernel: nouveau [ PTHERM][0000:01:00.0] programmed thresholds [ 90(3), 95(3), 115(2), 135(5) ] Aug 26 13:06:08 test1 kernel: agpgart-via 0000:00:00.0: AGP 3.5 bridge Aug 26 13:06:08 test1 kernel: agpgart: modprobe tried to set rate=x12. Setting to AGP3 x8 mode. Aug 26 13:06:08 test1 kernel: agpgart-via 0000:00:00.0: putting AGP V3 device into 8x mode Aug 26 13:06:08 test1 kernel: nouveau 0000:01:00.0: putting AGP V3 device into 8x mode Aug 26 13:06:08 test1 kernel: [TTM] Zone kernel: Available graphics memory: 1026348 kiB Aug 26 13:06:08 test1 kernel: [TTM] Initializing pool allocator Aug 26 13:06:08 test1 kernel: [TTM] Initializing DMA pool allocator Aug 26 13:06:08 test1 kernel: nouveau [ DRM] VRAM: 251 MiB Aug 26 13:06:08 test1 kernel: nouveau [ DRM] GART: 256 MiB Aug 26 13:06:08 test1 kernel: nouveau [ DRM] TMDS table version 1.1 Aug 26 13:06:08 test1 kernel: nouveau W[ DRM] TMDS table script pointers not stubbed Aug 26 13:06:08 test1 kernel: nouveau [ DRM] DCB version 3.0 Aug 26 13:06:08 test1 kernel: nouveau [ DRM] DCB outp 00: 04011310 00000028 Aug 26 13:06:08 test1 kernel: nouveau [ DRM] DCB outp 01: 0c011312 00000000 Aug 26 13:06:08 test1 kernel: nouveau [ DRM] DCB outp 02: 01000300 00000028 Aug 26 13:06:08 test1 kernel: nouveau [ DRM] DCB outp 03: 020223f1 00c0c083 Aug 26 13:06:08 test1 kernel: nouveau [ DRM] DCB conn 00: 0000 Aug 26 13:06:08 test1 kernel: nouveau [ DRM] DCB conn 01: 2130 Aug 26 13:06:08 test1 kernel: nouveau [ DRM] DCB conn 02: 0210 Aug 26 13:06:08 test1 kernel: nouveau [ DRM] DCB conn 03: 0211 Aug 26 13:06:08 test1 kernel: nouveau [ DRM] DCB conn 04: 0213 Aug 26 13:06:08 test1 kernel: nouveau [ DRM] Saving VGA fonts Aug 26 13:06:08 test1 kernel: [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). Aug 26 13:06:08 test1 kernel: [drm] No driver support for vblank timestamp query. Aug 26 13:06:08 test1 kernel: nouveau [ DRM] 0xD3FB: Parsing digital output script table Aug 26 13:06:08 test1 kernel: nouveau [ DRM] 4 available performance level(s) Aug 26 13:06:08 test1 kernel: nouveau [ DRM] 0: core 275MHz shader 275MHz memory 600MHz voltage 1050mV fanspeed 40% Aug 26 13:06:08 test1 kernel: nouveau [ DRM] 1: core 400MHz shader 400MHz memory 625MHz voltage 1100mV fanspeed 70% Aug 26 13:06:08 test1 kernel: nouveau [ DRM] 2: core 440MHz shader 440MHz memory 650MHz voltage 1100mV fanspeed 79% Aug 26 13:06:08 test1 kernel: nouveau [ DRM] 3: core 487MHz shader 487MHz memory 695MHz voltage 1200mV fanspeed 100% Aug 26 13:06:08 test1 kernel: nouveau [ DRM] c: core 275MHz shader 275MHz memory 600MHz voltage 1050mV fanspeed 100% Aug 26 13:06:08 test1 kernel: nouveau [ DRM] MM: using M2MF for buffer copies Aug 26 13:06:08 test1 kernel: nouveau [ DRM] Setting dpms mode 3 on TV encoder (output 3) Aug 26 13:06:08 test1 kernel: nouveau [ DRM] allocated 1600x1200 fb: 0x9000, bo ffff88003701f800 Aug 26 13:06:08 test1 kernel: fbcon: nouveaufb (fb0) is primary device Aug 26 13:06:08 test1 kernel: nouveau [ DRM] 0xD3FB: Parsing digital output script table Aug 26 13:06:08 test1 kernel: Console: switching to colour frame buffer device 200x75 Aug 26 13:06:08 test1 kernel: nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device Aug 26 13:06:08 test1 kernel: nouveau 0000:01:00.0: registered panic notifier Aug 26 13:06:08 test1 kernel: [drm] Initialized nouveau 1.1.0 20120801 for 0000:01:00.0 on minor 0> The various temperature thresholds just take care of setting the fan (in > this case it's saying that your card is at 0C so it can shut the fan down).Well, that was really what caused this. The card temperature was NOT 0C, it was something like 30C+, so I suspect something went awry during restore.> I think the real issue are the INVALID_CMD's that you see...I am no expert in this, hence submitting this bug report. If there is anything else you'd like to know just ask. -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/nouveau/attachments/20130826/0c7b3fee/attachment-0001.html>
bugzilla-daemon at freedesktop.org
2013-Aug-27 00:12 UTC
[Nouveau] [Bug 68572] shutdown threshold temperature sometimes isn't restored properly after hibernate
https://bugs.freedesktop.org/show_bug.cgi?id=68572 --- Comment #3 from Martin Peres <martin.peres at ensi-bourges.fr> --- (In reply to comment #2)> (In reply to comment #1) > > The various temperature thresholds just take care of setting the fan (in > > this case it's saying that your card is at 0C so it can shut the fan down). > Well, that was really what caused this. The card temperature was NOT 0C, it > was something like 30C+, so I suspect something went awry during restore.Exactly, there is a real problem here. The card must not be fully posted! I'll write a patch to check a little more the temperature before rebooting the computer though.> > > I think the real issue are the INVALID_CMD's that you see... > I am no expert in this, hence submitting this bug report. If there is > anything else you'd like to know just ask.Yeah. Are you aware that Hibernate is not considered as being very stable? You may want to avoid using it, some people lost their data because of it. -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/nouveau/attachments/20130827/0d87abf7/attachment.html>
bugzilla-daemon at freedesktop.org
2013-Aug-27 22:58 UTC
[Nouveau] [Bug 68572] shutdown threshold temperature sometimes isn't restored properly after hibernate
https://bugs.freedesktop.org/show_bug.cgi?id=68572 --- Comment #4 from Mr-4 <mr.dash.four at googlemail.com> --- (In reply to comment #3)> I'll write a patch to check a little more the temperature before rebooting > the computer though.OK, let me know and I'll give it a go.> Yeah. Are you aware that Hibernate is not considered as being very stable?I am using hibernate since kernel 2.6. It was disastrous in all versions up to 3.1, had a few problems in various 3.x kernel versions, but since about 3.7 it has been rock solid!> You may want to avoid using it, some people lost their data because of it.No chance! I am doing a couple of hibernate/restore cycles a day, every day, and, as I pointed out above, have absolutely no issues with it, particularly in recent kernels. -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/nouveau/attachments/20130827/b42a0710/attachment.html>
bugzilla-daemon at freedesktop.org
2019-Dec-04 08:35 UTC
[Nouveau] [Bug 68572] shutdown threshold temperature sometimes isn't restored properly after hibernate
https://bugs.freedesktop.org/show_bug.cgi?id=68572 Martin Peres <martin.peres at free.fr> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |MOVED Status|NEW |RESOLVED --- Comment #5 from Martin Peres <martin.peres at free.fr> --- -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/53. -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20191204/42664269/attachment.html>