bugzilla-daemon at freedesktop.org
2018-Nov-27 04:03 UTC
[Nouveau] [Bug 108873] New: nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
https://bugs.freedesktop.org/show_bug.cgi?id=108873
Bug ID: 108873
Summary: nouveau/Quadro P2000 Mobile: runpm causing ACPI
errors, lockups
Product: xorg
Version: git
Hardware: Other
OS: All
Status: NEW
Severity: major
Priority: medium
Component: Driver/nouveau
Assignee: nouveau at lists.freedesktop.org
Reporter: mst at redhat.com
QA Contact: xorg-team at lists.x.org
Created attachment 142620
--> https://bugs.freedesktop.org/attachment.cgi?id=142620&action=edit
dmesg showing the errors and the lockup. using noaccel=1
So a new thinkpad:
01:00.0 VGA compatible controller: NVIDIA Corporation GP107GLM [Quadro P2000
Mobile] (rev a1)
Hangs whenever I try to poke at the card. It starts happily enough with
[ 3.971515] ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch -
Found [Buffer], ACPI requires [Package]
+(20181003/nsarguments-66)
[ 3.971553] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type
mismatch - Found [Buffer], ACPI requires [Package]
+(20181003/nsarguments-66)
[ 3.971721] pci 0000:01:00.0: optimus capabilities: enabled, status dynamic
power, hda bios codec supported
[ 3.971726] VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP
handle
[ 3.971727] nouveau: detected PR support, will not use DSM
[ 3.971745] nouveau 0000:01:00.0: enabling device (0006 -> 0007)
[ 3.971923] nouveau 0000:01:00.0: NVIDIA GP107 (137000a1)
[ 4.009875] PM: Image not found (code -22)
[ 4.135752] nouveau 0000:01:00.0: DRM: VRAM: 4096 MiB
[ 4.135753] nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
[ 4.135754] nouveau 0000:01:00.0: DRM: BIT table 'A' not found
[ 4.135755] nouveau 0000:01:00.0: DRM: BIT table 'L' not found
[ 4.135756] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[ 4.135756] nouveau 0000:01:00.0: DRM: DCB version 4.1
[ 4.135757] nouveau 0000:01:00.0: DRM: DCB outp 00: 02800f76 04600020
[ 4.135758] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f62 00020010
[ 4.135759] nouveau 0000:01:00.0: DRM: DCB outp 02: 01022f46 04600010
[ 4.135760] nouveau 0000:01:00.0: DRM: DCB outp 03: 01033f56 04600020
[ 4.135761] nouveau 0000:01:00.0: DRM: DCB conn 00: 00020047
[ 4.135761] nouveau 0000:01:00.0: DRM: DCB conn 01: 00010161
[ 4.135762] nouveau 0000:01:00.0: DRM: DCB conn 02: 00001246
[ 4.135763] nouveau 0000:01:00.0: DRM: DCB conn 03: 00002346
[ 4.508355] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 4.508355] [drm] Driver supports precise vblank timestamp query.
[ 4.509812] [drm] Cannot find any crtc or sizes
[ 4.510144] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on
minor 2
Although that type mismatch is a bit worrying. And I'm not sure what
prints PM: Image not found.
But after a short while it gets pretty busy:
[ 52.917009] No Local Variables are initialized for Method [NVPO]
[ 52.917011] No Arguments are initialized for method [NVPO]
[ 52.917012] ACPI Error: Method parse/execution failed
\_SB.PCI0.PEG0.PEGP.NVPO, AE_AML_LOOP_TIMEOUT (20181003/psparse-516)
[ 52.917063] ACPI Error: Method parse/execution failed \_SB.PCI0.PGON,
AE_AML_LOOP_TIMEOUT (20181003/psparse-516)
[ 52.917084] ACPI Error: Method parse/execution failed
\_SB.PCI0.PEG0.PG00._ON, AE_AML_LOOP_TIMEOUT (20181003/psparse-516)
[ 52.917108] acpi device:00: Failed to change power state to D0
[ 52.969287] video LNXVIDEO:00: Cannot transition to power state D0 for
parent in (unknown)
[ 52.969289] pci_raw_set_power_state: 2 callbacks suppressed
[ 52.969291] nouveau 0000:01:00.0: Refused to change power state, currently
in D3
[ 53.029514] video LNXVIDEO:00: Cannot transition to power state D0 for
parent in (unknown)
[ 53.041027] nouveau 0000:01:00.0: Refused to change power state, currently
in D3
[ 53.041035] video LNXVIDEO:00: Cannot transition to power state D0 for
parent in (unknown)
[ 53.053008] nouveau 0000:01:00.0: Refused to change power state, currently
in D3
And then kernel proceeds to throw up errors at random places, e.g.
[ 67.021892] cfg80211: failed to load regulatory.db
[ 67.021895] cfg80211: failed to load regulatory.db
[ 67.021897] cfg80211: failed to load regulatory.db
[ 67.021900] cfg80211: failed to load regulatory.db
[ 67.021927] cfg80211: failed to load regulatory.db
[ 67.021928] cfg80211: failed to load regulatory.db
[ 67.021932] cfg80211: failed to load regulatory.db
[ 67.021934] cfg80211: failed to load regulatory.db
[ 67.024463] cfg80211: failed to load regulatory.db
[ 99.980625] iwlwifi 0000:00:14.3: Error sending STATISTICS_CMD: time out
after 2000ms.
followed by soft lockups and sometimes hard lockups in places
like attempts to walk skb lists.
Adding runpm=0 does away with this issue.
The specific test was with noaccel=1 - it does not seem to change
things for me.
I poked at the ACPI method NVPO and yes it does actually
seem to execute a while loop waiting for some register
to become 0. Which I guess never happens? Because card
is in a low power state and so reads return ffffffff maybe?
X isn't happy even with runpm=0 but that might be a different
issue - I thought runpm=0 might be an easier place to start debugging
things given there are logs of the failure.
Using kernel 4.20.0-rc3 right now.
Userspace bits are from fedora 29:
xorg-x11-drv-nouveau-1.0.15-6.fc29.x86_64
firmware is pretty recent:
linux-firmware-20181008-88.gitc6b6265d.fc29.noarch
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<https://lists.freedesktop.org/archives/nouveau/attachments/20181127/4a6afd2e/attachment.html>
bugzilla-daemon at freedesktop.org
2018-Nov-28 14:20 UTC
[Nouveau] [Bug 108873] nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
https://bugs.freedesktop.org/show_bug.cgi?id=108873 --- Comment #1 from mst at redhat.com --- Created attachment 142646 --> https://bugs.freedesktop.org/attachment.cgi?id=142646&action=edit acpi dump from the machine -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20181128/f7703ebf/attachment.html>
bugzilla-daemon at freedesktop.org
2018-Nov-28 15:04 UTC
[Nouveau] [Bug 108873] nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
https://bugs.freedesktop.org/show_bug.cgi?id=108873 --- Comment #2 from mst at redhat.com --- Created attachment 142647 --> https://bugs.freedesktop.org/attachment.cgi?id=142647&action=edit output of lspci -x -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20181128/35480e6f/attachment.html>
bugzilla-daemon at freedesktop.org
2019-Nov-18 01:09 UTC
[Nouveau] [Bug 108873] nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
https://bugs.freedesktop.org/show_bug.cgi?id=108873 --- Comment #3 from mst at redhat.com --- As an update, with kernel kernel-5.3.11-300.fc31.x86_64 and xorg-x11-drv-nouveau-1.0.15-8.fc31.x86_64 just adding nouveau.runpm=0 nouveau.noaccel=1 seems to be enough to make the system at least boot and be able to show the outputs connected to the Nvidia card. I did not test that it actually works though - intend to try once I'm near an external monitor. If there is interest in figuring out why is noaccel required, let me know pls. -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20191118/9274f5ac/attachment.html>
bugzilla-daemon at freedesktop.org
2019-Nov-18 01:14 UTC
[Nouveau] [Bug 108873] nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
https://bugs.freedesktop.org/show_bug.cgi?id=108873 --- Comment #4 from mst at redhat.com --- just to clarify, without noaccel=1, I see lots of timeout warnings, all around this line: drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c:224 gf100_vmm_invalidate+0x1c9/0x1e0 [nouveau] and then things like xrandr seem to hang. -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20191118/00d76c4c/attachment.html>
bugzilla-daemon at freedesktop.org
2019-Dec-04 09:47 UTC
[Nouveau] [Bug 108873] nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
https://bugs.freedesktop.org/show_bug.cgi?id=108873
Martin Peres <martin.peres at free.fr> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |MOVED
Status|NEW |RESOLVED
--- Comment #5 from Martin Peres <martin.peres at free.fr> ---
-- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.
You can subscribe and participate further through the new bug through this link
to our GitLab instance:
https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/468.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<https://lists.freedesktop.org/archives/nouveau/attachments/20191204/aa11e4e9/attachment-0001.html>
Possibly Parallel Threads
- 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
- 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
- 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
- 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
- 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups