Nicolas Mercier
2017-Sep-11 19:52 UTC
[Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m
Hi Tobias, On Mon, Sep 11, 2017 at 8:49 PM, Tobias Klausmann < tobias.johannes.klausmann at mni.thm.de> wrote:> Hi, > > i remember seeing the same error with earlier firmware version with a > similar system (GP106) once in a while on boot, yet it does not happen > with newer versions. Maybe you could try to update the firmware to the > latest version from kernel-firmware. >Thanks for the suggestion Tobias. I updated the firmware (pulled it down straight from Git) but unfortunately i got the exact same result.> As a small addition: I remember deeply: you should ignore the > _OSI(Linux) query, as it may break the system in some ways, if you don't > have a specific bug fixed with adding _OSI(Linux), removing it from the > cmdline is a thing to test! >Actually I tried both with and without, it just happens that I forgot to remove it after many tests. I read also that it is better to not add _OSI(Linux) so I completely agree, I will take it out again. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20170911/4066e970/attachment.html>
Nicolas Mercier
2017-Sep-13 09:37 UTC
[Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m
I am still looking for a solution. I have hacked around in the code and found out the following: - Nouveau prefers using PCIE power managemet over ACPI Optimus calls. I tried to force it to use Optimus ACPI calls, but there was an error calling the ACPI method so it bails out and uses PCIE PM anyway. - I tried to debug the PCIE pm states which internally uses ACPI to turn power on/off. I could print different statuses here and there. When the power is switched off, ACPI calls turn the power off then the kernel successfully puts the device in state D3Cold (also turning off power to the PCI Express port). When waking up, ACPI turns the power on, apparently successfully (Device [PEGP] transitioned to D0). But a read from the PCI bus to get the power state & other flags return 65535 (~0) and the kernel fails to set the device in D0 (although ACPI claims it is in D0) The call to pci_raw_set_power_state (in drivers/pci/pci.c) seems to fail because pci_read_config_word returns "~0" (and does not return any error code) I have tried different things; if I use pcie_port_pm=off, the NVidia card goes to state D3Hot (if I am not mistaken, its PCIE port is still powered) but that did not fix it. I tried to turn on or off different PCI/PCIexpress features such as hotplug, PM and so on. The only thing that works is that PM is fully disabled, which equals to the device not being powered off, so that would be equivalent to nouveau.runpm=0, which is not helping a lot. I have tried to force pcie aspm by recompiling the ACPI table, still no luck. I am still taking a look, but it seems like the problem comes from the PCIExpress PM functions and ACPI, not directly from Nouveau /n -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20170913/7f7e1b3c/attachment.html>
Tobias Klausmann
2017-Sep-13 12:28 UTC
[Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m
Hi, the system fails to initialize your vbios using secureboot (i had a rare chance to on my system to witness it again), for now i traced it to acr_boot_falcon() in "linux/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0148cdec.c" where it throws -110 which is -ETIMEDOUT. You could try to increase the timeout and see if it helps something, similar to the following: diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c index 77273b53672c..fc0cb187d80d 100644 --- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c +++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c @@ -326,7 +326,7 @@ nvkm_msgqueue_post(struct nvkm_msgqueue *priv, enum msgqueue_msg_priority prio, int ret; if (wait_init && !wait_for_completion_timeout(&priv->init_done, - msecs_to_jiffies(1000))) + msecs_to_jiffies(5000))) return -ETIMEDOUT; queue = priv->func->cmd_queue(priv, prio); diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c index fec0273158f6..c2ae525a0780 100644 --- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c +++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c @@ -279,6 +279,7 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum nvkm_secboot_falcon falcon) u32 flags; u32 falcon_id; } cmd; + const struct nvkm_subdev *subdev = priv->falcon->owner; memset(&cmd, 0, sizeof(cmd)); @@ -290,7 +291,8 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum nvkm_secboot_falcon falcon) nvkm_msgqueue_post(priv, MSGQUEUE_MSG_PRIORITY_HIGH, &cmd.hdr, acr_boot_falcon_callback, &completed, true); - if (!wait_for_completion_timeout(&completed, msecs_to_jiffies(1000))) + nvkm_error(subdev, "waiting for timeout in acr_boot_falcon (msgqueue_0137bca5)\n"); + if (!wait_for_completion_timeout(&completed, msecs_to_jiffies(5000))) return -ETIMEDOUT; return 0; On 9/13/17 11:37 AM, Nicolas Mercier wrote:> I am still looking for a solution. I have hacked around in the code > and found out the following: > - Nouveau prefers using PCIE power managemet over ACPI Optimus calls. > I tried to force it to use Optimus ACPI calls, but there was an error > calling the ACPI method so it bails out and uses PCIE PM anyway. > - I tried to debug the PCIE pm states which internally uses ACPI to > turn power on/off. I could print different statuses here and there. > When the power is switched off, ACPI calls turn the power off then the > kernel successfully puts the device in state D3Cold (also turning off > power to the PCI Express port). When waking up, ACPI turns the power > on, apparently successfully (Device [PEGP] transitioned to D0). But a > read from the PCI bus to get the power state & other flags return > 65535 (~0) and the kernel fails to set the device in D0 (although ACPI > claims it is in D0) > The call to pci_raw_set_power_state (in drivers/pci/pci.c) seems to > fail because pci_read_config_word returns "~0" (and does not return > any error code) > > I have tried different things; if I use pcie_port_pm=off, the NVidia > card goes to state D3Hot (if I am not mistaken, its PCIE port is still > powered) but that did not fix it. I tried to turn on or off different > PCI/PCIexpress features such as hotplug, PM and so on. The only thing > that works is that PM is fully disabled, which equals to the device > not being powered off, so that would be equivalent to nouveau.runpm=0, > which is not helping a lot. I have tried to force pcie aspm by > recompiling the ACPI table, still no luck. > > I am still taking a look, but it seems like the problem comes from the > PCIExpress PM functions and ACPI, not directly from Nouveau > > /n
Possibly Parallel Threads
- Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m
- Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m
- Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m
- Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m
- [PATCH v3 0/4] nouveau RPM fixes for Optimus (final)