Tobias Klausmann
2017-Sep-13 12:28 UTC
[Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m
Hi, the system fails to initialize your vbios using secureboot (i had a rare chance to on my system to witness it again), for now i traced it to acr_boot_falcon() in "linux/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0148cdec.c" where it throws -110 which is -ETIMEDOUT. You could try to increase the timeout and see if it helps something, similar to the following: diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c index 77273b53672c..fc0cb187d80d 100644 --- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c +++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c @@ -326,7 +326,7 @@ nvkm_msgqueue_post(struct nvkm_msgqueue *priv, enum msgqueue_msg_priority prio, int ret; if (wait_init && !wait_for_completion_timeout(&priv->init_done, - msecs_to_jiffies(1000))) + msecs_to_jiffies(5000))) return -ETIMEDOUT; queue = priv->func->cmd_queue(priv, prio); diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c index fec0273158f6..c2ae525a0780 100644 --- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c +++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c @@ -279,6 +279,7 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum nvkm_secboot_falcon falcon) u32 flags; u32 falcon_id; } cmd; + const struct nvkm_subdev *subdev = priv->falcon->owner; memset(&cmd, 0, sizeof(cmd)); @@ -290,7 +291,8 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum nvkm_secboot_falcon falcon) nvkm_msgqueue_post(priv, MSGQUEUE_MSG_PRIORITY_HIGH, &cmd.hdr, acr_boot_falcon_callback, &completed, true); - if (!wait_for_completion_timeout(&completed, msecs_to_jiffies(1000))) + nvkm_error(subdev, "waiting for timeout in acr_boot_falcon (msgqueue_0137bca5)\n"); + if (!wait_for_completion_timeout(&completed, msecs_to_jiffies(5000))) return -ETIMEDOUT; return 0; On 9/13/17 11:37 AM, Nicolas Mercier wrote:> I am still looking for a solution. I have hacked around in the code > and found out the following: > - Nouveau prefers using PCIE power managemet over ACPI Optimus calls. > I tried to force it to use Optimus ACPI calls, but there was an error > calling the ACPI method so it bails out and uses PCIE PM anyway. > - I tried to debug the PCIE pm states which internally uses ACPI to > turn power on/off. I could print different statuses here and there. > When the power is switched off, ACPI calls turn the power off then the > kernel successfully puts the device in state D3Cold (also turning off > power to the PCI Express port). When waking up, ACPI turns the power > on, apparently successfully (Device [PEGP] transitioned to D0). But a > read from the PCI bus to get the power state & other flags return > 65535 (~0) and the kernel fails to set the device in D0 (although ACPI > claims it is in D0) > The call to pci_raw_set_power_state (in drivers/pci/pci.c) seems to > fail because pci_read_config_word returns "~0" (and does not return > any error code) > > I have tried different things; if I use pcie_port_pm=off, the NVidia > card goes to state D3Hot (if I am not mistaken, its PCIE port is still > powered) but that did not fix it. I tried to turn on or off different > PCI/PCIexpress features such as hotplug, PM and so on. The only thing > that works is that PM is fully disabled, which equals to the device > not being powered off, so that would be equivalent to nouveau.runpm=0, > which is not helping a lot. I have tried to force pcie aspm by > recompiling the ACPI table, still no luck. > > I am still taking a look, but it seems like the problem comes from the > PCIExpress PM functions and ACPI, not directly from Nouveau > > /n
Nicolas Mercier
2017-Sep-13 22:13 UTC
[Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m
Thanks Tobias, I tried this but unfortunately the only effect was thta the boot was delayed by an additional 4 seconds :( The original timeout is at drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c I tried to increase that timeout, but it did not seem to make a difference either. I think I get this error less often when I have a cable plugged in the output of that card at boot, whereas I always get this error when I boot without a monitor attached to the card. On Wed, Sep 13, 2017 at 2:28 PM, Tobias Klausmann < tobias.johannes.klausmann at mni.thm.de> wrote:> Hi, > > the system fails to initialize your vbios using secureboot (i had a rare > chance to on my system to witness it again), for now i traced it to > acr_boot_falcon() in > "linux/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0148cdec.c" where it > throws -110 which is -ETIMEDOUT. You could try to increase the timeout > and see if it helps something, similar to the following: > > > diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c > b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c > index 77273b53672c..fc0cb187d80d 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c > +++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c > @@ -326,7 +326,7 @@ nvkm_msgqueue_post(struct nvkm_msgqueue *priv, enum > msgqueue_msg_priority prio, > int ret; > > if (wait_init && !wait_for_completion_timeout(&priv->init_done, > - msecs_to_jiffies(1000))) > + msecs_to_jiffies(5000))) > return -ETIMEDOUT; > > queue = priv->func->cmd_queue(priv, prio); > > diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c > b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c > index fec0273158f6..c2ae525a0780 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c > +++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c > @@ -279,6 +279,7 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum > nvkm_secboot_falcon falcon) > u32 flags; > u32 falcon_id; > } cmd; > + const struct nvkm_subdev *subdev = priv->falcon->owner; > > memset(&cmd, 0, sizeof(cmd)); > > @@ -290,7 +291,8 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum > nvkm_secboot_falcon falcon) > nvkm_msgqueue_post(priv, MSGQUEUE_MSG_PRIORITY_HIGH, &cmd.hdr, > acr_boot_falcon_callback, &completed, true); > > - if (!wait_for_completion_timeout(&completed, > msecs_to_jiffies(1000))) > + nvkm_error(subdev, "waiting for timeout in acr_boot_falcon > (msgqueue_0137bca5)\n"); > + if (!wait_for_completion_timeout(&completed, > msecs_to_jiffies(5000))) > return -ETIMEDOUT; > > return 0; > > > > On 9/13/17 11:37 AM, Nicolas Mercier wrote: > > I am still looking for a solution. I have hacked around in the code > > and found out the following: > > - Nouveau prefers using PCIE power managemet over ACPI Optimus calls. > > I tried to force it to use Optimus ACPI calls, but there was an error > > calling the ACPI method so it bails out and uses PCIE PM anyway. > > - I tried to debug the PCIE pm states which internally uses ACPI to > > turn power on/off. I could print different statuses here and there. > > When the power is switched off, ACPI calls turn the power off then the > > kernel successfully puts the device in state D3Cold (also turning off > > power to the PCI Express port). When waking up, ACPI turns the power > > on, apparently successfully (Device [PEGP] transitioned to D0). But a > > read from the PCI bus to get the power state & other flags return > > 65535 (~0) and the kernel fails to set the device in D0 (although ACPI > > claims it is in D0) > > The call to pci_raw_set_power_state (in drivers/pci/pci.c) seems to > > fail because pci_read_config_word returns "~0" (and does not return > > any error code) > > > > I have tried different things; if I use pcie_port_pm=off, the NVidia > > card goes to state D3Hot (if I am not mistaken, its PCIE port is still > > powered) but that did not fix it. I tried to turn on or off different > > PCI/PCIexpress features such as hotplug, PM and so on. The only thing > > that works is that PM is fully disabled, which equals to the device > > not being powered off, so that would be equivalent to nouveau.runpm=0, > > which is not helping a lot. I have tried to force pcie aspm by > > recompiling the ACPI table, still no luck. > > > > I am still taking a look, but it seems like the problem comes from the > > PCIExpress PM functions and ACPI, not directly from Nouveau > > > > /n >-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20170914/b78a33f1/attachment-0001.html>
Peter Wu
2017-Sep-17 23:37 UTC
[Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m
Hi Nicholas, Increasing the timeout won't help as the dGPU is not powered/available. If you have an external monitor connected, the dGPU will stay powered on and not suspend (with a (temporary) effect similar to booting with nouveau.runpm=0). The reason for the failure to restore power is still unknown, it seems related to the PCI core. Kind regards, Peter https://lekensteyn.nl (pardon my brevity, top-posting and formatting, sent from my phone) On 13 September 2017 23:13:59 BST, Nicolas Mercier <mercier.nicolas at gmail.com> wrote:>Thanks Tobias, I tried this but unfortunately the only effect was thta >the >boot was delayed by an additional 4 seconds :( >The original timeout is at >drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c >I tried to increase that timeout, but it did not seem to make a >difference >either. > >I think I get this error less often when I have a cable plugged in the >output of that card at boot, whereas I always get this error when I >boot >without a monitor attached to the card. > >On Wed, Sep 13, 2017 at 2:28 PM, Tobias Klausmann < >tobias.johannes.klausmann at mni.thm.de> wrote: > >> Hi, >> >> the system fails to initialize your vbios using secureboot (i had a >rare >> chance to on my system to witness it again), for now i traced it to >> acr_boot_falcon() in >> "linux/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0148cdec.c" where >it >> throws -110 which is -ETIMEDOUT. You could try to increase the >timeout >> and see if it helps something, similar to the following: >> >> >> diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c >> b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c >> index 77273b53672c..fc0cb187d80d 100644 >> --- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c >> +++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c >> @@ -326,7 +326,7 @@ nvkm_msgqueue_post(struct nvkm_msgqueue *priv, >enum >> msgqueue_msg_priority prio, >> int ret; >> >> if (wait_init && >!wait_for_completion_timeout(&priv->init_done, >> - msecs_to_jiffies(1000))) >> + msecs_to_jiffies(5000))) >> return -ETIMEDOUT; >> >> queue = priv->func->cmd_queue(priv, prio); >> >> diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c >> b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c >> index fec0273158f6..c2ae525a0780 100644 >> --- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c >> +++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c >> @@ -279,6 +279,7 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum >> nvkm_secboot_falcon falcon) >> u32 flags; >> u32 falcon_id; >> } cmd; >> + const struct nvkm_subdev *subdev = priv->falcon->owner; >> >> memset(&cmd, 0, sizeof(cmd)); >> >> @@ -290,7 +291,8 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum >> nvkm_secboot_falcon falcon) >> nvkm_msgqueue_post(priv, MSGQUEUE_MSG_PRIORITY_HIGH, >&cmd.hdr, >> acr_boot_falcon_callback, &completed, true); >> >> - if (!wait_for_completion_timeout(&completed, >> msecs_to_jiffies(1000))) >> + nvkm_error(subdev, "waiting for timeout in acr_boot_falcon >> (msgqueue_0137bca5)\n"); >> + if (!wait_for_completion_timeout(&completed, >> msecs_to_jiffies(5000))) >> return -ETIMEDOUT; >> >> return 0; >> >> >> >> On 9/13/17 11:37 AM, Nicolas Mercier wrote: >> > I am still looking for a solution. I have hacked around in the code >> > and found out the following: >> > - Nouveau prefers using PCIE power managemet over ACPI Optimus >calls. >> > I tried to force it to use Optimus ACPI calls, but there was an >error >> > calling the ACPI method so it bails out and uses PCIE PM anyway. >> > - I tried to debug the PCIE pm states which internally uses ACPI to >> > turn power on/off. I could print different statuses here and there. >> > When the power is switched off, ACPI calls turn the power off then >the >> > kernel successfully puts the device in state D3Cold (also turning >off >> > power to the PCI Express port). When waking up, ACPI turns the >power >> > on, apparently successfully (Device [PEGP] transitioned to D0). But >a >> > read from the PCI bus to get the power state & other flags return >> > 65535 (~0) and the kernel fails to set the device in D0 (although >ACPI >> > claims it is in D0) >> > The call to pci_raw_set_power_state (in drivers/pci/pci.c) seems to >> > fail because pci_read_config_word returns "~0" (and does not return >> > any error code) >> > >> > I have tried different things; if I use pcie_port_pm=off, the >NVidia >> > card goes to state D3Hot (if I am not mistaken, its PCIE port is >still >> > powered) but that did not fix it. I tried to turn on or off >different >> > PCI/PCIexpress features such as hotplug, PM and so on. The only >thing >> > that works is that PM is fully disabled, which equals to the device >> > not being powered off, so that would be equivalent to >nouveau.runpm=0, >> > which is not helping a lot. I have tried to force pcie aspm by >> > recompiling the ACPI table, still no luck. >> > >> > I am still taking a look, but it seems like the problem comes from >the >> > PCIExpress PM functions and ACPI, not directly from Nouveau >> > >> > /n >>
Maybe Matching Threads
- Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m
- Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m
- [PATCH] drm/nouveau/falcon: fix a few indentation issues
- [PATCH] drm/nouveau/falcon: make unexported objects static
- [PATCH 00/15] Support for GP10B chipset