thr3ads.net - Nouveau - [Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Tobias Klausmann

2017-Sep-13 12:28 UTC

[Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m

Hi,

the system fails to initialize your vbios using secureboot (i had a rare
chance to on my system to witness it again), for now i traced it to
acr_boot_falcon() in
"linux/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0148cdec.c" where
it
throws -110 which is -ETIMEDOUT. You could try to increase the timeout
and see if it helps something, similar to the following:


diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c
b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c
index 77273b53672c..fc0cb187d80d 100644
--- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c
+++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c
@@ -326,7 +326,7 @@ nvkm_msgqueue_post(struct nvkm_msgqueue *priv, enum
msgqueue_msg_priority prio,
        int ret;
 
        if (wait_init &&
!wait_for_completion_timeout(&priv->init_done,
-                                        msecs_to_jiffies(1000)))
+                                        msecs_to_jiffies(5000)))
                return -ETIMEDOUT;
 
        queue = priv->func->cmd_queue(priv, prio);

diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c
b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c
index fec0273158f6..c2ae525a0780 100644
--- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c
+++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c
@@ -279,6 +279,7 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum
nvkm_secboot_falcon falcon)
                u32 flags;
                u32 falcon_id;
        } cmd;
+       const struct nvkm_subdev *subdev = priv->falcon->owner;
 
        memset(&cmd, 0, sizeof(cmd));
 
@@ -290,7 +291,8 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum
nvkm_secboot_falcon falcon)
        nvkm_msgqueue_post(priv, MSGQUEUE_MSG_PRIORITY_HIGH, &cmd.hdr,
                        acr_boot_falcon_callback, &completed, true);
 
-       if (!wait_for_completion_timeout(&completed,
msecs_to_jiffies(1000)))
+       nvkm_error(subdev, "waiting for timeout in acr_boot_falcon
(msgqueue_0137bca5)\n");
+       if (!wait_for_completion_timeout(&completed,
msecs_to_jiffies(5000)))
                return -ETIMEDOUT;
 
        return 0;



On 9/13/17 11:37 AM, Nicolas Mercier wrote:> I am still looking for a solution. I have hacked around in the code
> and found out the following:
> - Nouveau prefers using PCIE power managemet over ACPI Optimus calls.
> I tried to force it to use Optimus ACPI calls, but there was an error
> calling the ACPI method so it bails out and uses PCIE PM anyway.
> - I tried to debug the PCIE pm states which internally uses ACPI to
> turn power on/off. I could print different statuses here and there.
> When the power is switched off, ACPI calls turn the power off then the
> kernel successfully puts the device in state D3Cold (also turning off
> power to the PCI Express port). When waking up, ACPI turns the power
> on, apparently successfully (Device [PEGP] transitioned to D0). But a
> read from the PCI bus to get the power state & other flags return
> 65535 (~0) and the kernel fails to set the device in D0 (although ACPI
> claims it is in D0)
> The call to pci_raw_set_power_state (in drivers/pci/pci.c) seems to
> fail because pci_read_config_word returns "~0" (and does not
return
> any error code)
>
> I have tried different things; if I use pcie_port_pm=off, the NVidia
> card goes to state D3Hot (if I am not mistaken, its PCIE port is still
> powered) but that did not fix it. I tried to turn on or off different
> PCI/PCIexpress features such as hotplug, PM and so on. The only thing
> that works is that PM is fully disabled, which equals to the device
> not being powered off, so that would be equivalent to nouveau.runpm=0,
> which is not helping a lot. I have tried to force pcie aspm by
> recompiling the ACPI table, still no luck.
>
> I am still taking a look, but it seems like the problem comes from the
> PCIExpress PM functions and ACPI, not directly from Nouveau
>
> /n

Nicolas Mercier

2017-Sep-13 22:13 UTC

head link

[Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m

Thanks Tobias, I tried this but unfortunately the only effect was thta the
boot was delayed by an additional 4 seconds :(
The original timeout is at
drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c
I tried to increase that timeout, but it did not seem to make a difference
either.

I think I get this error less often when I have a cable plugged in the
output of that card at boot, whereas I always get this error when I boot
without a monitor attached to the card.

On Wed, Sep 13, 2017 at 2:28 PM, Tobias Klausmann <
tobias.johannes.klausmann at mni.thm.de> wrote:
> Hi,
>
> the system fails to initialize your vbios using secureboot (i had a rare
> chance to on my system to witness it again), for now i traced it to
> acr_boot_falcon() in
> "linux/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0148cdec.c"
where it
> throws -110 which is -ETIMEDOUT. You could try to increase the timeout
> and see if it helps something, similar to the following:
>
>
> diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c
> b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c
> index 77273b53672c..fc0cb187d80d 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c
> @@ -326,7 +326,7 @@ nvkm_msgqueue_post(struct nvkm_msgqueue *priv, enum
> msgqueue_msg_priority prio,
>         int ret;
>
>         if (wait_init &&
!wait_for_completion_timeout(&priv->init_done,
> -                                        msecs_to_jiffies(1000)))
> +                                        msecs_to_jiffies(5000)))
>                 return -ETIMEDOUT;
>
>         queue = priv->func->cmd_queue(priv, prio);
>
> diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c
> b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c
> index fec0273158f6..c2ae525a0780 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c
> @@ -279,6 +279,7 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum
> nvkm_secboot_falcon falcon)
>                 u32 flags;
>                 u32 falcon_id;
>         } cmd;
> +       const struct nvkm_subdev *subdev = priv->falcon->owner;
>
>         memset(&cmd, 0, sizeof(cmd));
>
> @@ -290,7 +291,8 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum
> nvkm_secboot_falcon falcon)
>         nvkm_msgqueue_post(priv, MSGQUEUE_MSG_PRIORITY_HIGH, &cmd.hdr,
>                         acr_boot_falcon_callback, &completed, true);
>
> -       if (!wait_for_completion_timeout(&completed,
> msecs_to_jiffies(1000)))
> +       nvkm_error(subdev, "waiting for timeout in acr_boot_falcon
> (msgqueue_0137bca5)\n");
> +       if (!wait_for_completion_timeout(&completed,
> msecs_to_jiffies(5000)))
>                 return -ETIMEDOUT;
>
>         return 0;
>
>
>
> On 9/13/17 11:37 AM, Nicolas Mercier wrote:
> > I am still looking for a solution. I have hacked around in the code
> > and found out the following:
> > - Nouveau prefers using PCIE power managemet over ACPI Optimus calls.
> > I tried to force it to use Optimus ACPI calls, but there was an error
> > calling the ACPI method so it bails out and uses PCIE PM anyway.
> > - I tried to debug the PCIE pm states which internally uses ACPI to
> > turn power on/off. I could print different statuses here and there.
> > When the power is switched off, ACPI calls turn the power off then the
> > kernel successfully puts the device in state D3Cold (also turning off
> > power to the PCI Express port). When waking up, ACPI turns the power
> > on, apparently successfully (Device [PEGP] transitioned to D0). But a
> > read from the PCI bus to get the power state & other flags return
> > 65535 (~0) and the kernel fails to set the device in D0 (although ACPI
> > claims it is in D0)
> > The call to pci_raw_set_power_state (in drivers/pci/pci.c) seems to
> > fail because pci_read_config_word returns "~0" (and does not
return
> > any error code)
> >
> > I have tried different things; if I use pcie_port_pm=off, the NVidia
> > card goes to state D3Hot (if I am not mistaken, its PCIE port is still
> > powered) but that did not fix it. I tried to turn on or off different
> > PCI/PCIexpress features such as hotplug, PM and so on. The only thing
> > that works is that PM is fully disabled, which equals to the device
> > not being powered off, so that would be equivalent to nouveau.runpm=0,
> > which is not helping a lot. I have tried to force pcie aspm by
> > recompiling the ACPI table, still no luck.
> >
> > I am still taking a look, but it seems like the problem comes from the
> > PCIExpress PM functions and ACPI, not directly from Nouveau
> >
> > /n
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<https://lists.freedesktop.org/archives/nouveau/attachments/20170914/b78a33f1/attachment-0001.html>

Peter Wu

2017-Sep-17 23:37 UTC

head link

[Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m

Hi Nicholas,

Increasing the timeout won't help as the dGPU is not powered/available. If
you have an external monitor connected, the dGPU will stay powered on and not
suspend (with a (temporary) effect similar to booting with nouveau.runpm=0).

The reason for the failure to restore power is still unknown, it seems related
to the PCI core.

Kind regards,
Peter
https://lekensteyn.nl
(pardon my brevity, top-posting and formatting, sent from my phone)


On 13 September 2017 23:13:59 BST, Nicolas Mercier <mercier.nicolas at
gmail.com> wrote:>Thanks Tobias, I tried this but unfortunately the only effect was thta
>the
>boot was delayed by an additional 4 seconds :(
>The original timeout is at
>drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c
>I tried to increase that timeout, but it did not seem to make a
>difference
>either.
>
>I think I get this error less often when I have a cable plugged in the
>output of that card at boot, whereas I always get this error when I
>boot
>without a monitor attached to the card.
>
>On Wed, Sep 13, 2017 at 2:28 PM, Tobias Klausmann <
>tobias.johannes.klausmann at mni.thm.de> wrote:
>
>> Hi,
>>
>> the system fails to initialize your vbios using secureboot (i had a
>rare
>> chance to on my system to witness it again), for now i traced it to
>> acr_boot_falcon() in
>>
"linux/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0148cdec.c" where
>it
>> throws -110 which is -ETIMEDOUT. You could try to increase the
>timeout
>> and see if it helps something, similar to the following:
>>
>>
>> diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c
>> b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c
>> index 77273b53672c..fc0cb187d80d 100644
>> --- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c
>> +++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c
>> @@ -326,7 +326,7 @@ nvkm_msgqueue_post(struct nvkm_msgqueue *priv,
>enum
>> msgqueue_msg_priority prio,
>>         int ret;
>>
>>         if (wait_init &&
>!wait_for_completion_timeout(&priv->init_done,
>> -                                        msecs_to_jiffies(1000)))
>> +                                        msecs_to_jiffies(5000)))
>>                 return -ETIMEDOUT;
>>
>>         queue = priv->func->cmd_queue(priv, prio);
>>
>> diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c
>> b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c
>> index fec0273158f6..c2ae525a0780 100644
>> --- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c
>> +++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c
>> @@ -279,6 +279,7 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum
>> nvkm_secboot_falcon falcon)
>>                 u32 flags;
>>                 u32 falcon_id;
>>         } cmd;
>> +       const struct nvkm_subdev *subdev = priv->falcon->owner;
>>
>>         memset(&cmd, 0, sizeof(cmd));
>>
>> @@ -290,7 +291,8 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum
>> nvkm_secboot_falcon falcon)
>>         nvkm_msgqueue_post(priv, MSGQUEUE_MSG_PRIORITY_HIGH,
>&cmd.hdr,
>>                         acr_boot_falcon_callback, &completed,
true);
>>
>> -       if (!wait_for_completion_timeout(&completed,
>> msecs_to_jiffies(1000)))
>> +       nvkm_error(subdev, "waiting for timeout in acr_boot_falcon
>> (msgqueue_0137bca5)\n");
>> +       if (!wait_for_completion_timeout(&completed,
>> msecs_to_jiffies(5000)))
>>                 return -ETIMEDOUT;
>>
>>         return 0;
>>
>>
>>
>> On 9/13/17 11:37 AM, Nicolas Mercier wrote:
>> > I am still looking for a solution. I have hacked around in the
code
>> > and found out the following:
>> > - Nouveau prefers using PCIE power managemet over ACPI Optimus
>calls.
>> > I tried to force it to use Optimus ACPI calls, but there was an
>error
>> > calling the ACPI method so it bails out and uses PCIE PM anyway.
>> > - I tried to debug the PCIE pm states which internally uses ACPI
to
>> > turn power on/off. I could print different statuses here and
there.
>> > When the power is switched off, ACPI calls turn the power off then
>the
>> > kernel successfully puts the device in state D3Cold (also turning
>off
>> > power to the PCI Express port). When waking up, ACPI turns the
>power
>> > on, apparently successfully (Device [PEGP] transitioned to D0).
But
>a
>> > read from the PCI bus to get the power state & other flags
return
>> > 65535 (~0) and the kernel fails to set the device in D0 (although
>ACPI
>> > claims it is in D0)
>> > The call to pci_raw_set_power_state (in drivers/pci/pci.c) seems
to
>> > fail because pci_read_config_word returns "~0" (and does
not return
>> > any error code)
>> >
>> > I have tried different things; if I use pcie_port_pm=off, the
>NVidia
>> > card goes to state D3Hot (if I am not mistaken, its PCIE port is
>still
>> > powered) but that did not fix it. I tried to turn on or off
>different
>> > PCI/PCIexpress features such as hotplug, PM and so on. The only
>thing
>> > that works is that PM is fully disabled, which equals to the
device
>> > not being powered off, so that would be equivalent to
>nouveau.runpm=0,
>> > which is not helping a lot. I have tried to force pcie aspm by
>> > recompiling the ACPI table, still no luck.
>> >
>> > I am still taking a look, but it seems like the problem comes from
>the
>> > PCIExpress PM functions and ACPI, not directly from Nouveau
>> >
>> > /n
>>

Apparently Analagous Threads

Search for more maybe matching threads

Nouveau - Sep 2017 - Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m

[Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m

[Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m

[Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m

Apparently Analagous Threads