thr3ads.net - Nouveau - [Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges [Nov 2019]

If this information is useful, please help other people find it:
Share via:

Mika Westerberg

2019-Nov-27 11:49 UTC

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

On Tue, Nov 26, 2019 at 06:10:36PM -0500, Lyude Paul
wrote:> Hey-this is almost certainly not the right place in this thread to respond,
> but this thread has gotten so deep evolution can't push the subject
further to
> the right, heh. So I'll just respond here.
:)
> I've been following this and helping out Karol with testing here and
there.
> They had me test Bjorn's PCI branch on the X1 Extreme 2nd generation,
which
> has a turing GPU and 8086:1901 PCI bridge.
> 
> I was about to say "the patch fixed things, hooray!" but it seems
that after
> trying runtime suspend/resume a couple times things fall apart again:
You mean $subject patch, no?
> [  686.883247] nouveau 0000:01:00.0: DRM: suspending object tree...
> [  752.866484] ACPI Error: Aborting method \_SB.PCI0.PEG0.PEGP.NVPO due to
previous error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
> [  752.866508] ACPI Error: Aborting method \_SB.PCI0.PGON due to previous
error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
> [  752.866521] ACPI Error: Aborting method \_SB.PCI0.PEG0.PG00._ON due to
previous error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
This is probably the culprit. The same AML code fails to properly turn
on the device.

Is acpidump from this system available somewhere?

Karol Herbst

2019-Nov-27 11:51 UTC

head link

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

On Wed, Nov 27, 2019 at 12:49 PM Mika Westerberg
<mika.westerberg at intel.com> wrote:>
> On Tue, Nov 26, 2019 at 06:10:36PM -0500, Lyude Paul wrote:
> > Hey-this is almost certainly not the right place in this thread to
respond,
> > but this thread has gotten so deep evolution can't push the
subject further to
> > the right, heh. So I'll just respond here.
>
> :)
>
> > I've been following this and helping out Karol with testing here
and there.
> > They had me test Bjorn's PCI branch on the X1 Extreme 2nd
generation, which
> > has a turing GPU and 8086:1901 PCI bridge.
> >
> > I was about to say "the patch fixed things, hooray!" but it
seems that after
> > trying runtime suspend/resume a couple times things fall apart again:
>
> You mean $subject patch, no?
>
no, I told Lyude to test the pci/pm branch as the runpm errors we saw
on that machine looked different. Some BAR error the GPU reported
after it got resumed, so I was wondering if the delays were helping
with that. But after some cycles it still caused the same issue, that
the GPU disappeared. Later testing also showed that my patch also
didn't seem to help with this error sadly :/
> > [  686.883247] nouveau 0000:01:00.0: DRM: suspending object tree...
> > [  752.866484] ACPI Error: Aborting method \_SB.PCI0.PEG0.PEGP.NVPO
due to previous error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
> > [  752.866508] ACPI Error: Aborting method \_SB.PCI0.PGON due to
previous error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
> > [  752.866521] ACPI Error: Aborting method \_SB.PCI0.PEG0.PG00._ON due
to previous error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
>
> This is probably the culprit. The same AML code fails to properly turn
> on the device.
>
> Is acpidump from this system available somewhere?
>

Lyude Paul

2019-Nov-27 19:54 UTC

head link

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

On Wed, 2019-11-27 at 12:51 +0100, Karol Herbst wrote:> On Wed, Nov 27, 2019 at 12:49 PM Mika Westerberg
> <mika.westerberg at intel.com> wrote:
> > On Tue, Nov 26, 2019 at 06:10:36PM -0500, Lyude Paul wrote:
> > > Hey-this is almost certainly not the right place in this thread
to
> > > respond,
> > > but this thread has gotten so deep evolution can't push the
subject
> > > further to
> > > the right, heh. So I'll just respond here.
> > 
> > :)
> > 
> > > I've been following this and helping out Karol with testing
here and
> > > there.
> > > They had me test Bjorn's PCI branch on the X1 Extreme 2nd
generation,
> > > which
> > > has a turing GPU and 8086:1901 PCI bridge.
> > > 
> > > I was about to say "the patch fixed things, hooray!"
but it seems that
> > > after
> > > trying runtime suspend/resume a couple times things fall apart
again:
> > 
> > You mean $subject patch, no?
> > 
> 
> no, I told Lyude to test the pci/pm branch as the runpm errors we saw
> on that machine looked different. Some BAR error the GPU reported
> after it got resumed, so I was wondering if the delays were helping
> with that. But after some cycles it still caused the same issue, that
> the GPU disappeared. Later testing also showed that my patch also
> didn't seem to help with this error sadly :/
> 
> > > [  686.883247] nouveau 0000:01:00.0: DRM: suspending object
tree...
> > > [  752.866484] ACPI Error: Aborting method
\_SB.PCI0.PEG0.PEGP.NVPO due
> > > to previous error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
> > > [  752.866508] ACPI Error: Aborting method \_SB.PCI0.PGON due to
> > > previous error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
> > > [  752.866521] ACPI Error: Aborting method
\_SB.PCI0.PEG0.PG00._ON due
> > > to previous error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
> > 
> > This is probably the culprit. The same AML code fails to properly turn
> > on the device.
> > 
> > Is acpidump from this system available somewhere?
Attached it to this email
> > -- 
Cheers,
	Lyude Paul
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x1extremeg2-acpi.tar.xz
Type: application/x-xz-compressed-tar
Size: 181604 bytes
Desc: not available
URL:
<https://lists.freedesktop.org/archives/nouveau/attachments/20191127/61083044/attachment-0001.bin>

Apparently Analagous Threads

Search for more apparently analagous threads

Nouveau - Nov 2019 - [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

Apparently Analagous Threads