thr3ads.net - Nouveau - [Nouveau] Progress on nv46 vblank bug [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Hans de Goede

2015-Jun-16 09:34 UTC

[Nouveau] Progress on nv46 vblank bug

Hi All,

So I've been working~w banging my head against the nv46 vblank bug
again and I've been finding out some interesting things.

This is all using the latest kernel + ddx + mesa code.

All of this was tested with *cold* (power removed from wall outlet
for 20 seconds) boots in between the scenarios because this is
really weird, all boots where into text mode.

Scenario 1:
a) startx /usr/bin/glxgears -info
  glxgears works, is using the nouveau mesa driver and is synced at
  60 fps.
b) startx /usr/bin/xterm -title foo
  And then in the xterm:
  metacity&
  glxgears -info
  glxgears works, is using the nouveau mesa driver and is synced at
  60 fps.

Scenario 2:
a) startx /usr/bin/xterm -title foo
  And then in the xterm:
  metacity&
  glxgears -info
  glxgears does not work /proc/interrupts shows the interrupt for
   nvkm is not firing.
b) startx /usr/bin/glxgears -info
  glxgears does not work /proc/interrupts shows the interrupt for
   nvkm is not firing.

Weird huh, but it gets even weirder, if after scenario 2 I reload
the nouveau kernel module, using the exact same module as loaded
during boot, then I can run scenario 1 and it works the same as
after a cold boot (iow things work as they should).

Ok, so reloading the module sets things back to a pristine state,
well not quite. Because after a module reload scenario 2 also
works, where as after a cold boot scenario 2 does not work ...

Also once things are in a working state I can pretty much do
whatever I want and they stay working...

My theory so far is that plymouth does something which causes
problems when followed by starting X + xterm + metacity, where as
firing X + glxgears directly after boot undoes the something
plymouth has done, and from there on everything is good.

So any hints how to mvoe forward with this are appreciated.

Regards,

Hans

p.s.

Possibly related, likely unrelated during nouveau module (re) load I get these 2
errors:

[  240.837471] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT
at 0x6833c8
[  240.837945] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x4f4f5c4e FAULT
at 0x6833c8

Where by the addresses listed as being written to (0x00000000 and 0x4f4f5c4e)
are different
each module load, so they seem to be taken from uninitialized memory.

Hints on how to debug this are welcome too.

Ilia Mirkin

2015-Jun-16 14:02 UTC

head link

[Nouveau] Progress on nv46 vblank bug

On Tue, Jun 16, 2015 at 5:34 AM, Hans de Goede <hdegoede at redhat.com>
wrote:> So any hints how to mvoe forward with this are appreciated.
I can only say what I would do... forget about trying to quantify
which cases work and which don't, just take the case that you can
reliably reproduce the problem with. Start up a second xterm (or ssh
in, that might be simpler), and start poking at stuff with
nvapeek/nvapoke. You can look in the driver for what it does when
enabling/disabling vblanks, and you can verify the values of various
registers to see if they're what you expect or not. My bet is the the
vblanks are somehow masked off. The dispnv04 code is pretty
convoluted, and probably an odd call sequence causes it to mess things
up.

Also adding a drm.debug=0xf and comparing the successful and failure
cases may prove interesting. [and nouveau.debug=debug for good measure
as well, can't hurt]
> Possibly related, likely unrelated during nouveau module (re) load I get
> these 2 errors:
>
> [  240.837471] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000
> FAULT at 0x6833c8
> [  240.837945] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x4f4f5c4e
> FAULT at 0x6833c8
>
> Where by the addresses listed as being written to (0x00000000 and
> 0x4f4f5c4e) are different
> each module load, so they seem to be taken from uninitialized memory.
How different? This example appears to decode to:

$ ~/src/envytools/rnn/lookup -a 46 6833c8
PRMDIO2.PAL_INDEX => 0

Which is definitely video-related. Perhaps it's something executed by
a VBIOS script? (Or wait, you were thinking that 0x4f4f5c4e is the
address? no, that is the value being written. And it occurs to me that
that is 'N\OO' in fourcc. Probably irrelevant.) You may find
'nvbios'
a useful tool for decoding the bios scripts. Also the 3c8 is
suspiciously similar to the VGA I/O 0x3c4 (?) register? Probably
coincidence.

Good luck,

  -ilia

Hans de Goede

2015-Jun-19 10:25 UTC

head link

[Nouveau] Progress on nv46 vblank bug

Hi,

On 16-06-15 16:02, Ilia Mirkin wrote:> On Tue, Jun 16, 2015 at 5:34 AM, Hans de Goede <hdegoede at
redhat.com> wrote:
>> So any hints how to mvoe forward with this are appreciated.
>
> I can only say what I would do... forget about trying to quantify
> which cases work and which don't, just take the case that you can
> reliably reproduce the problem with. Start up a second xterm (or ssh
> in, that might be simpler), and start poking at stuff with
> nvapeek/nvapoke. You can look in the driver for what it does when
> enabling/disabling vblanks, and you can verify the values of various
> registers to see if they're what you expect or not. My bet is the the
> vblanks are somehow masked off. The dispnv04 code is pretty
> convoluted, and probably an odd call sequence causes it to mess things
> up.
Ok, so I've been poking at registers for a couple of hours yesterday
and today, but I've not gotten anywhere.

In the mean time I've learned something about my 2 scenarios, I was
wrong that one works and one does not work, they both work and do
not work some of the time ...

It seems that we're not initializing some register and sometimes this
comes out of reset with a good value and sometimes with a bad value ...

Running nvapeek on interesting register ranges also shows that quite
a few registers contain different values between boots. Some of these
are things seem to be counters for the current line / scanout address,
but others are not. Is this normal for nouveau hardware?

I'm used to most hardware having everything in a consistent state after
a reset.

>
> Also adding a drm.debug=0xf and comparing the successful and failure
> cases may prove interesting. [and nouveau.debug=debug for good measure
> as well, can't hurt]
>
>> Possibly related, likely unrelated during nouveau module (re) load I
get
>> these 2 errors:
>>
>> [  240.837471] nouveau E[    PBUS][0000:01:00.0] MMIO write of
0x00000000
>> FAULT at 0x6833c8
>> [  240.837945] nouveau E[    PBUS][0000:01:00.0] MMIO write of
0x4f4f5c4e
>> FAULT at 0x6833c8
>>
>> Where by the addresses listed as being written to (0x00000000 and
>> 0x4f4f5c4e) are different
>> each module load, so they seem to be taken from uninitialized memory.
>
> How different? This example appears to decode to:
>
> $ ~/src/envytools/rnn/lookup -a 46 6833c8
> PRMDIO2.PAL_INDEX => 0
>
> Which is definitely video-related. Perhaps it's something executed by
> a VBIOS script? (Or wait, you were thinking that 0x4f4f5c4e is the
> address? no, that is the value being written. And it occurs to me that
> that is 'N\OO' in fourcc. Probably irrelevant.) You may find
'nvbios'
> a useful tool for decoding the bios scripts. Also the 3c8 is
> suspiciously similar to the VGA I/O 0x3c4 (?) register? Probably
> coincidence.
Ah yes I had the 2 value and address swapped. you're right it is writing
to 0x6833c8 usually 2 times, but sometimes it is writing to that register
a lot of times in a row, usually on a nouveau module reload.

Regards,

Hans

Hans de Goede

2015-Jun-19 10:26 UTC

head link

[Nouveau] Progress on nv46 vblank bug

p.s.

I'm starting to think that it may eb best to just disable
vblank for nv46 hardware in the ddx and be done with it,
any opinions on this ?

Apparently Analagous Threads

Search for more possibly parallel threads

Nouveau - Jun 2015 - Progress on nv46 vblank bug

[Nouveau] Progress on nv46 vblank bug

[Nouveau] Progress on nv46 vblank bug

[Nouveau] Progress on nv46 vblank bug

[Nouveau] Progress on nv46 vblank bug

Apparently Analagous Threads