----- Mail original -----> De: "Ilia Mirkin" <imirkin at alum.mit.edu> > ?: "pierre morrow" <pierre.morrow at free.fr> > Cc: nouveau at lists.freedesktop.org > Envoy?: Vendredi 31 Janvier 2014 21:16:40 > Objet: Re: [Nouveau] Help needed for bug 58556> Unfortunately this is a *massive* bug... and confused by the "other" > very similar but apparently not identical bug in the system.> What happens if you only enable acceleration on the NVAC card? (e.g. > by hacking up nouveau to ignore the other one entirely). Wasn't there > some thing where the NV96 card was effectively disabled but still > appearing in PCI space? Or I might be thinking of a different mac > situation...Well, if I disable acceleration for the NV96 card, it doesn't hang after initialising it, but I get spammed (I think it's PAGE_NOT_PRESENT errors, like [1], but my screen goes garbage at that point, so I can't read anything) later on, and I don't get to login. BTW, what could I do to get boot logs even if the system did not make it trough (apart from recording with my phone...)?> As you probably saw, this is a MASSIVE commit. What exactly was the > problem with 20abd1634a?The vblank structure was a little bit modified, and psw->vblank would be initialised only when acceleration is on (it was always initialised before), though it would be used inside functions called even when acceleration is off. You can see it in comments 18 [2] and 20 [3].> Can you go into some detail on what these tests were that yielded a > successful outcome? IIRC nouveau_channel_new is called to create a > new... channel, which is used by drm clients. If you don't have > acceleration, that whole api is disabled, so it shouldn't come up. I > guess accel_init also initializes drm->channel which is the kernel > channel for doing stuff. [Although TBH I'm not entirely sure how > things work without acceleration enabled... but I think there's a > non-fifo way to show images on the screen.]My tests were pretty bruteforcing ones: * comment all nouveau_accel_init content, and uncomment block by block until it works; * then comment all nouveau_channel_new content, and uncomment function by function until it works; * and finally, I did the same inside nouveau_channel_init (for this function, only the vram creation, gart creation and dma variables initialisation were enough to get a clean screen) . To sum up what pieces of nouveau_accel_init were needed to get a clean screen: * return if card is an NV96 one; * init fence; * run nouveau_channel_new: * nouveau_channel_ind * nouveau_channel_init, precisely these parts: * vram creation; * gart creation; * dma variables initialisation.> -iliaPierre Moreau [1]: nouveau E[PFB][0000:03:00:0] trapped write at 0x0000546000 on channel 0x0000fee0 [unknown] BAR/PFIFO_WRITE/FB reason: PAGE_NOT_PRESENT [2]: https://bugs.freedesktop.org/show_bug.cgi?id=58556#c18 [3]: https://bugs.freedesktop.org/show_bug.cgi?id=58556#c20 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/nouveau/attachments/20140131/c638909a/attachment.html>
On Fri, Jan 31, 2014 at 5:39 PM, <pierre.morrow at free.fr> wrote:> De: "Ilia Mirkin" <imirkin at alum.mit.edu> > > Unfortunately this is a *massive* bug... and confused by the "other" > > very similar but apparently not identical bug in the system. > > > > What happens if you only enable acceleration on the NVAC card? (e.g. > > by hacking up nouveau to ignore the other one entirely). Wasn't there > > some thing where the NV96 card was effectively disabled but still > > appearing in PCI space? Or I might be thinking of a different mac > > situation... > > Well, if I disable acceleration for the NV96 card, it doesn't hang after > initialising it, but I get spammed (I think it's PAGE_NOT_PRESENT errors, > like [1], but my screen goes garbage at that point, so I can't read > anything) later on, and I don't get to login.I meant disable it much harder -- like tell nouveau to just ignore it as though modeset=0 was passed in for it. Also I seem to recall you can do an outb (even from grub) that will just turn off the nv96 card entirely.> BTW, what could I do to get boot logs even if the system did not make it > trough (apart from recording with my phone...)?pstore if you have efi, netconsole, blockconsole. And phone isn't so bad either :)> > > > > As you probably saw, this is a MASSIVE commit. What exactly was the > > problem with 20abd1634a? > > The vblank structure was a little bit modified, and psw->vblank would be > initialised only when acceleration is on (it was always initialised before), > though it would be used inside functions called even when acceleration is > off. You can see it in comments 18 [2] and 20 [3]. > > > > Can you go into some detail on what these tests were that yielded a > > successful outcome? IIRC nouveau_channel_new is called to create a > > new... channel, which is used by drm clients. If you don't have > > acceleration, that whole api is disabled, so it shouldn't come up. I > > guess accel_init also initializes drm->channel which is the kernel > > channel for doing stuff. [Although TBH I'm not entirely sure how > > things work without acceleration enabled... but I think there's a > > non-fifo way to show images on the screen.] > > My tests were pretty bruteforcing ones: > * comment all nouveau_accel_init content, and uncomment block by block > until it works; > * then comment all nouveau_channel_new content, and uncomment function by > function until it works; > * and finally, I did the same inside nouveau_channel_init (for this > function, only the vram creation, gart creation and dma variables > initialisation were enough to get a clean screen). > > To sum up what pieces of nouveau_accel_init were needed to get a clean > screen: > * return if card is an NV96 one; > * init fence; > * run nouveau_channel_new: > * nouveau_channel_ind > * nouveau_channel_init, precisely these parts: > * vram creation; > * gart creation; > * dma variables initialisation.Yeah, so all these things should only be necessary if you have acceleration enabled. I wonder if the card comes up in a funny "I'm still executing stuff" state and nouveau fails to "shut it down" when noaccel is passed in. -ilia
----- Mail original -----> De: "Ilia Mirkin" <imirkin at alum.mit.edu> > ?: "pierre morrow" <pierre.morrow at free.fr> > Cc: nouveau at lists.freedesktop.org > Envoy?: Vendredi 31 Janvier 2014 23:58:58 > Objet: Re: [Nouveau] Help needed for bug 58556> I meant disable it much harder -- like tell nouveau to just ignore it > as though modeset=0 was passed in for it. Also I seem to recall you > can do an outb (even from grub) that will just turn off the nv96 card > entirely.So I denied pci to enable the card, but nothing changed: with noaccel, screen is still corrupted and without it, I still don't boot, getting lots of errors. Oh, I forgot to precise: the screen gets scrambled just after the handover from efifb to nouveaufb.> pstore if you have efi, netconsole, blockconsole. And phone isn't so > bad either :)> Yeah, so all these things should only be necessary if you have > acceleration enabled. I wonder if the card comes up in a funny "I'm > still executing stuff" state and nouveau fails to "shut it down" when > noaccel is passed in.Well, from what I saw in nouveau_accel_init, it just enables things if noaccel=0, and does nothing otherwise; noaccel isn't used apart from accel_init.> -iliaPierre -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/nouveau/attachments/20140201/b70c11ae/attachment.html>
I think we had the wrong culprit: I just tried PCI-disabling the NVAC card (and keeping the NV96 one), and it just works: no garbage screen, and moreover, I get no hangs nor errors when enabling acceleration! I'll spend some time comparing both outputs (without NVAC, and without NV96) to find out what the NVAC is doing wrong. Pierre> On 31 Jan 2014, at 23:58, Ilia Mirkin <imirkin at alum.mit.edu> wrote: > >> On Fri, Jan 31, 2014 at 5:39 PM, <pierre.morrow at free.fr> wrote: >> De: "Ilia Mirkin" <imirkin at alum.mit.edu> >>> Unfortunately this is a *massive* bug... and confused by the "other" >>> very similar but apparently not identical bug in the system. >>> >>> What happens if you only enable acceleration on the NVAC card? (e.g. >>> by hacking up nouveau to ignore the other one entirely). Wasn't there >>> some thing where the NV96 card was effectively disabled but still >>> appearing in PCI space? Or I might be thinking of a different mac >>> situation... >> >> Well, if I disable acceleration for the NV96 card, it doesn't hang after >> initialising it, but I get spammed (I think it's PAGE_NOT_PRESENT errors, >> like [1], but my screen goes garbage at that point, so I can't read >> anything) later on, and I don't get to login. > > I meant disable it much harder -- like tell nouveau to just ignore it > as though modeset=0 was passed in for it. Also I seem to recall you > can do an outb (even from grub) that will just turn off the nv96 card > entirely. > >> BTW, what could I do to get boot logs even if the system did not make it >> trough (apart from recording with my phone...)? > > pstore if you have efi, netconsole, blockconsole. And phone isn't so > bad either :) > >> >> >> >>> As you probably saw, this is a MASSIVE commit. What exactly was the >>> problem with 20abd1634a? >> >> The vblank structure was a little bit modified, and psw->vblank would be >> initialised only when acceleration is on (it was always initialised before), >> though it would be used inside functions called even when acceleration is >> off. You can see it in comments 18 [2] and 20 [3]. >> >> >>> Can you go into some detail on what these tests were that yielded a >>> successful outcome? IIRC nouveau_channel_new is called to create a >>> new... channel, which is used by drm clients. If you don't have >>> acceleration, that whole api is disabled, so it shouldn't come up. I >>> guess accel_init also initializes drm->channel which is the kernel >>> channel for doing stuff. [Although TBH I'm not entirely sure how >>> things work without acceleration enabled... but I think there's a >>> non-fifo way to show images on the screen.] >> >> My tests were pretty bruteforcing ones: >> * comment all nouveau_accel_init content, and uncomment block by block >> until it works; >> * then comment all nouveau_channel_new content, and uncomment function by >> function until it works; >> * and finally, I did the same inside nouveau_channel_init (for this >> function, only the vram creation, gart creation and dma variables >> initialisation were enough to get a clean screen). >> >> To sum up what pieces of nouveau_accel_init were needed to get a clean >> screen: >> * return if card is an NV96 one; >> * init fence; >> * run nouveau_channel_new: >> * nouveau_channel_ind >> * nouveau_channel_init, precisely these parts: >> * vram creation; >> * gart creation; >> * dma variables initialisation. > > Yeah, so all these things should only be necessary if you have > acceleration enabled. I wonder if the card comes up in a funny "I'm > still executing stuff" state and nouveau fails to "shut it down" when > noaccel is passed in. > > -ilia