Hans de Goede
2015-Aug-10 12:47 UTC
[Nouveau] "enable dri3 support without glamor" causes gnome-shell regression on nv4x
Hi, On 03-08-15 20:09, Ilia Mirkin wrote:> On Mon, Aug 3, 2015 at 1:31 PM, Hans de Goede <hdegoede at redhat.com> wrote: >> Hi, >> >> >> On 03-08-15 17:36, Ilia Mirkin wrote: >>> >>> On Mon, Aug 3, 2015 at 9:02 AM, Hans de Goede <hdegoede at redhat.com> wrote: >>>> >>>> Hi, >>>> >>>> On 30-07-15 16:09, Ilia Mirkin wrote: >>>>> >>>>> >>>>> FWIW this is a fail on nv50+ as well. See for example >>>>> https://bugs.freedesktop.org/show_bug.cgi?id=91445 >>>>> >>>>> My suspicion is that this is due to the lack of PUSH_KICK in the *Done >>>>> exa handlers -- works fine with DRI2, but DRI3 has no synchronization >>>>> and so the commands never get flushed out. Easily verified by sticking >>>>> PUSH_KICK's everywhere. >>>> >>>> >>>> >>>> I do not believe that that is the problem, in my case it clearly >>>> seems to be a pitch / swizzle problem rather then a synchronizarion >>>> problem, here is what my desktop with gnome shell looks like when >>>> using DRI2: >>>> >>>> https://fedorapeople.org/~jwrdegoede/nv46-gnome-shell-good.jpg >>>> >>>> And this is what it looks like when using DRI3: >>>> >>>> https://fedorapeople.org/~jwrdegoede/nv46-gnome-shell-bad.jpg >>>> >>>> The DRI2 screenshot is made with Mario's 2 patches on top of >>>> current master: >>>> >>>> http://lists.freedesktop.org/archives/nouveau/2015-July/021740.html >>>> http://lists.freedesktop.org/archives/nouveau/2015-July/021741.html >>>> >>>> And then adding Option "DRI" "2" to xorg.conf. >>> >>> >>> His patches should have defaulted it to DRI 2 I think, so this is >>> unnecessary. In fact you should have had to say "DRI" "3" to get DRI3 >>> with his patches. >>> -- >>>> >>>> >>>> I've also tried disabling EXA using Option "AccelMethod" "none", >>>> but that seems to also automatically disable all DRI, leading to >>>> software rendering. >>>> >>>> I discussed this with Ben this morning and he suggested that this >>>> is likely a Mesa issue since with DRI3 mesa rather then the ddx >>>> allocs the surfaces. I've tried disabling swizzling in the >>>> mesa code by forcing nv30_miptree_create() to always take >>>> the code path for linear textures, but that leads to the exact >>>> same result as before that change. >>> >>> >>> Ah yes. Very different problem indeed. I actually suspect it has to do >>> with swizzling. Look at the white pattern of the moon -- it's all in a >>> line. That means that it expected some locality and instead it got >>> drawn all on a line. If it were merely a stride problem, I'd expect to >>> see strips of the moon below and offset from one another. >>> >>> So... take a look at nv30_miptree_from_handle -- I wonder if it can >>> now receive swizzled textures where it couldn't before. >> >> >> Ok, that does go in the direction I am expecting the problem to be, >> but I'm afraid I'm going to need a bit more guidance, what exactly >> am I looking for in that function / which "knobs" should I try to >> vary / play with to maybe fix this ? > > Unfortunately this is playing near (or past) the limits of my > knowledge as well. My understanding is that DRI3 passes pixmaps around > with dma-buf, aka "bo_from_handle". DRI2 uses some other mechanism > which is not that (I think it just copies stuff around). > > Now on nv50+, bo's have "tile flags" (and memtype and probably other > annoyances). The tile flags indicate the specific tiling mechanism > used on that bo (i.e. do you do 32x32 tiles? 32x64? etc). Take a look > at the nouveau_bo_new() call in nv50_miptree.c -- note how it takes a > "bo config" argument. This bo config can then later be retrieved using > some other syscall. > > However on nv30 there appears to not be any such thing. The > nouveau_bo_new call just passes in NULL for creating the bo, which > means that there's no way to recover the "are you swizzled" > information after-the-fact. > > Presumably you should create a "nv04" bo config section in the union,That already exists, and indeed gets set by the nouveau_allocate_surface function from src/nv_accel_common.c from the ddx,> and just pass the single "swizzled" bit through. I'm not sure what, if > anything, is required on the kernel side for that. I don't think > there's any optionality in how the swizzling is done for pre-nv50. > > Note that in the nv30_miptree logic, mt->swizzled implies that > mt->uniform_pitch = 0, but the level pitch is set "properly" (again, > see nv30_miptree_create). > > Hope this sheds some light and doesn't cause you to go in the wrong > direction -- please take everything I say with a grain of salt -- I'm > often a bit off on some of the details.Thanks this was helpful, I do feel we are getting somewhere, but I do need a bit more help. I've added some debug printf's to nv30_miptree.c, nv30_miptree_create and nv30_miptree_from_handle, where the latter is only used when using dri2 (e.g. in the working case). Doing a diff between a log of starting gnome-shell with dri vs dri3 results in this: --- mesa.log.dri2 2015-08-10 14:18:03.182712022 +0200 +++ mesa.log.dri3 2015-08-10 14:18:33.233336338 +0200 @@ -1,8 +1,8 @@ nv30_miptree_create 512x32 uniform_pitch 0 usage 0 flags 0 -nv30_miptree_from_handle 1x1 uniform_pitch 1024 usage 0 flags 0 +nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 nv30_miptree_create 512x32 uniform_pitch 0 usage 0 flags 0 -nv30_miptree_from_handle 1x1 uniform_pitch 1024 usage 0 flags 0 +nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 @@ -12,6 +12,7 @@ nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0 nv30_miptree_create 512x512 uniform_pitch 0 usage 0 flags 0 nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0 +nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 @@ -20,29 +21,24 @@ nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 -nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 nv30_miptree_create 1920x1200 uniform_pitch 7680 usage 0 flags 0 +nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0 nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 -nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 -nv30_miptree_create 18x18 uniform_pitch 64 usage 0 flags 0 -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 +nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0 nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0 nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 +nv30_miptree_create 18x18 uniform_pitch 64 usage 0 flags 0 +nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0 -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 nv30_miptree_create 6x8 uniform_pitch 64 usage 0 flags 0 nv30_miptree_create 6x8 uniform_pitch 64 usage 0 flags 0 -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 I've also added logging of the surface creation in the ddx, here is one such log line: Surface alloc 1920x1080 at 32 pitch 8192, cfg.nv04.surf_pitch 8192 .surf_flags 2 What stands out to me is: 1) In the dri3 case there are no nv30_miptree_from_handle calls, these are replaced by nv30_miptree_create calls 2) These replacement calls use a different pitch for the 1920x... surfaces, 8192 vs 7680 3) The surface creation in the ddx passes in cfg.nv04.surf_... when calling nouveau_bo_new, where as nv30_miptree_create passes in NULL 4) nv30_miptree_from_handle always sets uniform_pitch in the nv30_miptree, iow it always assumes non-swizzled ? The thing I would like to investigate first is 2), it seems to me that nv30_miptree_create should not be doing pitch alignment itself, so if we want to have the same pitch for these surfaces as in the dri2 case, we need to fix this in the caller. Which leads to the question where is this code being called from in the dri3 case. And this is where I'm stuck and could use your or Benjamin's input. So do you know where the caller is and/or how to figure out where the caller is? Regards, Hans
Ilia Mirkin
2015-Aug-11 12:21 UTC
[Nouveau] "enable dri3 support without glamor" causes gnome-shell regression on nv4x
On Mon, Aug 10, 2015 at 8:47 AM, Hans de Goede <hdegoede at redhat.com> wrote:> Hi, > > > On 03-08-15 20:09, Ilia Mirkin wrote: >> >> On Mon, Aug 3, 2015 at 1:31 PM, Hans de Goede <hdegoede at redhat.com> wrote: >>> >>> Hi, >>> >>> >>> On 03-08-15 17:36, Ilia Mirkin wrote: >>>> >>>> >>>> On Mon, Aug 3, 2015 at 9:02 AM, Hans de Goede <hdegoede at redhat.com> >>>> wrote: >>>>> >>>>> >>>>> Hi, >>>>> >>>>> On 30-07-15 16:09, Ilia Mirkin wrote: >>>>>> >>>>>> >>>>>> >>>>>> FWIW this is a fail on nv50+ as well. See for example >>>>>> https://bugs.freedesktop.org/show_bug.cgi?id=91445 >>>>>> >>>>>> My suspicion is that this is due to the lack of PUSH_KICK in the *Done >>>>>> exa handlers -- works fine with DRI2, but DRI3 has no synchronization >>>>>> and so the commands never get flushed out. Easily verified by sticking >>>>>> PUSH_KICK's everywhere. >>>>> >>>>> >>>>> >>>>> >>>>> I do not believe that that is the problem, in my case it clearly >>>>> seems to be a pitch / swizzle problem rather then a synchronizarion >>>>> problem, here is what my desktop with gnome shell looks like when >>>>> using DRI2: >>>>> >>>>> https://fedorapeople.org/~jwrdegoede/nv46-gnome-shell-good.jpg >>>>> >>>>> And this is what it looks like when using DRI3: >>>>> >>>>> https://fedorapeople.org/~jwrdegoede/nv46-gnome-shell-bad.jpg >>>>> >>>>> The DRI2 screenshot is made with Mario's 2 patches on top of >>>>> current master: >>>>> >>>>> http://lists.freedesktop.org/archives/nouveau/2015-July/021740.html >>>>> http://lists.freedesktop.org/archives/nouveau/2015-July/021741.html >>>>> >>>>> And then adding Option "DRI" "2" to xorg.conf. >>>> >>>> >>>> >>>> His patches should have defaulted it to DRI 2 I think, so this is >>>> unnecessary. In fact you should have had to say "DRI" "3" to get DRI3 >>>> with his patches. >>>> -- >>>>> >>>>> >>>>> >>>>> I've also tried disabling EXA using Option "AccelMethod" "none", >>>>> but that seems to also automatically disable all DRI, leading to >>>>> software rendering. >>>>> >>>>> I discussed this with Ben this morning and he suggested that this >>>>> is likely a Mesa issue since with DRI3 mesa rather then the ddx >>>>> allocs the surfaces. I've tried disabling swizzling in the >>>>> mesa code by forcing nv30_miptree_create() to always take >>>>> the code path for linear textures, but that leads to the exact >>>>> same result as before that change. >>>> >>>> >>>> >>>> Ah yes. Very different problem indeed. I actually suspect it has to do >>>> with swizzling. Look at the white pattern of the moon -- it's all in a >>>> line. That means that it expected some locality and instead it got >>>> drawn all on a line. If it were merely a stride problem, I'd expect to >>>> see strips of the moon below and offset from one another. >>>> >>>> So... take a look at nv30_miptree_from_handle -- I wonder if it can >>>> now receive swizzled textures where it couldn't before. >>> >>> >>> >>> Ok, that does go in the direction I am expecting the problem to be, >>> but I'm afraid I'm going to need a bit more guidance, what exactly >>> am I looking for in that function / which "knobs" should I try to >>> vary / play with to maybe fix this ? >> >> >> Unfortunately this is playing near (or past) the limits of my >> knowledge as well. My understanding is that DRI3 passes pixmaps around >> with dma-buf, aka "bo_from_handle". DRI2 uses some other mechanism >> which is not that (I think it just copies stuff around). >> >> Now on nv50+, bo's have "tile flags" (and memtype and probably other >> annoyances). The tile flags indicate the specific tiling mechanism >> used on that bo (i.e. do you do 32x32 tiles? 32x64? etc). Take a look >> at the nouveau_bo_new() call in nv50_miptree.c -- note how it takes a >> "bo config" argument. This bo config can then later be retrieved using >> some other syscall. >> >> However on nv30 there appears to not be any such thing. The >> nouveau_bo_new call just passes in NULL for creating the bo, which >> means that there's no way to recover the "are you swizzled" >> information after-the-fact. >> >> Presumably you should create a "nv04" bo config section in the union, > > > That already exists, and indeed gets set by the nouveau_allocate_surface > function from src/nv_accel_common.c from the ddx, > >> and just pass the single "swizzled" bit through. I'm not sure what, if >> anything, is required on the kernel side for that. I don't think >> there's any optionality in how the swizzling is done for pre-nv50. >> >> Note that in the nv30_miptree logic, mt->swizzled implies that >> mt->uniform_pitch = 0, but the level pitch is set "properly" (again, >> see nv30_miptree_create). >> >> Hope this sheds some light and doesn't cause you to go in the wrong >> direction -- please take everything I say with a grain of salt -- I'm >> often a bit off on some of the details. > > > Thanks this was helpful, I do feel we are getting somewhere, but I do > need a bit more help. > > I've added some debug printf's to nv30_miptree.c, nv30_miptree_create > and nv30_miptree_from_handle, where the latter is only used when using > dri2 (e.g. in the working case). > > Doing a diff between a log of starting gnome-shell with dri vs dri3 > results in this: > > --- mesa.log.dri2 2015-08-10 14:18:03.182712022 +0200 > +++ mesa.log.dri3 2015-08-10 14:18:33.233336338 +0200 > @@ -1,8 +1,8 @@ > nv30_miptree_create 512x32 uniform_pitch 0 usage 0 flags 0 > -nv30_miptree_from_handle 1x1 uniform_pitch 1024 usage 0 flags 0 > +nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 512x32 uniform_pitch 0 usage 0 flags 0 > -nv30_miptree_from_handle 1x1 uniform_pitch 1024 usage 0 flags 0 > +nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 > @@ -12,6 +12,7 @@ > nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 512x512 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0 > +nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > @@ -20,29 +21,24 @@ > nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > -nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 1920x1200 uniform_pitch 7680 usage 0 flags 0 > +nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0 > nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 > -nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > -nv30_miptree_create 18x18 uniform_pitch 64 usage 0 flags 0 > -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 > +nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 > nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 > nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 > -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 > +nv30_miptree_create 18x18 uniform_pitch 64 usage 0 flags 0 > +nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 > nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0 > -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 > nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 > -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 > nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 > nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 > nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 6x8 uniform_pitch 64 usage 0 flags 0 > nv30_miptree_create 6x8 uniform_pitch 64 usage 0 flags 0 > -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 > nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 > > I've also added logging of the surface creation in the ddx, here > is one such log line: > > Surface alloc 1920x1080 at 32 pitch 8192, cfg.nv04.surf_pitch 8192 .surf_flags > 2 > > What stands out to me is: > > 1) In the dri3 case there are no nv30_miptree_from_handle calls, these are > replaced by nv30_miptree_create callsI bet it becomes the DDX that consumes these now though? And perhaps it's not ready to take swizzled textures (i.e. pixmaps)?> > 2) These replacement calls use a different pitch for the 1920x... surfaces, > 8192 vs 7680Looks like it gets upgraded to a POT value (i.e. 2048 x whatever). nv30_miptree will create an unswizzled surface for a non-POT-sized texture, but I guess something somewhere is aligning it up in the DDX? And *still* creating it as a linear texture? Weird.> > 3) The surface creation in the ddx passes in cfg.nv04.surf_... when calling > nouveau_bo_new, where as nv30_miptree_create passes in NULLRight. That's the bit I was talking about earlier -- you need to pass the information that it will be swizzled. Otherwise it's lost in the nv30_miptree should probably be setting those surf_flags as well on BO creation. Apparently there are performance implications from setting these.> > 4) nv30_miptree_from_handle always sets uniform_pitch in the nv30_miptree, > iow > it always assumes non-swizzled ?That's right.> > The thing I would like to investigate first is 2), it seems to me that > nv30_miptree_create should not be doing pitch alignment itself, so if we > want to have the same pitch for these surfaces as in the dri2 case, we > need to fix this in the caller. Which leads to the question where is > this code being called from in the dri3 case. And this is where I'm > stuck and could use your or Benjamin's input. So do you know where the > caller is and/or how to figure out where the caller is?Well, a quick look at the DDX, looks like it never swizzles its textures. See NV40EXAPictTexture for example -- it always puts TEX_FORMAT_LINEAR in. Why, you ask? No clue, that seems incredibly inefficient and silly. Instead of doing that, you could "guess" whether it's swizzled or not based on whether the w/h are POT. If both are, then don't put that linear thing in. It's obviously possible to create POT-sized non-swizzled textures, but you can see the nv30_miptree logic for when it does and when it doesn't. I'm only suggesting this for debugging, not for "real". The real solution to this will be to include the swizzled-ness in the bo_info ioctl, i.e. stick it somewhere in the bo_config and have the kernel hold on to it. -ilia
Hans de Goede
2015-Aug-11 13:13 UTC
[Nouveau] "enable dri3 support without glamor" causes gnome-shell regression on nv4x
Hi, On 11-08-15 14:21, Ilia Mirkin wrote:> On Mon, Aug 10, 2015 at 8:47 AM, Hans de Goede <hdegoede at redhat.com> wrote: >> Hi, >> >> >> On 03-08-15 20:09, Ilia Mirkin wrote: >>> >>> On Mon, Aug 3, 2015 at 1:31 PM, Hans de Goede <hdegoede at redhat.com> wrote: >>>> >>>> Hi, >>>> >>>> >>>> On 03-08-15 17:36, Ilia Mirkin wrote: >>>>> >>>>> >>>>> On Mon, Aug 3, 2015 at 9:02 AM, Hans de Goede <hdegoede at redhat.com> >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> On 30-07-15 16:09, Ilia Mirkin wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> FWIW this is a fail on nv50+ as well. See for example >>>>>>> https://bugs.freedesktop.org/show_bug.cgi?id=91445 >>>>>>> >>>>>>> My suspicion is that this is due to the lack of PUSH_KICK in the *Done >>>>>>> exa handlers -- works fine with DRI2, but DRI3 has no synchronization >>>>>>> and so the commands never get flushed out. Easily verified by sticking >>>>>>> PUSH_KICK's everywhere. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I do not believe that that is the problem, in my case it clearly >>>>>> seems to be a pitch / swizzle problem rather then a synchronizarion >>>>>> problem, here is what my desktop with gnome shell looks like when >>>>>> using DRI2: >>>>>> >>>>>> https://fedorapeople.org/~jwrdegoede/nv46-gnome-shell-good.jpg >>>>>> >>>>>> And this is what it looks like when using DRI3: >>>>>> >>>>>> https://fedorapeople.org/~jwrdegoede/nv46-gnome-shell-bad.jpg >>>>>> >>>>>> The DRI2 screenshot is made with Mario's 2 patches on top of >>>>>> current master: >>>>>> >>>>>> http://lists.freedesktop.org/archives/nouveau/2015-July/021740.html >>>>>> http://lists.freedesktop.org/archives/nouveau/2015-July/021741.html >>>>>> >>>>>> And then adding Option "DRI" "2" to xorg.conf. >>>>> >>>>> >>>>> >>>>> His patches should have defaulted it to DRI 2 I think, so this is >>>>> unnecessary. In fact you should have had to say "DRI" "3" to get DRI3 >>>>> with his patches. >>>>> -- >>>>>> >>>>>> >>>>>> >>>>>> I've also tried disabling EXA using Option "AccelMethod" "none", >>>>>> but that seems to also automatically disable all DRI, leading to >>>>>> software rendering. >>>>>> >>>>>> I discussed this with Ben this morning and he suggested that this >>>>>> is likely a Mesa issue since with DRI3 mesa rather then the ddx >>>>>> allocs the surfaces. I've tried disabling swizzling in the >>>>>> mesa code by forcing nv30_miptree_create() to always take >>>>>> the code path for linear textures, but that leads to the exact >>>>>> same result as before that change. >>>>> >>>>> >>>>> >>>>> Ah yes. Very different problem indeed. I actually suspect it has to do >>>>> with swizzling. Look at the white pattern of the moon -- it's all in a >>>>> line. That means that it expected some locality and instead it got >>>>> drawn all on a line. If it were merely a stride problem, I'd expect to >>>>> see strips of the moon below and offset from one another. >>>>> >>>>> So... take a look at nv30_miptree_from_handle -- I wonder if it can >>>>> now receive swizzled textures where it couldn't before. >>>> >>>> >>>> >>>> Ok, that does go in the direction I am expecting the problem to be, >>>> but I'm afraid I'm going to need a bit more guidance, what exactly >>>> am I looking for in that function / which "knobs" should I try to >>>> vary / play with to maybe fix this ? >>> >>> >>> Unfortunately this is playing near (or past) the limits of my >>> knowledge as well. My understanding is that DRI3 passes pixmaps around >>> with dma-buf, aka "bo_from_handle". DRI2 uses some other mechanism >>> which is not that (I think it just copies stuff around). >>> >>> Now on nv50+, bo's have "tile flags" (and memtype and probably other >>> annoyances). The tile flags indicate the specific tiling mechanism >>> used on that bo (i.e. do you do 32x32 tiles? 32x64? etc). Take a look >>> at the nouveau_bo_new() call in nv50_miptree.c -- note how it takes a >>> "bo config" argument. This bo config can then later be retrieved using >>> some other syscall. >>> >>> However on nv30 there appears to not be any such thing. The >>> nouveau_bo_new call just passes in NULL for creating the bo, which >>> means that there's no way to recover the "are you swizzled" >>> information after-the-fact. >>> >>> Presumably you should create a "nv04" bo config section in the union, >> >> >> That already exists, and indeed gets set by the nouveau_allocate_surface >> function from src/nv_accel_common.c from the ddx, >> >>> and just pass the single "swizzled" bit through. I'm not sure what, if >>> anything, is required on the kernel side for that. I don't think >>> there's any optionality in how the swizzling is done for pre-nv50. >>> >>> Note that in the nv30_miptree logic, mt->swizzled implies that >>> mt->uniform_pitch = 0, but the level pitch is set "properly" (again, >>> see nv30_miptree_create). >>> >>> Hope this sheds some light and doesn't cause you to go in the wrong >>> direction -- please take everything I say with a grain of salt -- I'm >>> often a bit off on some of the details. >> >> >> Thanks this was helpful, I do feel we are getting somewhere, but I do >> need a bit more help. >> >> I've added some debug printf's to nv30_miptree.c, nv30_miptree_create >> and nv30_miptree_from_handle, where the latter is only used when using >> dri2 (e.g. in the working case). >> >> Doing a diff between a log of starting gnome-shell with dri vs dri3 >> results in this: >> >> --- mesa.log.dri2 2015-08-10 14:18:03.182712022 +0200 >> +++ mesa.log.dri3 2015-08-10 14:18:33.233336338 +0200 >> @@ -1,8 +1,8 @@ >> nv30_miptree_create 512x32 uniform_pitch 0 usage 0 flags 0 >> -nv30_miptree_from_handle 1x1 uniform_pitch 1024 usage 0 flags 0 >> +nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 >> nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 >> nv30_miptree_create 512x32 uniform_pitch 0 usage 0 flags 0 >> -nv30_miptree_from_handle 1x1 uniform_pitch 1024 usage 0 flags 0 >> +nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 >> nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 >> nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 >> nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 >> @@ -12,6 +12,7 @@ >> nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0 >> nv30_miptree_create 512x512 uniform_pitch 0 usage 0 flags 0 >> nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0 >> +nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 >> nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 >> nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 >> nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 >> @@ -20,29 +21,24 @@ >> nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 >> nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 >> nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 >> -nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 >> nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 >> nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 >> nv30_miptree_create 1920x1200 uniform_pitch 7680 usage 0 flags 0 >> +nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 >> nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0 >> nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 >> -nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 >> -nv30_miptree_create 18x18 uniform_pitch 64 usage 0 flags 0 >> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 >> +nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 >> nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 >> nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0 >> nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0 >> nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 >> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 >> +nv30_miptree_create 18x18 uniform_pitch 64 usage 0 flags 0 >> +nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 >> nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0 >> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 >> nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 >> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 >> nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 >> nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 >> nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 >> nv30_miptree_create 6x8 uniform_pitch 64 usage 0 flags 0 >> nv30_miptree_create 6x8 uniform_pitch 64 usage 0 flags 0 >> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 >> nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 >> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 >> >> I've also added logging of the surface creation in the ddx, here >> is one such log line: >> >> Surface alloc 1920x1080 at 32 pitch 8192, cfg.nv04.surf_pitch 8192 .surf_flags >> 2 >> >> What stands out to me is: >> >> 1) In the dri3 case there are no nv30_miptree_from_handle calls, these are >> replaced by nv30_miptree_create calls > > I bet it becomes the DDX that consumes these now though? And perhaps > it's not ready to take swizzled textures (i.e. pixmaps)?Correct, it turns out these are scanout buffers, see below for a fix.>> 2) These replacement calls use a different pitch for the 1920x... surfaces, >> 8192 vs 7680 > > Looks like it gets upgraded to a POT value (i.e. 2048 x whatever). > nv30_miptree will create an unswizzled surface for a non-POT-sized > texture, but I guess something somewhere is aligning it up in the DDX? > And *still* creating it as a linear texture? Weird.Correct, both on the aligning and on them still being linear, maybe the scanout buffers always need to be linear on nv3x / nv4x ?>> 3) The surface creation in the ddx passes in cfg.nv04.surf_... when calling >> nouveau_bo_new, where as nv30_miptree_create passes in NULL > > Right. That's the bit I was talking about earlier -- you need to pass > the information that it will be swizzled. Otherwise it's lost in the > nv30_miptree should probably be setting those surf_flags as well on BO > creation. Apparently there are performance implications from setting > these.OK, things do seem to work fine with both dri2 / dri3 without the nv30_miptree code passing these in. But if you think it is useful I can write a patch setting these in the non-linear case, and see what happens.>> 4) nv30_miptree_from_handle always sets uniform_pitch in the nv30_miptree, >> iow >> it always assumes non-swizzled ? > > That's right.Which works because the ddx is always creating linear buffers, right ?>> The thing I would like to investigate first is 2), it seems to me that >> nv30_miptree_create should not be doing pitch alignment itself, so if we >> want to have the same pitch for these surfaces as in the dri2 case, we >> need to fix this in the caller. Which leads to the question where is >> this code being called from in the dri3 case. And this is where I'm >> stuck and could use your or Benjamin's input. So do you know where the >> caller is and/or how to figure out where the caller is? > > Well, a quick look at the DDX, looks like it never swizzles its > textures. See NV40EXAPictTexture for example -- it always puts > TEX_FORMAT_LINEAR in. Why, you ask? No clue, that seems incredibly > inefficient and silly. > > Instead of doing that, you could "guess" whether it's swizzled or not > based on whether the w/h are POT. If both are, then don't put that > linear thing in. It's obviously possible to create POT-sized > non-swizzled textures, but you can see the nv30_miptree logic for when > it does and when it doesn't. I'm only suggesting this for debugging, > not for "real". The real solution to this will be to include the > swizzled-ness in the bo_info ioctl, i.e. stick it somewhere in the > bo_config and have the kernel hold on to it.This assumes that it is useful / possible to use swizzled buffers in combination with the scanout engine on nv3x/nv4x ... Also given the age of the hardware I wonder whether it is worth the trouble to do this. ### Thanks for all the info, I've been working on this myself today, and I've come up with the following fix for the immediate problem at hand: --- a/src/gallium/drivers/nouveau/nv30/nv30_miptree.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_miptree.c @@ -28,6 +28,7 @@ #include "util/u_surface.h" #include "nv_m2mf.xml.h" +#include "nv_object.xml.h" #include "nv30/nv30_screen.h" #include "nv30/nv30_context.h" #include "nv30/nv30_resource.h" @@ -362,6 +363,7 @@ nv30_miptree_create(struct pipe_screen *pscreen, blocksz = util_format_get_blocksize(pt->format); if ((pt->target == PIPE_TEXTURE_RECT) || + (pt->bind & PIPE_BIND_SCANOUT) || !util_is_power_of_two(pt->width0) || !util_is_power_of_two(pt->height0) || !util_is_power_of_two(pt->depth0) || @@ -369,6 +371,14 @@ nv30_miptree_create(struct pipe_screen *pscreen, util_format_is_float(pt->format) || mt->ms_mode) { mt->uniform_pitch = util_format_get_nblocksx(pt->format, w) * blocksz; mt->uniform_pitch = align(mt->uniform_pitch, 64); + if (pt->bind & PIPE_BIND_SCANOUT) { + struct nv30_screen *screen = nv30_screen(pscreen); + /* This assumes a tiled framebuffer */ + int pitch_align = MAX2( + screen->eng3d->oclass >= NV40_3D_CLASS ? 1024 : 256, + /* round_down_pow2(mt->uniform_pitch / 4) */ + 1 << (util_last_bit(mt->uniform_pitch / 4) - 1)); + mt->uniform_pitch = align(mt->uniform_pitch, pitch_align); + } } if (!mt->uniform_pitch) What I believe is happening is that with dri3 the buffer which becomes the buffer scanned-out by the hardware is allocated by a dri3 client, rather then by the x-server/ddx, but the actual video-mode set is still done by the ddx, so these scanout buffers must have a layout matching the one expected by the video-mode as setup by the ddx. This patch seems to fix the immediate problem at hand, it has one problem though, that it assumes that pNv->tiled_layout in the ddx always is true, I see no way to get to pNv->tiled_layout from the mesa code, so I think that it may be best to fix this problem by simply always making pNv->tiled_layout true when dri3 is active in the ddx. Regards, Hans