Stéphane Marchesin
2019-Mar-18 23:31 UTC
[Nouveau] Request for info on a big problems with nouveau driver
On Sat, Mar 16, 2019 at 12:33 PM Mauro Rossi <issor.oruam at gmail.com> wrote:> Hi Stéphane, > > the good news is that Kerol Herbst patches are mitigating effectively > the GPU lockup. > it would really be a pity to loose and abandon nouveau driver in > android-x86, > while intel, radeon and amdgpu are working perfectly. > > The Android GUI reboots always the same way when bringing back main screen, > with home button or using square menu button. > > I've collected log with drm.debug level 63 > to see what is happening prior to EGL-MAIN: DRI2: failed to create > screen/ EGL_NOT_INITIALIZED > > Full log and tombstone in the attachment, > could someone in nouveau team decipher the errors? >That's a question more appropriate for the nouveau list, I am CCing the list. Stéphane> In the logs there is also the DRM ioctl commands happening before the > DRI screen error > > Mauro > > 03-16 18:57:03.615 0 0 E : 00a0 2 base507c_ntfy_set > 03-16 18:57:03.615 0 0 E : 00000060 > 03-16 18:57:03.615 0 0 E : f0000000 > 03-16 18:57:03.615 0 0 E : 0084 1 base907c_image_set > 03-16 18:57:03.615 0 0 E : 00000010 > 03-16 18:57:03.615 0 0 E : 00c0 1 base907c_image_set > 03-16 18:57:03.615 0 0 E : fb0000fe > 03-16 18:57:03.615 0 0 E : 0400 5 base907c_image_set > 03-16 18:57:03.615 0 0 E : 00010000 > 03-16 18:57:03.615 0 0 E : 00000000 > 03-16 18:57:03.615 0 0 E : 04000500 > 03-16 18:57:03.615 0 0 E : 00005004 > 03-16 18:57:03.615 0 0 E : 0000cf00 > 03-16 18:57:03.615 0 0 E : 0080 1 base507c_update > 03-16 18:57:03.615 0 0 E : 00000000 > 03-16 18:57:03.616 2729 4165 W EGL-MAIN: DRI2: failed to create dri > screen > 03-16 18:57:03.616 2729 4165 W EGL-MAIN: DRI2: failed to create screen > 03-16 18:57:03.617 2729 4165 W libEGL : eglInitialize(0xad3ab800) > failed (EGL_NOT_INITIALIZED) > 03-16 18:57:03.617 2729 4165 I system_server: > > android::hardware::configstore::V1_0::ISurfaceFlingerConfigs::hasWideColorDisplay > retrieved: 0 > 03-16 18:57:03.617 2729 4165 I OpenGLRenderer: Initialized EGL, version > 1.4 > 03-16 18:57:03.617 2729 4165 D OpenGLRenderer: Swap behavior 2 > 03-16 18:57:03.617 2729 4165 F OpenGLRenderer: Failed to choose > config, error = EGL_NOT_INITIALIZED > --------- beginning of crash > 03-16 18:57:03.617 2729 4165 F libc : Fatal signal 6 (SIGABRT), > code -6 in tid 4165 (RenderThread), pid 2729 (system_server) > > On Tue, Mar 5, 2019 at 8:55 AM Mauro Rossi <issor.oruam at gmail.com> wrote: > > > > Hi, > > one of the problems (the Play Store Crash) was resolved with following > commit: > > > http://git.osdn.net/view?p=android-x86/frameworks-base.git;a=commit;h=d488a6c2bbedc06fc22942555d0157e7bf09f135 > > > > Now the remaining one, affecting the dEQP-EGL multithreading tests and > > RenderThread in general, > > has been traced in the attached logs. > > > > It seams a problem similar to "a second libEGL call failing" when > > RenderThread is trying to create dri screen > > which is killed by Android attempt to load EGL config which fails and > > it is treated as Fatal. > > We just need to find the root cause of failure. > > > > In the logcat there is a clue of what is happening: > > > > --------- beginning of crash > > 03-04 20:50:56.762 1440 1440 E AndroidRuntime: FATAL EXCEPTION: main > > 03-04 20:50:56.762 1440 1440 E AndroidRuntime: Process: > > com.android.systemui, PID: 1440 > > 03-04 20:50:56.762 1440 1440 E AndroidRuntime: > > java.lang.NullPointerException: Attempt to invoke virtual method > > 'android.graphics.GraphicBuffer > > android.graphics.Bitmap.createGraphicBufferHandle()' on a null object > > reference > > 03-04 20:50:56.762 1440 1440 E AndroidRuntime: at > > > com.android.systemui.recents.views.RecentsTransitionHelper.drawViewIntoGraphicBuffer(RecentsTransitionHelper.java:436) > > > > Mauro > > > > On Tue, Mar 5, 2019 at 1:29 AM Stéphane Marchesin <marcheu at chromium.org> > wrote: > > > > > > > > > > > > On Sat, Mar 2, 2019 at 12:08 AM Mauro Rossi <issor.oruam at gmail.com> > wrote: > > >> > > >> Hi Stéphane, > > >> > > >> On Fri, Mar 1, 2019 at 11:24 PM Stéphane Marchesin < > marcheu at chromium.org> wrote: > > >> > > > >> > > > >> > > > >> > On Fri, Mar 1, 2019 at 4:30 AM Mauro Rossi <issor.oruam at gmail.com> > wrote: > > >> >> > > >> >> Hi Stéphane, > > >> >> > > >> >> thanks for responding > > >> >> > > >> >> On Thu, Feb 28, 2019 at 9:56 PM Stéphane Marchesin < > marcheu at chromium.org> wrote: > > >> >> > > > >> >> > > > >> >> > > > >> >> > On Tue, Feb 19, 2019 at 6:54 PM Tomasz Figa <tfiga at chromium.org> > wrote: > > >> >> >> > > >> >> >> Hi Mauro, > > >> >> >> > > >> >> >> Thanks for your query. I'm not very active in the graphics area > > >> >> >> anymore, but let me add +Stéphane Marchesin , who should know > the > > >> >> >> best. > > >> >> >> > > >> >> >> Best regards, > > >> >> >> Tomasz > > >> >> >> > > >> >> >> On Wed, Feb 20, 2019 at 3:00 AM Mauro Rossi < > issor.oruam at gmail.com> wrote: > > >> >> >> > > > >> >> >> > Hi Tomasz, > > >> >> >> > > > >> >> >> > I wanted to ask some help, even just some information about > how > > >> >> >> > nouveau is working with chromeos minigbm stack, because we > have big > > >> >> >> > issues with drm_gralloc and gbm_gralloc. > > >> >> >> > > > >> >> >> > nouveau gallium driver does not support Multithreading and > oreo-x86 > > >> >> >> > has introduced additional RenderThread scenarios which cause > > >> >> >> > instability. > > >> >> >> > > > >> >> >> > dEQP-EGL multithreding tests are causing GUI restarts, even > with > > >> >> >> > latest Karol Herbst patches with per gl context mutex locking > and per > > >> >> >> > fence mutex locking, > > >> >> >> > he said there is an additional race condition that may > require another > > >> >> >> > major rewrite, > > >> >> >> > but he did not mention which additional race condition. > > >> >> >> > > > >> >> >> > I wanted to ask you just some info, in case you may have > them, or > > >> >> >> > suggestions on how to avoid the problem. > > >> >> >> > > > >> >> >> > 1) Are you aware of problems with chromeos with nouveau MT > and how > > >> >> >> > they were avoided? > > >> >> >> > At the moment I can boot with minigbm, but the navigation bar > and menu > > >> >> >> > bar are trasparent and invisible, so I was not able to check > if > > >> >> >> > minigbm has same problems we have. > > >> >> >> > > > >> >> >> > 2) We are so stuck with nouveau support that I was thinking > to explore > > >> >> >> > another angle, > > >> >> >> > is it possible to disable additional threads in android-x86 > code base for Oreo? > > >> >> >> > Do you have some colleagues that may provide indication on > how to do it? > > >> >> >> > > > >> >> > > > >> >> > > > >> >> > Hi Mauro, > > >> >> > > > >> >> > We don't officially support nouveau on Chrome OS (there are no > devices which use it). The nouveau minigbm driver was written to be able to > develop Chrome for Chrome OS on top of a Linux workstation with an nvidia > GPU. In particular, we have never started Android with that configuration. > > >> >> > > > >> >> > Can you give more details on issue 1, i.e. what is invisible? > Last I looked Chrome was working. Are you certain this is related to > threading? > > >> >> > > > >> >> > Stéphane > > >> >> > > >> >> [minigbm issue] > > >> >> > > >> >> The problem with minigbm was mentioned after trying to exploit > minigbm > > >> >> as it is in Chrome OS stack (which supports running Android > > >> >> applications AFAIK) > > >> >> > > >> >> The stock minigbm was not ready to boot in android-x86, lambdadroid > > >> >> added dma fb support and I added some required formats (RGBA, RGBX, > > >> >> RGB565) > > >> >> to be able to boot: > > >> >> https://github.com/maurossi/minigbm/commits/minigbm_fb > > >> >> > > >> >> Using that version of minigbm with android-x86 (oreo-x86) I see is > > >> >> that Android GUI top bar, bottom menu bar, icons and cursor are > > >> >> invisible/not rendered, > > >> >> even if blind interaction is possible. > > >> >> Maybe I've done something wrong because the drm format selection in > > >> >> minigbm is not as easy to underdestrand as drm_gralloc and > gbm_gralloc > > >> >> ones. > > >> > > > >> > > > >> > Yeah as I said, we never ran any Android with the nouveau minigbm > driver, not ARC++, even less Android, so I don't know. > > >> > > > >> >> > > >> >> > > >> >> The GUI transparency (or missing rendering) with minigbm does not > seem > > >> >> related to multiple threads using same GL context, > > >> >> however the GPU lookups and failure of dEQP-EGL multithreading > tests > > >> >> happening also with drm_gralloc and gbm_gralloc are certainly > related. > > >> >> > > >> >> [MT issues] > > >> >> > > >> >> Since it is already assessed and known that nouveau lacks MT > support > > >> >> as per other mesa drivers i965, radeon, amdgpu > > >> >> and Karol Herbst submitted patches to mesa-dev to bring "per gl > > >> >> context mutex" and "per fence mutex locking" in nouveau, > > >> >> I tried to run CTS dEQP-EGL with mesa GLES/EGL built with those > patches, > > >> >> the result was that dEQP-EGL multithreading tests failed causing > GUI > > >> >> reboots or PC restarts. > > >> >> > > >> > > > >> > I am surprised by that; we have no problem with android on radeon > which uses gallium which would have the same issues. > > >> > > >> We have no problem with radeon too, > > >> but for nouveau there is an history of GPU lockups with android-x86 > as we speak, > > >> Ilia Mirkin confirmed in several different bugzilla tickets that > > >> nouveau does not react well to multiple threads workers on same gl > > >> context. > > > > > > > > > > > > Hmm if you get GPU lockups, yes that's a different problem. > > > > > > > > >> > > >> > > >> Infact with some prototypal mutex locking patches we had a mitigation > > >> for android-x86 releases from lollipop-x86 to nougat-x86 > > >> > > >> Karol Hebst submitted patches to mesa-dev on last december for that > > >> exact same problem, > > >> the patches are not yet up-streamed, so technically the problem is > still there. > > >> > > >> The current Use Case is android-x86, but the first next GUI using > > >> multiple threads will have problems too. > > >> > > >> > > > >> > > > >> >> Having contacted Karol Herbst he told that there may be one > additional > > >> >> race condition, but he did not clarified which one. > > >> >> > > >> >> What about launching dEQP-EGL on platform different from android, > e.g. > > >> >> EGL wayland is that possible to see if the tests also fail on Linux > > >> >> platform? > > >> > > > >> > > > >> > We use the surfaceless/null backend for deqp. We have upstreamed > it, you should be able to use that also. Otherwise I have used the glx > backend successfully as well on my desktop. > > >> > > >> Could it be that in your scenario there is only one thread per gl > > >> context at a time? > > >> > > > > > > In general, most of deqp is one GL context at a time, unless you run > the parallel deqp stuff. So yes it would probably help. Similarly Chrome OS > is running pretty much in a single GPU process, so we wouldn't see that > problem either when running nouveau. > > > > > > > > >> > > >> > > > >> > > > >> >> > > >> >> Are there similar tests in piglit? > > >> > > > >> > > > >> > I'm not aware of any, but I stopped using piglit years ago. > > >> > > > >> >> > > >> >> > > >> >> [Other issue appeared with Android 8 Oreo hardware bitmaps] > > >> >> > > >> >> System UI and Play Store crashes, are happening after successful > > >> >> android-x86 boot with drm_gralloc and gbm_gralloc, > > >> >> these crashes seem to be very much related to this path: > > >> >> CreateHardwareBitmap -> CreateBitmap -> Null Pointer Exception. > > >> >> CreateHardwareBitmap (introduced in Android Oreo), > > >> > > > >> > > > >> > Seems like you are missing dri extensions? > > >> > > >> Checking in the logcat the boot with nouveau has all extensions as per > > >> other drivers, > > >> but it has DRI_IMAGE twice, is that bad? > > >> > > >> 02-02 10:35:37.176 2489 2489 D vndksupport: Loading > > >> /vendor/lib/egl/libGLES_mesa.so from current namespace instead of > > >> sphal namespace. > > >> 02-02 10:35:37.188 2489 2489 D libEGL : loaded > > >> /vendor/lib/egl/libGLES_mesa.so > > >> 02-02 10:35:37.251 2489 2489 D vndksupport: Loading > > >> /vendor/lib/hw/gralloc.gbm.so from current namespace instead of sphal > > >> namespace. > > >> 02-02 10:35:37.253 2489 2489 I EGL-MAIN: found extension DRI_Core > version 2 > > >> 02-02 10:35:37.253 2489 2489 I EGL-MAIN: found extension > > >> DRI_IMAGE_DRIVER version 1 > > >> 02-02 10:35:37.253 2489 2489 I EGL-MAIN: found extension > > >> DRI_ConfigOptions version 2 > > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension > > >> DRI_TexBuffer version 2 > > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension DRI2_Flush > version 4 > > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension DRI_IMAGE > version 17 > > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension DRI_IMAGE > version 17 > > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension > > >> DRI_RENDERER_QUERY version 1 > > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension > > >> DRI_CONFIG_QUERY version 1 > > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension DRI2_Fence > version 2 > > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension > > >> DRI2_Interop version 1 > > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension > DRI_NoError version 1 > > >> > > > > > > Can you put it in gdb and see where the NULL crash is? One can only > intuit about what's going on otherwise. > > > > > > Stéphane > > > > > > > > >> > > >> > > > >> > Stéphane > > >> > > > >> > > > >> >> > > >> >> uses only one copy > > >> >> of bitmap instead of two, are there some restrictions in nouveau > with > > >> >> RGBA/RGBX, BGRA hardware bitmaps? > > >> >> > > >> >> Thanks in advance for any info, suggestions > > >> >> I am available and ready to support testing/verifications to see > the > > >> >> MT and HardwareBitmap issues solved. > > >> >> > > >> >> Mauro > > >> >> > > >> >> > > >> >> > > >> >> > > > >> >> > > > >> >> >> > > >> >> >> > Mauro >-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20190318/566d3eee/attachment-0001.html>
Seemingly Similar Threads
- [Bug 104441] New: drm/nouveau: drm_hwcomposer cannot get rotation property/does not work
- [Mesa-dev] Chromium - Application-level nouveau blacklist
- [PATCH] nouveau/uvmm: fix addr/range calcs for remap operations
- [Bug 91986] New: Artifacts in models rendering with GeForce 6150 (NV4x chipsets)
- [Bug 95095] New: NV46 (G72) Full screen artifacts in Freespace 2 SW OT mod