Fernando Ramos
2021-Oct-02 07:13 UTC
[Nouveau] [Intel-gfx] [PATCH v2 00/17] drm: cleanup: Use DRM_MODESET_LOCK_ALL_* helpers where possible
On 21/10/02 05:30AM, Ville Syrj?l? wrote:> On Sat, Oct 02, 2021 at 01:05:47AM +0300, Ville Syrj?l? wrote: > > On Fri, Oct 01, 2021 at 04:48:15PM -0400, Sean Paul wrote: > > > On Fri, Oct 01, 2021 at 10:00:50PM +0300, Ville Syrj?l? wrote: > > > > On Fri, Oct 01, 2021 at 02:36:55PM -0400, Sean Paul wrote: > > > > > > > > > > Thank you for revising, Fernando! I've pushed the set to drm-misc-next (along > > > > > with the necessary drm-tip conflict resolutions). > > > > > > > > Ugh. Did anyone actually review the locking changes this does? > > > > I shot the previous i915 stuff down because the commit messages > > > > did not address any of it. > > > > > > I reviewed the set on 9/17, I didn't see your feedback on that thread. > > > > It was much earlir than that. > > https://lists.freedesktop.org/archives/dri-devel/2021-June/313193.html > > > > And I think I might have also shot down a similar thing earlier. > > > > I was actually half considering sending a patch to nuke that > > misleading TODO item. I don't think anything which changes > > which locks are taken should be considred a starter level task. > > And the commit messages here don't seem to address any of it. > > And i915 is now broken :( > > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10680/fi-bwr-2160/boot.htmlI completely overlooked the side effects of not having a global context anymore. Sorry for all the trouble. Sean, could you revert the whole patch series? I'll have a deeper look into the patch set and come up with a v3 where all these issues will be addressed. Thanks and sorry once again for the extra overhead this might have caused.
Fernando Ramos
2021-Oct-02 17:28 UTC
[Nouveau] [Intel-gfx] [PATCH v2 00/17] drm: cleanup: Use DRM_MODESET_LOCK_ALL_* helpers where possible
On 21/10/02 09:13AM, Fernando Ramos wrote:> On 21/10/02 05:30AM, Ville Syrj?l? wrote: > > On Sat, Oct 02, 2021 at 01:05:47AM +0300, Ville Syrj?l? wrote: > > > On Fri, Oct 01, 2021 at 04:48:15PM -0400, Sean Paul wrote: > > > > On Fri, Oct 01, 2021 at 10:00:50PM +0300, Ville Syrj?l? wrote: > > > > > On Fri, Oct 01, 2021 at 02:36:55PM -0400, Sean Paul wrote: > > > > > > > > > > > > Thank you for revising, Fernando! I've pushed the set to drm-misc-next (along > > > > > > with the necessary drm-tip conflict resolutions). > > > > > > > > > > Ugh. Did anyone actually review the locking changes this does? > > > > > I shot the previous i915 stuff down because the commit messages > > > > > did not address any of it. > > > > > > > > I reviewed the set on 9/17, I didn't see your feedback on that thread. > > > > > > It was much earlir than that. > > > https://lists.freedesktop.org/archives/dri-devel/2021-June/313193.htmlSorry, I'm new to this and it did not occur to me to search for similar patches in the mailing list archives in case there were additional comments that applied to my change set. In case I had done that I would have found that, as you mentioned, you had already raised two issues back in June: On Tue, Jun 29, 2021, Ville Syrj?l? wrote: > > That looks wrong. You're using a private ctx here, but still > passing dev->mode_config.acquire_ctx to the lower level stuff. > > Also DRM_MODESET_LOCK_ALL_{BEGIN,END}() do not seem to be > equivalent to drm_modeset_{lock,unlock}_all() when it comes to > mode_config.mutex. So would need a proper review whether we > actually need that lock or not. The first one was pointing out the same error I would later repeat in my patch series (ups). After further inspection of the code it looks to me that changing this: intel_modeset_setup_hw_state(dev, dev->mode_config.acquire_ctx); ...into this: intel_modeset_setup_hw_state(dev, &ctx); ...would be enough. Why? The only difference between the old drm_modeset_{lock,unlock}_all() functions and the new DRM_MODESET_LOCK_ALL_{BEGIN,END}() macros is that the former use a global context stored in dev->mode_config.acquire_ctx while the latter depend on a user provided one (typically in the stack). In the old (working) code the global context structure is freed in drm_modeset_unlock_all() thus we are sure no one is holding a reference to it at that point. This means that as long as no one accesses the global dev->mode_config.acquire_ctx context in the block that runs between lock/BEGIN and unlock/END, the code should be equivalent before and after my changes. In fact, now that my patch series removes the drm_modeset_{lock,unlock}_all() functions, the acquire_ctx field of the drm_mode_config structure should be deleted: /** * @acquire_ctx: * * Global implicit acquire context used by atomic drivers for legacy * IOCTLs. Deprecated, since implicit locking contexts make it * impossible to use driver-private &struct drm_modeset_lock. Users of * this must hold @mutex. */ struct drm_modeset_acquire_ctx *acquire_ctx; If I had done that (ie. removing this field) I would have detected the problem when compiling. There is another place (in the amdgpu driver) where this field is still being referenced, but before I investigate that I would like to know if you agree that this is a good path to follow. Regarding the second issue you raised... > Also DRM_MODESET_LOCK_ALL_{BEGIN,END}() do not seem to be > equivalent to drm_modeset_{lock,unlock}_all() when it comes to > mode_config.mutex. So would need a proper review whether we > actually need that lock or not. ...the only difference regarding mode_config.mutex I see is that in the new macros the mutex is locked only under this condition: if (!drm_drv_uses_atomic_modeset(dev)) ...which seems reasonable, right? Is this what you were referring to or is it something else? Please let me know what you think. Thanks!
Fernando Ramos
2021-Oct-02 22:32 UTC
[Nouveau] [Intel-gfx] [PATCH v2 00/17] drm: cleanup: Use DRM_MODESET_LOCK_ALL_* helpers where possible
On 21/10/02 09:13AM, Fernando Ramos wrote:> > Sean, could you revert the whole patch series? I'll have a deeper look into the > patch set and come up with a v3 where all these issues will be addressed. >Hi Sean, I now understand the nature of the issue that caused the problem with i915 and have proceed to remove the global context structure (which revealed a similar issue in the amdgpu driver). I have prepared a V3 version of the patch set where these issues should hopefully be fixed for both the i915 and amdgpu drivers. In order to prevent causing more disruption, could you tell me what the proper way to proceed would be? In particular: 1. Is there any place where I can push my changes so that they are tested on a i915 machine? (Some type of automated pool) 2. I can test the amdgpu driver on my machine but, what about all the other architectures? What is the standard procedure? Should I simply publish V3 and wait for feedback from the different vendors? (I would hate to cause a simular situation again) 3. Should I post V3 on top of drm-next or drm-misc-next? Thanks for your patience :)