Daniel Vetter
2022-Apr-27 14:50 UTC
[PATCH v4 10/15] drm/shmem-helper: Take reservation lock instead of drm_gem_shmem locks
On Mon, Apr 18, 2022 at 10:18:54PM +0300, Dmitry Osipenko wrote:> Hello, > > On 4/18/22 21:38, Thomas Zimmermann wrote: > > Hi > > > > Am 18.04.22 um 00:37 schrieb Dmitry Osipenko: > >> Replace drm_gem_shmem locks with the reservation lock to make GEM > >> lockings more consistent. > >> > >> Previously drm_gem_shmem_vmap() and drm_gem_shmem_get_pages() were > >> protected by separate locks, now it's the same lock, but it doesn't > >> make any difference for the current GEM SHMEM users. Only Panfrost > >> and Lima drivers use vmap() and they do it in the slow code paths, > >> hence there was no practical justification for the usage of separate > >> lock in the vmap(). > >> > >> Suggested-by: Daniel Vetter <daniel at ffwll.ch> > >> Signed-off-by: Dmitry Osipenko <dmitry.osipenko at collabora.com> > >> --- > ... > >> ? @@ -310,7 +306,7 @@ static int drm_gem_shmem_vmap_locked(struct > >> drm_gem_shmem_object *shmem, > >> ????? } else { > >> ????????? pgprot_t prot = PAGE_KERNEL; > >> ? -??????? ret = drm_gem_shmem_get_pages(shmem); > >> +??????? ret = drm_gem_shmem_get_pages_locked(shmem); > >> ????????? if (ret) > >> ????????????? goto err_zero_use; > >> ? @@ -360,11 +356,11 @@ int drm_gem_shmem_vmap(struct > >> drm_gem_shmem_object *shmem, > >> ? { > >> ????? int ret; > >> ? -??? ret = mutex_lock_interruptible(&shmem->vmap_lock); > >> +??? ret = dma_resv_lock_interruptible(shmem->base.resv, NULL); > >> ????? if (ret) > >> ????????? return ret; > >> ????? ret = drm_gem_shmem_vmap_locked(shmem, map); > > > > Within drm_gem_shmem_vmap_locked(), there's a call to dma_buf_vmap() for > > imported pages. If the exporter side also holds/acquires the same > > reservation lock as our object, the whole thing can deadlock. We cannot > > move dma_buf_vmap() out of the CS, because we still need to increment > > the reference counter. I honestly don't know how to easily fix this > > problem. There's a TODO item about replacing these locks at [1]. As > > Daniel suggested this patch, we should talk to him about the issue. > > > > Best regards > > Thomas > > > > [1] > > https://www.kernel.org/doc/html/latest/gpu/todo.html#move-buffer-object-locking-to-dma-resv-lock > > Indeed, good catch! Perhaps we could simply use a separate lock for the > vmapping of the *imported* GEMs? The vmap_use_count is used only by > vmap/vunmap, so it doesn't matter which lock is used by these functions > in the case of imported GEMs since we only need to protect the > vmap_use_count.Apologies for the late reply, I'm flooded. I discussed this with Daniel Stone last week in a chat, roughly what we need to do is: 1. Pick a function from shmem helpers. 2. Go through all drivers that call this, and make sure that we acquire dma_resv_lock in the top level driver entry point for this. 3. Once all driver code paths are converted, add a dma_resv_assert_held() call to that function to make sure you have it all correctly. 4. Repeate 1-3 until all shmem helper functions are converted over. 5. Ditch the 3 different shmem helper locks. The trouble is that I forgot that vmap is a thing, so that needs more work. I think there's two approaches here: - Do the vmap at import time. This is the trick we used to untangle the dma_resv_lock issues around dma_buf_attachment_map() - Change the dma_buf_vmap rules that callers must hold the dma_resv_lock. - Maybe also do what you suggest and keep a separate lock for this, but the fundamental issue is that this doesn't really work - if you share buffers both ways with two drivers using shmem helpers, then the ordering of this vmap_count_mutex vs dma_resv_lock is inconsistent and you can get some nice deadlocks. So not a great approach (and also the reason why we really need to get everyone to move towards dma_resv_lock as _the_ buffer object lock, since otherwise we'll never get a consistent lock nesting hierarchy). The trouble here is that trying to be clever and doing the conversion just in shmem helpers wont work, because there's a lot of cases where the drivers are all kinds of inconsistent with their locking. Adding Daniel S, also maybe for questions it'd be fastest to chat on irc? -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch