Danilo Krummrich
2023-Oct-04 12:57 UTC
[Nouveau] [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects
On 10/3/23 11:11, Thomas Hellstr?m wrote: <snip>>>> + >>> +/** >>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to / from the &drm_gpuvms >>> + * evicted list >>> + * @vm_bo: the &drm_gpuvm_bo to add or remove >>> + * @evict: indicates whether the object is evicted >>> + * >>> + * Adds a &drm_gpuvm_bo to or removes it from the &drm_gpuvms evicted list. >>> + */ >>> +void >>> +drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict) >>> +{ >>> +??? struct drm_gem_object *obj = vm_bo->obj; >>> + >>> +??? dma_resv_assert_held(obj->resv); >>> + >>> +??? /* Always lock list transactions, even if DRM_GPUVM_RESV_PROTECTED is >>> +???? * set. This is required to protect multiple concurrent calls to >>> +???? * drm_gpuvm_bo_evict() with BOs with different dma_resv. >>> +???? */ >> >> This doesn't work. The RESV_PROTECTED case requires the evicted flag we discussed before. The list is either protected by the spinlock or the resv. Otherwise a list add could race with a list removal elsewhere.I think it does unless I miss something, but it might be a bit subtle though. Concurrent drm_gpuvm_bo_evict() are protected by the spinlock. Additionally, when drm_gpuvm_bo_evict() is called we hold the dma-resv of the corresponding GEM object. In drm_gpuvm_validate() I assert that we hold *all* dma-resv, which implies that no one can call drm_gpuvm_bo_evict() on any of the VM's objects and no one can add a new one and directly call drm_gpuvm_bo_evict() on it either.>> >> Thanks, >> >> Thomas >> >> >
Thomas Hellström
2023-Oct-04 15:29 UTC
[Nouveau] [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects
On Wed, 2023-10-04 at 14:57 +0200, Danilo Krummrich wrote:> On 10/3/23 11:11, Thomas Hellstr?m wrote: > > <snip> > > > > > + > > > > +/** > > > > + * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to / > > > > from the &drm_gpuvms > > > > + * evicted list > > > > + * @vm_bo: the &drm_gpuvm_bo to add or remove > > > > + * @evict: indicates whether the object is evicted > > > > + * > > > > + * Adds a &drm_gpuvm_bo to or removes it from the &drm_gpuvms > > > > evicted list. > > > > + */ > > > > +void > > > > +drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict) > > > > +{ > > > > +??? struct drm_gem_object *obj = vm_bo->obj; > > > > + > > > > +??? dma_resv_assert_held(obj->resv); > > > > + > > > > +??? /* Always lock list transactions, even if > > > > DRM_GPUVM_RESV_PROTECTED is > > > > +???? * set. This is required to protect multiple concurrent > > > > calls to > > > > +???? * drm_gpuvm_bo_evict() with BOs with different dma_resv. > > > > +???? */ > > > > > > This doesn't work. The RESV_PROTECTED case requires the evicted > > > flag we discussed before. The list is either protected by the > > > spinlock or the resv. Otherwise a list add could race with a list > > > removal elsewhere. > > I think it does unless I miss something, but it might be a bit subtle > though. > > Concurrent drm_gpuvm_bo_evict() are protected by the spinlock. > Additionally, when > drm_gpuvm_bo_evict() is called we hold the dma-resv of the > corresponding GEM object. > > In drm_gpuvm_validate() I assert that we hold *all* dma-resv, which > implies that no > one can call drm_gpuvm_bo_evict() on any of the VM's objects and no > one can add a new > one and directly call drm_gpuvm_bo_evict() on it either.But translated into how the data (the list in this case) is protected it becomes "Either the spinlock and the bo resv of a single list item OR the bo resvs of all bos that can potentially be on the list", while this is certainly possible to assert, any new / future code that manipulates the evict list will probably get this wrong and as a result the code becomes pretty fragile. I think drm_gpuvm_bo_destroy() already gets it wrong in that it, while holding a single resv, doesn't take the spinlock. So I think that needs fixing, and if keeping that protection I think it needs to be documented with the list member and ideally an assert. But also note that lockdep_assert_held will typically give false true for dma_resv locks; as long as the first dma_resv lock locked in a drm_exec sequence remains locked, lockdep thinks *all* dma_resv locks are held. (or something along those lines), so the resv lockdep asserts are currently pretty useless. /Thomas> > > > > > > Thanks, > > > > > > Thomas > > > > > > > > >