Danilo Krummrich
2023-Oct-04 17:17 UTC
[Nouveau] [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects
On 10/4/23 17:29, Thomas Hellstr?m wrote:> > On Wed, 2023-10-04 at 14:57 +0200, Danilo Krummrich wrote: >> On 10/3/23 11:11, Thomas Hellstr?m wrote: >> >> <snip> >> >>>>> + >>>>> +/** >>>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to / >>>>> from the &drm_gpuvms >>>>> + * evicted list >>>>> + * @vm_bo: the &drm_gpuvm_bo to add or remove >>>>> + * @evict: indicates whether the object is evicted >>>>> + * >>>>> + * Adds a &drm_gpuvm_bo to or removes it from the &drm_gpuvms >>>>> evicted list. >>>>> + */ >>>>> +void >>>>> +drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict) >>>>> +{ >>>>> +??? struct drm_gem_object *obj = vm_bo->obj; >>>>> + >>>>> +??? dma_resv_assert_held(obj->resv); >>>>> + >>>>> +??? /* Always lock list transactions, even if >>>>> DRM_GPUVM_RESV_PROTECTED is >>>>> +???? * set. This is required to protect multiple concurrent >>>>> calls to >>>>> +???? * drm_gpuvm_bo_evict() with BOs with different dma_resv. >>>>> +???? */ >>>> >>>> This doesn't work. The RESV_PROTECTED case requires the evicted >>>> flag we discussed before. The list is either protected by the >>>> spinlock or the resv. Otherwise a list add could race with a list >>>> removal elsewhere. >> >> I think it does unless I miss something, but it might be a bit subtle >> though. >> >> Concurrent drm_gpuvm_bo_evict() are protected by the spinlock. >> Additionally, when >> drm_gpuvm_bo_evict() is called we hold the dma-resv of the >> corresponding GEM object. >> >> In drm_gpuvm_validate() I assert that we hold *all* dma-resv, which >> implies that no >> one can call drm_gpuvm_bo_evict() on any of the VM's objects and no >> one can add a new >> one and directly call drm_gpuvm_bo_evict() on it either. > > But translated into how the data (the list in this case) is protected > it becomes > > "Either the spinlock and the bo resv of a single list item OR the bo > resvs of all bos that can potentially be on the list", > > while this is certainly possible to assert, any new / future code that > manipulates the evict list will probably get this wrong and as a result > the code becomes pretty fragile. I think drm_gpuvm_bo_destroy() already > gets it wrong in that it, while holding a single resv, doesn't take the > spinlock.That's true and I don't like it either. Unfortunately, with the dma-resv locking scheme we can't really protect the evict list without the drm_gpuvm_bo::evicted trick properly. But as pointed out in my other reply, I'm a bit worried about the drm_gpuvm_bo::evicted trick being too restrictive, but maybe it's fine doing it in the RESV_PROTECTED case.> > So I think that needs fixing, and if keeping that protection I think it > needs to be documented with the list member and ideally an assert. But > also note that lockdep_assert_held will typically give false true for > dma_resv locks; as long as the first dma_resv lock locked in a drm_exec > sequence remains locked, lockdep thinks *all* dma_resv locks are held. > (or something along those lines), so the resv lockdep asserts are > currently pretty useless. > > /Thomas > > > >> >>>> >>>> Thanks, >>>> >>>> Thomas >>>> >>>> >>> >> >
Thomas Hellström
2023-Oct-04 17:57 UTC
[Nouveau] [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects
On Wed, 2023-10-04 at 19:17 +0200, Danilo Krummrich wrote:> On 10/4/23 17:29, Thomas Hellstr?m wrote: > > > > On Wed, 2023-10-04 at 14:57 +0200, Danilo Krummrich wrote: > > > On 10/3/23 11:11, Thomas Hellstr?m wrote: > > > > > > <snip> > > > > > > > > > + > > > > > > +/** > > > > > > + * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to > > > > > > / > > > > > > from the &drm_gpuvms > > > > > > + * evicted list > > > > > > + * @vm_bo: the &drm_gpuvm_bo to add or remove > > > > > > + * @evict: indicates whether the object is evicted > > > > > > + * > > > > > > + * Adds a &drm_gpuvm_bo to or removes it from the > > > > > > &drm_gpuvms > > > > > > evicted list. > > > > > > + */ > > > > > > +void > > > > > > +drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict) > > > > > > +{ > > > > > > +??? struct drm_gem_object *obj = vm_bo->obj; > > > > > > + > > > > > > +??? dma_resv_assert_held(obj->resv); > > > > > > + > > > > > > +??? /* Always lock list transactions, even if > > > > > > DRM_GPUVM_RESV_PROTECTED is > > > > > > +???? * set. This is required to protect multiple > > > > > > concurrent > > > > > > calls to > > > > > > +???? * drm_gpuvm_bo_evict() with BOs with different > > > > > > dma_resv. > > > > > > +???? */ > > > > > > > > > > This doesn't work. The RESV_PROTECTED case requires the > > > > > evicted > > > > > flag we discussed before. The list is either protected by the > > > > > spinlock or the resv. Otherwise a list add could race with a > > > > > list > > > > > removal elsewhere. > > > > > > I think it does unless I miss something, but it might be a bit > > > subtle > > > though. > > > > > > Concurrent drm_gpuvm_bo_evict() are protected by the spinlock. > > > Additionally, when > > > drm_gpuvm_bo_evict() is called we hold the dma-resv of the > > > corresponding GEM object. > > > > > > In drm_gpuvm_validate() I assert that we hold *all* dma-resv, > > > which > > > implies that no > > > one can call drm_gpuvm_bo_evict() on any of the VM's objects and > > > no > > > one can add a new > > > one and directly call drm_gpuvm_bo_evict() on it either. > > > > But translated into how the data (the list in this case) is > > protected > > it becomes > > > > "Either the spinlock and the bo resv of a single list item OR the > > bo > > resvs of all bos that can potentially be on the list", > > > > while this is certainly possible to assert, any new / future code > > that > > manipulates the evict list will probably get this wrong and as a > > result > > the code becomes pretty fragile. I think drm_gpuvm_bo_destroy() > > already > > gets it wrong in that it, while holding a single resv, doesn't take > > the > > spinlock. > > That's true and I don't like it either. Unfortunately, with the dma- > resv > locking scheme we can't really protect the evict list without the > drm_gpuvm_bo::evicted trick properly. > > But as pointed out in my other reply, I'm a bit worried about the > drm_gpuvm_bo::evicted trick being too restrictive, but maybe it's > fine > doing it in the RESV_PROTECTED case.Ah, indeed. I misread that as discussing the current code rather than the drm_gpuvm_bo::evicted trick. If validating only a subset, or a range, then with the drm_gpuvm_bo::evicted trick would be valid only for that subset. But the current code would break because the condition of locking "the resvs of all bos that can potentially be on the list" doesn't hold anymore, and you'd get list corruption. What *would* work, though, is the solution currently in xe, The original evict list, and a staging evict list whose items are copied over on validation. The staging evict list being protected by the spinlock, the original evict list by the resv, and they'd use separate list heads in the drm_gpuvm_bo, but that is yet another complication. But I think if this becomes an issue, those VMs (perhaps OpenGL UMD VMs) only wanting to validate a subset, would simply initially rely on the current non-RESV solution. It looks like it's only a matter of flipping the flag on a per-vm basis. /Thomas> > > > > So I think that needs fixing, and if keeping that protection I > > think it > > needs to be documented with the list member and ideally an assert. > > But > > also note that lockdep_assert_held will typically give false true > > for > > dma_resv locks; as long as the first dma_resv lock locked in a > > drm_exec > > sequence? remains locked, lockdep thinks *all* dma_resv locks are > > held. > > (or something along those lines), so the resv lockdep asserts are > > currently pretty useless. > > > > /Thomas > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > >