thr3ads.net - Nouveau - [Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation [Aug 2023]

If this information is useful, please help other people find it:
Share via:

Danilo Krummrich

2023-Aug-31 11:18 UTC

[Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

On Thu, Aug 31, 2023 at 11:04:06AM +0200, Thomas Hellstr?m (Intel)
wrote:> Hi!
> 
> On 8/30/23 17:00, Danilo Krummrich wrote:
> > On Wed, Aug 30, 2023 at 03:42:08PM +0200, Thomas Hellstr?m (Intel)
wrote:
> > > On 8/30/23 14:49, Danilo Krummrich wrote:
> > > > Hi Thomas,
> > > > 
> > > > thanks for having a look!
> > > > 
> > > > On Wed, Aug 30, 2023 at 09:27:45AM +0200, Thomas Hellstr?m
(Intel) wrote:
> > > > > Hi, Danilo.
> > > > > 
> > > > > Some quick comments since I'm doing some Xe work in
this area. Will probably
> > > > > get back with more.
> > > > > 
> > > > > On 8/20/23 23:53, Danilo Krummrich wrote:
<snip>
> > > > > > diff --git a/include/drm/drm_gpuva_mgr.h
b/include/drm/drm_gpuva_mgr.h
> > > > > > index ed8d50200cc3..693e2da3f425 100644
> > > > > > --- a/include/drm/drm_gpuva_mgr.h
> > > > > > +++ b/include/drm/drm_gpuva_mgr.h
> > > > > > @@ -26,12 +26,16 @@
> > > > > >      */
> > > > > >     #include <linux/list.h>
> > > > > > +#include <linux/dma-resv.h>
> > > > > > +#include <linux/maple_tree.h>
> > > > > >     #include <linux/rbtree.h>
> > > > > >     #include <linux/types.h>
> > > > > >     #include <drm/drm_gem.h>
> > > > > > +#include <drm/drm_exec.h>
> > > > > >     struct drm_gpuva_manager;
> > > > > > +struct drm_gpuva_gem;
> > > > > >     struct drm_gpuva_fn_ops;
> > > > > >     /**
> > > > > > @@ -140,7 +144,7 @@ struct drm_gpuva {
> > > > > >     int drm_gpuva_insert(struct drm_gpuva_manager
*mgr, struct drm_gpuva *va);
> > > > > >     void drm_gpuva_remove(struct drm_gpuva *va);
> > > > > > -void drm_gpuva_link(struct drm_gpuva *va);
> > > > > > +void drm_gpuva_link(struct drm_gpuva *va, struct
drm_gpuva_gem *vm_bo);
> > > > > >     void drm_gpuva_unlink(struct drm_gpuva *va);
> > > > > >     struct drm_gpuva *drm_gpuva_find(struct
drm_gpuva_manager *mgr,
> > > > > > @@ -240,15 +244,137 @@ struct drm_gpuva_manager {
> > > > > >     	 * @ops: &drm_gpuva_fn_ops providing the
split/merge steps to drivers
> > > > > >     	 */
> > > > > >     	const struct drm_gpuva_fn_ops *ops;
> > > > > > +
> > > > > > +	/**
> > > > > > +	 * @d_obj: Dummy GEM object; used internally to
pass the GPU VMs
> > > > > > +	 * dma-resv to &drm_exec.
> > > > > > +	 */
> > > > > > +	struct drm_gem_object d_obj;
> > > > > > +
> > > > > > +	/**
> > > > > > +	 * @resv: the &dma_resv for
&drm_gem_objects mapped in this GPU VA
> > > > > > +	 * space
> > > > > > +	 */
> > > > > > +	struct dma_resv *resv;
> > > > > > +
> > > > > > +	/**
> > > > > > +	 * @exec: the &drm_exec helper to lock
external &drm_gem_objects
> > > > > > +	 */
> > > > > > +	struct drm_exec exec;
> > > > > > +
> > > > > > +	/**
> > > > > > +	 * @mt_ext: &maple_tree storing external
&drm_gem_objects
> > > > > > +	 */
> > > > > > +	struct maple_tree mt_ext;
> > > > > Why are you using a maple tree here? Insertion and
removal is O(log(n))
> > > > > instead of O(1) for a list?
> > > > > 
> > > > Having a list of drm_gem_objects directly wouldn't work,
as multiple GPU-VMs
> > > > could have mappings of the same extobj.
> > > > 
> > > > I considered using the VM_BO abstraction (struct
drm_gpuva_gem) as list entry
> > > > instead, which also seems to be the obvious choice. However,
there is a locking
> > > > conflict.
> > > > 
> > > > A drm_gem_object keeps a list of drm_gpuva_gems, while each
drm_gpuva_gem keeps
> > > > a list of drm_gpuvas. Both lists are either protected with
the dma-resv lock of
> > > > the corresponding drm_gem_object, or with an external lock
provided by the
> > > > driver (see drm_gem_gpuva_set_lock()). The latter is used by
drivers performing
> > > > changes on the GPUVA space directly from the fence
signalling path.
> > > > 
> > > > Now, similar to what drm_gpuva_link() and drm_gpuva_unlink()
are doing already,
> > > > we'd want to add a drm_gpuva_gem to the extobj list for
the first mapping being
> > > > linked and we'd want to remove it for the last one being
unlinked.
> > > > 
> > > > (Actually we'd want to add the drm_gpuva_gem object to
the extobj list even
> > > > before, because otherwise we'd not acquire it's
dma-resv lock of this GEM object
> > > > through drm_gpuva_manager_lock(). But that's trival, we
could do that when we
> > > > create the drm_gpuva_gem, which we need to do anyways.)
> > > > 
> > > > Anyway, we'd probably want to keep removing the
drm_gpuva_gem from the extobj
> > > > list from drm_gpuva_unlink() when the last mapping of this
BO is unlinked. In
> > > > order to do so, we'd (as discussed above) either need to
hold the outer GPU-VM
> > > > lock or the GPU-VMs dma-resv lock. Both would be illegal in
the case
> > > > drm_gpuva_unlink() is called from within the fence
signalling path. For drivers
> > > > like XE or Nouveau, we'd at least need to make sure to
not mess up the locking
> > > > hierarchy of GPU-VM lock and dma-resv lock of the
corresponding BO.
> > > > 
> > > > Considering all that, I thought it's probably better to
track extobjs separate
> > > > from the drm_gpuva_gem, hence the maple tree choice.
> > > Hm. OK, in Xe we're having a list of the xe_vmas (drm_gpuvas)
that point to
> > > external objects, or in the case of multiple mappings to the same
gem
> > > object, only one of the drm_gpuvas is in the list. These are
protected by
> > > the GPU-VM lock. I don't see a problem with removing those
from the fence
> > > signalling path, though?
> > I intentionally tried to avoid keeping a list of drm_gpuvas to track
extobjs,
> > since this is generic code I don't know how much mappings of an
external object
> > the corresponding driver potentially creates. This could become a
pretty large
> > list to iterate. Another reason was, that I want to keep the drm_gpuva
structure
> > as small as possible, hence avoiding another list_head.
> 
> Yes, the list might be pretty large, but OTOH you never iterate to access a
> single list element. When you need to iterate the whole list you need to do
> that regardless of the data structure used. As for the list head, it might
> perhaps be aliased (union) with an upcoming userptr list head?
> 
Oh, I did not mean that I'm concerned about the size of a list of extobjs in
general, that would indeed be the same for every data structure chosen. But I
would be concerned about keeping a list of *all* mappings being backed by an
extobj.
> > 
> > Now, it sounds like in XE you're doing some kind of optimization
just keeping a
> > single mapping of an extobj in the list? How do you know when to
remove it? What
> > if the mapping from the extobj list gets unmapped, but there is still
another
> > one left in the GPU-VM being backed by the same BO?
> When removing from the lists, we iterate through the object's list of
vmas,
> and if there is one matching the same vm, we replace the old one with the
> new one. A similar iteration is done when adding to avoid adding one that
is
> already on the list.
I see, but wouldn't this be O(n) on insertion and O(m) on removal of an
extobj,
while using the maple tree is O(log(n))?
> > Although assuming that's a no-go for GPUVA wouldn't an XArray
be a better
> > choice, keeping O(1)?
> > When tracking extobjs, the address of the drm_gem_object is the key
while the
> > reference count is the value. I was thinking of an XArray as well, but
I was
> > worried that the corresponding indices could be too much distributed
for an
> > XArray to still be efficient. Now that I think about it, it's
probably not that
> > bad.
> > 
> > Btw., while I agree trying to make things as efficient as possible,
what is the
> > magnitue for extobjs to be tracked, do we need to worry about the
O(log(n))?
> 
> Not sure yet, TBH, but I think one of our UMDs can only use external
object,
> because they don't know at creation time which ones need exporting.
However
> if this turns out to be too bad, there are various flavours of "clever
but
> complicated" optimizations that we could think of to reduce the list
size.
> Still in our case, we opted for the vma list head for now.
Considering the above, I would guess that if your current approach is good
enough, a maple tree will work as well.

Otherwise, if you want, I could do some experiments with Xarray and see how
that works out compared to using a maple tree.

Btw. another nice thing about using Xarray or maple tree for that is that
drivers updating the VA space from the fence signalling path don't need to
hold a GPU-VM lock to update the extobj list. Actually, they might not need
a GPU-VM lock at all.
> 
> /Thomas
> 
> 
> > 
> > > > > > +
> > > > > > +	/**
> > > > > > +	 * @evict: structure holding the evict list and
evict list lock
> > > > > > +	 */
> > > > > > +	struct {
> > > > > > +		/**
> > > > > > +		 * @list: &list_head storing
&drm_gem_objects currently being
> > > > > > +		 * evicted
> > > > > > +		 */
> > > > > > +		struct list_head list;
> > > > > > +
> > > > > > +		/**
> > > > > > +		 * @lock: spinlock to protect the evict list
against concurrent
> > > > > > +		 * insertion / removal of different
&drm_gpuva_gems
> > > > > > +		 */
> > > > > > +		spinlock_t lock;
> > > > > > +	} evict;
> > > > > >     };
> > > > > >     void drm_gpuva_manager_init(struct
drm_gpuva_manager *mgr,
> > > > > > +			    struct drm_device *drm,
> > > > > >     			    const char *name,
> > > > > >     			    u64 start_offset, u64 range,
> > > > > >     			    u64 reserve_offset, u64 reserve_range,
> > > > > >     			    const struct drm_gpuva_fn_ops *ops);
> > > > > >     void drm_gpuva_manager_destroy(struct
drm_gpuva_manager *mgr);
> > > > > > +/**
> > > > > > + * DRM_GPUVA_EXEC - returns the
&drm_gpuva_managers &drm_exec instance
> > > > > > + * @mgr: the &drm_gpuva_managers to return
the &drm_exec instance for
> > > > > > + */
> > > > > > +#define DRM_GPUVA_EXEC(mgr)	&(mgr)->exec
> > > > > A struct ww_acquire_ctx and thus a drm_exec is
fundamentally per task and
> > > > > should typically be allocated on the stack. Otherwise
you'd need to protect
> > > > > the mgr->exec member with an exclusive lock
throughout the locking process,
> > > > > and that's not what we want.
> > > > Oh, good point. I think it works in Nouveau, because there
it's implicitly
> > > > protected with the job submission lock.
> > > > 
> > > > > Did you consider subclassing a drm_exec for drm_gpuva
purposes and add
> > > > > needed ops to it: Like so:
> > > > That's a good idea, will take this into V2.
> > > Actually, I'm not fully sure that was a good idea: I've
now have a working
> > > version of Xe ported over to drm_exec, having these helpers in
mind and with
> > > the intention to start using them as they mature. What I found,
though is
> > > that open-coding the drm_exec loop is not all that bad, but that
building
> > > blocks that can be called from within the loop are useful:
> > > 
> > > Like the drm_gpuva_prepare_objects() and an imaginary
> > > drm_gpuva_prepare_gpuva() that locks the vm resv and the resv of
the object
> > > (if different and the gpuva points to the object. And
> > > drm_gpuva_prepare_array() although we don't use it within Xe.
That means you
> > > can use these building blocks like helpers and avoid the fn()
callback by
> > > instead open-coding.
> > > 
> > > But I guess YMMV.
> > That's exactly why those building blocks are exported, I already
had in mind
> > that there might be drivers which still want to open-code the drm_exec
loop,
> > while others might just want a simple interface to lock everything.
> > 
> > I still think it is a good idea, but I'd keep that as simple as
possible. And
> > for everything else just let the driver open-code it and use the
"building
> > blocks" - will also expand the bulding blocks to what you
mentioned above.
> > 
> > > > > struct drm_gpuva_exec_ops {
> > > > >   ??? int (*fn) (struct drm_gpuva_exec *exec, int
num_fences);
> > > > Is this the fn argument from drm_gpuva_manager_lock_extra()?
> > > > 
> > > > >   ??? int (*bo_validate) (struct drm_gpuva_exec *exec,
struct drm_gem_object
> > > > > *obj);
> > > > I guess we could also keep that within the drm_gpuva_fn_ops?
This should always
> > > > be the same callback, right?
> > > > 
> > > > > };
> > > > > 
> > > > > struct drm_gpuva_exec {
> > > > >   ??? const struct drm_gpuva_exec_ops *ops;
> > > > >   ??? struct drm_exec exec;
> > > > >   ??? struct drm_gpuva_manager *mgr;
> > > > > };
> > > > > 
> > > > > Although I'd actually expect bo_validate to be part
of fn in the typical
> > > > > case. The drm_gpuva_exec would then be allocated by the
caller on the stack.
> > > > This doesn't sound like my assumption about fn() above
is correct.
> > > Well one important thing in our conversion is that
ttm_bo_validate () needs
> > > to be in the until_all_locked() loop. We want to be able soon to
use
> > > sleeping locks for eviction, so a xe_bo_validate() would, at
least
> > > temporarily, add locked objects to the drm_exec list of locked
objects. That
> > > means everything that may end up calling validate deep within the
call chain
> > > needs to be part of the until_all_locked() loop, so our
> > > drm_gpuva_manager_lock_extra() fn callback would include those
validates and
> > > look different all the time. Hence that's why open-coding
isn't all that
> > > bad...
> > Oh, I see. You indeed want to call validate() from within
until_all_locked().
> > 
> > > /Thomas
> > > 
> > > 
<snip>

Thomas Hellström (Intel)

2023-Aug-31 16:53 UTC

head link

[Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

Hi,

On 8/31/23 13:18, Danilo Krummrich wrote:> On Thu, Aug 31, 2023 at 11:04:06AM +0200, Thomas Hellstr?m (Intel) wrote:
>> Hi!
>>
>> On 8/30/23 17:00, Danilo Krummrich wrote:
>>> On Wed, Aug 30, 2023 at 03:42:08PM +0200, Thomas Hellstr?m (Intel)
wrote:
>>>> On 8/30/23 14:49, Danilo Krummrich wrote:
>>>>> Hi Thomas,
>>>>>
>>>>> thanks for having a look!
>>>>>
>>>>> On Wed, Aug 30, 2023 at 09:27:45AM +0200, Thomas Hellstr?m
(Intel) wrote:
>>>>>> Hi, Danilo.
>>>>>>
>>>>>> Some quick comments since I'm doing some Xe work in
this area. Will probably
>>>>>> get back with more.
>>>>>>
>>>>>> On 8/20/23 23:53, Danilo Krummrich wrote:
> <snip>
>
>>>>>>> diff --git a/include/drm/drm_gpuva_mgr.h
b/include/drm/drm_gpuva_mgr.h
>>>>>>> index ed8d50200cc3..693e2da3f425 100644
>>>>>>> --- a/include/drm/drm_gpuva_mgr.h
>>>>>>> +++ b/include/drm/drm_gpuva_mgr.h
>>>>>>> @@ -26,12 +26,16 @@
>>>>>>>       */
>>>>>>>      #include <linux/list.h>
>>>>>>> +#include <linux/dma-resv.h>
>>>>>>> +#include <linux/maple_tree.h>
>>>>>>>      #include <linux/rbtree.h>
>>>>>>>      #include <linux/types.h>
>>>>>>>      #include <drm/drm_gem.h>
>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>      struct drm_gpuva_manager;
>>>>>>> +struct drm_gpuva_gem;
>>>>>>>      struct drm_gpuva_fn_ops;
>>>>>>>      /**
>>>>>>> @@ -140,7 +144,7 @@ struct drm_gpuva {
>>>>>>>      int drm_gpuva_insert(struct drm_gpuva_manager
*mgr, struct drm_gpuva *va);
>>>>>>>      void drm_gpuva_remove(struct drm_gpuva *va);
>>>>>>> -void drm_gpuva_link(struct drm_gpuva *va);
>>>>>>> +void drm_gpuva_link(struct drm_gpuva *va, struct
drm_gpuva_gem *vm_bo);
>>>>>>>      void drm_gpuva_unlink(struct drm_gpuva *va);
>>>>>>>      struct drm_gpuva *drm_gpuva_find(struct
drm_gpuva_manager *mgr,
>>>>>>> @@ -240,15 +244,137 @@ struct drm_gpuva_manager {
>>>>>>>      	 * @ops: &drm_gpuva_fn_ops providing the
split/merge steps to drivers
>>>>>>>      	 */
>>>>>>>      	const struct drm_gpuva_fn_ops *ops;
>>>>>>> +
>>>>>>> +	/**
>>>>>>> +	 * @d_obj: Dummy GEM object; used internally to
pass the GPU VMs
>>>>>>> +	 * dma-resv to &drm_exec.
>>>>>>> +	 */
>>>>>>> +	struct drm_gem_object d_obj;
>>>>>>> +
>>>>>>> +	/**
>>>>>>> +	 * @resv: the &dma_resv for
&drm_gem_objects mapped in this GPU VA
>>>>>>> +	 * space
>>>>>>> +	 */
>>>>>>> +	struct dma_resv *resv;
>>>>>>> +
>>>>>>> +	/**
>>>>>>> +	 * @exec: the &drm_exec helper to lock
external &drm_gem_objects
>>>>>>> +	 */
>>>>>>> +	struct drm_exec exec;
>>>>>>> +
>>>>>>> +	/**
>>>>>>> +	 * @mt_ext: &maple_tree storing external
&drm_gem_objects
>>>>>>> +	 */
>>>>>>> +	struct maple_tree mt_ext;
>>>>>> Why are you using a maple tree here? Insertion and
removal is O(log(n))
>>>>>> instead of O(1) for a list?
>>>>>>
>>>>> Having a list of drm_gem_objects directly wouldn't
work, as multiple GPU-VMs
>>>>> could have mappings of the same extobj.
>>>>>
>>>>> I considered using the VM_BO abstraction (struct
drm_gpuva_gem) as list entry
>>>>> instead, which also seems to be the obvious choice.
However, there is a locking
>>>>> conflict.
>>>>>
>>>>> A drm_gem_object keeps a list of drm_gpuva_gems, while each
drm_gpuva_gem keeps
>>>>> a list of drm_gpuvas. Both lists are either protected with
the dma-resv lock of
>>>>> the corresponding drm_gem_object, or with an external lock
provided by the
>>>>> driver (see drm_gem_gpuva_set_lock()). The latter is used
by drivers performing
>>>>> changes on the GPUVA space directly from the fence
signalling path.
>>>>>
>>>>> Now, similar to what drm_gpuva_link() and
drm_gpuva_unlink() are doing already,
>>>>> we'd want to add a drm_gpuva_gem to the extobj list for
the first mapping being
>>>>> linked and we'd want to remove it for the last one
being unlinked.
>>>>>
>>>>> (Actually we'd want to add the drm_gpuva_gem object to
the extobj list even
>>>>> before, because otherwise we'd not acquire it's
dma-resv lock of this GEM object
>>>>> through drm_gpuva_manager_lock(). But that's trival, we
could do that when we
>>>>> create the drm_gpuva_gem, which we need to do anyways.)
>>>>>
>>>>> Anyway, we'd probably want to keep removing the
drm_gpuva_gem from the extobj
>>>>> list from drm_gpuva_unlink() when the last mapping of this
BO is unlinked. In
>>>>> order to do so, we'd (as discussed above) either need
to hold the outer GPU-VM
>>>>> lock or the GPU-VMs dma-resv lock. Both would be illegal in
the case
>>>>> drm_gpuva_unlink() is called from within the fence
signalling path. For drivers
>>>>> like XE or Nouveau, we'd at least need to make sure to
not mess up the locking
>>>>> hierarchy of GPU-VM lock and dma-resv lock of the
corresponding BO.
>>>>>
>>>>> Considering all that, I thought it's probably better to
track extobjs separate
>>>>> from the drm_gpuva_gem, hence the maple tree choice.
>>>> Hm. OK, in Xe we're having a list of the xe_vmas
(drm_gpuvas) that point to
>>>> external objects, or in the case of multiple mappings to the
same gem
>>>> object, only one of the drm_gpuvas is in the list. These are
protected by
>>>> the GPU-VM lock. I don't see a problem with removing those
from the fence
>>>> signalling path, though?
>>> I intentionally tried to avoid keeping a list of drm_gpuvas to
track extobjs,
>>> since this is generic code I don't know how much mappings of an
external object
>>> the corresponding driver potentially creates. This could become a
pretty large
>>> list to iterate. Another reason was, that I want to keep the
drm_gpuva structure
>>> as small as possible, hence avoiding another list_head.
>> Yes, the list might be pretty large, but OTOH you never iterate to
access a
>> single list element. When you need to iterate the whole list you need
to do
>> that regardless of the data structure used. As for the list head, it
might
>> perhaps be aliased (union) with an upcoming userptr list head?
>>
> Oh, I did not mean that I'm concerned about the size of a list of
extobjs in
> general, that would indeed be the same for every data structure chosen. But
I
> would be concerned about keeping a list of *all* mappings being backed by
an
> extobj.
>
>>> Now, it sounds like in XE you're doing some kind of
optimization just keeping a
>>> single mapping of an extobj in the list? How do you know when to
remove it? What
>>> if the mapping from the extobj list gets unmapped, but there is
still another
>>> one left in the GPU-VM being backed by the same BO?
>> When removing from the lists, we iterate through the object's list
of vmas,
>> and if there is one matching the same vm, we replace the old one with
the
>> new one. A similar iteration is done when adding to avoid adding one
that is
>> already on the list.
> I see, but wouldn't this be O(n) on insertion and O(m) on removal of an
extobj,
> while using the maple tree is O(log(n))?
No, insertion and removal is O(m) where m is the number of vms the 
object is currently bound to. Typically a very small number.
>
>>> Although assuming that's a no-go for GPUVA wouldn't an
XArray be a better
>>> choice, keeping O(1)?
>>> When tracking extobjs, the address of the drm_gem_object is the key
while the
>>> reference count is the value. I was thinking of an XArray as well,
but I was
>>> worried that the corresponding indices could be too much
distributed for an
>>> XArray to still be efficient. Now that I think about it, it's
probably not that
>>> bad.
>>>
>>> Btw., while I agree trying to make things as efficient as possible,
what is the
>>> magnitue for extobjs to be tracked, do we need to worry about the
O(log(n))?
>> Not sure yet, TBH, but I think one of our UMDs can only use external
object,
>> because they don't know at creation time which ones need exporting.
However
>> if this turns out to be too bad, there are various flavours of
"clever but
>> complicated" optimizations that we could think of to reduce the
list size.
>> Still in our case, we opted for the vma list head for now.
> Considering the above, I would guess that if your current approach is good
> enough, a maple tree will work as well.
Hmm, Yeah it's probably a bikeshed since each drm_exec builds a 
realloced array of all external objects on each exec.
>
> Otherwise, if you want, I could do some experiments with Xarray and see how
> that works out compared to using a maple tree.
>
> Btw. another nice thing about using Xarray or maple tree for that is that
> drivers updating the VA space from the fence signalling path don't need
to
> hold a GPU-VM lock to update the extobj list. Actually, they might not need
> a GPU-VM lock at all.
I still don't follow why drivers would want to do that. Isn't the VA 
space / fence object list always updated sync from the IOCTL?

/Thomas

>
>> /Thomas
>>
>>
>>>>>>> +
>>>>>>> +	/**
>>>>>>> +	 * @evict: structure holding the evict list and
evict list lock
>>>>>>> +	 */
>>>>>>> +	struct {
>>>>>>> +		/**
>>>>>>> +		 * @list: &list_head storing
&drm_gem_objects currently being
>>>>>>> +		 * evicted
>>>>>>> +		 */
>>>>>>> +		struct list_head list;
>>>>>>> +
>>>>>>> +		/**
>>>>>>> +		 * @lock: spinlock to protect the evict list
against concurrent
>>>>>>> +		 * insertion / removal of different
&drm_gpuva_gems
>>>>>>> +		 */
>>>>>>> +		spinlock_t lock;
>>>>>>> +	} evict;
>>>>>>>      };
>>>>>>>      void drm_gpuva_manager_init(struct
drm_gpuva_manager *mgr,
>>>>>>> +			    struct drm_device *drm,
>>>>>>>      			    const char *name,
>>>>>>>      			    u64 start_offset, u64 range,
>>>>>>>      			    u64 reserve_offset, u64 reserve_range,
>>>>>>>      			    const struct drm_gpuva_fn_ops *ops);
>>>>>>>      void drm_gpuva_manager_destroy(struct
drm_gpuva_manager *mgr);
>>>>>>> +/**
>>>>>>> + * DRM_GPUVA_EXEC - returns the
&drm_gpuva_managers &drm_exec instance
>>>>>>> + * @mgr: the &drm_gpuva_managers to return the
&drm_exec instance for
>>>>>>> + */
>>>>>>> +#define DRM_GPUVA_EXEC(mgr)	&(mgr)->exec
>>>>>> A struct ww_acquire_ctx and thus a drm_exec is
fundamentally per task and
>>>>>> should typically be allocated on the stack. Otherwise
you'd need to protect
>>>>>> the mgr->exec member with an exclusive lock
throughout the locking process,
>>>>>> and that's not what we want.
>>>>> Oh, good point. I think it works in Nouveau, because there
it's implicitly
>>>>> protected with the job submission lock.
>>>>>
>>>>>> Did you consider subclassing a drm_exec for drm_gpuva
purposes and add
>>>>>> needed ops to it: Like so:
>>>>> That's a good idea, will take this into V2.
>>>> Actually, I'm not fully sure that was a good idea: I've
now have a working
>>>> version of Xe ported over to drm_exec, having these helpers in
mind and with
>>>> the intention to start using them as they mature. What I found,
though is
>>>> that open-coding the drm_exec loop is not all that bad, but
that building
>>>> blocks that can be called from within the loop are useful:
>>>>
>>>> Like the drm_gpuva_prepare_objects() and an imaginary
>>>> drm_gpuva_prepare_gpuva() that locks the vm resv and the resv
of the object
>>>> (if different and the gpuva points to the object. And
>>>> drm_gpuva_prepare_array() although we don't use it within
Xe. That means you
>>>> can use these building blocks like helpers and avoid the fn()
callback by
>>>> instead open-coding.
>>>>
>>>> But I guess YMMV.
>>> That's exactly why those building blocks are exported, I
already had in mind
>>> that there might be drivers which still want to open-code the
drm_exec loop,
>>> while others might just want a simple interface to lock everything.
>>>
>>> I still think it is a good idea, but I'd keep that as simple as
possible. And
>>> for everything else just let the driver open-code it and use the
"building
>>> blocks" - will also expand the bulding blocks to what you
mentioned above.
>>>
>>>>>> struct drm_gpuva_exec_ops {
>>>>>>    ??? int (*fn) (struct drm_gpuva_exec *exec, int
num_fences);
>>>>> Is this the fn argument from
drm_gpuva_manager_lock_extra()?
>>>>>
>>>>>>    ??? int (*bo_validate) (struct drm_gpuva_exec *exec,
struct drm_gem_object
>>>>>> *obj);
>>>>> I guess we could also keep that within the
drm_gpuva_fn_ops? This should always
>>>>> be the same callback, right?
>>>>>
>>>>>> };
>>>>>>
>>>>>> struct drm_gpuva_exec {
>>>>>>    ??? const struct drm_gpuva_exec_ops *ops;
>>>>>>    ??? struct drm_exec exec;
>>>>>>    ??? struct drm_gpuva_manager *mgr;
>>>>>> };
>>>>>>
>>>>>> Although I'd actually expect bo_validate to be part
of fn in the typical
>>>>>> case. The drm_gpuva_exec would then be allocated by the
caller on the stack.
>>>>> This doesn't sound like my assumption about fn() above
is correct.
>>>> Well one important thing in our conversion is that
ttm_bo_validate () needs
>>>> to be in the until_all_locked() loop. We want to be able soon
to use
>>>> sleeping locks for eviction, so a xe_bo_validate() would, at
least
>>>> temporarily, add locked objects to the drm_exec list of locked
objects. That
>>>> means everything that may end up calling validate deep within
the call chain
>>>> needs to be part of the until_all_locked() loop, so our
>>>> drm_gpuva_manager_lock_extra() fn callback would include those
validates and
>>>> look different all the time. Hence that's why open-coding
isn't all that
>>>> bad...
>>> Oh, I see. You indeed want to call validate() from within
until_all_locked().
>>>
>>>> /Thomas
>>>>
>>>>
> <snip>

Reasonably Related Threads

Search for more possibly parallel threads

Nouveau - Aug 2023 - [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

[Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

[Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

Reasonably Related Threads