thr3ads.net - Nouveau - [Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation [Oct 2023]

If this information is useful, please help other people find it:
Share via:

Dave Airlie

2023-Oct-10 20:23 UTC

[Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

> I think we're then optimizing for different scenarios. Our compute
> driver will use mostly external objects only, and if shared, I don't
> forsee them bound to many VMs. What saves us currently here is that in
> compute mode we only really traverse the extobj list after a preempt
> fence wait, or when a vm is using a new context for the first time. So
> vm's extobj list is pretty large. Each bo's vma list will typically
be
> pretty small.
Can I ask why we are optimising for this userspace, this seems
incredibly broken.

We've has this sort of problem in the past with Intel letting the tail
wag the horse, does anyone remember optimising relocations for a
userspace that didn't actually need to use relocations?

We need to ask why this userspace is doing this, can we get some
pointers to it? compute driver should have no reason to use mostly
external objects, the OpenCL and level0 APIs should be good enough to
figure this out.

Dave.

Christian König

2023-Oct-11 07:07 UTC

head link

[Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

Am 10.10.23 um 22:23 schrieb Dave Airlie:>> I think we're then optimizing for different scenarios. Our compute
>> driver will use mostly external objects only, and if shared, I
don't
>> forsee them bound to many VMs. What saves us currently here is that in
>> compute mode we only really traverse the extobj list after a preempt
>> fence wait, or when a vm is using a new context for the first time. So
>> vm's extobj list is pretty large. Each bo's vma list will
typically be
>> pretty small.
> Can I ask why we are optimising for this userspace, this seems
> incredibly broken.
>
> We've has this sort of problem in the past with Intel letting the tail
> wag the horse, does anyone remember optimising relocations for a
> userspace that didn't actually need to use relocations?
>
> We need to ask why this userspace is doing this, can we get some
> pointers to it? compute driver should have no reason to use mostly
> external objects, the OpenCL and level0 APIs should be good enough to
> figure this out.
Well that is pretty normal use case, AMD works the same way.

In a multi GPU compute stack you have mostly all the data shared between 
different hardware devices.

As I said before looking at just the Vulcan use case is not a good idea 
at all.

Christian.
>
> Dave.

Thomas Hellström

2023-Oct-11 08:22 UTC

head link

[Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

On Wed, 2023-10-11 at 06:23 +1000, Dave Airlie wrote:> > I think we're then optimizing for different scenarios. Our compute
> > driver will use mostly external objects only, and if shared, I
> > don't
> > forsee them bound to many VMs. What saves us currently here is that
> > in
> > compute mode we only really traverse the extobj list after a
> > preempt
> > fence wait, or when a vm is using a new context for the first time.
> > So
> > vm's extobj list is pretty large. Each bo's vma list will
typically
> > be
> > pretty small.
> 
> Can I ask why we are optimising for this userspace, this seems
> incredibly broken.
First Judging from the discussion with Christian this is not really
uncommon. There *are* ways that we can play tricks in KMD of assorted
cleverness to reduce the extobj list size, but doing that in KMD that
wouldn't be much different than accepting a large extobj list size and
do what we can to reduce overhead of iterating over it.

Second the discussion here really was about whether we should be using
a lower level lock to allow for async state updates, with a rather
complex mechanism with weak reference counting and a requirement to
drop the locks within the loop to avoid locking inversion. If that were
a simplification with little or no overhead all fine, but IMO it's not
a simplification?
> 
> We've has this sort of problem in the past with Intel letting the
> tail
> wag the horse, does anyone remember optimising relocations for a
> userspace that didn't actually need to use relocations?
> 
> We need to ask why this userspace is doing this, can we get some
> pointers to it? compute driver should have no reason to use mostly
> external objects, the OpenCL and level0 APIs should be good enough to
> figure this out.
TBH for the compute UMD case, I'd be prepared to drop the *performance*
argument of fine-grained locking the extobj list since it's really only
traversed on new contexts and preemption. But as Christian mentions
there might be other cases. We should perhaps figure those out and
document?

/Thoams

> 
> Dave.

Reasonably Related Threads

Search for more maybe matching threads

Nouveau - Oct 2023 - [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

[Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

[Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

[Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

Reasonably Related Threads