Dave Airlie
2023-Oct-10 20:23 UTC
[Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation
> I think we're then optimizing for different scenarios. Our compute > driver will use mostly external objects only, and if shared, I don't > forsee them bound to many VMs. What saves us currently here is that in > compute mode we only really traverse the extobj list after a preempt > fence wait, or when a vm is using a new context for the first time. So > vm's extobj list is pretty large. Each bo's vma list will typically be > pretty small.Can I ask why we are optimising for this userspace, this seems incredibly broken. We've has this sort of problem in the past with Intel letting the tail wag the horse, does anyone remember optimising relocations for a userspace that didn't actually need to use relocations? We need to ask why this userspace is doing this, can we get some pointers to it? compute driver should have no reason to use mostly external objects, the OpenCL and level0 APIs should be good enough to figure this out. Dave.
Christian König
2023-Oct-11 07:07 UTC
[Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation
Am 10.10.23 um 22:23 schrieb Dave Airlie:>> I think we're then optimizing for different scenarios. Our compute >> driver will use mostly external objects only, and if shared, I don't >> forsee them bound to many VMs. What saves us currently here is that in >> compute mode we only really traverse the extobj list after a preempt >> fence wait, or when a vm is using a new context for the first time. So >> vm's extobj list is pretty large. Each bo's vma list will typically be >> pretty small. > Can I ask why we are optimising for this userspace, this seems > incredibly broken. > > We've has this sort of problem in the past with Intel letting the tail > wag the horse, does anyone remember optimising relocations for a > userspace that didn't actually need to use relocations? > > We need to ask why this userspace is doing this, can we get some > pointers to it? compute driver should have no reason to use mostly > external objects, the OpenCL and level0 APIs should be good enough to > figure this out.Well that is pretty normal use case, AMD works the same way. In a multi GPU compute stack you have mostly all the data shared between different hardware devices. As I said before looking at just the Vulcan use case is not a good idea at all. Christian.> > Dave.
Thomas Hellström
2023-Oct-11 08:22 UTC
[Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation
On Wed, 2023-10-11 at 06:23 +1000, Dave Airlie wrote:> > I think we're then optimizing for different scenarios. Our compute > > driver will use mostly external objects only, and if shared, I > > don't > > forsee them bound to many VMs. What saves us currently here is that > > in > > compute mode we only really traverse the extobj list after a > > preempt > > fence wait, or when a vm is using a new context for the first time. > > So > > vm's extobj list is pretty large. Each bo's vma list will typically > > be > > pretty small. > > Can I ask why we are optimising for this userspace, this seems > incredibly broken.First Judging from the discussion with Christian this is not really uncommon. There *are* ways that we can play tricks in KMD of assorted cleverness to reduce the extobj list size, but doing that in KMD that wouldn't be much different than accepting a large extobj list size and do what we can to reduce overhead of iterating over it. Second the discussion here really was about whether we should be using a lower level lock to allow for async state updates, with a rather complex mechanism with weak reference counting and a requirement to drop the locks within the loop to avoid locking inversion. If that were a simplification with little or no overhead all fine, but IMO it's not a simplification?> > We've has this sort of problem in the past with Intel letting the > tail > wag the horse, does anyone remember optimising relocations for a > userspace that didn't actually need to use relocations?> > We need to ask why this userspace is doing this, can we get some > pointers to it? compute driver should have no reason to use mostly > external objects, the OpenCL and level0 APIs should be good enough to > figure this out.TBH for the compute UMD case, I'd be prepared to drop the *performance* argument of fine-grained locking the extobj list since it's really only traversed on new contexts and preemption. But as Christian mentions there might be other cases. We should perhaps figure those out and document? /Thoams> > Dave.
Possibly Parallel Threads
- [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation
- [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation
- [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation
- [PATCH drm-misc-next 0/3] [RFC] DRM GPUVA Manager GPU-VM features
- HP Autoloader