Thomas Hellström
2023-Sep-13 10:39 UTC
[Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
Hi, On 9/13/23 09:19, Boris Brezillon wrote:> On Wed, 13 Sep 2023 17:05:42 +1000 > Dave Airlie <airlied at gmail.com> wrote: > >> On Wed, 13 Sept 2023 at 17:03, Boris Brezillon >> <boris.brezillon at collabora.com> wrote: >>> On Tue, 12 Sep 2023 18:20:32 +0200 >>> Thomas Hellstr?m <thomas.hellstrom at linux.intel.com> wrote: >>> >>>>> +/** >>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element >>>>> + * @__gpuvm: The GPU VM >>>>> + * @__list_name: The name of the list we're iterating on >>>>> + * @__local_list: A pointer to the local list used to store already iterated items >>>>> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo() >>>>> + * >>>>> + * This helper is here to provide lockless list iteration. Lockless as in, the >>>>> + * iterator releases the lock immediately after picking the first element from >>>>> + * the list, so list insertion deletion can happen concurrently. >>>> Are the list spinlocks needed for that async state update from within >>>> the dma-fence critical section we've discussed previously? >>> Any driver calling _[un]link() from its drm_gpu_scheduler::run_job() >>> hook will be in this situation (Panthor at the moment, PowerVR soon). I >>> get that Xe and Nouveau don't need that because they update the VM >>> state early (in the ioctl path), but I keep thinking this will hurt us >>> if we don't think it through from the beginning, because once you've >>> set this logic to depend only on resv locks, it will be pretty hard to >>> get back to a solution which lets synchronous VM_BINDs take precedence >>> on asynchronous request, and, with vkQueueBindSparse() passing external >>> deps (plus the fact the VM_BIND queue might be pretty deep), it can >>> take a long time to get your synchronous VM_BIND executed...So this would boil down to either (possibly opt-in) keeping the spinlock approach or pushing the unlink out to a wq then? BTW, as also asked in a reply to Danilo, how do you call unlink from run_job() when it was requiring the obj->dma_resv lock, or was that a WIP?>>> >> btw what is the use case for this? do we have actual vulkan >> applications we know will have problems here? > I don't, but I think that's a concern Faith raised at some point (dates > back from when I was reading threads describing how VM_BIND on i915 > should work, and I was clearly discovering this whole VM_BIND thing at > that time, so maybe I misunderstood). > >> it feels like a bit of premature optimisation, but maybe we have use cases. > Might be, but that's the sort of thing that would put us in a corner if > we don't have a plan for when the needs arise. Besides, if we don't > want to support that case because it's too complicated, I'd recommend > dropping all the drm_gpuvm APIs that let people think this mode is > valid/supported (map/remap/unmap hooks in drm_gpuvm_ops, > drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the > confusion.Xe allows bypassing the bind-queue with another bind-queue, but to completely avoid dependencies between queues the Operations may not overlap.? (And the definition of overlap is currently page-table structure updates may not overlap) but no guarantees are made about priority. /Thomas