Tvrtko Ursulin
2025-May-16 10:34 UTC
[PATCH v2 2/6] drm/sched: Prevent teardown waitque from blocking too long
On 16/05/2025 10:54, Philipp Stanner wrote:> On Fri, 2025-05-16 at 10:33 +0100, Tvrtko Ursulin wrote: >> >> On 24/04/2025 10:55, Philipp Stanner wrote: >>> The waitqueue that ensures that drm_sched_fini() blocks until the >>> pending_list has become empty could theoretically cause that >>> function to >>> block for a very long time. That, ultimately, could block userspace >>> procesess and prevent them from being killable through SIGKILL. >>> >>> When a driver calls drm_sched_fini(), it is safe to assume that all >>> still pending jobs are not needed anymore anyways. Thus, they can >>> be >>> cancelled and thereby it can be ensured that drm_sched_fini() will >>> return relatively quickly. >>> >>> Implement a new helper to stop all work items / submission except >>> for >>> the drm_sched_backend_ops.run_job(). >>> >>> Implement a driver callback, kill_fence_context(), that instructs >>> the >>> driver to kill the fence context associated with this scheduler, >>> thereby >>> causing all pending hardware fences to be signalled. >>> >>> Call those new routines in drm_sched_fini() and ensure backwards >>> compatibility if the new callback is not implemented. >>> >>> Suggested-by: Danilo Krummrich <dakr at redhat.com> >>> Signed-off-by: Philipp Stanner <phasta at kernel.org> >>> --- >>> ? drivers/gpu/drm/scheduler/sched_main.c | 47 +++++++++++++++++---- >>> ----- >>> ? include/drm/gpu_scheduler.h??????????? | 11 ++++++ >>> ? 2 files changed, 42 insertions(+), 16 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>> b/drivers/gpu/drm/scheduler/sched_main.c >>> index ea82e69a72a8..c2ad6c70bfb6 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>> @@ -1390,31 +1390,46 @@ drm_sched_no_jobs_pending(struct >>> drm_gpu_scheduler *sched) >>> ?? return empty; >>> ? } >>> >>> +/** >>> + * drm_sched_cancel_jobs_and_wait - trigger freeing of all pending >>> jobs >>> + * @sched: scheduler instance >>> + * >>> + * Must only be called if &struct >>> drm_sched_backend_ops.kill_fence_context is >>> + * implemented. >>> + * >>> + * Instructs the driver to kill the fence context associated with >>> this scheduler, >>> + * thereby signalling all pending fences. This, in turn, will >>> trigger >>> + * &struct drm_sched_backend_ops.free_job to be called for all >>> pending jobs. >>> + * The function then blocks until all pending jobs have been >>> freed. >>> + */ >>> +static inline void >>> +drm_sched_cancel_jobs_and_wait(struct drm_gpu_scheduler *sched) >>> +{ >>> + sched->ops->kill_fence_context(sched); >>> + wait_event(sched->pending_list_waitque, >>> drm_sched_no_jobs_pending(sched)); >>> +} >>> + >>> ? /** >>> ?? * drm_sched_fini - Destroy a gpu scheduler >>> ?? * >>> ?? * @sched: scheduler instance >>> ?? * >>> - * Tears down and cleans up the scheduler. >>> - * >>> - * Note that this function blocks until all the fences returned by >>> - * &struct drm_sched_backend_ops.run_job have been signalled. >>> + * Tears down and cleans up the scheduler. Might leak memory if >>> + * &struct drm_sched_backend_ops.kill_fence_context is not >>> implemented. >>> ?? */ >>> ? void drm_sched_fini(struct drm_gpu_scheduler *sched) >>> ? { >>> ?? struct drm_sched_entity *s_entity; >>> ?? int i; >>> >>> - /* >>> - * Jobs that have neither been scheduled or which have >>> timed out are >>> - * gone by now, but jobs that have been submitted through >>> - * backend_ops.run_job() and have not yet terminated are >>> still pending. >>> - * >>> - * Wait for the pending_list to become empty to avoid >>> leaking those jobs. >>> - */ >>> - drm_sched_submission_and_timeout_stop(sched); >>> - wait_event(sched->pending_list_waitque, >>> drm_sched_no_jobs_pending(sched)); >>> - drm_sched_free_stop(sched); >>> + if (sched->ops->kill_fence_context) { >>> + drm_sched_submission_and_timeout_stop(sched); >>> + drm_sched_cancel_jobs_and_wait(sched); >>> + drm_sched_free_stop(sched); >>> + } else { >>> + /* We're in "legacy free-mode" and ignore >>> potential mem leaks */ >>> + drm_sched_wqueue_stop(sched); >>> + } >>> >>> ?? for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; >>> i++) { >>> ?? struct drm_sched_rq *rq = sched->sched_rq[i]; >>> @@ -1502,7 +1517,7 @@ bool drm_sched_wqueue_ready(struct >>> drm_gpu_scheduler *sched) >>> ? EXPORT_SYMBOL(drm_sched_wqueue_ready); >>> >>> ? /** >>> - * drm_sched_wqueue_stop - stop scheduler submission >>> + * drm_sched_wqueue_stop - stop scheduler submission and freeing >> >> Looks like the kerneldoc corrections (below too) belong to the >> previous >> patch. Irrelevant if you decide to squash them though. >> >>> ?? * @sched: scheduler instance >>> ?? * >>> ?? * Stops the scheduler from pulling new jobs from entities. It >>> also stops >>> @@ -1518,7 +1533,7 @@ void drm_sched_wqueue_stop(struct >>> drm_gpu_scheduler *sched) >>> ? EXPORT_SYMBOL(drm_sched_wqueue_stop); >>> >>> ? /** >>> - * drm_sched_wqueue_start - start scheduler submission >>> + * drm_sched_wqueue_start - start scheduler submission and freeing >>> ?? * @sched: scheduler instance >>> ?? * >>> ?? * Restarts the scheduler after drm_sched_wqueue_stop() has >>> stopped it. >>> diff --git a/include/drm/gpu_scheduler.h >>> b/include/drm/gpu_scheduler.h >>> index d0b1f416b4d9..8630b4a26f10 100644 >>> --- a/include/drm/gpu_scheduler.h >>> +++ b/include/drm/gpu_scheduler.h >>> @@ -509,6 +509,17 @@ struct drm_sched_backend_ops { >>> ?????????? * and it's time to clean it up. >>> ?? */ >>> ?? void (*free_job)(struct drm_sched_job *sched_job); >>> + >>> + /** >>> + * @kill_fence_context: kill the fence context belonging >>> to this scheduler >> >> Which fence context would that be? ;) >> >> Also, "fence context" would be a new terminology in gpu_scheduler.h >> API >> level. You could call it ->sched_fini() or similar to signify at >> which >> point in the API it gets called and then the fact it takes sched as >> parameter would be natural. >> >> We also probably want some commentary on the topic of indefinite (or >> very long at least) blocking a thread exit / SIGINT/TERM/KILL time. >> Cover letter touches upon that problem but I don't see you address >> it. >> Is the idea to let drivers shoot themselves in the foot or what? > > You didn't seriously just write that, did you? > > Yes, my intention clearly has been since September to make things > worse, more ambiguous and destroy drivers. That's why I'm probably the > only individual after Sima (who added some of the FIXMEs) who ever > bothered documenting all this mess here, and warning people about the > five hundred pitfalls, like three different start and stop functions, > implicit refcount rules and God knows which other insane hacks that > simply move driver problems into common infrastructure. > > Reconsider your attitude.I don't know what kind of attitude you think I was expressing. If the last sentence was a hyperbole too much for you then sorry. Point was in the paragraph ending with that - to document callback must not block, or if it has to to discuss for how long. And to perhaps discuss if scheduler code should impose a limit. Otherwise the indefinite block on thread exit from the cover letter can still happen. I don't see raising those point is a criticism of your overall work. (Fact that we don't see eye to eye regarding more state machine vs the ->cancel_job() alternative aside.) Regards, Tvrtko> > P. > >> >> Regards, >> >> Tvrtko >> >>> + * >>> + * Needed to cleanly tear the scheduler down in >>> drm_sched_fini(). This >>> + * callback will cause all hardware fences to be signalled >>> by the driver, >>> + * which, ultimately, ensures that all jobs get freed >>> before teardown. >>> + * >>> + * This callback is optional, but it is highly recommended >>> to implement it. >>> + */ >>> + void (*kill_fence_context)(struct drm_gpu_scheduler >>> *sched); >>> ? }; >>> >>> ? /** >> >