Tvrtko Ursulin
2025-Apr-22 13:25 UTC
[PATCH 3/5] drm/sched: Warn if pending list is not empty
On 22/04/2025 13:00, Philipp Stanner wrote:> On Tue, 2025-04-22 at 13:13 +0200, Danilo Krummrich wrote: >> On Tue, Apr 22, 2025 at 11:39:11AM +0100, Tvrtko Ursulin wrote: >>> Question I raised is if there are other drivers which manage to >>> clean up >>> everything correctly (like the mock scheduler does), but trigger >>> that >>> warning. Maybe there are not and maybe mock scheduler is the only >>> false >>> positive. > > For clarification: > > I messed up the comment from the cover letter. > > What I did was run the *old* unit tests (v5 IIRC) from Tvrtko that > still had the memory leaks. Those then trigger the warning print, as is > expected, since they don't provide fence_context_kill(). > > The current unit tests are fine memory-leak-wise. > > IOW, both with Nouveau and the unit tests, everything behaves as > expected, without issues.Hmm how? Pending list can be non-empty so it should be possible to hit that warn. Regards, Tvrtko>> So far the scheduler simply does not give any guideline on how to >> address the >> problem, hence every driver simply does something (or nothing, >> effectively >> ignoring the problem). This is what we want to fix. >> >> The mock scheduler keeps it's own list of pending jobs and on tear >> down stops >> the scheduler's workqueues, traverses it's own list and eventually >> frees the >> pending jobs without updating the scheduler's internal pending list. >> >> So yes, it does avoid memory leaks, but it also leaves the schedulers >> internal >> structures with an invalid state, i.e. the pending list of the >> scheduler has >> pointers to already freed memory. >> >> What if the drm_sched_fini() starts touching the pending list? Then >> you'd end up >> with UAF bugs with this implementation. We cannot invalidate the >> schedulers >> internal structures and yet call scheduler functions - e.g. >> drm_sched_fini() - >> subsequently. >> >> Hence, the current implementation of the mock scheduler is >> fundamentally flawed. >> And so would be *every* driver that still has entries within the >> scheduler's >> pending list. >> >> This is not a false positive, it already caught a real bug -- in the >> mock >> scheduler. >> >> - Danilo >