Danilo Krummrich
2023-Sep-27  11:45 UTC
[Nouveau] [PATCH drm-misc-next 1/3] drm/sched: implement dynamic job flow control
On 9/27/23 09:25, Boris Brezillon wrote:> On Wed, 27 Sep 2023 02:13:59 +0200 > Danilo Krummrich <dakr at redhat.com> wrote: > >> On 9/26/23 22:43, Luben Tuikov wrote: >>> Hi, >>> >>> On 2023-09-24 18:43, Danilo Krummrich wrote: >>>> Currently, job flow control is implemented simply by limiting the amount >>>> of jobs in flight. Therefore, a scheduler is initialized with a >>>> submission limit that corresponds to a certain amount of jobs. >>> >>> "certain"? How about this instead: >>> " ... that corresponds to the number of jobs which can be sent >>> to the hardware."? >>> >>>> >>>> This implies that for each job drivers need to account for the maximum >>> ^, >>> Please add a comma after "job". >>> >>>> job size possible in order to not overflow the ring buffer. >>> >>> Well, different hardware designs would implement this differently. >>> Ideally, you only want pointers into the ring buffer, and then >>> the hardware consumes as much as it can. But this is a moot point >>> and it's always a good idea to have a "job size" hint from the client. >>> So this is a good patch. >>> >>> Ideally, you want to say that the hardware needs to be able to >>> accommodate the number of jobs which can fit in the hardware >>> queue times the largest job. This is a waste of resources >>> however, and it is better to give a hint as to the size of a job, >>> by the client. If the hardware can peek and understand dependencies, >>> on top of knowing the "size of the job", it can be an extremely >>> efficient scheduler. >>> >>>> >>>> However, there are drivers, such as Nouveau, where the job size has a >>>> rather large range. For such drivers it can easily happen that job >>>> submissions not even filling the ring by 1% can block subsequent >>>> submissions, which, in the worst case, can lead to the ring run dry. >>>> >>>> In order to overcome this issue, allow for tracking the actual job size >>>> instead of the amount job jobs. Therefore, add a field to track a job's >>> >>> "the amount job jobs." --> "the number of jobs." >> >> Yeah, I somehow manage to always get this wrong, which I guess you noticed >> below already. >> >> That's all good points below - gonna address them. >> >> Did you see Boris' response regarding a separate callback in order to fetch >> the job's submission units dynamically? Since this is needed by PowerVR, I'd >> like to include this in V2. What's your take on that? >> >> My only concern with that would be that if I got what Boris was saying >> correctly calling >> >> WARN_ON(s_job->submission_units > sched->submission_limit); >> >> from drm_sched_can_queue() wouldn't work anymore, since this could indeed happen >> temporarily. I think this was also Christian's concern. > > Actually, I think that's fine to account for the max job size in the > first check, we're unlikely to have so many native fence waits that our > job can't fit in an empty ring buffer. >But it can happen, right? Hence, we can't have this check, do we?