Dave Airlie
2024-Feb-05 21:08 UTC
[PATCH] nouveau: offload fence uevents work to workqueue
On Tue, 6 Feb 2024 at 02:22, Danilo Krummrich <dakr at redhat.com> wrote:> > On 1/29/24 02:50, Dave Airlie wrote: > > From: Dave Airlie <airlied at redhat.com> > > > > This should break the deadlock between the fctx lock and the irq lock. > > > > This offloads the processing off the work from the irq into a workqueue. > > > > Signed-off-by: Dave Airlie <airlied at redhat.com> > > Nouveau's scheduler uses a dedicated wq, hence from this perspective it's > safe deferring fence signalling to the kernel global wq. However, I wonder > if we could create deadlocks by building dependency chains into other > drivers / kernel code that, by chance, makes use of the kernel global wq as > well. > > Admittedly, even if, it's gonna be extremely unlikely given that > WQ_MAX_ACTIVE == 512. But maybe it'd be safer to use a dedicated wq. > > Also, do we need to CC stable?I pushed this to Linus at the end of last week, since the hangs in 6.7 take out the complete system and I wanted it in stable. It might be safer to use a dedicated wq, is the concern someone is waiting on a fence in a workqueue somewhere else so we will never signal it? Dave.
Danilo Krummrich
2024-Feb-05 22:00 UTC
[PATCH] nouveau: offload fence uevents work to workqueue
On 2/5/24 22:08, Dave Airlie wrote:> On Tue, 6 Feb 2024 at 02:22, Danilo Krummrich <dakr at redhat.com> wrote: >> >> On 1/29/24 02:50, Dave Airlie wrote: >>> From: Dave Airlie <airlied at redhat.com> >>> >>> This should break the deadlock between the fctx lock and the irq lock. >>> >>> This offloads the processing off the work from the irq into a workqueue. >>> >>> Signed-off-by: Dave Airlie <airlied at redhat.com> >> >> Nouveau's scheduler uses a dedicated wq, hence from this perspective it's >> safe deferring fence signalling to the kernel global wq. However, I wonder >> if we could create deadlocks by building dependency chains into other >> drivers / kernel code that, by chance, makes use of the kernel global wq as >> well. >> >> Admittedly, even if, it's gonna be extremely unlikely given that >> WQ_MAX_ACTIVE == 512. But maybe it'd be safer to use a dedicated wq. >> >> Also, do we need to CC stable? > > I pushed this to Linus at the end of last week, since the hangs in 6.7 > take out the complete system and I wanted it in stable. > > It might be safer to use a dedicated wq, is the concern someone is > waiting on a fence in a workqueue somewhere else so we will never > signal it?Yes, if some other work is waiting for this fence (or something else in the same dependency chain) to signal it can prevent executing the work signaling this fence, in case both are scheduled on the same wq. As mentioned, with the kernel global wq this would be rather unlikely to lead to an actual stall with WQ_MAX_ACTIVE == 512, but formally the race condition exists. I guess a malicious attacker could try to intentionally push jobs directly or indirectly depending on this fence to a driver which queues them up on a scheduler using the kernel global wq.> > Dave. >