Christian König
2025-May-22 12:34 UTC
[PATCH 2/2] drm/nouveau: Don't signal when killing the fence context
On 5/22/25 14:20, Philipp Stanner wrote:> On Thu, 2025-05-22 at 14:06 +0200, Christian K?nig wrote: >> On 5/22/25 13:25, Philipp Stanner wrote: >>> dma_fence_is_signaled_locked(), which is used in >>> nouveau_fence_context_kill(), can signal fences below the surface >>> through a callback. >>> >>> There is neither need for nor use in doing that when killing a >>> fence >>> context. >>> >>> Replace dma_fence_is_signaled_locked() with >>> __dma_fence_is_signaled(), a >>> function which only checks, never signals. >> >> That is not a good approach. >> >> Having the __dma_fence_is_signaled() means that other would be >> allowed to call it as well. >> >> But nouveau can do that here only because it knows that the fence was >> issued by nouveau. >> >> What nouveau can to is to test the signaled flag directly, but that's >> what you try to avoid as well. > > There's many parties who check the bit already. > > And if Nouveau is allowed to do that, one can just as well provide a > wrapper for it.No, exactly that's what is usually avoided in cases like this here. See all the functions inside include/linux/dma-fence.h can be used by everybody. It's basically the public interface of the dma_fence object. So testing if a fence is signaled without calling the callback is only allowed by whoever implemented the fence. In other words nouveau can test nouveau fences, i915 can test i915 fences, amdgpu can test amdgpu fences etc... But if you have the wrapper that makes it officially allowed that nouveau starts testing i915 fences and that would be problematic. Regards, Christian.> > That has the advantage of centralizing the responsibility and > documenting it. > > P. > >> >> Regards, >> Christian. >> >>> >>> Signed-off-by: Philipp Stanner <phasta at kernel.org> >>> --- >>> ?drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +- >>> ?1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c >>> b/drivers/gpu/drm/nouveau/nouveau_fence.c >>> index d5654e26d5bc..993b3dcb5db0 100644 >>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c >>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c >>> @@ -88,7 +88,7 @@ nouveau_fence_context_kill(struct >>> nouveau_fence_chan *fctx, int error) >>> ? >>> ? spin_lock_irqsave(&fctx->lock, flags); >>> ? list_for_each_entry_safe(fence, tmp, &fctx->pending, head) >>> { >>> - if (error && !dma_fence_is_signaled_locked(&fence- >>>> base)) >>> + if (error && !__dma_fence_is_signaled(&fence- >>>> base)) >>> ? dma_fence_set_error(&fence->base, error); >>> ? >>> ? if (nouveau_fence_signal(fence)) >> >
Philipp Stanner
2025-May-22 12:42 UTC
[PATCH 2/2] drm/nouveau: Don't signal when killing the fence context
On Thu, 2025-05-22 at 14:34 +0200, Christian K?nig wrote:> On 5/22/25 14:20, Philipp Stanner wrote: > > On Thu, 2025-05-22 at 14:06 +0200, Christian K?nig wrote: > > > On 5/22/25 13:25, Philipp Stanner wrote: > > > > dma_fence_is_signaled_locked(), which is used in > > > > nouveau_fence_context_kill(), can signal fences below the > > > > surface > > > > through a callback. > > > > > > > > There is neither need for nor use in doing that when killing a > > > > fence > > > > context. > > > > > > > > Replace dma_fence_is_signaled_locked() with > > > > __dma_fence_is_signaled(), a > > > > function which only checks, never signals. > > > > > > That is not a good approach. > > > > > > Having the __dma_fence_is_signaled() means that other would be > > > allowed to call it as well. > > > > > > But nouveau can do that here only because it knows that the fence > > > was > > > issued by nouveau. > > > > > > What nouveau can to is to test the signaled flag directly, but > > > that's > > > what you try to avoid as well. > > > > There's many parties who check the bit already. > > > > And if Nouveau is allowed to do that, one can just as well provide > > a > > wrapper for it. > > No, exactly that's what is usually avoided in cases like this here. > > See all the functions inside include/linux/dma-fence.h can be used by > everybody. It's basically the public interface of the dma_fence > object. > > So testing if a fence is signaled without calling the callback is > only allowed by whoever implemented the fence.Why? See, who owns the callback? -> the driver which emitted the fence. If the driver doesn't guarantee that all fences will be signaled, the callback (always returning false) doesn't help you in any way. I think the issue you're seeing is more that a party that only ever checks a fence's state through callbacks (and doesn't signal them through interrupts for example) would run danger of fences never getting signaled. But that's already the case if someone doesn't implement the callback. The fundamental basis is always the same: The driver must guarantee that all fences get signaled. Independently from other users checking the fence this or that way, independently from the callback being implemented.> > In other words nouveau can test nouveau fences, i915 can test i915 > fences, amdgpu can test amdgpu fences etc... But if you have the > wrapper that makes it officially allowed that nouveau starts testing > i915 fences and that would be problematic.I don't see the context here. That applies to the other functions as well. P.> > Regards, > Christian. > > > > > That has the advantage of centralizing the responsibility and > > documenting it. > > > > P. > > > > > > > > Regards, > > > Christian. > > > > > > > > > > > Signed-off-by: Philipp Stanner <phasta at kernel.org> > > > > --- > > > > ?drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +- > > > > ?1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c > > > > b/drivers/gpu/drm/nouveau/nouveau_fence.c > > > > index d5654e26d5bc..993b3dcb5db0 100644 > > > > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c > > > > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c > > > > @@ -88,7 +88,7 @@ nouveau_fence_context_kill(struct > > > > nouveau_fence_chan *fctx, int error) > > > > ? > > > > ? spin_lock_irqsave(&fctx->lock, flags); > > > > ? list_for_each_entry_safe(fence, tmp, &fctx->pending, > > > > head) > > > > { > > > > - if (error && > > > > !dma_fence_is_signaled_locked(&fence- > > > > > base)) > > > > + if (error && !__dma_fence_is_signaled(&fence- > > > > > base)) > > > > ? dma_fence_set_error(&fence->base, > > > > error); > > > > ? > > > > ? if (nouveau_fence_signal(fence)) > > > > > >
Tvrtko Ursulin
2025-May-22 12:57 UTC
[PATCH 2/2] drm/nouveau: Don't signal when killing the fence context
On 22/05/2025 13:34, Christian K?nig wrote:> On 5/22/25 14:20, Philipp Stanner wrote: >> On Thu, 2025-05-22 at 14:06 +0200, Christian K?nig wrote: >>> On 5/22/25 13:25, Philipp Stanner wrote: >>>> dma_fence_is_signaled_locked(), which is used in >>>> nouveau_fence_context_kill(), can signal fences below the surface >>>> through a callback. >>>> >>>> There is neither need for nor use in doing that when killing a >>>> fence >>>> context. >>>> >>>> Replace dma_fence_is_signaled_locked() with >>>> __dma_fence_is_signaled(), a >>>> function which only checks, never signals. >>> >>> That is not a good approach. >>> >>> Having the __dma_fence_is_signaled() means that other would be >>> allowed to call it as well. >>> >>> But nouveau can do that here only because it knows that the fence was >>> issued by nouveau. >>> >>> What nouveau can to is to test the signaled flag directly, but that's >>> what you try to avoid as well. >> >> There's many parties who check the bit already. >> >> And if Nouveau is allowed to do that, one can just as well provide a >> wrapper for it. > > No, exactly that's what is usually avoided in cases like this here. > > See all the functions inside include/linux/dma-fence.h can be used by everybody. It's basically the public interface of the dma_fence object. > > So testing if a fence is signaled without calling the callback is only allowed by whoever implemented the fence. > > In other words nouveau can test nouveau fences, i915 can test i915 fences, amdgpu can test amdgpu fences etc... But if you have the wrapper that makes it officially allowed that nouveau starts testing i915 fences and that would be problematic.But why? Say for example scheduler dependencies - why the scheduler couldn't ignore them at add time, but it can before trying to install a callback on them, and instead has to opportunistically signal someone else's fences? I don't see it. But even if there is a reason, advantage of the helper is that it can document this at a centralised place. Regards, Tvrtko>> That has the advantage of centralizing the responsibility and >> documenting it. >> >> P. >> >>> >>> Regards, >>> Christian. >>> >>>> >>>> Signed-off-by: Philipp Stanner <phasta at kernel.org> >>>> --- >>>> ?drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +- >>>> ?1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c >>>> b/drivers/gpu/drm/nouveau/nouveau_fence.c >>>> index d5654e26d5bc..993b3dcb5db0 100644 >>>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c >>>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c >>>> @@ -88,7 +88,7 @@ nouveau_fence_context_kill(struct >>>> nouveau_fence_chan *fctx, int error) >>>> >>>> ? spin_lock_irqsave(&fctx->lock, flags); >>>> ? list_for_each_entry_safe(fence, tmp, &fctx->pending, head) >>>> { >>>> - if (error && !dma_fence_is_signaled_locked(&fence- >>>>> base)) >>>> + if (error && !__dma_fence_is_signaled(&fence- >>>>> base)) >>>> ? dma_fence_set_error(&fence->base, error); >>>> >>>> ? if (nouveau_fence_signal(fence)) >>> >> >
Danilo Krummrich
2025-May-22 12:59 UTC
[PATCH 2/2] drm/nouveau: Don't signal when killing the fence context
On Thu, May 22, 2025 at 02:34:33PM +0200, Christian K?nig wrote:> See all the functions inside include/linux/dma-fence.h can be used by everybody. It's basically the public interface of the dma_fence object.As you write below, in certain cases it is valid to call this from drivers, so it's not unreasonable to have it as part of the public API.> So testing if a fence is signaled without calling the callback is only allowed by whoever implemented the fence. > > In other words nouveau can test nouveau fences, i915 can test i915 fences, amdgpu can test amdgpu fences etc... But if you have the wrapper that makes it officially allowed that nouveau starts testing i915 fences and that would be problematic.In general, I like the __dma_fence_is_signaled() helper, because this way we can document in which cases it is allowed to be used, i.e. the ones you descibe above. test_bit() can be called by anyone and there is no documentation comment explaining that it is only allowed under certain conditions. Having the __dma_fence_is_signaled() helper properly documented could get you rid of having to explain in which case the test_bit() dance is allowed to do over and over again. :-) I also think the name is good, since the '__' prefix already implies that there are some restrictions on the use of this helper.
Christian König
2025-May-22 13:05 UTC
[PATCH 2/2] drm/nouveau: Don't signal when killing the fence context
On 5/22/25 14:42, Philipp Stanner wrote:> On Thu, 2025-05-22 at 14:34 +0200, Christian K?nig wrote: >> On 5/22/25 14:20, Philipp Stanner wrote: >>> On Thu, 2025-05-22 at 14:06 +0200, Christian K?nig wrote: >>>> On 5/22/25 13:25, Philipp Stanner wrote: >>>>> dma_fence_is_signaled_locked(), which is used in >>>>> nouveau_fence_context_kill(), can signal fences below the >>>>> surface >>>>> through a callback. >>>>> >>>>> There is neither need for nor use in doing that when killing a >>>>> fence >>>>> context. >>>>> >>>>> Replace dma_fence_is_signaled_locked() with >>>>> __dma_fence_is_signaled(), a >>>>> function which only checks, never signals. >>>> >>>> That is not a good approach. >>>> >>>> Having the __dma_fence_is_signaled() means that other would be >>>> allowed to call it as well. >>>> >>>> But nouveau can do that here only because it knows that the fence >>>> was >>>> issued by nouveau. >>>> >>>> What nouveau can to is to test the signaled flag directly, but >>>> that's >>>> what you try to avoid as well. >>> >>> There's many parties who check the bit already. >>> >>> And if Nouveau is allowed to do that, one can just as well provide >>> a >>> wrapper for it. >> >> No, exactly that's what is usually avoided in cases like this here. >> >> See all the functions inside include/linux/dma-fence.h can be used by >> everybody. It's basically the public interface of the dma_fence >> object. >> >> So testing if a fence is signaled without calling the callback is >> only allowed by whoever implemented the fence. > > Why? > > See, who owns the callback? -> the driver which emitted the fence. If > the driver doesn't guarantee that all fences will be signaled, the > callback (always returning false) doesn't help you in any way. > > I think the issue you're seeing is more that a party that only ever > checks a fence's state through callbacks (and doesn't signal them > through interrupts for example) would run danger of fences never > getting signaled.Partially correct, yes.> But that's already the case if someone doesn't implement the callback.But than this implementation must always signal somehow else.> The fundamental basis is always the same: The driver must guarantee > that all fences get signaled. Independently from other users checking > the fence this or that way, independently from the callback being > implemented.Yeah, but it is invalid for a caller to not ask the implementation if it's signaled or not. See the rational behind that is to avoid abuse of the interface. E.g. when you don't know the implementation side use the defined API and don't mess with the internals. If you do know the implementation side then it's valid that you check the internals. Regards, Christian.> >> >> In other words nouveau can test nouveau fences, i915 can test i915 >> fences, amdgpu can test amdgpu fences etc... But if you have the >> wrapper that makes it officially allowed that nouveau starts testing >> i915 fences and that would be problematic. > > I don't see the context here. That applies to the other functions as > well. > > > P. > >> >> Regards, >> Christian. >> >>> >>> That has the advantage of centralizing the responsibility and >>> documenting it. >>> >>> P. >>> >>>> >>>> Regards, >>>> Christian. >>>> >>>>> >>>>> Signed-off-by: Philipp Stanner <phasta at kernel.org> >>>>> --- >>>>> ?drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +- >>>>> ?1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c >>>>> b/drivers/gpu/drm/nouveau/nouveau_fence.c >>>>> index d5654e26d5bc..993b3dcb5db0 100644 >>>>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c >>>>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c >>>>> @@ -88,7 +88,7 @@ nouveau_fence_context_kill(struct >>>>> nouveau_fence_chan *fctx, int error) >>>>> ? >>>>> ? spin_lock_irqsave(&fctx->lock, flags); >>>>> ? list_for_each_entry_safe(fence, tmp, &fctx->pending, >>>>> head) >>>>> { >>>>> - if (error && >>>>> !dma_fence_is_signaled_locked(&fence- >>>>>> base)) >>>>> + if (error && !__dma_fence_is_signaled(&fence- >>>>>> base)) >>>>> ? dma_fence_set_error(&fence->base, >>>>> error); >>>>> ? >>>>> ? if (nouveau_fence_signal(fence)) >>>> >>> >> >
Christian König
2025-May-22 13:15 UTC
[PATCH 2/2] drm/nouveau: Don't signal when killing the fence context
On 5/22/25 14:57, Tvrtko Ursulin wrote:> > On 22/05/2025 13:34, Christian K?nig wrote: >> On 5/22/25 14:20, Philipp Stanner wrote: >>> On Thu, 2025-05-22 at 14:06 +0200, Christian K?nig wrote: >>>> On 5/22/25 13:25, Philipp Stanner wrote: >>>>> dma_fence_is_signaled_locked(), which is used in >>>>> nouveau_fence_context_kill(), can signal fences below the surface >>>>> through a callback. >>>>> >>>>> There is neither need for nor use in doing that when killing a >>>>> fence >>>>> context. >>>>> >>>>> Replace dma_fence_is_signaled_locked() with >>>>> __dma_fence_is_signaled(), a >>>>> function which only checks, never signals. >>>> >>>> That is not a good approach. >>>> >>>> Having the __dma_fence_is_signaled() means that other would be >>>> allowed to call it as well. >>>> >>>> But nouveau can do that here only because it knows that the fence was >>>> issued by nouveau. >>>> >>>> What nouveau can to is to test the signaled flag directly, but that's >>>> what you try to avoid as well. >>> >>> There's many parties who check the bit already. >>> >>> And if Nouveau is allowed to do that, one can just as well provide a >>> wrapper for it. >> >> No, exactly that's what is usually avoided in cases like this here. >> >> See all the functions inside include/linux/dma-fence.h can be used by everybody. It's basically the public interface of the dma_fence object. >> >> So testing if a fence is signaled without calling the callback is only allowed by whoever implemented the fence. >> >> In other words nouveau can test nouveau fences, i915 can test i915 fences, amdgpu can test amdgpu fences etc... But if you have the wrapper that makes it officially allowed that nouveau starts testing i915 fences and that would be problematic. > > But why? Say for example scheduler dependencies - why the scheduler couldn't ignore them at add time, but it can before trying to install a callback on them, and instead has to opportunistically signal someone else's fences?We had cases where people tested the signaling status from time to time and were then surprised that the fence never signaled.> I don't see it. But even if there is a reason, advantage of the helper is that it can document this at a centralised place.Yeah, that is basically the only argument I can see which speaks in favor of that approach. Regards, Christian.> > Regards, > > Tvrtko > >>> That has the advantage of centralizing the responsibility and >>> documenting it. >>> >>> P. >>> >>>> >>>> Regards, >>>> Christian. >>>> >>>>> >>>>> Signed-off-by: Philipp Stanner <phasta at kernel.org> >>>>> --- >>>>> ??drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +- >>>>> ??1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c >>>>> b/drivers/gpu/drm/nouveau/nouveau_fence.c >>>>> index d5654e26d5bc..993b3dcb5db0 100644 >>>>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c >>>>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c >>>>> @@ -88,7 +88,7 @@ nouveau_fence_context_kill(struct >>>>> nouveau_fence_chan *fctx, int error) >>>>> ? ????? spin_lock_irqsave(&fctx->lock, flags); >>>>> ????? list_for_each_entry_safe(fence, tmp, &fctx->pending, head) >>>>> { >>>>> -??????? if (error && !dma_fence_is_signaled_locked(&fence- >>>>>> base)) >>>>> +??????? if (error && !__dma_fence_is_signaled(&fence- >>>>>> base)) >>>>> ????????????? dma_fence_set_error(&fence->base, error); >>>>> ? ????????? if (nouveau_fence_signal(fence)) >>>> >>> >> >