Christian König
2025-Apr-09 14:10 UTC
[PATCH 1/2] dma-fence: Rename dma_fence_is_signaled()
Am 09.04.25 um 16:01 schrieb Philipp Stanner:> On Wed, 2025-04-09 at 15:14 +0200, Christian K?nig wrote: >> Am 09.04.25 um 14:56 schrieb Philipp Stanner: >>> On Wed, 2025-04-09 at 14:51 +0200, Philipp Stanner wrote: >>>> On Wed, 2025-04-09 at 14:39 +0200, Boris Brezillon wrote: >>>>> Hi Philipp, >>>>> >>>>> On Wed,? 9 Apr 2025 14:06:37 +0200 >>>>> Philipp Stanner <phasta at kernel.org> wrote: >>>>> >>>>>> dma_fence_is_signaled()'s name strongly reads as if this >>>>>> function >>>>>> were >>>>>> intended for checking whether a fence is already signaled. >>>>>> Also >>>>>> the >>>>>> boolean it returns hints at that. >>>>>> >>>>>> The function's behavior, however, is more complex: it can >>>>>> check >>>>>> with a >>>>>> driver callback whether the hardware's sequence number >>>>>> indicates >>>>>> that >>>>>> the fence can already be treated as signaled, although the >>>>>> hardware's / >>>>>> driver's interrupt handler has not signaled it yet. If that's >>>>>> the >>>>>> case, >>>>>> the function also signals the fence. >>>>>> >>>>>> (Presumably) this has caused a bug in Nouveau (unknown >>>>>> commit), >>>>>> where >>>>>> nouveau_fence_done() uses the function to check a fence, >>>>>> which >>>>>> causes a >>>>>> race. >>>>>> >>>>>> Give the function a more obvious name. >>>>> This is just my personal view on this, but I find the new name >>>>> just >>>>> as >>>>> confusing as the old one. It sounds like something is checked, >>>>> but >>>>> it's >>>>> clear what, and then the fence is forcibly signaled like it >>>>> would >>>>> be >>>>> if >>>>> you call drm_fence_signal(). Of course, this clarified by the >>>>> doc, >>>>> but >>>>> given the goal was to make the function name clearly reflect >>>>> what >>>>> it >>>>> does, I'm not convinced it's significantly better. >>>>> >>>>> Maybe dma_fence_check_hw_state_and_propagate(), though it might >>>>> be >>>>> too long of name. Oh well, feel free to ignore this comments if >>>>> a >>>>> majority is fine with the new name. >>>> Yoa, the name isn't perfect (the perfect name describing the >>>> whole >>>> behavior would be >>>> dma_fence_check_if_already_signaled_then_check_hardware_state_and >>>> _pro >>>> pa >>>> gate() ^^' >>>> >>>> My intention here is to have the reader realize "watch out, the >>>> fence >>>> might get signaled here!", which is probably the most important >>>> event >>>> regarding fences, which can race, invoke the callbacks and so on. >>>> >>>> For details readers will then check the documentation. >>>> >>>> But I'm of course open to see if there's a majority for this or >>>> that >>>> name. >>> how about: >>> >>> dma_fence_check_hw_and_signal() ? >> I don't think that renaming the function is a good idea in the first >> place. >> >> What the function does internally is an implementation detail of the >> framework. >> >> For the code using this function it's completely irrelevant if the >> function might also signal the fence, what matters for the caller is >> the returned status of the fence. I think this also counts for the >> dma_fence_is_signaled() documentation. > It does obviously matter. As it's currently implemented, a lot of > important things happen implicitly.Yeah, but that's ok. The code who calls this is the consumer of the interface and so shouldn't need to know this. That's why we have created the DMA fence framework in the first place. For the provider side when a driver or similar implements the interface the relevant documentation is the dma_fence_ops structure.> I only see improvement by making things more obvious. > > In any case, how would you call a wrapper that just does > test_bit(IS_SIGNALED, ?) ?Broken, that was very intentionally removed quite shortly after we created the framework. We have a few cases were implementations do check that for their fences, but consumers should never be allowed to touch such internals. Regards, Christian.> > P. > >> What we should improve is the documentation of the dma_fence_ops- >>> enable_signaling and dma_fence_ops->signaled callbacks. >> Especially see the comment about reference counts on enable_signaling >> which is missing on the signaled callback. That is most likely the >> root cause why nouveau implemented enable_signaling correctly but not >> the other one. >> >> But putting that aside I think we should make nails with heads and >> let the framework guarantee that the fences stay alive until they are >> signaled (one way or another). This completely removes the burden to >> keep a reference on unsignaled fences from the drivers / >> implementations and make things more over all more defensive. >> >> Regards, >> Christian. >> >>> P. >>> >>>> P. >>>> >>>> >>>>> Regards, >>>>> >>>>> Boris
Philipp Stanner
2025-Apr-09 15:04 UTC
[PATCH 1/2] dma-fence: Rename dma_fence_is_signaled()
On Wed, 2025-04-09 at 16:10 +0200, Christian K?nig wrote:> Am 09.04.25 um 16:01 schrieb Philipp Stanner: > > On Wed, 2025-04-09 at 15:14 +0200, Christian K?nig wrote: > > > Am 09.04.25 um 14:56 schrieb Philipp Stanner: > > > > On Wed, 2025-04-09 at 14:51 +0200, Philipp Stanner wrote: > > > > > On Wed, 2025-04-09 at 14:39 +0200, Boris Brezillon wrote: > > > > > > Hi Philipp, > > > > > > > > > > > > On Wed,? 9 Apr 2025 14:06:37 +0200 > > > > > > Philipp Stanner <phasta at kernel.org> wrote: > > > > > > > > > > > > > dma_fence_is_signaled()'s name strongly reads as if this > > > > > > > function > > > > > > > were > > > > > > > intended for checking whether a fence is already > > > > > > > signaled. > > > > > > > Also > > > > > > > the > > > > > > > boolean it returns hints at that. > > > > > > > > > > > > > > The function's behavior, however, is more complex: it can > > > > > > > check > > > > > > > with a > > > > > > > driver callback whether the hardware's sequence number > > > > > > > indicates > > > > > > > that > > > > > > > the fence can already be treated as signaled, although > > > > > > > the > > > > > > > hardware's / > > > > > > > driver's interrupt handler has not signaled it yet. If > > > > > > > that's > > > > > > > the > > > > > > > case, > > > > > > > the function also signals the fence. > > > > > > > > > > > > > > (Presumably) this has caused a bug in Nouveau (unknown > > > > > > > commit), > > > > > > > where > > > > > > > nouveau_fence_done() uses the function to check a fence, > > > > > > > which > > > > > > > causes a > > > > > > > race. > > > > > > > > > > > > > > Give the function a more obvious name. > > > > > > This is just my personal view on this, but I find the new > > > > > > name > > > > > > just > > > > > > as > > > > > > confusing as the old one. It sounds like something is > > > > > > checked, > > > > > > but > > > > > > it's > > > > > > clear what, and then the fence is forcibly signaled like it > > > > > > would > > > > > > be > > > > > > if > > > > > > you call drm_fence_signal(). Of course, this clarified by > > > > > > the > > > > > > doc, > > > > > > but > > > > > > given the goal was to make the function name clearly > > > > > > reflect > > > > > > what > > > > > > it > > > > > > does, I'm not convinced it's significantly better. > > > > > > > > > > > > Maybe dma_fence_check_hw_state_and_propagate(), though it > > > > > > might > > > > > > be > > > > > > too long of name. Oh well, feel free to ignore this > > > > > > comments if > > > > > > a > > > > > > majority is fine with the new name. > > > > > Yoa, the name isn't perfect (the perfect name describing the > > > > > whole > > > > > behavior would be > > > > > dma_fence_check_if_already_signaled_then_check_hardware_state > > > > > _and > > > > > _pro > > > > > pa > > > > > gate() ^^' > > > > > > > > > > My intention here is to have the reader realize "watch out, > > > > > the > > > > > fence > > > > > might get signaled here!", which is probably the most > > > > > important > > > > > event > > > > > regarding fences, which can race, invoke the callbacks and so > > > > > on. > > > > > > > > > > For details readers will then check the documentation. > > > > > > > > > > But I'm of course open to see if there's a majority for this > > > > > or > > > > > that > > > > > name. > > > > how about: > > > > > > > > dma_fence_check_hw_and_signal() ? > > > I don't think that renaming the function is a good idea in the > > > first > > > place. > > > > > > What the function does internally is an implementation detail of > > > the > > > framework. > > > > > > For the code using this function it's completely irrelevant if > > > the > > > function might also signal the fence, what matters for the caller > > > is > > > the returned status of the fence. I think this also counts for > > > the > > > dma_fence_is_signaled() documentation. > > It does obviously matter. As it's currently implemented, a lot of > > important things happen implicitly. > > Yeah, but that's ok. > > The code who calls this is the consumer of the interface and so > shouldn't need to know this. That's why we have created the DMA fence > framework in the first place. > > For the provider side when a driver or similar implements the > interface the relevant documentation is the dma_fence_ops structure. > > > I only see improvement by making things more obvious. > > > > In any case, how would you call a wrapper that just does > > test_bit(IS_SIGNALED, ?) ? > > Broken, that was very intentionally removed quite shortly after we > created the framework. > > We have a few cases were implementations do check that for their > fences, but consumers should never be allowed to touch such > internals.There is theory and there is practice. In practice, those internals are being used by Nouveau, i915, Xe, vmgfx and radeon. So it seems that we failed quite a bit at communicating clearly how the interface should be used. And, to repeat myself, with both name and docu of that function, I think it is very easy to misunderstand what it's doing. You say that it shouldn't matter ? and maybe that's true, in theory. In practice, it does matter. In practice, APIs get misused and have side-effects. And making that harder is desirable. In any case, I might have to add another such call to Nouveau, because the solution preferred by you over the callback causes another race. Certainly one could solve this in a clean way, but someone has to do the work, and we're talking about more than a few hours here. In any case, be so kind and look at patch 2 and tell me there if you're at least OK with making the documentation more detailed. P.> > Regards, > Christian. > > > > > P. > > > > > What we should improve is the documentation of the dma_fence_ops- > > > > enable_signaling and dma_fence_ops->signaled callbacks. > > > Especially see the comment about reference counts on > > > enable_signaling > > > which is missing on the signaled callback. That is most likely > > > the > > > root cause why nouveau implemented enable_signaling correctly but > > > not > > > the other one. > > > > > > But putting that aside I think we should make nails with heads > > > and > > > let the framework guarantee that the fences stay alive until they > > > are > > > signaled (one way or another). This completely removes the burden > > > to > > > keep a reference on unsignaled fences from the drivers / > > > implementations and make things more over all more defensive. > > > > > > Regards, > > > Christian. > > > > > > > P. > > > > > > > > > P. > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > Boris >
Christian König
2025-Apr-10 08:37 UTC
[PATCH 1/2] dma-fence: Rename dma_fence_is_signaled()
Am 09.04.25 um 17:04 schrieb Philipp Stanner:> On Wed, 2025-04-09 at 16:10 +0200, Christian K?nig wrote: >>> I only see improvement by making things more obvious. >>> >>> In any case, how would you call a wrapper that just does >>> test_bit(IS_SIGNALED, ?) ? >> Broken, that was very intentionally removed quite shortly after we >> created the framework. >> >> We have a few cases were implementations do check that for their >> fences, but consumers should never be allowed to touch such >> internals. > There is theory and there is practice. In practice, those internals are > being used by Nouveau, i915, Xe, vmgfx and radeon.What do you mean? I only skimmed over the use cases, but as far as I can see those are all valid. You can test the flag if you know what the fence means to you, that is not a problem at all.> So it seems that we failed quite a bit at communicating clearly how the > interface should be used. > > And, to repeat myself, with both name and docu of that function, I > think it is very easy to misunderstand what it's doing. You say that it > shouldn't matter ? and maybe that's true, in theory. In practice, it > does matter. In practice, APIs get misused and have side-effects. And > making that harder is desirable.That sounds like I didn't used the right wording. It *must* not matter to the consumer. See the purpose of the DMA-fence framework is to make it irrelevant for the consumer how the provider has implemented it's fences. This means that things like if polling or interrupt driven signaling is used, 32bit vs 64bit seq numbers, etc... should all be hidden by the framework from the consumer of the fences. BTW I'm actually not sure if nouveau has a bug here. As far as I can see nouveau_fence_signal() will be called later eventually and do the necessary cleanup. But on the other hand it wouldn't surprise me if nouveau has a bug with that. The driver has been basically only barely maintained for quite a while.> In any case, I might have to add another such call to Nouveau, because > the solution preferred by you over the callback causes another race. > Certainly one could solve this in a clean way, but someone has to do > the work, and we're talking about more than a few hours here.Well this is not my preferred solution, it's just the technical correct solution as far as I can see.> In any case, be so kind and look at patch 2 and tell me there if you're > at least OK with making the documentation more detailed.As far as I can see that is clearly the wrong place to document that stuff. Regards, Christian.> > P.-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20250410/9e74a761/attachment-0001.htm> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-drm-nouveau-fix-and-cleanup-fence-handling.patch Type: text/x-patch Size: 2670 bytes Desc: not available URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20250410/9e74a761/attachment-0001.bin>