Michael Kruse via llvm-dev
2021-Feb-19 06:56 UTC
[llvm-dev] !llvm.loop ID metadata clarification
I wrote the lines in the LangRef and can answer the questions. When LoopIDs were first conceived, the idea was that each loop has its own identifier (so you could, e.g. build a DenseMap to information for a loop). To avoid that the MDNode of two different would be collapsed into one, the design made the first item of the MDNode reference itself. This worked because the uniquifier did not do a GVN or similar, just compared the content (IMHO still a bad idea, the uniquing algorithm could be improved later). Later, 'distinct' MDNodes were introduced, but by then code exists that depend on this. Either because they check for valid LoopID MDNodes eg. in an assert, skip the first element in a LoopID MDNode, or, in case of llvm.access.group (which allows either be a single LoopID or a list of LoopIDs) only handle the list of LoopIDs case because if it;s only a single LoopID, the MDNode again contains the self-reference LoopID. There is a problem with using a LoopID as "identifier": 1. The MDNode changes when changing the loop properties, e.g. LoopVectorize adds llvm.loop.isvectorized to loops it vectorize and loops that it determined to be not vectorizable to avoid processing it again. Hence, LoopIDs are not persistent. 2. There is some code that makes copies of code, such as LoopUnroll, inlining or LoopVersioning. These copy all the instructions including the MDNodes and you get multiple loops with the same LoopID. There is even a LoopUnroll test case that checks that. Hence, a LoopID is not unique. Therefore, a LoopID does not fulfill the basic requirements to identify a loop. Instead, we went to interpret the MDNode as a bag of loop properties which can be changed. With this reinterpretation, I did not change the name 'LoopID' nor removed the self-reference to avoid breaking things. Am Do., 18. Feb. 2021 um 19:12 Uhr schrieb Mehdi AMINI via llvm-dev <llvm-dev at lists.llvm.org>:> The last sentence isn't clear to me, I don't see the relationship between "the loop-metadata node can be attached to multiple nodes" and "the ‘distinct’ keyword has become unnecessary"?See above. The distinct/self-reference was meant to stop the same LoopID to be assigned to multiple loops by the MDNode uniquifier. However, there are other mechanisms that assign the same LoopID to two different loops.> Also It isn't clear why we don't use a distinct metadata instead of a self-referencing one?To not break existing code that assumes the self-reference.> > Then LangRef mentions: > > > Loop metadata nodes cannot be used as unique identifiers. They are neither persistent for the same loop through transformations nor necessarily unique to just one loop. > > So the metadata itself can't be used as a unique identifier for the loop, however in a loop all latches must have the same MDNode for LoopInfo to recognize it as valid.A single loop can have multiple latches. These should be consistent, otherwise we don't know which one is the correct information for this loop. We could instead choose the first one encountered, but I don't think its a solution for robustness. In contrast, metadata is defined to always be able to be dropped without losing semantic correctes.> It isn't clear to me why it prevents using unique metadata here without the self-referencing trick, and rely on content-based uniquing for equality?To not break existing code that assumes the self-reference.> Loop info could still rely on pointer-equality to ensure that the metadata is consistent on every backedge, two loops with the same metadata would share the same MDNode, but it shouldn't be an issue since these aren't used as unique identifiers per LangRef.Correct. However, it requires work to fix all the places that assumes the self-reference, makes existing IR incompatible and may cause trouble downstream. Hal Finkel and me found that this such a change is not worth the risk. Michael
Mehdi AMINI via llvm-dev
2021-Feb-19 17:44 UTC
[llvm-dev] !llvm.loop ID metadata clarification
On Thu, Feb 18, 2021 at 10:57 PM Michael Kruse <llvmdev at meinersbur.de> wrote:> I wrote the lines in the LangRef and can answer the questions. > > When LoopIDs were first conceived, the idea was that each loop has its > own identifier (so you could, e.g. build a DenseMap to information for > a loop). To avoid that the MDNode of two different would be collapsed > into one, the design made the first item of the MDNode reference > itself. This worked because the uniquifier did not do a GVN or > similar, just compared the content (IMHO still a bad idea, the > uniquing algorithm could be improved later). Later, 'distinct' MDNodes > were introduced, but by then code exists that depend on this. Either > because they check for valid LoopID MDNodes eg. in an assert, skip the > first element in a LoopID MDNode, or, in case of llvm.access.group > (which allows either be a single LoopID or a list of LoopIDs) only > handle the list of LoopIDs case because if it;s only a single LoopID, > the MDNode again contains the self-reference LoopID. > > > There is a problem with using a LoopID as "identifier": > > 1. The MDNode changes when changing the loop properties, e.g. > LoopVectorize adds llvm.loop.isvectorized to loops it vectorize and > loops that it determined to be not vectorizable to avoid processing it > again. Hence, LoopIDs are not persistent. > > 2. There is some code that makes copies of code, such as LoopUnroll, > inlining or LoopVersioning. These copy all the instructions including > the MDNodes and you get multiple loops with the same LoopID. There is > even a LoopUnroll test case that checks that. Hence, a LoopID is not > unique. > > Therefore, a LoopID does not fulfill the basic requirements to > identify a loop. Instead, we went to interpret the MDNode as a bag of > loop properties which can be changed. > > > With this reinterpretation, I did not change the name 'LoopID' nor > removed the self-reference to avoid breaking things. > > > > Am Do., 18. Feb. 2021 um 19:12 Uhr schrieb Mehdi AMINI via llvm-dev > <llvm-dev at lists.llvm.org>: > > The last sentence isn't clear to me, I don't see the relationship > between "the loop-metadata node can be attached to multiple nodes" and "the > ‘distinct’ keyword has become unnecessary"? > > See above. The distinct/self-reference was meant to stop the same > LoopID to be assigned to multiple loops by the MDNode uniquifier.However, there are other mechanisms that assign the same LoopID to two> different loops. >OK thanks, I get the sentence now: it isn't only the distinct keyword that is unnecessary but also the self-reference. Basically the metadata does not need to be distinct, I think that is what wasn't clear in the sentence unclear for me.> > Also It isn't clear why we don't use a distinct metadata instead of a > self-referencing one? > > To not break existing code that assumes the self-reference. >So this is just tech debt that can be cleaned up then? We could sweep through the code base and make it work with unique MDs?> > > > Then LangRef mentions: > > > > > Loop metadata nodes cannot be used as unique identifiers. They are > neither persistent for the same loop through transformations nor > necessarily unique to just one loop. > > > > So the metadata itself can't be used as a unique identifier for the > loop, however in a loop all latches must have the same MDNode for LoopInfo > to recognize it as valid. > > A single loop can have multiple latches. These should be consistent, > otherwise we don't know which one is the correct information for this > loop. We could instead choose the first one encountered, but I don't > think its a solution for robustness. In contrast, metadata is defined > to always be able to be dropped without losing semantic correctes. > > > > It isn't clear to me why it prevents using unique metadata here without > the self-referencing trick, and rely on content-based uniquing for equality? > > To not break existing code that assumes the self-reference.> > > Loop info could still rely on pointer-equality to ensure that the > metadata is consistent on every backedge, two loops with the same metadata > would share the same MDNode, but it shouldn't be an issue since these > aren't used as unique identifiers per LangRef. > > Correct. > > However, it requires work to fix all the places that assumes the > self-reference, makes existing IR incompatible and may cause trouble > downstream. Hal Finkel and me found that this such a change is not > worth the risk. >I'm happy to give it a try to clean-up the codebase and remove all this! Thanks for the clarification :) -- Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210219/09fad631/attachment.html>