Arthur Eubanks via llvm-dev
2021-Sep-02 22:48 UTC
[llvm-dev] LoopAccessAnalysis cache problem with new PM
A bug and a repro would be good, even if it's not consistent. I can take a look, having worked with new PM caches/proxies a bit. llvm-reduce'ing the IR with a test that runs the pipeline multiple times would be nice. Perhaps wherever the loop is being deleted, it's not calling LPMUpdater::markLoopAsDeleted()? LPMUpdater::markLoopAsDeleted() deletes analysis cache entries for the deleted loop. On Thu, Sep 2, 2021 at 3:29 PM Björn Pettersson A via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi! > > Spent lots of time today trying to debug a problem where I end up with > sporadic crashes, asserts, etc. > It happens occasionally if I run the same opt test case over and over > again, and I see the failures when LoopDistribution is running. > > In the IR there are 2 functions, "f" and "g". Function "f" has several > loops, and function "g" has one loop. > > My command line is basically "opt -O3 -enable-loop-distribute > -debug-pass-manager=verbose -debug-only=loop-distribute,loop-accesses", but > it seems to fail more often (or more seldom) if adding some other > parameters as well. > > I've added some debug printouts in LoopDistributeForLoop::processLoop to > print the Loop* L variable, and in LoopBase c'tor/d'tor, as that seemed to > be relevant. > > > What I've found out is that when it goes bad the output is like this: > > ...................... > <cut> > Running pass: LoopSimplifyPass on f > Running analysis: LoopAnalysis on f > *ADDED-DEBUG* Loop is created: 0x891cc60 > > <cut> > Running pass: LoopDistributePass on f > > LDist: In "f" checking Loop at depth 3 containing: %for.body10.us.us > <header><latch><exiting> > *ADDED-DEBUG* L used when requesting LAA: 0x891cd08 > Running analysis: LoopAccessAnalysis on Loop at depth 3 containing: % > for.body10.us.us<header><latch><exiting> > > LDist: In "f" checking Loop at depth 3 containing: > %for.body89<header><latch><exiting> > *ADDED-DEBUG* L used when requesting LAA: 0x891cc60 > Running analysis: LoopAccessAnalysis on Loop at depth 3 containing: > %for.body89<header><latch><exiting> > > <cut> > Running pass: SimpleLoopUnswitchPass on Loop at depth 2 containing: > %for.cond1.preheader<header><exiting>,%for.body10.us.us,% > cleanup.thread.split.us.us > ,%for.body89,%crit_edge,%for.inc100<latch><exiting>,%for.body89.preheader,% > for.body10.us.us.preheader > *ADDED-DEBUG* Loop is deleted: 0x891cc60 > > <cut> > Running pass: LoopSimplifyPass on g > Running analysis: LoopAnalysis on g > *ADDED-DEBUG* Loop is created: 0x891cc60 > > <cut> > Running pass: LoopDistributePass on g > > LDist: In "g" checking Loop at depth 1 containing: > %for.cond<header><latch><exiting> > *ADDED-DEBUG* L used when requesting LAA: 0x891cc60 > > <cut> > ...................... > > And then the pass manager will return the LoopAccessInfo cached for the > Loop in function "f". > (Notice that the Loop object happened to reuse the same address for the > Loop object used earlier.) > > I don't know if the using object pointers as key to the cached analysis > like this is safe in general. > But I guess the problem here is that the cache should have been cleared > earlier (when the original Loop was deleted), right? > > Trying to track when LoopBase objects are created and deleted kind of > points at that the first Loop object at 0x891cc60 is deleted when doing > loop unswitch on a loop at lower depth in function "f". But I'm not sure > exactly how the LoopAnalysisManagerFunctionProxy holding the cached > LoopAccessAnalysis based on that pointer is supposed to be informed about > that. But I suspect that maybe SimpleLoopUnswitchPass is to blame here? (or > something in the pass manager framework) > > > I guess I'll need to write a PR for this, but kind of messy since getting > the crashes is kind of sporadic. So I might need to find a better way to > detect that there is a stale analysis (but a few hours ago I did not know > much about how these analyses are cached in the new PM at all, so I haven't > really known what to look for). > > If anyone got some ideas how to debug this further, or what to look for, > I'm interested to get some guidance how to continue debugging this (or I'll > just have to keep banging my head on the wall trying to understand analysis > caches and proxies in new PM tomorrow). > > Regards, > Björn > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210902/0736b23b/attachment.html>
Björn Pettersson A via llvm-dev
2021-Sep-02 23:07 UTC
[llvm-dev] LoopAccessAnalysis cache problem with new PM
A quick look at SimpleLoopUnswitch tells me that it is doing some markLoopAsDeleted calls for the base loop. But when unswitching it may clone/destroy other sub loops (?) as well (there are some calls to LI.destroy(…) but I can’t really see that there is any LPMUpdater calls related to those). I’ll see if I can reduce and find a better reproducer tomorrow (given the knowledge that I probably should be able to run LoopDistribute -> SimpleLoopUnswitch and then somehow detect that there should is a stale analysis it might be a lot easier compared to my earlier attempts). /Björn From: Arthur Eubanks <aeubanks at google.com> Sent: den 3 september 2021 00:49 To: Björn Pettersson A <bjorn.a.pettersson at ericsson.com> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] LoopAccessAnalysis cache problem with new PM A bug and a repro would be good, even if it's not consistent. I can take a look, having worked with new PM caches/proxies a bit. llvm-reduce'ing the IR with a test that runs the pipeline multiple times would be nice. Perhaps wherever the loop is being deleted, it's not calling LPMUpdater::markLoopAsDeleted()? LPMUpdater::markLoopAsDeleted() deletes analysis cache entries for the deleted loop. On Thu, Sep 2, 2021 at 3:29 PM Björn Pettersson A via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi! Spent lots of time today trying to debug a problem where I end up with sporadic crashes, asserts, etc. It happens occasionally if I run the same opt test case over and over again, and I see the failures when LoopDistribution is running. In the IR there are 2 functions, "f" and "g". Function "f" has several loops, and function "g" has one loop. My command line is basically "opt -O3 -enable-loop-distribute -debug-pass-manager=verbose -debug-only=loop-distribute,loop-accesses", but it seems to fail more often (or more seldom) if adding some other parameters as well. I've added some debug printouts in LoopDistributeForLoop::processLoop to print the Loop* L variable, and in LoopBase c'tor/d'tor, as that seemed to be relevant. What I've found out is that when it goes bad the output is like this: ...................... <cut> Running pass: LoopSimplifyPass on f Running analysis: LoopAnalysis on f *ADDED-DEBUG* Loop is created: 0x891cc60 <cut> Running pass: LoopDistributePass on f LDist: In "f" checking Loop at depth 3 containing: %for.body10.us.us<https://protect2.fireeye.com/v1/url?k=ca8e8b78-9515b23b-ca8ecbe3-861fcb972bfc-5cb88c39c1f21069&q=1&e=4b838500-8bfc-4bc3-acae-134737c08377&u=http%3A%2F%2Ffor.body10.us.us%2F><header><latch><exiting> *ADDED-DEBUG* L used when requesting LAA: 0x891cd08 Running analysis: LoopAccessAnalysis on Loop at depth 3 containing: %for.body10.us.us<https://protect2.fireeye.com/v1/url?k=f1705b7b-aeeb6238-f1701be0-861fcb972bfc-72aa0c7d47dedbae&q=1&e=4b838500-8bfc-4bc3-acae-134737c08377&u=http%3A%2F%2Ffor.body10.us.us%2F><header><latch><exiting> LDist: In "f" checking Loop at depth 3 containing: %for.body89<header><latch><exiting> *ADDED-DEBUG* L used when requesting LAA: 0x891cc60 Running analysis: LoopAccessAnalysis on Loop at depth 3 containing: %for.body89<header><latch><exiting> <cut> Running pass: SimpleLoopUnswitchPass on Loop at depth 2 containing: %for.cond1.preheader<header><exiting>,%for.body10.us.us<https://protect2.fireeye.com/v1/url?k=7b502b6b-24cb1228-7b506bf0-861fcb972bfc-2e4bd48f8f2ba4d5&q=1&e=4b838500-8bfc-4bc3-acae-134737c08377&u=http%3A%2F%2Ffor.body10.us.us%2F>,%cleanup.thread.split.us.us<https://protect2.fireeye.com/v1/url?k=a35872cd-fcc34b8e-a3583256-861fcb972bfc-cce5dce10235ee85&q=1&e=4b838500-8bfc-4bc3-acae-134737c08377&u=http%3A%2F%2Fcleanup.thread.split.us.us%2F>,%for.body89,%crit_edge,%for.inc100<latch><exiting>,%for.body89.preheader,%for.body10.us.us<https://protect2.fireeye.com/v1/url?k=f603136b-a9982a28-f60353f0-861fcb972bfc-8e3c555b18dabc09&q=1&e=4b838500-8bfc-4bc3-acae-134737c08377&u=http%3A%2F%2Ffor.body10.us.us%2F>.preheader *ADDED-DEBUG* Loop is deleted: 0x891cc60 <cut> Running pass: LoopSimplifyPass on g Running analysis: LoopAnalysis on g *ADDED-DEBUG* Loop is created: 0x891cc60 <cut> Running pass: LoopDistributePass on g LDist: In "g" checking Loop at depth 1 containing: %for.cond<header><latch><exiting> *ADDED-DEBUG* L used when requesting LAA: 0x891cc60 <cut> ...................... And then the pass manager will return the LoopAccessInfo cached for the Loop in function "f". (Notice that the Loop object happened to reuse the same address for the Loop object used earlier.) I don't know if the using object pointers as key to the cached analysis like this is safe in general. But I guess the problem here is that the cache should have been cleared earlier (when the original Loop was deleted), right? Trying to track when LoopBase objects are created and deleted kind of points at that the first Loop object at 0x891cc60 is deleted when doing loop unswitch on a loop at lower depth in function "f". But I'm not sure exactly how the LoopAnalysisManagerFunctionProxy holding the cached LoopAccessAnalysis based on that pointer is supposed to be informed about that. But I suspect that maybe SimpleLoopUnswitchPass is to blame here? (or something in the pass manager framework) I guess I'll need to write a PR for this, but kind of messy since getting the crashes is kind of sporadic. So I might need to find a better way to detect that there is a stale analysis (but a few hours ago I did not know much about how these analyses are cached in the new PM at all, so I haven't really known what to look for). If anyone got some ideas how to debug this further, or what to look for, I'm interested to get some guidance how to continue debugging this (or I'll just have to keep banging my head on the wall trying to understand analysis caches and proxies in new PM tomorrow). Regards, Björn _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210902/897217da/attachment-0001.html>