Daniel Berlin
2015-Jul-21 22:34 UTC
[LLVMdev] GlobalsModRef (and thus LTO) is completely broken
Based on function names and structures, this is some version of GCC :) Any way you can post the entire .ll file? Because it's globalsmodref, it's hard to debug without the other functions, since it goes over all the functions to determine address takenness, etc :) On Tue, Jul 21, 2015 at 3:23 PM, Michael Zolotukhin <mzolotukhin at apple.com> wrote:> Hi Chandler, > > We observed some regressions in our regular testing (despite I saw nothing > suspicious in my runs). I did accurate investigation and was able to > reproduce and track down the regression. > I found the exact request to GlobalsModRef that results in the performance > loss (I added a limit on number of requests into the implementation and > bisected the number to find the interesting request). > > Here are the details: > > We’re checking following two locations: > > (lldb) p ((llvm::Instruction*)(LocA.Ptr))->dump() > %arrayidx.i = getelementptr inbounds [1 x %struct.elt_list*], [1 x > %struct.elt_list*]* %te.i, i64 0, i64 %indvars.iv.i > (lldb) p ((llvm::Instruction*)(LocB.Ptr))->dump() > @reg_values = internal unnamed_addr global %struct.varray_head_tag* null, > align 8 > > and the function in question is “cselib_init”: > (lldb) p > ((llvm::Instruction*)(LocA.Ptr))->getParent()->getParent()->getName() > (llvm::StringRef) $3 = (Data = "cselib_init", Length = 11) > > Corresponding underlying values: > (lldb) p UV2->dump() > @reg_values = internal unnamed_addr global %struct.varray_head_tag* null, > align 8 > (lldb) p UV1->dump() > %32 = load %struct.varray_head_tag*, %struct.varray_head_tag** @reg_values, > align 8, !tbaa !2 > > Backtrace: > (lldb) bt > * thread #1: tid = 0x120baaf, 0x00000001038b752a libLTO.dylib`(anonymous > namespace)::GlobalsModRef::alias(this=0x000000010eba5c10, > LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at > GlobalsModRef.cpp:519, queue = 'com.apple.main-thread', stop reason = step > over > * frame #0: 0x00000001038b752a libLTO.dylib`(anonymous > namespace)::GlobalsModRef::alias(this=0x000000010eba5c10, > LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at > GlobalsModRef.cpp:519 > frame #1: 0x00000001038b82f7 libLTO.dylib`non-virtual thunk to > (anonymous namespace)::GlobalsModRef::alias(this=0x000000010eba5c30, > LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 55 at > GlobalsModRef.cpp:562 > frame #2: 0x00000001038d6aa8 > libLTO.dylib`llvm::AliasAnalysis::getModRefInfo(this=0x000000010eba5c30, > S=0x000000010a1a22e0, Loc=0x00007fff5fbf6268) + 120 at AliasAnalysis.cpp:288 > frame #3: 0x0000000103a0a814 > libLTO.dylib`llvm::MemoryDependenceAnalysis::getPointerDependencyFrom(this=0x000000010e6cf0c0, > MemLoc=0x00007fff5fbf6268, isLoad=true, ScanIt=llvm::BasicBlock::iterator at > 0x00007fff5fbf4390, BB=0x000000010a19ffa0, QueryInst=0x000000010a1a20c8) + > 1908 at MemoryDependenceAnalysis.cpp:570 > frame #4: 0x0000000103a0ffa5 > libLTO.dylib`llvm::MemoryDependenceAnalysis::GetNonLocalInfoForBlock(this=0x000000010e6cf0c0, > QueryInst=0x000000010a1a20c8, Loc=0x00007fff5fbf6268, isLoad=true, > BB=0x000000010a19ffa0, Cache=0x0000000100f9d568, NumSortedEntries=0) + 2165 > at MemoryDependenceAnalysis.cpp:965 > frame #5: 0x0000000103a0e3a9 > libLTO.dylib`llvm::MemoryDependenceAnalysis::getNonLocalPointerDepFromBB(this=0x000000010e6cf0c0, > QueryInst=0x000000010a1a20c8, Pointer=0x00007fff5fbf62a8, > Loc=0x00007fff5fbf6268, isLoad=true, StartBB=0x000000010a19ffa0, > Result=0x00007fff5fbf6bf0, Visited=0x00007fff5fbf6208, SkipFirstBlock=false) > + 5897 at MemoryDependenceAnalysis.cpp:1200 > frame #6: 0x0000000103a0cb3b > libLTO.dylib`llvm::MemoryDependenceAnalysis::getNonLocalPointerDependency(this=0x000000010e6cf0c0, > QueryInst=0x000000010a1a20c8, Result=0x00007fff5fbf6bf0) + 635 at > MemoryDependenceAnalysis.cpp:911 > frame #7: 0x000000010340c5b5 libLTO.dylib`(anonymous > namespace)::GVN::processNonLocalLoad(this=0x000000010e6ce680, > LI=0x000000010a1a20c8) + 101 at GVN.cpp:1706 > frame #8: 0x0000000103408eef libLTO.dylib`(anonymous > namespace)::GVN::processLoad(this=0x000000010e6ce680, L=0x000000010a1a20c8) > + 1551 at GVN.cpp:1905 > frame #9: 0x00000001034080fd libLTO.dylib`(anonymous > namespace)::GVN::processInstruction(this=0x000000010e6ce680, > I=0x000000010a1a20c8) + 397 at GVN.cpp:2220 > frame #10: 0x0000000103407d1b libLTO.dylib`(anonymous > namespace)::GVN::processBlock(this=0x000000010e6ce680, > BB=0x000000010a19ffa0) + 251 at GVN.cpp:2394 > frame #11: 0x0000000103401755 libLTO.dylib`(anonymous > namespace)::GVN::iterateOnFunction(this=0x000000010e6ce680, > F=0x00000001085f69f8) + 1541 at GVN.cpp:2677 > frame #12: 0x0000000103400fef libLTO.dylib`(anonymous > namespace)::GVN::runOnFunction(this=0x000000010e6ce680, > F=0x00000001085f69f8) + 623 at GVN.cpp:2352 > frame #13: 0x00000001027cd05b > libLTO.dylib`llvm::FPPassManager::runOnFunction(this=0x000000010eba6810, > F=0x00000001085f69f8) + 427 at LegacyPassManager.cpp:1520 > frame #14: 0x00000001027cd375 > libLTO.dylib`llvm::FPPassManager::runOnModule(this=0x000000010eba6810, > M=0x000000010115c5f0) + 117 at LegacyPassManager.cpp:1540 > frame #15: 0x00000001027cdda1 libLTO.dylib`(anonymous > namespace)::MPPassManager::runOnModule(this=0x000000010e6cbaf0, > M=0x000000010115c5f0) + 1409 at LegacyPassManager.cpp:1596 > frame #16: 0x00000001027cd636 > libLTO.dylib`llvm::legacy::PassManagerImpl::run(this=0x000000010e6cb740, > M=0x000000010115c5f0) + 310 at LegacyPassManager.cpp:1698 > frame #17: 0x00000001027ce521 > libLTO.dylib`llvm::legacy::PassManager::run(this=0x00007fff5fbf82b8, > M=0x000000010115c5f0) + 33 at LegacyPassManager.cpp:1729 > > > The function body is in the attached file. > > > > GlobalsModRef reports NoAlias for this pair, here: > if (GV1 || GV2) { > // If the global's address is taken, pretend we don't know it's a > pointer to > // the global. > if (GV1 && !NonAddressTakenGlobals.count(GV1)) > GV1 = nullptr; > if (GV2 && !NonAddressTakenGlobals.count(GV2)) > GV2 = nullptr; > > // If the two pointers are derived from two different non-addr-taken > // globals, or if one is and the other isn't, we know these can't alias. > if ((GV1 || GV2) && GV1 != GV2) > return NoAlias; > > // Otherwise if they are both derived from the same addr-taken global, > we > // can't know the two accesses don't overlap. > } > > > Thanks, > Michael > > On Jul 17, 2015, at 12:18 PM, Chandler Carruth <chandlerc at gmail.com> wrote: > > On Fri, Jul 17, 2015 at 9:13 AM Evgeny Astigeevich > <evgeny.astigeevich at arm.com> wrote: >> >> It’s Dhrystone. > > Dhrystone has historically not been a good indicator of real-world > performance fluctuations, especially at this small of a shift. > > I'd like to see if we see any fluctuation on larger and more realistic > application benchmarks. One advantage of the flag being set is that we > should get runs from folks who have automatic builds and runs periodically > from trunk. Those should help give an accurate picture. > >> >> >> >> From: Chandler Carruth [mailto:chandlerc at gmail.com] >> Sent: 17 July 2015 16:10 >> >> >> To: Evgeny Astigeevich; Chandler Carruth >> Cc: LLVM Developers Mailing List >> >> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >> >> >> >> Can you say what Benchmark or give a test case so we understand the nature >> of the regression? As Gerolf said, that will be important to understand what >> is best to do. >> >> >> >> On Fri, Jul 17, 2015, 06:43 Evgeny Astigeevich >> <Evgeny.Astigeevich at arm.com> wrote: >> >> Yes, the regression is stable. I double checked this. A full benchmark run >> consists of at least 10 sub-runs to validate the score. >> >> I also checked if there were regressions of this benchmark across >> different ARM hardware versions. I found all regressions of this benchmark >> were in range 1.6%-2%. >> >> >> >> Kind regards, >> >> Evgeny Astigeevich >> >> >> >> From: Chandler Carruth [mailto:chandlerc at gmail.com] >> Sent: 17 July 2015 07:52 >> To: Evgeny Astigeevich; Chandler Carruth >> Cc: LLVM Developers Mailing List; Michael Zolotukhin >> >> >> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >> >> >> >> Hey, thanks for benchmarking. >> >> >> >> How stable is the 2% regression? >> >> >> >> Michael ran some benchmarks with GlobalsModRef completely disabled and the >> only differences were in the noise. This was a complete spec2k6 run along >> with some others. Based on the number of benchmarks run there, I'm going to >> go ahead and submit these patches, but if you can clarify the impact here, >> we can look at potentially some other tradeoff. I'm not particularly set on >> one set of defaults, etc, I just don't want to keep patches held up based on >> that. We can flip the default back and forth as new data arrives. >> >> >> >> On Thu, Jul 16, 2015 at 12:23 PM Evgeny Astigeevich >> <evgeny.astigeevich at arm.com> wrote: >> >> Hi Chandler, >> >> >> >> I ran couple benchmarks with LTO turned on and your patches on ARM >> hardware. >> >> There were no performance degradation of one benchmark and 2% slowdown of >> another benchmark. >> >> >> >> Kind regards, >> >> Evgeny Astigeevich >> >> >> >> >> >> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On >> Behalf Of Evgeny Astigeevich >> Sent: 15 July 2015 15:12 >> >> >> To: 'Chandler Carruth'; Gerolf Hoflehner >> Cc: LLVM Developers Mailing List >> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >> >> >> >> Hi Chandler, >> >> >> >> I would like to run some benchmarks on ARM hardware and to look at impact >> of your patches on LTO. >> >> >> >> Kind regards, >> >> Evgeny Astigeevich >> >> >> >> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On >> Behalf Of Chandler Carruth >> >> >> Sent: 15 July 2015 10:45 >> To: Chandler Carruth; Gerolf Hoflehner >> Cc: LLVM Developers Mailing List >> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >> >> >> >> I've fixed the obvious bugs I spotted in r242281. These should be pure >> correctness improvements. >> >> >> >> I've sent the two patches I'm imagining to address the core issue here: >> >> http://reviews.llvm.org/D11213 >> >> http://reviews.llvm.org/D11214 >> >> >> >> Currently, I have the unsafe alias results disabled by default, but with a >> flag that can re-enable them if needed. I don't feel really strongly about >> which way the default is set -- but that may be because I don't have lots of >> users relying on LTO. I'll let others indicate which way they would be most >> comfortable. >> >> >> >> Some IRC conversations indicated that early benchmark results with GMR >> completely disabled weren't showing really significant swings, so maybe this >> relatively small reduction in power of GMR won't be too problematic for >> folks. Either way, I'm open to the different approaches. It's D11214 that I >> care a lot about. =] >> >> >> >> >> >> Thanks for all the thoughts here! >> >> -Chandler >> >> >> >> On Tue, Jul 14, 2015 at 11:25 PM Chandler Carruth <chandlerc at gmail.com> >> wrote: >> >> Replying here, but several of the questions raised boil down to "couldn't >> you make the usage of GetUnderlyingObject conservatively correct?". I'll try >> and address that. >> >> >> >> I think this *is* the right approach, but I think it is very hard to do >> without effectively disabling this part of GlobalsModRef. That is, the easy >> ways are likely to fire very frequently IMO. >> >> >> >> The core idea is to detect a "no information" state coming out of >> GetUnderlyingObject (likely be providing a custom version just for >> GlobalsModRef and tailored to its use cases). This is particularly effective >> at avoiding the problems with the recursion limit. But let's look at what >> cases we *wouldn't* return that. Here are the cases I see when I thought >> about this last night with Hal, roughly in descending likelihood I would >> guess: >> >> >> >> 1) We detect some global or an alloca. In that case, even BasicAA would be >> sufficient to provide no-alias. GMR shouldn't be relevant. >> >> >> >> 2) We detect a phi, select, or inttoptr, and stop there. >> >> >> >> 3) We detect a load and stop there. >> >> >> >> 4) We detect a return from a function. >> >> >> >> 5) We detect an argument to the function. >> >> >> >> I strongly suspect the vast majority of queries hit #1. That's why BasicAA >> is *so* effective. Both #4 and #5 I think are actually reasonable places for >> GMR to potentially say "no-alias" and provide useful definitive information. >> But I also suspect these are the least common. >> >> >> >> So let's look at #2 and #3 because I think they're interesting. For these, >> I think it is extremely hard to return "no-alias". It seems extremely easy >> for a reasonable and innocuous change to the IR to introduce a phi or a >> select into one side of the GetUnderlyingObject but not the other. If that >> ever happens, we can't return "no-alias" for #2, or we need to add really >> expensive updates. It also seems reasonable (if not as likely) to want >> adding a store and load to the IR to not trigger a miscompile. If it is >> possible for a valid optimization pass to do reg2mem on an SSA value, then >> that could happen to only one side of the paired GetUnderlyingObject and >> break GMR with #3. If that seems like an unreasonable thing to do, consider >> loop re-rolling or other transformations which may need to take things in >> SSA form at move them out of SSA form. Even if we then try immediately to >> put it back *into* SSA form, before we do that we create a point where GMR >> cannot correctly return no-alias. >> >> >> >> So ultimately, I don't think we want to rely on GMR returning "no-alias" >> for either #2 or #3 because of the challenge of actually updating it in all >> of the circumstances that could break them. That means that *only* #4 and #5 >> are going to return "no-alias" usefully. And even then, function inlining >> and function outlining both break #4 and #5, so you have to preclude those >> transforms while GMR is active. And I have serious doubts about these >> providing enough value to be worth the cost. >> >> >> >> >> >> I think the better way to approach this is the other way around. Rather >> than doing a backwards analysis to see if one location reaches and global >> and the other location doesn't reach a global, I think it would be much more >> effective to re-cast this as a forward analysis that determines all the >> memory locations in a function that come from outside the function, and use >> that to drive the no-alias responses. >> >> >> >> >> >> On Tue, Jul 14, 2015 at 12:12 PM Gerolf Hoflehner <ghoflehner at apple.com> >> wrote: >> >> I wouldn’t be willing to give up performance for hypothetical issues. >> Please protect all your changes with options. For some of your concerns it >> is probably hard to provide a test case that shows an/the actual issue. >> >> >> >> I certainly agree that it will be very hard to provide a test case and >> extremely rare to see this in the wild for most of these issues. As long as >> I can remove the problematic update API we currently have (which as its an >> API change can't really be put behind flags), I'm happy to have flags >> control whether or not GMR uses the unsound / stale information to try to >> answer alias queries. Do you have any opinion about what the default value >> of the flags should be? >> >> >> >> I'll go ahead and prepare the patches, as it seems like we're all ending >> up in the same position, and just wondering about the precise tradeoffs we >> want to settle on. >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> ______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Michael Zolotukhin
2015-Jul-21 22:43 UTC
[LLVMdev] GlobalsModRef (and thus LTO) is completely broken
> On Jul 21, 2015, at 3:34 PM, Daniel Berlin <dberlin at dberlin.org> wrote: > > Based on function names and structures, this is some version of GCC :)Yep, it’s 403.gcc from SPEC2006 (I thought I mentioned it - but probably I only did that on IRC).> Any way you can post the entire .ll file?It’s an LTO build, so it’d be troublesome.. I tried to print the module in lldb, but after a several minutes it hasn’t even finished printing globals (which I assume is the very beginning).> > Because it's globalsmodref, it's hard to debug without the other > functions, since it goes over all the functions to determine address > takenness, etc :)Yep, I understand that - I’m still in debugger though, so if you’re interested in some particular data, I can try collecting it. I can try to dump the module too, but it might be not-practical in the end:) Michael> > > On Tue, Jul 21, 2015 at 3:23 PM, Michael Zolotukhin > <mzolotukhin at apple.com> wrote: >> Hi Chandler, >> >> We observed some regressions in our regular testing (despite I saw nothing >> suspicious in my runs). I did accurate investigation and was able to >> reproduce and track down the regression. >> I found the exact request to GlobalsModRef that results in the performance >> loss (I added a limit on number of requests into the implementation and >> bisected the number to find the interesting request). >> >> Here are the details: >> >> We’re checking following two locations: >> >> (lldb) p ((llvm::Instruction*)(LocA.Ptr))->dump() >> %arrayidx.i = getelementptr inbounds [1 x %struct.elt_list*], [1 x >> %struct.elt_list*]* %te.i, i64 0, i64 %indvars.iv.i >> (lldb) p ((llvm::Instruction*)(LocB.Ptr))->dump() >> @reg_values = internal unnamed_addr global %struct.varray_head_tag* null, >> align 8 >> >> and the function in question is “cselib_init”: >> (lldb) p >> ((llvm::Instruction*)(LocA.Ptr))->getParent()->getParent()->getName() >> (llvm::StringRef) $3 = (Data = "cselib_init", Length = 11) >> >> Corresponding underlying values: >> (lldb) p UV2->dump() >> @reg_values = internal unnamed_addr global %struct.varray_head_tag* null, >> align 8 >> (lldb) p UV1->dump() >> %32 = load %struct.varray_head_tag*, %struct.varray_head_tag** @reg_values, >> align 8, !tbaa !2 >> >> Backtrace: >> (lldb) bt >> * thread #1: tid = 0x120baaf, 0x00000001038b752a libLTO.dylib`(anonymous >> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10, >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at >> GlobalsModRef.cpp:519, queue = 'com.apple.main-thread', stop reason = step >> over >> * frame #0: 0x00000001038b752a libLTO.dylib`(anonymous >> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10, >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at >> GlobalsModRef.cpp:519 >> frame #1: 0x00000001038b82f7 libLTO.dylib`non-virtual thunk to >> (anonymous namespace)::GlobalsModRef::alias(this=0x000000010eba5c30, >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 55 at >> GlobalsModRef.cpp:562 >> frame #2: 0x00000001038d6aa8 >> libLTO.dylib`llvm::AliasAnalysis::getModRefInfo(this=0x000000010eba5c30, >> S=0x000000010a1a22e0, Loc=0x00007fff5fbf6268) + 120 at AliasAnalysis.cpp:288 >> frame #3: 0x0000000103a0a814 >> libLTO.dylib`llvm::MemoryDependenceAnalysis::getPointerDependencyFrom(this=0x000000010e6cf0c0, >> MemLoc=0x00007fff5fbf6268, isLoad=true, ScanIt=llvm::BasicBlock::iterator at >> 0x00007fff5fbf4390, BB=0x000000010a19ffa0, QueryInst=0x000000010a1a20c8) + >> 1908 at MemoryDependenceAnalysis.cpp:570 >> frame #4: 0x0000000103a0ffa5 >> libLTO.dylib`llvm::MemoryDependenceAnalysis::GetNonLocalInfoForBlock(this=0x000000010e6cf0c0, >> QueryInst=0x000000010a1a20c8, Loc=0x00007fff5fbf6268, isLoad=true, >> BB=0x000000010a19ffa0, Cache=0x0000000100f9d568, NumSortedEntries=0) + 2165 >> at MemoryDependenceAnalysis.cpp:965 >> frame #5: 0x0000000103a0e3a9 >> libLTO.dylib`llvm::MemoryDependenceAnalysis::getNonLocalPointerDepFromBB(this=0x000000010e6cf0c0, >> QueryInst=0x000000010a1a20c8, Pointer=0x00007fff5fbf62a8, >> Loc=0x00007fff5fbf6268, isLoad=true, StartBB=0x000000010a19ffa0, >> Result=0x00007fff5fbf6bf0, Visited=0x00007fff5fbf6208, SkipFirstBlock=false) >> + 5897 at MemoryDependenceAnalysis.cpp:1200 >> frame #6: 0x0000000103a0cb3b >> libLTO.dylib`llvm::MemoryDependenceAnalysis::getNonLocalPointerDependency(this=0x000000010e6cf0c0, >> QueryInst=0x000000010a1a20c8, Result=0x00007fff5fbf6bf0) + 635 at >> MemoryDependenceAnalysis.cpp:911 >> frame #7: 0x000000010340c5b5 libLTO.dylib`(anonymous >> namespace)::GVN::processNonLocalLoad(this=0x000000010e6ce680, >> LI=0x000000010a1a20c8) + 101 at GVN.cpp:1706 >> frame #8: 0x0000000103408eef libLTO.dylib`(anonymous >> namespace)::GVN::processLoad(this=0x000000010e6ce680, L=0x000000010a1a20c8) >> + 1551 at GVN.cpp:1905 >> frame #9: 0x00000001034080fd libLTO.dylib`(anonymous >> namespace)::GVN::processInstruction(this=0x000000010e6ce680, >> I=0x000000010a1a20c8) + 397 at GVN.cpp:2220 >> frame #10: 0x0000000103407d1b libLTO.dylib`(anonymous >> namespace)::GVN::processBlock(this=0x000000010e6ce680, >> BB=0x000000010a19ffa0) + 251 at GVN.cpp:2394 >> frame #11: 0x0000000103401755 libLTO.dylib`(anonymous >> namespace)::GVN::iterateOnFunction(this=0x000000010e6ce680, >> F=0x00000001085f69f8) + 1541 at GVN.cpp:2677 >> frame #12: 0x0000000103400fef libLTO.dylib`(anonymous >> namespace)::GVN::runOnFunction(this=0x000000010e6ce680, >> F=0x00000001085f69f8) + 623 at GVN.cpp:2352 >> frame #13: 0x00000001027cd05b >> libLTO.dylib`llvm::FPPassManager::runOnFunction(this=0x000000010eba6810, >> F=0x00000001085f69f8) + 427 at LegacyPassManager.cpp:1520 >> frame #14: 0x00000001027cd375 >> libLTO.dylib`llvm::FPPassManager::runOnModule(this=0x000000010eba6810, >> M=0x000000010115c5f0) + 117 at LegacyPassManager.cpp:1540 >> frame #15: 0x00000001027cdda1 libLTO.dylib`(anonymous >> namespace)::MPPassManager::runOnModule(this=0x000000010e6cbaf0, >> M=0x000000010115c5f0) + 1409 at LegacyPassManager.cpp:1596 >> frame #16: 0x00000001027cd636 >> libLTO.dylib`llvm::legacy::PassManagerImpl::run(this=0x000000010e6cb740, >> M=0x000000010115c5f0) + 310 at LegacyPassManager.cpp:1698 >> frame #17: 0x00000001027ce521 >> libLTO.dylib`llvm::legacy::PassManager::run(this=0x00007fff5fbf82b8, >> M=0x000000010115c5f0) + 33 at LegacyPassManager.cpp:1729 >> >> >> The function body is in the attached file. >> >> >> >> GlobalsModRef reports NoAlias for this pair, here: >> if (GV1 || GV2) { >> // If the global's address is taken, pretend we don't know it's a >> pointer to >> // the global. >> if (GV1 && !NonAddressTakenGlobals.count(GV1)) >> GV1 = nullptr; >> if (GV2 && !NonAddressTakenGlobals.count(GV2)) >> GV2 = nullptr; >> >> // If the two pointers are derived from two different non-addr-taken >> // globals, or if one is and the other isn't, we know these can't alias. >> if ((GV1 || GV2) && GV1 != GV2) >> return NoAlias; >> >> // Otherwise if they are both derived from the same addr-taken global, >> we >> // can't know the two accesses don't overlap. >> } >> >> >> Thanks, >> Michael >> >> On Jul 17, 2015, at 12:18 PM, Chandler Carruth <chandlerc at gmail.com> wrote: >> >> On Fri, Jul 17, 2015 at 9:13 AM Evgeny Astigeevich >> <evgeny.astigeevich at arm.com> wrote: >>> >>> It’s Dhrystone. >> >> Dhrystone has historically not been a good indicator of real-world >> performance fluctuations, especially at this small of a shift. >> >> I'd like to see if we see any fluctuation on larger and more realistic >> application benchmarks. One advantage of the flag being set is that we >> should get runs from folks who have automatic builds and runs periodically >> from trunk. Those should help give an accurate picture. >> >>> >>> >>> >>> From: Chandler Carruth [mailto:chandlerc at gmail.com] >>> Sent: 17 July 2015 16:10 >>> >>> >>> To: Evgeny Astigeevich; Chandler Carruth >>> Cc: LLVM Developers Mailing List >>> >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >>> >>> >>> >>> Can you say what Benchmark or give a test case so we understand the nature >>> of the regression? As Gerolf said, that will be important to understand what >>> is best to do. >>> >>> >>> >>> On Fri, Jul 17, 2015, 06:43 Evgeny Astigeevich >>> <Evgeny.Astigeevich at arm.com> wrote: >>> >>> Yes, the regression is stable. I double checked this. A full benchmark run >>> consists of at least 10 sub-runs to validate the score. >>> >>> I also checked if there were regressions of this benchmark across >>> different ARM hardware versions. I found all regressions of this benchmark >>> were in range 1.6%-2%. >>> >>> >>> >>> Kind regards, >>> >>> Evgeny Astigeevich >>> >>> >>> >>> From: Chandler Carruth [mailto:chandlerc at gmail.com] >>> Sent: 17 July 2015 07:52 >>> To: Evgeny Astigeevich; Chandler Carruth >>> Cc: LLVM Developers Mailing List; Michael Zolotukhin >>> >>> >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >>> >>> >>> >>> Hey, thanks for benchmarking. >>> >>> >>> >>> How stable is the 2% regression? >>> >>> >>> >>> Michael ran some benchmarks with GlobalsModRef completely disabled and the >>> only differences were in the noise. This was a complete spec2k6 run along >>> with some others. Based on the number of benchmarks run there, I'm going to >>> go ahead and submit these patches, but if you can clarify the impact here, >>> we can look at potentially some other tradeoff. I'm not particularly set on >>> one set of defaults, etc, I just don't want to keep patches held up based on >>> that. We can flip the default back and forth as new data arrives. >>> >>> >>> >>> On Thu, Jul 16, 2015 at 12:23 PM Evgeny Astigeevich >>> <evgeny.astigeevich at arm.com> wrote: >>> >>> Hi Chandler, >>> >>> >>> >>> I ran couple benchmarks with LTO turned on and your patches on ARM >>> hardware. >>> >>> There were no performance degradation of one benchmark and 2% slowdown of >>> another benchmark. >>> >>> >>> >>> Kind regards, >>> >>> Evgeny Astigeevich >>> >>> >>> >>> >>> >>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On >>> Behalf Of Evgeny Astigeevich >>> Sent: 15 July 2015 15:12 >>> >>> >>> To: 'Chandler Carruth'; Gerolf Hoflehner >>> Cc: LLVM Developers Mailing List >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >>> >>> >>> >>> Hi Chandler, >>> >>> >>> >>> I would like to run some benchmarks on ARM hardware and to look at impact >>> of your patches on LTO. >>> >>> >>> >>> Kind regards, >>> >>> Evgeny Astigeevich >>> >>> >>> >>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On >>> Behalf Of Chandler Carruth >>> >>> >>> Sent: 15 July 2015 10:45 >>> To: Chandler Carruth; Gerolf Hoflehner >>> Cc: LLVM Developers Mailing List >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >>> >>> >>> >>> I've fixed the obvious bugs I spotted in r242281. These should be pure >>> correctness improvements. >>> >>> >>> >>> I've sent the two patches I'm imagining to address the core issue here: >>> >>> http://reviews.llvm.org/D11213 >>> >>> http://reviews.llvm.org/D11214 >>> >>> >>> >>> Currently, I have the unsafe alias results disabled by default, but with a >>> flag that can re-enable them if needed. I don't feel really strongly about >>> which way the default is set -- but that may be because I don't have lots of >>> users relying on LTO. I'll let others indicate which way they would be most >>> comfortable. >>> >>> >>> >>> Some IRC conversations indicated that early benchmark results with GMR >>> completely disabled weren't showing really significant swings, so maybe this >>> relatively small reduction in power of GMR won't be too problematic for >>> folks. Either way, I'm open to the different approaches. It's D11214 that I >>> care a lot about. =] >>> >>> >>> >>> >>> >>> Thanks for all the thoughts here! >>> >>> -Chandler >>> >>> >>> >>> On Tue, Jul 14, 2015 at 11:25 PM Chandler Carruth <chandlerc at gmail.com> >>> wrote: >>> >>> Replying here, but several of the questions raised boil down to "couldn't >>> you make the usage of GetUnderlyingObject conservatively correct?". I'll try >>> and address that. >>> >>> >>> >>> I think this *is* the right approach, but I think it is very hard to do >>> without effectively disabling this part of GlobalsModRef. That is, the easy >>> ways are likely to fire very frequently IMO. >>> >>> >>> >>> The core idea is to detect a "no information" state coming out of >>> GetUnderlyingObject (likely be providing a custom version just for >>> GlobalsModRef and tailored to its use cases). This is particularly effective >>> at avoiding the problems with the recursion limit. But let's look at what >>> cases we *wouldn't* return that. Here are the cases I see when I thought >>> about this last night with Hal, roughly in descending likelihood I would >>> guess: >>> >>> >>> >>> 1) We detect some global or an alloca. In that case, even BasicAA would be >>> sufficient to provide no-alias. GMR shouldn't be relevant. >>> >>> >>> >>> 2) We detect a phi, select, or inttoptr, and stop there. >>> >>> >>> >>> 3) We detect a load and stop there. >>> >>> >>> >>> 4) We detect a return from a function. >>> >>> >>> >>> 5) We detect an argument to the function. >>> >>> >>> >>> I strongly suspect the vast majority of queries hit #1. That's why BasicAA >>> is *so* effective. Both #4 and #5 I think are actually reasonable places for >>> GMR to potentially say "no-alias" and provide useful definitive information. >>> But I also suspect these are the least common. >>> >>> >>> >>> So let's look at #2 and #3 because I think they're interesting. For these, >>> I think it is extremely hard to return "no-alias". It seems extremely easy >>> for a reasonable and innocuous change to the IR to introduce a phi or a >>> select into one side of the GetUnderlyingObject but not the other. If that >>> ever happens, we can't return "no-alias" for #2, or we need to add really >>> expensive updates. It also seems reasonable (if not as likely) to want >>> adding a store and load to the IR to not trigger a miscompile. If it is >>> possible for a valid optimization pass to do reg2mem on an SSA value, then >>> that could happen to only one side of the paired GetUnderlyingObject and >>> break GMR with #3. If that seems like an unreasonable thing to do, consider >>> loop re-rolling or other transformations which may need to take things in >>> SSA form at move them out of SSA form. Even if we then try immediately to >>> put it back *into* SSA form, before we do that we create a point where GMR >>> cannot correctly return no-alias. >>> >>> >>> >>> So ultimately, I don't think we want to rely on GMR returning "no-alias" >>> for either #2 or #3 because of the challenge of actually updating it in all >>> of the circumstances that could break them. That means that *only* #4 and #5 >>> are going to return "no-alias" usefully. And even then, function inlining >>> and function outlining both break #4 and #5, so you have to preclude those >>> transforms while GMR is active. And I have serious doubts about these >>> providing enough value to be worth the cost. >>> >>> >>> >>> >>> >>> I think the better way to approach this is the other way around. Rather >>> than doing a backwards analysis to see if one location reaches and global >>> and the other location doesn't reach a global, I think it would be much more >>> effective to re-cast this as a forward analysis that determines all the >>> memory locations in a function that come from outside the function, and use >>> that to drive the no-alias responses. >>> >>> >>> >>> >>> >>> On Tue, Jul 14, 2015 at 12:12 PM Gerolf Hoflehner <ghoflehner at apple.com> >>> wrote: >>> >>> I wouldn’t be willing to give up performance for hypothetical issues. >>> Please protect all your changes with options. For some of your concerns it >>> is probably hard to provide a test case that shows an/the actual issue. >>> >>> >>> >>> I certainly agree that it will be very hard to provide a test case and >>> extremely rare to see this in the wild for most of these issues. As long as >>> I can remove the problematic update API we currently have (which as its an >>> API change can't really be put behind flags), I'm happy to have flags >>> control whether or not GMR uses the unsound / stale information to try to >>> answer alias queries. Do you have any opinion about what the default value >>> of the flags should be? >>> >>> >>> >>> I'll go ahead and prepare the patches, as it seems like we're all ending >>> up in the same position, and just wondering about the precise tradeoffs we >>> want to settle on. >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >>> ______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>
Xinliang David Li
2015-Jul-21 22:50 UTC
[LLVMdev] GlobalsModRef (and thus LTO) is completely broken
reg_values is a file static variable in cselib.c -- so you might be able to reproduce the issue with a smaller reproducible. David On Tue, Jul 21, 2015 at 3:43 PM, Michael Zolotukhin <mzolotukhin at apple.com> wrote:> > > On Jul 21, 2015, at 3:34 PM, Daniel Berlin <dberlin at dberlin.org> wrote: > > > > Based on function names and structures, this is some version of GCC :) > Yep, it’s 403.gcc from SPEC2006 (I thought I mentioned it - but probably I > only did that on IRC). > > > Any way you can post the entire .ll file? > It’s an LTO build, so it’d be troublesome.. I tried to print the module in > lldb, but after a several minutes it hasn’t even finished printing globals > (which I assume is the very beginning). > > > > > Because it's globalsmodref, it's hard to debug without the other > > functions, since it goes over all the functions to determine address > > takenness, etc :) > Yep, I understand that - I’m still in debugger though, so if you’re > interested in some particular data, I can try collecting it. I can try to > dump the module too, but it might be not-practical in the end:) > > Michael > > > > > > On Tue, Jul 21, 2015 at 3:23 PM, Michael Zolotukhin > > <mzolotukhin at apple.com> wrote: > >> Hi Chandler, > >> > >> We observed some regressions in our regular testing (despite I saw > nothing > >> suspicious in my runs). I did accurate investigation and was able to > >> reproduce and track down the regression. > >> I found the exact request to GlobalsModRef that results in the > performance > >> loss (I added a limit on number of requests into the implementation and > >> bisected the number to find the interesting request). > >> > >> Here are the details: > >> > >> We’re checking following two locations: > >> > >> (lldb) p ((llvm::Instruction*)(LocA.Ptr))->dump() > >> %arrayidx.i = getelementptr inbounds [1 x %struct.elt_list*], [1 x > >> %struct.elt_list*]* %te.i, i64 0, i64 %indvars.iv.i > >> (lldb) p ((llvm::Instruction*)(LocB.Ptr))->dump() > >> @reg_values = internal unnamed_addr global %struct.varray_head_tag* > null, > >> align 8 > >> > >> and the function in question is “cselib_init”: > >> (lldb) p > >> ((llvm::Instruction*)(LocA.Ptr))->getParent()->getParent()->getName() > >> (llvm::StringRef) $3 = (Data = "cselib_init", Length = 11) > >> > >> Corresponding underlying values: > >> (lldb) p UV2->dump() > >> @reg_values = internal unnamed_addr global %struct.varray_head_tag* > null, > >> align 8 > >> (lldb) p UV1->dump() > >> %32 = load %struct.varray_head_tag*, %struct.varray_head_tag** > @reg_values, > >> align 8, !tbaa !2 > >> > >> Backtrace: > >> (lldb) bt > >> * thread #1: tid = 0x120baaf, 0x00000001038b752a libLTO.dylib`(anonymous > >> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10, > >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at > >> GlobalsModRef.cpp:519, queue = 'com.apple.main-thread', stop reason > step > >> over > >> * frame #0: 0x00000001038b752a libLTO.dylib`(anonymous > >> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10, > >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at > >> GlobalsModRef.cpp:519 > >> frame #1: 0x00000001038b82f7 libLTO.dylib`non-virtual thunk to > >> (anonymous namespace)::GlobalsModRef::alias(this=0x000000010eba5c30, > >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 55 at > >> GlobalsModRef.cpp:562 > >> frame #2: 0x00000001038d6aa8 > >> libLTO.dylib`llvm::AliasAnalysis::getModRefInfo(this=0x000000010eba5c30, > >> S=0x000000010a1a22e0, Loc=0x00007fff5fbf6268) + 120 at > AliasAnalysis.cpp:288 > >> frame #3: 0x0000000103a0a814 > >> > libLTO.dylib`llvm::MemoryDependenceAnalysis::getPointerDependencyFrom(this=0x000000010e6cf0c0, > >> MemLoc=0x00007fff5fbf6268, isLoad=true, > ScanIt=llvm::BasicBlock::iterator at > >> 0x00007fff5fbf4390, BB=0x000000010a19ffa0, > QueryInst=0x000000010a1a20c8) + > >> 1908 at MemoryDependenceAnalysis.cpp:570 > >> frame #4: 0x0000000103a0ffa5 > >> > libLTO.dylib`llvm::MemoryDependenceAnalysis::GetNonLocalInfoForBlock(this=0x000000010e6cf0c0, > >> QueryInst=0x000000010a1a20c8, Loc=0x00007fff5fbf6268, isLoad=true, > >> BB=0x000000010a19ffa0, Cache=0x0000000100f9d568, NumSortedEntries=0) + > 2165 > >> at MemoryDependenceAnalysis.cpp:965 > >> frame #5: 0x0000000103a0e3a9 > >> > libLTO.dylib`llvm::MemoryDependenceAnalysis::getNonLocalPointerDepFromBB(this=0x000000010e6cf0c0, > >> QueryInst=0x000000010a1a20c8, Pointer=0x00007fff5fbf62a8, > >> Loc=0x00007fff5fbf6268, isLoad=true, StartBB=0x000000010a19ffa0, > >> Result=0x00007fff5fbf6bf0, Visited=0x00007fff5fbf6208, > SkipFirstBlock=false) > >> + 5897 at MemoryDependenceAnalysis.cpp:1200 > >> frame #6: 0x0000000103a0cb3b > >> > libLTO.dylib`llvm::MemoryDependenceAnalysis::getNonLocalPointerDependency(this=0x000000010e6cf0c0, > >> QueryInst=0x000000010a1a20c8, Result=0x00007fff5fbf6bf0) + 635 at > >> MemoryDependenceAnalysis.cpp:911 > >> frame #7: 0x000000010340c5b5 libLTO.dylib`(anonymous > >> namespace)::GVN::processNonLocalLoad(this=0x000000010e6ce680, > >> LI=0x000000010a1a20c8) + 101 at GVN.cpp:1706 > >> frame #8: 0x0000000103408eef libLTO.dylib`(anonymous > >> namespace)::GVN::processLoad(this=0x000000010e6ce680, > L=0x000000010a1a20c8) > >> + 1551 at GVN.cpp:1905 > >> frame #9: 0x00000001034080fd libLTO.dylib`(anonymous > >> namespace)::GVN::processInstruction(this=0x000000010e6ce680, > >> I=0x000000010a1a20c8) + 397 at GVN.cpp:2220 > >> frame #10: 0x0000000103407d1b libLTO.dylib`(anonymous > >> namespace)::GVN::processBlock(this=0x000000010e6ce680, > >> BB=0x000000010a19ffa0) + 251 at GVN.cpp:2394 > >> frame #11: 0x0000000103401755 libLTO.dylib`(anonymous > >> namespace)::GVN::iterateOnFunction(this=0x000000010e6ce680, > >> F=0x00000001085f69f8) + 1541 at GVN.cpp:2677 > >> frame #12: 0x0000000103400fef libLTO.dylib`(anonymous > >> namespace)::GVN::runOnFunction(this=0x000000010e6ce680, > >> F=0x00000001085f69f8) + 623 at GVN.cpp:2352 > >> frame #13: 0x00000001027cd05b > >> libLTO.dylib`llvm::FPPassManager::runOnFunction(this=0x000000010eba6810, > >> F=0x00000001085f69f8) + 427 at LegacyPassManager.cpp:1520 > >> frame #14: 0x00000001027cd375 > >> libLTO.dylib`llvm::FPPassManager::runOnModule(this=0x000000010eba6810, > >> M=0x000000010115c5f0) + 117 at LegacyPassManager.cpp:1540 > >> frame #15: 0x00000001027cdda1 libLTO.dylib`(anonymous > >> namespace)::MPPassManager::runOnModule(this=0x000000010e6cbaf0, > >> M=0x000000010115c5f0) + 1409 at LegacyPassManager.cpp:1596 > >> frame #16: 0x00000001027cd636 > >> libLTO.dylib`llvm::legacy::PassManagerImpl::run(this=0x000000010e6cb740, > >> M=0x000000010115c5f0) + 310 at LegacyPassManager.cpp:1698 > >> frame #17: 0x00000001027ce521 > >> libLTO.dylib`llvm::legacy::PassManager::run(this=0x00007fff5fbf82b8, > >> M=0x000000010115c5f0) + 33 at LegacyPassManager.cpp:1729 > >> > >> > >> The function body is in the attached file. > >> > >> > >> > >> GlobalsModRef reports NoAlias for this pair, here: > >> if (GV1 || GV2) { > >> // If the global's address is taken, pretend we don't know it's a > >> pointer to > >> // the global. > >> if (GV1 && !NonAddressTakenGlobals.count(GV1)) > >> GV1 = nullptr; > >> if (GV2 && !NonAddressTakenGlobals.count(GV2)) > >> GV2 = nullptr; > >> > >> // If the two pointers are derived from two different non-addr-taken > >> // globals, or if one is and the other isn't, we know these can't > alias. > >> if ((GV1 || GV2) && GV1 != GV2) > >> return NoAlias; > >> > >> // Otherwise if they are both derived from the same addr-taken > global, > >> we > >> // can't know the two accesses don't overlap. > >> } > >> > >> > >> Thanks, > >> Michael > >> > >> On Jul 17, 2015, at 12:18 PM, Chandler Carruth <chandlerc at gmail.com> > wrote: > >> > >> On Fri, Jul 17, 2015 at 9:13 AM Evgeny Astigeevich > >> <evgeny.astigeevich at arm.com> wrote: > >>> > >>> It’s Dhrystone. > >> > >> Dhrystone has historically not been a good indicator of real-world > >> performance fluctuations, especially at this small of a shift. > >> > >> I'd like to see if we see any fluctuation on larger and more realistic > >> application benchmarks. One advantage of the flag being set is that we > >> should get runs from folks who have automatic builds and runs > periodically > >> from trunk. Those should help give an accurate picture. > >> > >>> > >>> > >>> > >>> From: Chandler Carruth [mailto:chandlerc at gmail.com] > >>> Sent: 17 July 2015 16:10 > >>> > >>> > >>> To: Evgeny Astigeevich; Chandler Carruth > >>> Cc: LLVM Developers Mailing List > >>> > >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely > broken > >>> > >>> > >>> > >>> Can you say what Benchmark or give a test case so we understand the > nature > >>> of the regression? As Gerolf said, that will be important to > understand what > >>> is best to do. > >>> > >>> > >>> > >>> On Fri, Jul 17, 2015, 06:43 Evgeny Astigeevich > >>> <Evgeny.Astigeevich at arm.com> wrote: > >>> > >>> Yes, the regression is stable. I double checked this. A full benchmark > run > >>> consists of at least 10 sub-runs to validate the score. > >>> > >>> I also checked if there were regressions of this benchmark across > >>> different ARM hardware versions. I found all regressions of this > benchmark > >>> were in range 1.6%-2%. > >>> > >>> > >>> > >>> Kind regards, > >>> > >>> Evgeny Astigeevich > >>> > >>> > >>> > >>> From: Chandler Carruth [mailto:chandlerc at gmail.com] > >>> Sent: 17 July 2015 07:52 > >>> To: Evgeny Astigeevich; Chandler Carruth > >>> Cc: LLVM Developers Mailing List; Michael Zolotukhin > >>> > >>> > >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely > broken > >>> > >>> > >>> > >>> Hey, thanks for benchmarking. > >>> > >>> > >>> > >>> How stable is the 2% regression? > >>> > >>> > >>> > >>> Michael ran some benchmarks with GlobalsModRef completely disabled and > the > >>> only differences were in the noise. This was a complete spec2k6 run > along > >>> with some others. Based on the number of benchmarks run there, I'm > going to > >>> go ahead and submit these patches, but if you can clarify the impact > here, > >>> we can look at potentially some other tradeoff. I'm not particularly > set on > >>> one set of defaults, etc, I just don't want to keep patches held up > based on > >>> that. We can flip the default back and forth as new data arrives. > >>> > >>> > >>> > >>> On Thu, Jul 16, 2015 at 12:23 PM Evgeny Astigeevich > >>> <evgeny.astigeevich at arm.com> wrote: > >>> > >>> Hi Chandler, > >>> > >>> > >>> > >>> I ran couple benchmarks with LTO turned on and your patches on ARM > >>> hardware. > >>> > >>> There were no performance degradation of one benchmark and 2% slowdown > of > >>> another benchmark. > >>> > >>> > >>> > >>> Kind regards, > >>> > >>> Evgeny Astigeevich > >>> > >>> > >>> > >>> > >>> > >>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On > >>> Behalf Of Evgeny Astigeevich > >>> Sent: 15 July 2015 15:12 > >>> > >>> > >>> To: 'Chandler Carruth'; Gerolf Hoflehner > >>> Cc: LLVM Developers Mailing List > >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely > broken > >>> > >>> > >>> > >>> Hi Chandler, > >>> > >>> > >>> > >>> I would like to run some benchmarks on ARM hardware and to look at > impact > >>> of your patches on LTO. > >>> > >>> > >>> > >>> Kind regards, > >>> > >>> Evgeny Astigeevich > >>> > >>> > >>> > >>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On > >>> Behalf Of Chandler Carruth > >>> > >>> > >>> Sent: 15 July 2015 10:45 > >>> To: Chandler Carruth; Gerolf Hoflehner > >>> Cc: LLVM Developers Mailing List > >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely > broken > >>> > >>> > >>> > >>> I've fixed the obvious bugs I spotted in r242281. These should be pure > >>> correctness improvements. > >>> > >>> > >>> > >>> I've sent the two patches I'm imagining to address the core issue here: > >>> > >>> http://reviews.llvm.org/D11213 > >>> > >>> http://reviews.llvm.org/D11214 > >>> > >>> > >>> > >>> Currently, I have the unsafe alias results disabled by default, but > with a > >>> flag that can re-enable them if needed. I don't feel really strongly > about > >>> which way the default is set -- but that may be because I don't have > lots of > >>> users relying on LTO. I'll let others indicate which way they would be > most > >>> comfortable. > >>> > >>> > >>> > >>> Some IRC conversations indicated that early benchmark results with GMR > >>> completely disabled weren't showing really significant swings, so > maybe this > >>> relatively small reduction in power of GMR won't be too problematic for > >>> folks. Either way, I'm open to the different approaches. It's D11214 > that I > >>> care a lot about. =] > >>> > >>> > >>> > >>> > >>> > >>> Thanks for all the thoughts here! > >>> > >>> -Chandler > >>> > >>> > >>> > >>> On Tue, Jul 14, 2015 at 11:25 PM Chandler Carruth <chandlerc at gmail.com > > > >>> wrote: > >>> > >>> Replying here, but several of the questions raised boil down to > "couldn't > >>> you make the usage of GetUnderlyingObject conservatively correct?". > I'll try > >>> and address that. > >>> > >>> > >>> > >>> I think this *is* the right approach, but I think it is very hard to do > >>> without effectively disabling this part of GlobalsModRef. That is, the > easy > >>> ways are likely to fire very frequently IMO. > >>> > >>> > >>> > >>> The core idea is to detect a "no information" state coming out of > >>> GetUnderlyingObject (likely be providing a custom version just for > >>> GlobalsModRef and tailored to its use cases). This is particularly > effective > >>> at avoiding the problems with the recursion limit. But let's look at > what > >>> cases we *wouldn't* return that. Here are the cases I see when I > thought > >>> about this last night with Hal, roughly in descending likelihood I > would > >>> guess: > >>> > >>> > >>> > >>> 1) We detect some global or an alloca. In that case, even BasicAA > would be > >>> sufficient to provide no-alias. GMR shouldn't be relevant. > >>> > >>> > >>> > >>> 2) We detect a phi, select, or inttoptr, and stop there. > >>> > >>> > >>> > >>> 3) We detect a load and stop there. > >>> > >>> > >>> > >>> 4) We detect a return from a function. > >>> > >>> > >>> > >>> 5) We detect an argument to the function. > >>> > >>> > >>> > >>> I strongly suspect the vast majority of queries hit #1. That's why > BasicAA > >>> is *so* effective. Both #4 and #5 I think are actually reasonable > places for > >>> GMR to potentially say "no-alias" and provide useful definitive > information. > >>> But I also suspect these are the least common. > >>> > >>> > >>> > >>> So let's look at #2 and #3 because I think they're interesting. For > these, > >>> I think it is extremely hard to return "no-alias". It seems extremely > easy > >>> for a reasonable and innocuous change to the IR to introduce a phi or a > >>> select into one side of the GetUnderlyingObject but not the other. If > that > >>> ever happens, we can't return "no-alias" for #2, or we need to add > really > >>> expensive updates. It also seems reasonable (if not as likely) to want > >>> adding a store and load to the IR to not trigger a miscompile. If it is > >>> possible for a valid optimization pass to do reg2mem on an SSA value, > then > >>> that could happen to only one side of the paired GetUnderlyingObject > and > >>> break GMR with #3. If that seems like an unreasonable thing to do, > consider > >>> loop re-rolling or other transformations which may need to take things > in > >>> SSA form at move them out of SSA form. Even if we then try immediately > to > >>> put it back *into* SSA form, before we do that we create a point where > GMR > >>> cannot correctly return no-alias. > >>> > >>> > >>> > >>> So ultimately, I don't think we want to rely on GMR returning > "no-alias" > >>> for either #2 or #3 because of the challenge of actually updating it > in all > >>> of the circumstances that could break them. That means that *only* #4 > and #5 > >>> are going to return "no-alias" usefully. And even then, function > inlining > >>> and function outlining both break #4 and #5, so you have to preclude > those > >>> transforms while GMR is active. And I have serious doubts about these > >>> providing enough value to be worth the cost. > >>> > >>> > >>> > >>> > >>> > >>> I think the better way to approach this is the other way around. Rather > >>> than doing a backwards analysis to see if one location reaches and > global > >>> and the other location doesn't reach a global, I think it would be > much more > >>> effective to re-cast this as a forward analysis that determines all the > >>> memory locations in a function that come from outside the function, > and use > >>> that to drive the no-alias responses. > >>> > >>> > >>> > >>> > >>> > >>> On Tue, Jul 14, 2015 at 12:12 PM Gerolf Hoflehner < > ghoflehner at apple.com> > >>> wrote: > >>> > >>> I wouldn’t be willing to give up performance for hypothetical issues. > >>> Please protect all your changes with options. For some of your > concerns it > >>> is probably hard to provide a test case that shows an/the actual issue. > >>> > >>> > >>> > >>> I certainly agree that it will be very hard to provide a test case and > >>> extremely rare to see this in the wild for most of these issues. As > long as > >>> I can remove the problematic update API we currently have (which as > its an > >>> API change can't really be put behind flags), I'm happy to have flags > >>> control whether or not GMR uses the unsound / stale information to try > to > >>> answer alias queries. Do you have any opinion about what the default > value > >>> of the flags should be? > >>> > >>> > >>> > >>> I'll go ahead and prepare the patches, as it seems like we're all > ending > >>> up in the same position, and just wondering about the precise > tradeoffs we > >>> want to settle on. > >>> > >>> > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >>> > >>> > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >>> > >>> > >>> ______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150721/0520ab1e/attachment.html>
Pete Cooper
2015-Jul-21 22:55 UTC
[LLVMdev] GlobalsModRef (and thus LTO) is completely broken
>From a quick look at the IR snippet you gave:%32 = load %struct.varray_head_tag*, %struct.varray_head_tag** @reg_values, align 8, !tbaa !2 ; <<<<< @reg_values %data.i = getelementptr inbounds %struct.varray_head_tag, %struct.varray_head_tag* %32, i64 0, i32 4 ; <<<<<<< %te.i = bitcast %union.varray_data_tag* %data.i to [1 x %struct.elt_list*]* ; <<<<<<<<<< %arrayidx.i = getelementptr inbounds [1 x %struct.elt_list*], [1 x %struct.elt_list*]* %te.i, i64 0, i64 %indvars.iv.i ; <<<<<<<<<<<< store %struct.elt_list* null, %struct.elt_list** %arrayidx.i, align 8, !tbaa !2 no alias seems to be the right choice here, although its something I’d have expected BasicAA to work out. I think we’re ultimately looking to see if a field of @reg_values aliases @reg_values itself, which is of course not possible. So we have something along the lines of struct varray_head_tag { type0 field0; type1 field1; type2 field2; type3 field3; varray_data_tag field4[some number of elements]; // this is what %data.i points to, and %arrayidx.i points to a specific element } reg_values; so by storing in to field4[%indvars.iv.i], we can’t be aliasing @reg_values. Unless i’ve got my number of pointers wrong of course, then it might be possible. Pete> On Jul 21, 2015, at 3:43 PM, Michael Zolotukhin <mzolotukhin at apple.com> wrote: > > >> On Jul 21, 2015, at 3:34 PM, Daniel Berlin <dberlin at dberlin.org> wrote: >> >> Based on function names and structures, this is some version of GCC :) > Yep, it’s 403.gcc from SPEC2006 (I thought I mentioned it - but probably I only did that on IRC). > >> Any way you can post the entire .ll file? > It’s an LTO build, so it’d be troublesome.. I tried to print the module in lldb, but after a several minutes it hasn’t even finished printing globals (which I assume is the very beginning). > >> >> Because it's globalsmodref, it's hard to debug without the other >> functions, since it goes over all the functions to determine address >> takenness, etc :) > Yep, I understand that - I’m still in debugger though, so if you’re interested in some particular data, I can try collecting it. I can try to dump the module too, but it might be not-practical in the end:) > > Michael >> >> >> On Tue, Jul 21, 2015 at 3:23 PM, Michael Zolotukhin >> <mzolotukhin at apple.com> wrote: >>> Hi Chandler, >>> >>> We observed some regressions in our regular testing (despite I saw nothing >>> suspicious in my runs). I did accurate investigation and was able to >>> reproduce and track down the regression. >>> I found the exact request to GlobalsModRef that results in the performance >>> loss (I added a limit on number of requests into the implementation and >>> bisected the number to find the interesting request). >>> >>> Here are the details: >>> >>> We’re checking following two locations: >>> >>> (lldb) p ((llvm::Instruction*)(LocA.Ptr))->dump() >>> %arrayidx.i = getelementptr inbounds [1 x %struct.elt_list*], [1 x >>> %struct.elt_list*]* %te.i, i64 0, i64 %indvars.iv.i >>> (lldb) p ((llvm::Instruction*)(LocB.Ptr))->dump() >>> @reg_values = internal unnamed_addr global %struct.varray_head_tag* null, >>> align 8 >>> >>> and the function in question is “cselib_init”: >>> (lldb) p >>> ((llvm::Instruction*)(LocA.Ptr))->getParent()->getParent()->getName() >>> (llvm::StringRef) $3 = (Data = "cselib_init", Length = 11) >>> >>> Corresponding underlying values: >>> (lldb) p UV2->dump() >>> @reg_values = internal unnamed_addr global %struct.varray_head_tag* null, >>> align 8 >>> (lldb) p UV1->dump() >>> %32 = load %struct.varray_head_tag*, %struct.varray_head_tag** @reg_values, >>> align 8, !tbaa !2 >>> >>> Backtrace: >>> (lldb) bt >>> * thread #1: tid = 0x120baaf, 0x00000001038b752a libLTO.dylib`(anonymous >>> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10, >>> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at >>> GlobalsModRef.cpp:519, queue = 'com.apple.main-thread', stop reason = step >>> over >>> * frame #0: 0x00000001038b752a libLTO.dylib`(anonymous >>> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10, >>> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at >>> GlobalsModRef.cpp:519 >>> frame #1: 0x00000001038b82f7 libLTO.dylib`non-virtual thunk to >>> (anonymous namespace)::GlobalsModRef::alias(this=0x000000010eba5c30, >>> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 55 at >>> GlobalsModRef.cpp:562 >>> frame #2: 0x00000001038d6aa8 >>> libLTO.dylib`llvm::AliasAnalysis::getModRefInfo(this=0x000000010eba5c30, >>> S=0x000000010a1a22e0, Loc=0x00007fff5fbf6268) + 120 at AliasAnalysis.cpp:288 >>> frame #3: 0x0000000103a0a814 >>> libLTO.dylib`llvm::MemoryDependenceAnalysis::getPointerDependencyFrom(this=0x000000010e6cf0c0, >>> MemLoc=0x00007fff5fbf6268, isLoad=true, ScanIt=llvm::BasicBlock::iterator at >>> 0x00007fff5fbf4390, BB=0x000000010a19ffa0, QueryInst=0x000000010a1a20c8) + >>> 1908 at MemoryDependenceAnalysis.cpp:570 >>> frame #4: 0x0000000103a0ffa5 >>> libLTO.dylib`llvm::MemoryDependenceAnalysis::GetNonLocalInfoForBlock(this=0x000000010e6cf0c0, >>> QueryInst=0x000000010a1a20c8, Loc=0x00007fff5fbf6268, isLoad=true, >>> BB=0x000000010a19ffa0, Cache=0x0000000100f9d568, NumSortedEntries=0) + 2165 >>> at MemoryDependenceAnalysis.cpp:965 >>> frame #5: 0x0000000103a0e3a9 >>> libLTO.dylib`llvm::MemoryDependenceAnalysis::getNonLocalPointerDepFromBB(this=0x000000010e6cf0c0, >>> QueryInst=0x000000010a1a20c8, Pointer=0x00007fff5fbf62a8, >>> Loc=0x00007fff5fbf6268, isLoad=true, StartBB=0x000000010a19ffa0, >>> Result=0x00007fff5fbf6bf0, Visited=0x00007fff5fbf6208, SkipFirstBlock=false) >>> + 5897 at MemoryDependenceAnalysis.cpp:1200 >>> frame #6: 0x0000000103a0cb3b >>> libLTO.dylib`llvm::MemoryDependenceAnalysis::getNonLocalPointerDependency(this=0x000000010e6cf0c0, >>> QueryInst=0x000000010a1a20c8, Result=0x00007fff5fbf6bf0) + 635 at >>> MemoryDependenceAnalysis.cpp:911 >>> frame #7: 0x000000010340c5b5 libLTO.dylib`(anonymous >>> namespace)::GVN::processNonLocalLoad(this=0x000000010e6ce680, >>> LI=0x000000010a1a20c8) + 101 at GVN.cpp:1706 >>> frame #8: 0x0000000103408eef libLTO.dylib`(anonymous >>> namespace)::GVN::processLoad(this=0x000000010e6ce680, L=0x000000010a1a20c8) >>> + 1551 at GVN.cpp:1905 >>> frame #9: 0x00000001034080fd libLTO.dylib`(anonymous >>> namespace)::GVN::processInstruction(this=0x000000010e6ce680, >>> I=0x000000010a1a20c8) + 397 at GVN.cpp:2220 >>> frame #10: 0x0000000103407d1b libLTO.dylib`(anonymous >>> namespace)::GVN::processBlock(this=0x000000010e6ce680, >>> BB=0x000000010a19ffa0) + 251 at GVN.cpp:2394 >>> frame #11: 0x0000000103401755 libLTO.dylib`(anonymous >>> namespace)::GVN::iterateOnFunction(this=0x000000010e6ce680, >>> F=0x00000001085f69f8) + 1541 at GVN.cpp:2677 >>> frame #12: 0x0000000103400fef libLTO.dylib`(anonymous >>> namespace)::GVN::runOnFunction(this=0x000000010e6ce680, >>> F=0x00000001085f69f8) + 623 at GVN.cpp:2352 >>> frame #13: 0x00000001027cd05b >>> libLTO.dylib`llvm::FPPassManager::runOnFunction(this=0x000000010eba6810, >>> F=0x00000001085f69f8) + 427 at LegacyPassManager.cpp:1520 >>> frame #14: 0x00000001027cd375 >>> libLTO.dylib`llvm::FPPassManager::runOnModule(this=0x000000010eba6810, >>> M=0x000000010115c5f0) + 117 at LegacyPassManager.cpp:1540 >>> frame #15: 0x00000001027cdda1 libLTO.dylib`(anonymous >>> namespace)::MPPassManager::runOnModule(this=0x000000010e6cbaf0, >>> M=0x000000010115c5f0) + 1409 at LegacyPassManager.cpp:1596 >>> frame #16: 0x00000001027cd636 >>> libLTO.dylib`llvm::legacy::PassManagerImpl::run(this=0x000000010e6cb740, >>> M=0x000000010115c5f0) + 310 at LegacyPassManager.cpp:1698 >>> frame #17: 0x00000001027ce521 >>> libLTO.dylib`llvm::legacy::PassManager::run(this=0x00007fff5fbf82b8, >>> M=0x000000010115c5f0) + 33 at LegacyPassManager.cpp:1729 >>> >>> >>> The function body is in the attached file. >>> >>> >>> >>> GlobalsModRef reports NoAlias for this pair, here: >>> if (GV1 || GV2) { >>> // If the global's address is taken, pretend we don't know it's a >>> pointer to >>> // the global. >>> if (GV1 && !NonAddressTakenGlobals.count(GV1)) >>> GV1 = nullptr; >>> if (GV2 && !NonAddressTakenGlobals.count(GV2)) >>> GV2 = nullptr; >>> >>> // If the two pointers are derived from two different non-addr-taken >>> // globals, or if one is and the other isn't, we know these can't alias. >>> if ((GV1 || GV2) && GV1 != GV2) >>> return NoAlias; >>> >>> // Otherwise if they are both derived from the same addr-taken global, >>> we >>> // can't know the two accesses don't overlap. >>> } >>> >>> >>> Thanks, >>> Michael >>> >>> On Jul 17, 2015, at 12:18 PM, Chandler Carruth <chandlerc at gmail.com> wrote: >>> >>> On Fri, Jul 17, 2015 at 9:13 AM Evgeny Astigeevich >>> <evgeny.astigeevich at arm.com> wrote: >>>> >>>> It’s Dhrystone. >>> >>> Dhrystone has historically not been a good indicator of real-world >>> performance fluctuations, especially at this small of a shift. >>> >>> I'd like to see if we see any fluctuation on larger and more realistic >>> application benchmarks. One advantage of the flag being set is that we >>> should get runs from folks who have automatic builds and runs periodically >>> from trunk. Those should help give an accurate picture. >>> >>>> >>>> >>>> >>>> From: Chandler Carruth [mailto:chandlerc at gmail.com] >>>> Sent: 17 July 2015 16:10 >>>> >>>> >>>> To: Evgeny Astigeevich; Chandler Carruth >>>> Cc: LLVM Developers Mailing List >>>> >>>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >>>> >>>> >>>> >>>> Can you say what Benchmark or give a test case so we understand the nature >>>> of the regression? As Gerolf said, that will be important to understand what >>>> is best to do. >>>> >>>> >>>> >>>> On Fri, Jul 17, 2015, 06:43 Evgeny Astigeevich >>>> <Evgeny.Astigeevich at arm.com> wrote: >>>> >>>> Yes, the regression is stable. I double checked this. A full benchmark run >>>> consists of at least 10 sub-runs to validate the score. >>>> >>>> I also checked if there were regressions of this benchmark across >>>> different ARM hardware versions. I found all regressions of this benchmark >>>> were in range 1.6%-2%. >>>> >>>> >>>> >>>> Kind regards, >>>> >>>> Evgeny Astigeevich >>>> >>>> >>>> >>>> From: Chandler Carruth [mailto:chandlerc at gmail.com] >>>> Sent: 17 July 2015 07:52 >>>> To: Evgeny Astigeevich; Chandler Carruth >>>> Cc: LLVM Developers Mailing List; Michael Zolotukhin >>>> >>>> >>>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >>>> >>>> >>>> >>>> Hey, thanks for benchmarking. >>>> >>>> >>>> >>>> How stable is the 2% regression? >>>> >>>> >>>> >>>> Michael ran some benchmarks with GlobalsModRef completely disabled and the >>>> only differences were in the noise. This was a complete spec2k6 run along >>>> with some others. Based on the number of benchmarks run there, I'm going to >>>> go ahead and submit these patches, but if you can clarify the impact here, >>>> we can look at potentially some other tradeoff. I'm not particularly set on >>>> one set of defaults, etc, I just don't want to keep patches held up based on >>>> that. We can flip the default back and forth as new data arrives. >>>> >>>> >>>> >>>> On Thu, Jul 16, 2015 at 12:23 PM Evgeny Astigeevich >>>> <evgeny.astigeevich at arm.com> wrote: >>>> >>>> Hi Chandler, >>>> >>>> >>>> >>>> I ran couple benchmarks with LTO turned on and your patches on ARM >>>> hardware. >>>> >>>> There were no performance degradation of one benchmark and 2% slowdown of >>>> another benchmark. >>>> >>>> >>>> >>>> Kind regards, >>>> >>>> Evgeny Astigeevich >>>> >>>> >>>> >>>> >>>> >>>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On >>>> Behalf Of Evgeny Astigeevich >>>> Sent: 15 July 2015 15:12 >>>> >>>> >>>> To: 'Chandler Carruth'; Gerolf Hoflehner >>>> Cc: LLVM Developers Mailing List >>>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >>>> >>>> >>>> >>>> Hi Chandler, >>>> >>>> >>>> >>>> I would like to run some benchmarks on ARM hardware and to look at impact >>>> of your patches on LTO. >>>> >>>> >>>> >>>> Kind regards, >>>> >>>> Evgeny Astigeevich >>>> >>>> >>>> >>>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On >>>> Behalf Of Chandler Carruth >>>> >>>> >>>> Sent: 15 July 2015 10:45 >>>> To: Chandler Carruth; Gerolf Hoflehner >>>> Cc: LLVM Developers Mailing List >>>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >>>> >>>> >>>> >>>> I've fixed the obvious bugs I spotted in r242281. These should be pure >>>> correctness improvements. >>>> >>>> >>>> >>>> I've sent the two patches I'm imagining to address the core issue here: >>>> >>>> http://reviews.llvm.org/D11213 >>>> >>>> http://reviews.llvm.org/D11214 >>>> >>>> >>>> >>>> Currently, I have the unsafe alias results disabled by default, but with a >>>> flag that can re-enable them if needed. I don't feel really strongly about >>>> which way the default is set -- but that may be because I don't have lots of >>>> users relying on LTO. I'll let others indicate which way they would be most >>>> comfortable. >>>> >>>> >>>> >>>> Some IRC conversations indicated that early benchmark results with GMR >>>> completely disabled weren't showing really significant swings, so maybe this >>>> relatively small reduction in power of GMR won't be too problematic for >>>> folks. Either way, I'm open to the different approaches. It's D11214 that I >>>> care a lot about. =] >>>> >>>> >>>> >>>> >>>> >>>> Thanks for all the thoughts here! >>>> >>>> -Chandler >>>> >>>> >>>> >>>> On Tue, Jul 14, 2015 at 11:25 PM Chandler Carruth <chandlerc at gmail.com> >>>> wrote: >>>> >>>> Replying here, but several of the questions raised boil down to "couldn't >>>> you make the usage of GetUnderlyingObject conservatively correct?". I'll try >>>> and address that. >>>> >>>> >>>> >>>> I think this *is* the right approach, but I think it is very hard to do >>>> without effectively disabling this part of GlobalsModRef. That is, the easy >>>> ways are likely to fire very frequently IMO. >>>> >>>> >>>> >>>> The core idea is to detect a "no information" state coming out of >>>> GetUnderlyingObject (likely be providing a custom version just for >>>> GlobalsModRef and tailored to its use cases). This is particularly effective >>>> at avoiding the problems with the recursion limit. But let's look at what >>>> cases we *wouldn't* return that. Here are the cases I see when I thought >>>> about this last night with Hal, roughly in descending likelihood I would >>>> guess: >>>> >>>> >>>> >>>> 1) We detect some global or an alloca. In that case, even BasicAA would be >>>> sufficient to provide no-alias. GMR shouldn't be relevant. >>>> >>>> >>>> >>>> 2) We detect a phi, select, or inttoptr, and stop there. >>>> >>>> >>>> >>>> 3) We detect a load and stop there. >>>> >>>> >>>> >>>> 4) We detect a return from a function. >>>> >>>> >>>> >>>> 5) We detect an argument to the function. >>>> >>>> >>>> >>>> I strongly suspect the vast majority of queries hit #1. That's why BasicAA >>>> is *so* effective. Both #4 and #5 I think are actually reasonable places for >>>> GMR to potentially say "no-alias" and provide useful definitive information. >>>> But I also suspect these are the least common. >>>> >>>> >>>> >>>> So let's look at #2 and #3 because I think they're interesting. For these, >>>> I think it is extremely hard to return "no-alias". It seems extremely easy >>>> for a reasonable and innocuous change to the IR to introduce a phi or a >>>> select into one side of the GetUnderlyingObject but not the other. If that >>>> ever happens, we can't return "no-alias" for #2, or we need to add really >>>> expensive updates. It also seems reasonable (if not as likely) to want >>>> adding a store and load to the IR to not trigger a miscompile. If it is >>>> possible for a valid optimization pass to do reg2mem on an SSA value, then >>>> that could happen to only one side of the paired GetUnderlyingObject and >>>> break GMR with #3. If that seems like an unreasonable thing to do, consider >>>> loop re-rolling or other transformations which may need to take things in >>>> SSA form at move them out of SSA form. Even if we then try immediately to >>>> put it back *into* SSA form, before we do that we create a point where GMR >>>> cannot correctly return no-alias. >>>> >>>> >>>> >>>> So ultimately, I don't think we want to rely on GMR returning "no-alias" >>>> for either #2 or #3 because of the challenge of actually updating it in all >>>> of the circumstances that could break them. That means that *only* #4 and #5 >>>> are going to return "no-alias" usefully. And even then, function inlining >>>> and function outlining both break #4 and #5, so you have to preclude those >>>> transforms while GMR is active. And I have serious doubts about these >>>> providing enough value to be worth the cost. >>>> >>>> >>>> >>>> >>>> >>>> I think the better way to approach this is the other way around. Rather >>>> than doing a backwards analysis to see if one location reaches and global >>>> and the other location doesn't reach a global, I think it would be much more >>>> effective to re-cast this as a forward analysis that determines all the >>>> memory locations in a function that come from outside the function, and use >>>> that to drive the no-alias responses. >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Jul 14, 2015 at 12:12 PM Gerolf Hoflehner <ghoflehner at apple.com> >>>> wrote: >>>> >>>> I wouldn’t be willing to give up performance for hypothetical issues. >>>> Please protect all your changes with options. For some of your concerns it >>>> is probably hard to provide a test case that shows an/the actual issue. >>>> >>>> >>>> >>>> I certainly agree that it will be very hard to provide a test case and >>>> extremely rare to see this in the wild for most of these issues. As long as >>>> I can remove the problematic update API we currently have (which as its an >>>> API change can't really be put behind flags), I'm happy to have flags >>>> control whether or not GMR uses the unsound / stale information to try to >>>> answer alias queries. Do you have any opinion about what the default value >>>> of the flags should be? >>>> >>>> >>>> >>>> I'll go ahead and prepare the patches, as it seems like we're all ending >>>> up in the same position, and just wondering about the precise tradeoffs we >>>> want to settle on. >>>> >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>>> ______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Michael Zolotukhin
2015-Jul-21 22:58 UTC
[LLVMdev] GlobalsModRef (and thus LTO) is completely broken
> On Jul 21, 2015, at 3:50 PM, Xinliang David Li <xinliangli at gmail.com> wrote: > > reg_values is a file static variable in cselib.c -- so you might be able to reproduce the issue with a smaller reproducible.The problem here is that we haven’t defined the issue:) The original question was “does GlobalsModRef give us anything, or can we just turn it off?”, so my recent reply shows that turning it off hurts performance on 403.gcc. Michael> > David > > On Tue, Jul 21, 2015 at 3:43 PM, Michael Zolotukhin <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> wrote: > > > On Jul 21, 2015, at 3:34 PM, Daniel Berlin <dberlin at dberlin.org <mailto:dberlin at dberlin.org>> wrote: > > > > Based on function names and structures, this is some version of GCC :) > Yep, it’s 403.gcc from SPEC2006 (I thought I mentioned it - but probably I only did that on IRC). > > > Any way you can post the entire .ll file? > It’s an LTO build, so it’d be troublesome.. I tried to print the module in lldb, but after a several minutes it hasn’t even finished printing globals (which I assume is the very beginning). > > > > > Because it's globalsmodref, it's hard to debug without the other > > functions, since it goes over all the functions to determine address > > takenness, etc :) > Yep, I understand that - I’m still in debugger though, so if you’re interested in some particular data, I can try collecting it. I can try to dump the module too, but it might be not-practical in the end:) > > Michael > > > > > > On Tue, Jul 21, 2015 at 3:23 PM, Michael Zolotukhin > > <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> wrote: > >> Hi Chandler, > >> > >> We observed some regressions in our regular testing (despite I saw nothing > >> suspicious in my runs). I did accurate investigation and was able to > >> reproduce and track down the regression. > >> I found the exact request to GlobalsModRef that results in the performance > >> loss (I added a limit on number of requests into the implementation and > >> bisected the number to find the interesting request). > >> > >> Here are the details: > >> > >> We’re checking following two locations: > >> > >> (lldb) p ((llvm::Instruction*)(LocA.Ptr))->dump() > >> %arrayidx.i = getelementptr inbounds [1 x %struct.elt_list*], [1 x > >> %struct.elt_list*]* %te.i, i64 0, i64 %indvars.iv.i > >> (lldb) p ((llvm::Instruction*)(LocB.Ptr))->dump() > >> @reg_values = internal unnamed_addr global %struct.varray_head_tag* null, > >> align 8 > >> > >> and the function in question is “cselib_init”: > >> (lldb) p > >> ((llvm::Instruction*)(LocA.Ptr))->getParent()->getParent()->getName() > >> (llvm::StringRef) $3 = (Data = "cselib_init", Length = 11) > >> > >> Corresponding underlying values: > >> (lldb) p UV2->dump() > >> @reg_values = internal unnamed_addr global %struct.varray_head_tag* null, > >> align 8 > >> (lldb) p UV1->dump() > >> %32 = load %struct.varray_head_tag*, %struct.varray_head_tag** @reg_values, > >> align 8, !tbaa !2 > >> > >> Backtrace: > >> (lldb) bt > >> * thread #1: tid = 0x120baaf, 0x00000001038b752a libLTO.dylib`(anonymous > >> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10, > >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at > >> GlobalsModRef.cpp:519, queue = 'com.apple.main-thread', stop reason = step > >> over > >> * frame #0: 0x00000001038b752a libLTO.dylib`(anonymous > >> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10, > >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at > >> GlobalsModRef.cpp:519 > >> frame #1: 0x00000001038b82f7 libLTO.dylib`non-virtual thunk to > >> (anonymous namespace)::GlobalsModRef::alias(this=0x000000010eba5c30, > >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 55 at > >> GlobalsModRef.cpp:562 > >> frame #2: 0x00000001038d6aa8 > >> libLTO.dylib`llvm::AliasAnalysis::getModRefInfo(this=0x000000010eba5c30, > >> S=0x000000010a1a22e0, Loc=0x00007fff5fbf6268) + 120 at AliasAnalysis.cpp:288 > >> frame #3: 0x0000000103a0a814 > >> libLTO.dylib`llvm::MemoryDependenceAnalysis::getPointerDependencyFrom(this=0x000000010e6cf0c0, > >> MemLoc=0x00007fff5fbf6268, isLoad=true, ScanIt=llvm::BasicBlock::iterator at > >> 0x00007fff5fbf4390, BB=0x000000010a19ffa0, QueryInst=0x000000010a1a20c8) + > >> 1908 at MemoryDependenceAnalysis.cpp:570 > >> frame #4: 0x0000000103a0ffa5 > >> libLTO.dylib`llvm::MemoryDependenceAnalysis::GetNonLocalInfoForBlock(this=0x000000010e6cf0c0, > >> QueryInst=0x000000010a1a20c8, Loc=0x00007fff5fbf6268, isLoad=true, > >> BB=0x000000010a19ffa0, Cache=0x0000000100f9d568, NumSortedEntries=0) + 2165 > >> at MemoryDependenceAnalysis.cpp:965 > >> frame #5: 0x0000000103a0e3a9 > >> libLTO.dylib`llvm::MemoryDependenceAnalysis::getNonLocalPointerDepFromBB(this=0x000000010e6cf0c0, > >> QueryInst=0x000000010a1a20c8, Pointer=0x00007fff5fbf62a8, > >> Loc=0x00007fff5fbf6268, isLoad=true, StartBB=0x000000010a19ffa0, > >> Result=0x00007fff5fbf6bf0, Visited=0x00007fff5fbf6208, SkipFirstBlock=false) > >> + 5897 at MemoryDependenceAnalysis.cpp:1200 > >> frame #6: 0x0000000103a0cb3b > >> libLTO.dylib`llvm::MemoryDependenceAnalysis::getNonLocalPointerDependency(this=0x000000010e6cf0c0, > >> QueryInst=0x000000010a1a20c8, Result=0x00007fff5fbf6bf0) + 635 at > >> MemoryDependenceAnalysis.cpp:911 > >> frame #7: 0x000000010340c5b5 libLTO.dylib`(anonymous > >> namespace)::GVN::processNonLocalLoad(this=0x000000010e6ce680, > >> LI=0x000000010a1a20c8) + 101 at GVN.cpp:1706 > >> frame #8: 0x0000000103408eef libLTO.dylib`(anonymous > >> namespace)::GVN::processLoad(this=0x000000010e6ce680, L=0x000000010a1a20c8) > >> + 1551 at GVN.cpp:1905 > >> frame #9: 0x00000001034080fd libLTO.dylib`(anonymous > >> namespace)::GVN::processInstruction(this=0x000000010e6ce680, > >> I=0x000000010a1a20c8) + 397 at GVN.cpp:2220 > >> frame #10: 0x0000000103407d1b libLTO.dylib`(anonymous > >> namespace)::GVN::processBlock(this=0x000000010e6ce680, > >> BB=0x000000010a19ffa0) + 251 at GVN.cpp:2394 > >> frame #11: 0x0000000103401755 libLTO.dylib`(anonymous > >> namespace)::GVN::iterateOnFunction(this=0x000000010e6ce680, > >> F=0x00000001085f69f8) + 1541 at GVN.cpp:2677 > >> frame #12: 0x0000000103400fef libLTO.dylib`(anonymous > >> namespace)::GVN::runOnFunction(this=0x000000010e6ce680, > >> F=0x00000001085f69f8) + 623 at GVN.cpp:2352 > >> frame #13: 0x00000001027cd05b > >> libLTO.dylib`llvm::FPPassManager::runOnFunction(this=0x000000010eba6810, > >> F=0x00000001085f69f8) + 427 at LegacyPassManager.cpp:1520 > >> frame #14: 0x00000001027cd375 > >> libLTO.dylib`llvm::FPPassManager::runOnModule(this=0x000000010eba6810, > >> M=0x000000010115c5f0) + 117 at LegacyPassManager.cpp:1540 > >> frame #15: 0x00000001027cdda1 libLTO.dylib`(anonymous > >> namespace)::MPPassManager::runOnModule(this=0x000000010e6cbaf0, > >> M=0x000000010115c5f0) + 1409 at LegacyPassManager.cpp:1596 > >> frame #16: 0x00000001027cd636 > >> libLTO.dylib`llvm::legacy::PassManagerImpl::run(this=0x000000010e6cb740, > >> M=0x000000010115c5f0) + 310 at LegacyPassManager.cpp:1698 > >> frame #17: 0x00000001027ce521 > >> libLTO.dylib`llvm::legacy::PassManager::run(this=0x00007fff5fbf82b8, > >> M=0x000000010115c5f0) + 33 at LegacyPassManager.cpp:1729 > >> > >> > >> The function body is in the attached file. > >> > >> > >> > >> GlobalsModRef reports NoAlias for this pair, here: > >> if (GV1 || GV2) { > >> // If the global's address is taken, pretend we don't know it's a > >> pointer to > >> // the global. > >> if (GV1 && !NonAddressTakenGlobals.count(GV1)) > >> GV1 = nullptr; > >> if (GV2 && !NonAddressTakenGlobals.count(GV2)) > >> GV2 = nullptr; > >> > >> // If the two pointers are derived from two different non-addr-taken > >> // globals, or if one is and the other isn't, we know these can't alias. > >> if ((GV1 || GV2) && GV1 != GV2) > >> return NoAlias; > >> > >> // Otherwise if they are both derived from the same addr-taken global, > >> we > >> // can't know the two accesses don't overlap. > >> } > >> > >> > >> Thanks, > >> Michael > >> > >> On Jul 17, 2015, at 12:18 PM, Chandler Carruth <chandlerc at gmail.com <mailto:chandlerc at gmail.com>> wrote: > >> > >> On Fri, Jul 17, 2015 at 9:13 AM Evgeny Astigeevich > >> <evgeny.astigeevich at arm.com <mailto:evgeny.astigeevich at arm.com>> wrote: > >>> > >>> It’s Dhrystone. > >> > >> Dhrystone has historically not been a good indicator of real-world > >> performance fluctuations, especially at this small of a shift. > >> > >> I'd like to see if we see any fluctuation on larger and more realistic > >> application benchmarks. One advantage of the flag being set is that we > >> should get runs from folks who have automatic builds and runs periodically > >> from trunk. Those should help give an accurate picture. > >> > >>> > >>> > >>> > >>> From: Chandler Carruth [mailto:chandlerc at gmail.com <mailto:chandlerc at gmail.com>] > >>> Sent: 17 July 2015 16:10 > >>> > >>> > >>> To: Evgeny Astigeevich; Chandler Carruth > >>> Cc: LLVM Developers Mailing List > >>> > >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken > >>> > >>> > >>> > >>> Can you say what Benchmark or give a test case so we understand the nature > >>> of the regression? As Gerolf said, that will be important to understand what > >>> is best to do. > >>> > >>> > >>> > >>> On Fri, Jul 17, 2015, 06:43 Evgeny Astigeevich > >>> <Evgeny.Astigeevich at arm.com <mailto:Evgeny.Astigeevich at arm.com>> wrote: > >>> > >>> Yes, the regression is stable. I double checked this. A full benchmark run > >>> consists of at least 10 sub-runs to validate the score. > >>> > >>> I also checked if there were regressions of this benchmark across > >>> different ARM hardware versions. I found all regressions of this benchmark > >>> were in range 1.6%-2%. > >>> > >>> > >>> > >>> Kind regards, > >>> > >>> Evgeny Astigeevich > >>> > >>> > >>> > >>> From: Chandler Carruth [mailto:chandlerc at gmail.com <mailto:chandlerc at gmail.com>] > >>> Sent: 17 July 2015 07:52 > >>> To: Evgeny Astigeevich; Chandler Carruth > >>> Cc: LLVM Developers Mailing List; Michael Zolotukhin > >>> > >>> > >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken > >>> > >>> > >>> > >>> Hey, thanks for benchmarking. > >>> > >>> > >>> > >>> How stable is the 2% regression? > >>> > >>> > >>> > >>> Michael ran some benchmarks with GlobalsModRef completely disabled and the > >>> only differences were in the noise. This was a complete spec2k6 run along > >>> with some others. Based on the number of benchmarks run there, I'm going to > >>> go ahead and submit these patches, but if you can clarify the impact here, > >>> we can look at potentially some other tradeoff. I'm not particularly set on > >>> one set of defaults, etc, I just don't want to keep patches held up based on > >>> that. We can flip the default back and forth as new data arrives. > >>> > >>> > >>> > >>> On Thu, Jul 16, 2015 at 12:23 PM Evgeny Astigeevich > >>> <evgeny.astigeevich at arm.com <mailto:evgeny.astigeevich at arm.com>> wrote: > >>> > >>> Hi Chandler, > >>> > >>> > >>> > >>> I ran couple benchmarks with LTO turned on and your patches on ARM > >>> hardware. > >>> > >>> There were no performance degradation of one benchmark and 2% slowdown of > >>> another benchmark. > >>> > >>> > >>> > >>> Kind regards, > >>> > >>> Evgeny Astigeevich > >>> > >>> > >>> > >>> > >>> > >>> From: llvmdev-bounces at cs.uiuc.edu <mailto:llvmdev-bounces at cs.uiuc.edu> [mailto:llvmdev-bounces at cs.uiuc.edu <mailto:llvmdev-bounces at cs.uiuc.edu>] On > >>> Behalf Of Evgeny Astigeevich > >>> Sent: 15 July 2015 15:12 > >>> > >>> > >>> To: 'Chandler Carruth'; Gerolf Hoflehner > >>> Cc: LLVM Developers Mailing List > >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken > >>> > >>> > >>> > >>> Hi Chandler, > >>> > >>> > >>> > >>> I would like to run some benchmarks on ARM hardware and to look at impact > >>> of your patches on LTO. > >>> > >>> > >>> > >>> Kind regards, > >>> > >>> Evgeny Astigeevich > >>> > >>> > >>> > >>> From: llvmdev-bounces at cs.uiuc.edu <mailto:llvmdev-bounces at cs.uiuc.edu> [mailto:llvmdev-bounces at cs.uiuc.edu <mailto:llvmdev-bounces at cs.uiuc.edu>] On > >>> Behalf Of Chandler Carruth > >>> > >>> > >>> Sent: 15 July 2015 10:45 > >>> To: Chandler Carruth; Gerolf Hoflehner > >>> Cc: LLVM Developers Mailing List > >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken > >>> > >>> > >>> > >>> I've fixed the obvious bugs I spotted in r242281. These should be pure > >>> correctness improvements. > >>> > >>> > >>> > >>> I've sent the two patches I'm imagining to address the core issue here: > >>> > >>> http://reviews.llvm.org/D11213 <http://reviews.llvm.org/D11213> > >>> > >>> http://reviews.llvm.org/D11214 <http://reviews.llvm.org/D11214> > >>> > >>> > >>> > >>> Currently, I have the unsafe alias results disabled by default, but with a > >>> flag that can re-enable them if needed. I don't feel really strongly about > >>> which way the default is set -- but that may be because I don't have lots of > >>> users relying on LTO. I'll let others indicate which way they would be most > >>> comfortable. > >>> > >>> > >>> > >>> Some IRC conversations indicated that early benchmark results with GMR > >>> completely disabled weren't showing really significant swings, so maybe this > >>> relatively small reduction in power of GMR won't be too problematic for > >>> folks. Either way, I'm open to the different approaches. It's D11214 that I > >>> care a lot about. =] > >>> > >>> > >>> > >>> > >>> > >>> Thanks for all the thoughts here! > >>> > >>> -Chandler > >>> > >>> > >>> > >>> On Tue, Jul 14, 2015 at 11:25 PM Chandler Carruth <chandlerc at gmail.com <mailto:chandlerc at gmail.com>> > >>> wrote: > >>> > >>> Replying here, but several of the questions raised boil down to "couldn't > >>> you make the usage of GetUnderlyingObject conservatively correct?". I'll try > >>> and address that. > >>> > >>> > >>> > >>> I think this *is* the right approach, but I think it is very hard to do > >>> without effectively disabling this part of GlobalsModRef. That is, the easy > >>> ways are likely to fire very frequently IMO. > >>> > >>> > >>> > >>> The core idea is to detect a "no information" state coming out of > >>> GetUnderlyingObject (likely be providing a custom version just for > >>> GlobalsModRef and tailored to its use cases). This is particularly effective > >>> at avoiding the problems with the recursion limit. But let's look at what > >>> cases we *wouldn't* return that. Here are the cases I see when I thought > >>> about this last night with Hal, roughly in descending likelihood I would > >>> guess: > >>> > >>> > >>> > >>> 1) We detect some global or an alloca. In that case, even BasicAA would be > >>> sufficient to provide no-alias. GMR shouldn't be relevant. > >>> > >>> > >>> > >>> 2) We detect a phi, select, or inttoptr, and stop there. > >>> > >>> > >>> > >>> 3) We detect a load and stop there. > >>> > >>> > >>> > >>> 4) We detect a return from a function. > >>> > >>> > >>> > >>> 5) We detect an argument to the function. > >>> > >>> > >>> > >>> I strongly suspect the vast majority of queries hit #1. That's why BasicAA > >>> is *so* effective. Both #4 and #5 I think are actually reasonable places for > >>> GMR to potentially say "no-alias" and provide useful definitive information. > >>> But I also suspect these are the least common. > >>> > >>> > >>> > >>> So let's look at #2 and #3 because I think they're interesting. For these, > >>> I think it is extremely hard to return "no-alias". It seems extremely easy > >>> for a reasonable and innocuous change to the IR to introduce a phi or a > >>> select into one side of the GetUnderlyingObject but not the other. If that > >>> ever happens, we can't return "no-alias" for #2, or we need to add really > >>> expensive updates. It also seems reasonable (if not as likely) to want > >>> adding a store and load to the IR to not trigger a miscompile. If it is > >>> possible for a valid optimization pass to do reg2mem on an SSA value, then > >>> that could happen to only one side of the paired GetUnderlyingObject and > >>> break GMR with #3. If that seems like an unreasonable thing to do, consider > >>> loop re-rolling or other transformations which may need to take things in > >>> SSA form at move them out of SSA form. Even if we then try immediately to > >>> put it back *into* SSA form, before we do that we create a point where GMR > >>> cannot correctly return no-alias. > >>> > >>> > >>> > >>> So ultimately, I don't think we want to rely on GMR returning "no-alias" > >>> for either #2 or #3 because of the challenge of actually updating it in all > >>> of the circumstances that could break them. That means that *only* #4 and #5 > >>> are going to return "no-alias" usefully. And even then, function inlining > >>> and function outlining both break #4 and #5, so you have to preclude those > >>> transforms while GMR is active. And I have serious doubts about these > >>> providing enough value to be worth the cost. > >>> > >>> > >>> > >>> > >>> > >>> I think the better way to approach this is the other way around. Rather > >>> than doing a backwards analysis to see if one location reaches and global > >>> and the other location doesn't reach a global, I think it would be much more > >>> effective to re-cast this as a forward analysis that determines all the > >>> memory locations in a function that come from outside the function, and use > >>> that to drive the no-alias responses. > >>> > >>> > >>> > >>> > >>> > >>> On Tue, Jul 14, 2015 at 12:12 PM Gerolf Hoflehner <ghoflehner at apple.com <mailto:ghoflehner at apple.com>> > >>> wrote: > >>> > >>> I wouldn’t be willing to give up performance for hypothetical issues. > >>> Please protect all your changes with options. For some of your concerns it > >>> is probably hard to provide a test case that shows an/the actual issue. > >>> > >>> > >>> > >>> I certainly agree that it will be very hard to provide a test case and > >>> extremely rare to see this in the wild for most of these issues. As long as > >>> I can remove the problematic update API we currently have (which as its an > >>> API change can't really be put behind flags), I'm happy to have flags > >>> control whether or not GMR uses the unsound / stale information to try to > >>> answer alias queries. Do you have any opinion about what the default value > >>> of the flags should be? > >>> > >>> > >>> > >>> I'll go ahead and prepare the patches, as it seems like we're all ending > >>> up in the same position, and just wondering about the precise tradeoffs we > >>> want to settle on. > >>> > >>> > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> > >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > >>> > >>> > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > >>> > >>> > >>> ______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > >> > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > >> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150721/0ee4a211/attachment.html>
Chandler Carruth
2015-Jul-21 23:12 UTC
[LLVMdev] GlobalsModRef (and thus LTO) is completely broken
FWIW, after talking extensively on IRC with Michael and Hal, I think I can add support for handling this in a sound and principled way to GlobalsModRef. I'll be looking into a patch next. On Tue, Jul 21, 2015 at 4:06 PM Michael Zolotukhin <mzolotukhin at apple.com> wrote:> On Jul 21, 2015, at 3:50 PM, Xinliang David Li <xinliangli at gmail.com> > wrote: > > reg_values is a file static variable in cselib.c -- so you might be able > to reproduce the issue with a smaller reproducible. > > The problem here is that we haven’t defined the issue:) The original > question was “does GlobalsModRef give us anything, or can we just turn it > off?”, so my recent reply shows that turning it off hurts performance on > 403.gcc. > > Michael > > > David > > On Tue, Jul 21, 2015 at 3:43 PM, Michael Zolotukhin <mzolotukhin at apple.com > > wrote: > > >> > On Jul 21, 2015, at 3:34 PM, Daniel Berlin <dberlin at dberlin.org> wrote: >> > >> > Based on function names and structures, this is some version of GCC :) >> Yep, it’s 403.gcc from SPEC2006 (I thought I mentioned it - but probably >> I only did that on IRC). >> >> > Any way you can post the entire .ll file? >> It’s an LTO build, so it’d be troublesome.. I tried to print the module >> in lldb, but after a several minutes it hasn’t even finished printing >> globals (which I assume is the very beginning). >> >> > >> > Because it's globalsmodref, it's hard to debug without the other >> > functions, since it goes over all the functions to determine address >> > takenness, etc :) >> Yep, I understand that - I’m still in debugger though, so if you’re >> interested in some particular data, I can try collecting it. I can try to >> dump the module too, but it might be not-practical in the end:) >> >> Michael >> > > >> > >> > On Tue, Jul 21, 2015 at 3:23 PM, Michael Zolotukhin >> > <mzolotukhin at apple.com> wrote: >> >> Hi Chandler, >> >> >> >> We observed some regressions in our regular testing (despite I saw >> nothing >> >> suspicious in my runs). I did accurate investigation and was able to >> >> reproduce and track down the regression. >> >> I found the exact request to GlobalsModRef that results in the >> performance >> >> loss (I added a limit on number of requests into the implementation and >> >> bisected the number to find the interesting request). >> >> >> >> Here are the details: >> >> >> >> We’re checking following two locations: >> >> >> >> (lldb) p ((llvm::Instruction*)(LocA.Ptr))->dump() >> >> %arrayidx.i = getelementptr inbounds [1 x %struct.elt_list*], [1 x >> >> %struct.elt_list*]* %te.i, i64 0, i64 %indvars.iv.i >> >> (lldb) p ((llvm::Instruction*)(LocB.Ptr))->dump() >> >> @reg_values = internal unnamed_addr global %struct.varray_head_tag* >> null, >> >> align 8 >> >> >> >> and the function in question is “cselib_init”: >> >> (lldb) p >> >> ((llvm::Instruction*)(LocA.Ptr))->getParent()->getParent()->getName() >> >> (llvm::StringRef) $3 = (Data = "cselib_init", Length = 11) >> >> >> >> Corresponding underlying values: >> >> (lldb) p UV2->dump() >> >> @reg_values = internal unnamed_addr global %struct.varray_head_tag* >> null, >> >> align 8 >> >> (lldb) p UV1->dump() >> >> %32 = load %struct.varray_head_tag*, %struct.varray_head_tag** >> @reg_values, >> >> align 8, !tbaa !2 >> >> >> >> Backtrace: >> >> (lldb) bt >> >> * thread #1: tid = 0x120baaf, 0x00000001038b752a >> libLTO.dylib`(anonymous >> >> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10, >> >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at >> >> GlobalsModRef.cpp:519, queue = 'com.apple.main-thread', stop reason >> step >> >> over >> >> * frame #0: 0x00000001038b752a libLTO.dylib`(anonymous >> >> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10, >> >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at >> >> GlobalsModRef.cpp:519 >> >> frame #1: 0x00000001038b82f7 libLTO.dylib`non-virtual thunk to >> >> (anonymous namespace)::GlobalsModRef::alias(this=0x000000010eba5c30, >> >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 55 at >> >> GlobalsModRef.cpp:562 >> >> frame #2: 0x00000001038d6aa8 >> >> >> libLTO.dylib`llvm::AliasAnalysis::getModRefInfo(this=0x000000010eba5c30, >> >> S=0x000000010a1a22e0, Loc=0x00007fff5fbf6268) + 120 at >> AliasAnalysis.cpp:288 >> >> frame #3: 0x0000000103a0a814 >> >> >> libLTO.dylib`llvm::MemoryDependenceAnalysis::getPointerDependencyFrom(this=0x000000010e6cf0c0, >> >> MemLoc=0x00007fff5fbf6268, isLoad=true, >> ScanIt=llvm::BasicBlock::iterator at >> >> 0x00007fff5fbf4390, BB=0x000000010a19ffa0, >> QueryInst=0x000000010a1a20c8) + >> >> 1908 at MemoryDependenceAnalysis.cpp:570 >> >> frame #4: 0x0000000103a0ffa5 >> >> >> libLTO.dylib`llvm::MemoryDependenceAnalysis::GetNonLocalInfoForBlock(this=0x000000010e6cf0c0, >> >> QueryInst=0x000000010a1a20c8, Loc=0x00007fff5fbf6268, isLoad=true, >> >> BB=0x000000010a19ffa0, Cache=0x0000000100f9d568, NumSortedEntries=0) + >> 2165 >> >> at MemoryDependenceAnalysis.cpp:965 >> >> frame #5: 0x0000000103a0e3a9 >> >> >> libLTO.dylib`llvm::MemoryDependenceAnalysis::getNonLocalPointerDepFromBB(this=0x000000010e6cf0c0, >> >> QueryInst=0x000000010a1a20c8, Pointer=0x00007fff5fbf62a8, >> >> Loc=0x00007fff5fbf6268, isLoad=true, StartBB=0x000000010a19ffa0, >> >> Result=0x00007fff5fbf6bf0, Visited=0x00007fff5fbf6208, >> SkipFirstBlock=false) >> >> + 5897 at MemoryDependenceAnalysis.cpp:1200 >> >> frame #6: 0x0000000103a0cb3b >> >> >> libLTO.dylib`llvm::MemoryDependenceAnalysis::getNonLocalPointerDependency(this=0x000000010e6cf0c0, >> >> QueryInst=0x000000010a1a20c8, Result=0x00007fff5fbf6bf0) + 635 at >> >> MemoryDependenceAnalysis.cpp:911 >> >> frame #7: 0x000000010340c5b5 libLTO.dylib`(anonymous >> >> namespace)::GVN::processNonLocalLoad(this=0x000000010e6ce680, >> >> LI=0x000000010a1a20c8) + 101 at GVN.cpp:1706 >> >> frame #8: 0x0000000103408eef libLTO.dylib`(anonymous >> >> namespace)::GVN::processLoad(this=0x000000010e6ce680, >> L=0x000000010a1a20c8) >> >> + 1551 at GVN.cpp:1905 >> >> frame #9: 0x00000001034080fd libLTO.dylib`(anonymous >> >> namespace)::GVN::processInstruction(this=0x000000010e6ce680, >> >> I=0x000000010a1a20c8) + 397 at GVN.cpp:2220 >> >> frame #10: 0x0000000103407d1b libLTO.dylib`(anonymous >> >> namespace)::GVN::processBlock(this=0x000000010e6ce680, >> >> BB=0x000000010a19ffa0) + 251 at GVN.cpp:2394 >> >> frame #11: 0x0000000103401755 libLTO.dylib`(anonymous >> >> namespace)::GVN::iterateOnFunction(this=0x000000010e6ce680, >> >> F=0x00000001085f69f8) + 1541 at GVN.cpp:2677 >> >> frame #12: 0x0000000103400fef libLTO.dylib`(anonymous >> >> namespace)::GVN::runOnFunction(this=0x000000010e6ce680, >> >> F=0x00000001085f69f8) + 623 at GVN.cpp:2352 >> >> frame #13: 0x00000001027cd05b >> >> >> libLTO.dylib`llvm::FPPassManager::runOnFunction(this=0x000000010eba6810, >> >> F=0x00000001085f69f8) + 427 at LegacyPassManager.cpp:1520 >> >> frame #14: 0x00000001027cd375 >> >> libLTO.dylib`llvm::FPPassManager::runOnModule(this=0x000000010eba6810, >> >> M=0x000000010115c5f0) + 117 at LegacyPassManager.cpp:1540 >> >> frame #15: 0x00000001027cdda1 libLTO.dylib`(anonymous >> >> namespace)::MPPassManager::runOnModule(this=0x000000010e6cbaf0, >> >> M=0x000000010115c5f0) + 1409 at LegacyPassManager.cpp:1596 >> >> frame #16: 0x00000001027cd636 >> >> >> libLTO.dylib`llvm::legacy::PassManagerImpl::run(this=0x000000010e6cb740, >> >> M=0x000000010115c5f0) + 310 at LegacyPassManager.cpp:1698 >> >> frame #17: 0x00000001027ce521 >> >> libLTO.dylib`llvm::legacy::PassManager::run(this=0x00007fff5fbf82b8, >> >> M=0x000000010115c5f0) + 33 at LegacyPassManager.cpp:1729 >> >> >> >> >> >> The function body is in the attached file. >> >> >> >> >> >> >> >> GlobalsModRef reports NoAlias for this pair, here: >> >> if (GV1 || GV2) { >> >> // If the global's address is taken, pretend we don't know it's a >> >> pointer to >> >> // the global. >> >> if (GV1 && !NonAddressTakenGlobals.count(GV1)) >> >> GV1 = nullptr; >> >> if (GV2 && !NonAddressTakenGlobals.count(GV2)) >> >> GV2 = nullptr; >> >> >> >> // If the two pointers are derived from two different non-addr-taken >> >> // globals, or if one is and the other isn't, we know these can't >> alias. >> >> if ((GV1 || GV2) && GV1 != GV2) >> >> return NoAlias; >> >> >> >> // Otherwise if they are both derived from the same addr-taken >> global, >> >> we >> >> // can't know the two accesses don't overlap. >> >> } >> >> >> >> >> >> Thanks, >> >> Michael >> >> >> >> On Jul 17, 2015, at 12:18 PM, Chandler Carruth <chandlerc at gmail.com> >> wrote: >> >> >> >> On Fri, Jul 17, 2015 at 9:13 AM Evgeny Astigeevich >> >> <evgeny.astigeevich at arm.com> wrote: >> >>> >> >>> It’s Dhrystone. >> >> >> >> Dhrystone has historically not been a good indicator of real-world >> >> performance fluctuations, especially at this small of a shift. >> >> >> >> I'd like to see if we see any fluctuation on larger and more realistic >> >> application benchmarks. One advantage of the flag being set is that we >> >> should get runs from folks who have automatic builds and runs >> periodically >> >> from trunk. Those should help give an accurate picture. >> >> >> >>> >> >>> >> >>> >> >>> From: Chandler Carruth [mailto:chandlerc at gmail.com] >> >>> Sent: 17 July 2015 16:10 >> >>> >> >>> >> >>> To: Evgeny Astigeevich; Chandler Carruth >> >>> Cc: LLVM Developers Mailing List >> >>> >> >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely >> broken >> >>> >> >>> >> >>> >> >>> Can you say what Benchmark or give a test case so we understand the >> nature >> >>> of the regression? As Gerolf said, that will be important to >> understand what >> >>> is best to do. >> >>> >> >>> >> >>> >> >>> On Fri, Jul 17, 2015, 06:43 Evgeny Astigeevich >> >>> <Evgeny.Astigeevich at arm.com> wrote: >> >>> >> >>> Yes, the regression is stable. I double checked this. A full >> benchmark run >> >>> consists of at least 10 sub-runs to validate the score. >> >>> >> >>> I also checked if there were regressions of this benchmark across >> >>> different ARM hardware versions. I found all regressions of this >> benchmark >> >>> were in range 1.6%-2%. >> >>> >> >>> >> >>> >> >>> Kind regards, >> >>> >> >>> Evgeny Astigeevich >> >>> >> >>> >> >>> >> >>> From: Chandler Carruth [mailto:chandlerc at gmail.com] >> >>> Sent: 17 July 2015 07:52 >> >>> To: Evgeny Astigeevich; Chandler Carruth >> >>> Cc: LLVM Developers Mailing List; Michael Zolotukhin >> >>> >> >>> >> >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely >> broken >> >>> >> >>> >> >>> >> >>> Hey, thanks for benchmarking. >> >>> >> >>> >> >>> >> >>> How stable is the 2% regression? >> >>> >> >>> >> >>> >> >>> Michael ran some benchmarks with GlobalsModRef completely disabled >> and the >> >>> only differences were in the noise. This was a complete spec2k6 run >> along >> >>> with some others. Based on the number of benchmarks run there, I'm >> going to >> >>> go ahead and submit these patches, but if you can clarify the impact >> here, >> >>> we can look at potentially some other tradeoff. I'm not particularly >> set on >> >>> one set of defaults, etc, I just don't want to keep patches held up >> based on >> >>> that. We can flip the default back and forth as new data arrives. >> >>> >> >>> >> >>> >> >>> On Thu, Jul 16, 2015 at 12:23 PM Evgeny Astigeevich >> >>> <evgeny.astigeevich at arm.com> wrote: >> >>> >> >>> Hi Chandler, >> >>> >> >>> >> >>> >> >>> I ran couple benchmarks with LTO turned on and your patches on ARM >> >>> hardware. >> >>> >> >>> There were no performance degradation of one benchmark and 2% >> slowdown of >> >>> another benchmark. >> >>> >> >>> >> >>> >> >>> Kind regards, >> >>> >> >>> Evgeny Astigeevich >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] >> On >> >>> Behalf Of Evgeny Astigeevich >> >>> Sent: 15 July 2015 15:12 >> >>> >> >>> >> >>> To: 'Chandler Carruth'; Gerolf Hoflehner >> >>> Cc: LLVM Developers Mailing List >> >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely >> broken >> >>> >> >>> >> >>> >> >>> Hi Chandler, >> >>> >> >>> >> >>> >> >>> I would like to run some benchmarks on ARM hardware and to look at >> impact >> >>> of your patches on LTO. >> >>> >> >>> >> >>> >> >>> Kind regards, >> >>> >> >>> Evgeny Astigeevich >> >>> >> >>> >> >>> >> >>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] >> On >> >>> Behalf Of Chandler Carruth >> >>> >> >>> >> >>> Sent: 15 July 2015 10:45 >> >>> To: Chandler Carruth; Gerolf Hoflehner >> >>> Cc: LLVM Developers Mailing List >> >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely >> broken >> >>> >> >>> >> >>> >> >>> I've fixed the obvious bugs I spotted in r242281. These should be pure >> >>> correctness improvements. >> >>> >> >>> >> >>> >> >>> I've sent the two patches I'm imagining to address the core issue >> here: >> >>> >> > >>> http://reviews.llvm.org/D11213 >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D11213&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=EsY1fGxczRcp-UjYSFv1rWmYXBpmsX7OsckxJsRUy8I&s=d4561YywaWoYOw4i0HZ8j4PKNAigcNH-78hxy3aFemE&e=> >> >>> >> >>> http://reviews.llvm.org/D11214 >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D11214&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=EsY1fGxczRcp-UjYSFv1rWmYXBpmsX7OsckxJsRUy8I&s=zQJk7H1XW9yvEMDPaDEh1JRp5-XJjqLM2PaWlK5fZU0&e=> >> > >> >>> >> >>> >> >>> >> >>> Currently, I have the unsafe alias results disabled by default, but >> with a >> >>> flag that can re-enable them if needed. I don't feel really strongly >> about >> >>> which way the default is set -- but that may be because I don't have >> lots of >> >>> users relying on LTO. I'll let others indicate which way they would >> be most >> >>> comfortable. >> >>> >> >>> >> >>> >> >>> Some IRC conversations indicated that early benchmark results with GMR >> >>> completely disabled weren't showing really significant swings, so >> maybe this >> >>> relatively small reduction in power of GMR won't be too problematic >> for >> >>> folks. Either way, I'm open to the different approaches. It's D11214 >> that I >> >>> care a lot about. =] >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> Thanks for all the thoughts here! >> >>> >> >>> -Chandler >> >>> >> >>> >> >>> >> >>> On Tue, Jul 14, 2015 at 11:25 PM Chandler Carruth < >> chandlerc at gmail.com> >> >>> wrote: >> >>> >> >>> Replying here, but several of the questions raised boil down to >> "couldn't >> >>> you make the usage of GetUnderlyingObject conservatively correct?". >> I'll try >> >>> and address that. >> >>> >> >>> >> >>> >> >>> I think this *is* the right approach, but I think it is very hard to >> do >> >>> without effectively disabling this part of GlobalsModRef. That is, >> the easy >> >>> ways are likely to fire very frequently IMO. >> >>> >> >>> >> >>> >> >>> The core idea is to detect a "no information" state coming out of >> >>> GetUnderlyingObject (likely be providing a custom version just for >> >>> GlobalsModRef and tailored to its use cases). This is particularly >> effective >> >>> at avoiding the problems with the recursion limit. But let's look at >> what >> >>> cases we *wouldn't* return that. Here are the cases I see when I >> thought >> >>> about this last night with Hal, roughly in descending likelihood I >> would >> >>> guess: >> >>> >> >>> >> >>> >> >>> 1) We detect some global or an alloca. In that case, even BasicAA >> would be >> >>> sufficient to provide no-alias. GMR shouldn't be relevant. >> >>> >> >>> >> >>> >> >>> 2) We detect a phi, select, or inttoptr, and stop there. >> >>> >> >>> >> >>> >> >>> 3) We detect a load and stop there. >> >>> >> >>> >> >>> >> >>> 4) We detect a return from a function. >> >>> >> >>> >> >>> >> >>> 5) We detect an argument to the function. >> >>> >> >>> >> >>> >> >>> I strongly suspect the vast majority of queries hit #1. That's why >> BasicAA >> >>> is *so* effective. Both #4 and #5 I think are actually reasonable >> places for >> >>> GMR to potentially say "no-alias" and provide useful definitive >> information. >> >>> But I also suspect these are the least common. >> >>> >> >>> >> >>> >> >>> So let's look at #2 and #3 because I think they're interesting. For >> these, >> >>> I think it is extremely hard to return "no-alias". It seems extremely >> easy >> >>> for a reasonable and innocuous change to the IR to introduce a phi or >> a >> >>> select into one side of the GetUnderlyingObject but not the other. If >> that >> >>> ever happens, we can't return "no-alias" for #2, or we need to add >> really >> >>> expensive updates. It also seems reasonable (if not as likely) to want >> >>> adding a store and load to the IR to not trigger a miscompile. If it >> is >> >>> possible for a valid optimization pass to do reg2mem on an SSA value, >> then >> >>> that could happen to only one side of the paired GetUnderlyingObject >> and >> >>> break GMR with #3. If that seems like an unreasonable thing to do, >> consider >> >>> loop re-rolling or other transformations which may need to take >> things in >> >>> SSA form at move them out of SSA form. Even if we then try >> immediately to >> >>> put it back *into* SSA form, before we do that we create a point >> where GMR >> >>> cannot correctly return no-alias. >> >>> >> >>> >> >>> >> >>> So ultimately, I don't think we want to rely on GMR returning >> "no-alias" >> >>> for either #2 or #3 because of the challenge of actually updating it >> in all >> >>> of the circumstances that could break them. That means that *only* #4 >> and #5 >> >>> are going to return "no-alias" usefully. And even then, function >> inlining >> >>> and function outlining both break #4 and #5, so you have to preclude >> those >> >>> transforms while GMR is active. And I have serious doubts about these >> >>> providing enough value to be worth the cost. >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> I think the better way to approach this is the other way around. >> Rather >> >>> than doing a backwards analysis to see if one location reaches and >> global >> >>> and the other location doesn't reach a global, I think it would be >> much more >> >>> effective to re-cast this as a forward analysis that determines all >> the >> >>> memory locations in a function that come from outside the function, >> and use >> >>> that to drive the no-alias responses. >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> On Tue, Jul 14, 2015 at 12:12 PM Gerolf Hoflehner < >> ghoflehner at apple.com> >> >>> wrote: >> >>> >> >>> I wouldn’t be willing to give up performance for hypothetical issues. >> >>> Please protect all your changes with options. For some of your >> concerns it >> >>> is probably hard to provide a test case that shows an/the actual >> issue. >> >>> >> >>> >> >>> >> >>> I certainly agree that it will be very hard to provide a test case and >> >>> extremely rare to see this in the wild for most of these issues. As >> long as >> >>> I can remove the problematic update API we currently have (which as >> its an >> >>> API change can't really be put behind flags), I'm happy to have flags >> >>> control whether or not GMR uses the unsound / stale information to >> try to >> >>> answer alias queries. Do you have any opinion about what the default >> value >> >>> of the flags should be? >> >>> >> >>> >> >>> >> >>> I'll go ahead and prepare the patches, as it seems like we're all >> ending >> >>> up in the same position, and just wondering about the precise >> tradeoffs we >> >>> want to settle on. >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> LLVM Developers mailing list >> >>> >> > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > >> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> LLVM Developers mailing list >> > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > >> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >>> >> >>> _______________________________________________ >> >>> LLVM Developers mailing list >> > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > >> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >>> >> >>> >> >>> ______________________________________________ >> >>> LLVM Developers mailing list >> > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > >> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >>> >> >>> _______________________________________________ >> >>> LLVM Developers mailing list >> > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > >> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list >> > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> >> >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list >> > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150721/143747ee/attachment.html>
Pete Cooper
2015-Jul-22 00:12 UTC
[LLVMdev] GlobalsModRef (and thus LTO) is completely broken
> On Jul 21, 2015, at 4:12 PM, Chandler Carruth <chandlerc at google.com> wrote: > > FWIW, after talking extensively on IRC with Michael and Hal, I think I can add support for handling this in a sound and principled way to GlobalsModRef. I'll be looking into a patch next.Out of curiosity, is this able to be done in any other AA passes? I thought BasicAA might even be able to handle it. David mentioned tbaa which normally would handle this, but can’t because we have 'tbaa !2' on both load and store in this case. My thinking is that it shouldn’t matter that @reg_values is a global, so we shouldn’t need the AA pass which handles globals. It could just as easily be a pointer argument to the function with this sequence of code operating on it. But then maybe i’m missing something about AA, i’m still learning how it works in cases like this. Anyway, these aren’t questions which are in any way blocking a fix going in to GlobalsModRef, just something i’m curious about. Cheers Pete> > On Tue, Jul 21, 2015 at 4:06 PM Michael Zolotukhin <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> wrote: >> On Jul 21, 2015, at 3:50 PM, Xinliang David Li <xinliangli at gmail.com <mailto:xinliangli at gmail.com>> wrote: >> >> reg_values is a file static variable in cselib.c -- so you might be able to reproduce the issue with a smaller reproducible. > > The problem here is that we haven’t defined the issue:) The original question was “does GlobalsModRef give us anything, or can we just turn it off?”, so my recent reply shows that turning it off hurts performance on 403.gcc. > > Michael >> >> David >> > >> On Tue, Jul 21, 2015 at 3:43 PM, Michael Zolotukhin <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> wrote: > >> >> > On Jul 21, 2015, at 3:34 PM, Daniel Berlin <dberlin at dberlin.org <mailto:dberlin at dberlin.org>> wrote: >> > >> > Based on function names and structures, this is some version of GCC :) >> Yep, it’s 403.gcc from SPEC2006 (I thought I mentioned it - but probably I only did that on IRC). >> >> > Any way you can post the entire .ll file? >> It’s an LTO build, so it’d be troublesome.. I tried to print the module in lldb, but after a several minutes it hasn’t even finished printing globals (which I assume is the very beginning). >> >> > >> > Because it's globalsmodref, it's hard to debug without the other >> > functions, since it goes over all the functions to determine address >> > takenness, etc :) >> Yep, I understand that - I’m still in debugger though, so if you’re interested in some particular data, I can try collecting it. I can try to dump the module too, but it might be not-practical in the end:) >> >> Michael > >> > >> > >> > On Tue, Jul 21, 2015 at 3:23 PM, Michael Zolotukhin >> > <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> wrote: >> >> Hi Chandler, >> >> >> >> We observed some regressions in our regular testing (despite I saw nothing >> >> suspicious in my runs). I did accurate investigation and was able to >> >> reproduce and track down the regression. >> >> I found the exact request to GlobalsModRef that results in the performance >> >> loss (I added a limit on number of requests into the implementation and >> >> bisected the number to find the interesting request). >> >> >> >> Here are the details: >> >> >> >> We’re checking following two locations: >> >> >> >> (lldb) p ((llvm::Instruction*)(LocA.Ptr))->dump() >> >> %arrayidx.i = getelementptr inbounds [1 x %struct.elt_list*], [1 x >> >> %struct.elt_list*]* %te.i, i64 0, i64 %indvars.iv.i >> >> (lldb) p ((llvm::Instruction*)(LocB.Ptr))->dump() >> >> @reg_values = internal unnamed_addr global %struct.varray_head_tag* null, >> >> align 8 >> >> >> >> and the function in question is “cselib_init”: >> >> (lldb) p >> >> ((llvm::Instruction*)(LocA.Ptr))->getParent()->getParent()->getName() >> >> (llvm::StringRef) $3 = (Data = "cselib_init", Length = 11) >> >> >> >> Corresponding underlying values: >> >> (lldb) p UV2->dump() >> >> @reg_values = internal unnamed_addr global %struct.varray_head_tag* null, >> >> align 8 >> >> (lldb) p UV1->dump() >> >> %32 = load %struct.varray_head_tag*, %struct.varray_head_tag** @reg_values, >> >> align 8, !tbaa !2 >> >> >> >> Backtrace: >> >> (lldb) bt >> >> * thread #1: tid = 0x120baaf, 0x00000001038b752a libLTO.dylib`(anonymous >> >> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10, >> >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at >> >> GlobalsModRef.cpp:519, queue = 'com.apple.main-thread', stop reason = step >> >> over >> >> * frame #0: 0x00000001038b752a libLTO.dylib`(anonymous >> >> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10, >> >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at >> >> GlobalsModRef.cpp:519 >> >> frame #1: 0x00000001038b82f7 libLTO.dylib`non-virtual thunk to >> >> (anonymous namespace)::GlobalsModRef::alias(this=0x000000010eba5c30, >> >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 55 at >> >> GlobalsModRef.cpp:562 >> >> frame #2: 0x00000001038d6aa8 >> >> libLTO.dylib`llvm::AliasAnalysis::getModRefInfo(this=0x000000010eba5c30, >> >> S=0x000000010a1a22e0, Loc=0x00007fff5fbf6268) + 120 at AliasAnalysis.cpp:288 >> >> frame #3: 0x0000000103a0a814 >> >> libLTO.dylib`llvm::MemoryDependenceAnalysis::getPointerDependencyFrom(this=0x000000010e6cf0c0, >> >> MemLoc=0x00007fff5fbf6268, isLoad=true, ScanIt=llvm::BasicBlock::iterator at >> >> 0x00007fff5fbf4390, BB=0x000000010a19ffa0, QueryInst=0x000000010a1a20c8) + >> >> 1908 at MemoryDependenceAnalysis.cpp:570 >> >> frame #4: 0x0000000103a0ffa5 >> >> libLTO.dylib`llvm::MemoryDependenceAnalysis::GetNonLocalInfoForBlock(this=0x000000010e6cf0c0, >> >> QueryInst=0x000000010a1a20c8, Loc=0x00007fff5fbf6268, isLoad=true, >> >> BB=0x000000010a19ffa0, Cache=0x0000000100f9d568, NumSortedEntries=0) + 2165 >> >> at MemoryDependenceAnalysis.cpp:965 >> >> frame #5: 0x0000000103a0e3a9 >> >> libLTO.dylib`llvm::MemoryDependenceAnalysis::getNonLocalPointerDepFromBB(this=0x000000010e6cf0c0, >> >> QueryInst=0x000000010a1a20c8, Pointer=0x00007fff5fbf62a8, >> >> Loc=0x00007fff5fbf6268, isLoad=true, StartBB=0x000000010a19ffa0, >> >> Result=0x00007fff5fbf6bf0, Visited=0x00007fff5fbf6208, SkipFirstBlock=false) >> >> + 5897 at MemoryDependenceAnalysis.cpp:1200 >> >> frame #6: 0x0000000103a0cb3b >> >> libLTO.dylib`llvm::MemoryDependenceAnalysis::getNonLocalPointerDependency(this=0x000000010e6cf0c0, >> >> QueryInst=0x000000010a1a20c8, Result=0x00007fff5fbf6bf0) + 635 at >> >> MemoryDependenceAnalysis.cpp:911 >> >> frame #7: 0x000000010340c5b5 libLTO.dylib`(anonymous >> >> namespace)::GVN::processNonLocalLoad(this=0x000000010e6ce680, >> >> LI=0x000000010a1a20c8) + 101 at GVN.cpp:1706 >> >> frame #8: 0x0000000103408eef libLTO.dylib`(anonymous >> >> namespace)::GVN::processLoad(this=0x000000010e6ce680, L=0x000000010a1a20c8) >> >> + 1551 at GVN.cpp:1905 >> >> frame #9: 0x00000001034080fd libLTO.dylib`(anonymous >> >> namespace)::GVN::processInstruction(this=0x000000010e6ce680, >> >> I=0x000000010a1a20c8) + 397 at GVN.cpp:2220 >> >> frame #10: 0x0000000103407d1b libLTO.dylib`(anonymous >> >> namespace)::GVN::processBlock(this=0x000000010e6ce680, >> >> BB=0x000000010a19ffa0) + 251 at GVN.cpp:2394 >> >> frame #11: 0x0000000103401755 libLTO.dylib`(anonymous >> >> namespace)::GVN::iterateOnFunction(this=0x000000010e6ce680, >> >> F=0x00000001085f69f8) + 1541 at GVN.cpp:2677 >> >> frame #12: 0x0000000103400fef libLTO.dylib`(anonymous >> >> namespace)::GVN::runOnFunction(this=0x000000010e6ce680, >> >> F=0x00000001085f69f8) + 623 at GVN.cpp:2352 >> >> frame #13: 0x00000001027cd05b >> >> libLTO.dylib`llvm::FPPassManager::runOnFunction(this=0x000000010eba6810, >> >> F=0x00000001085f69f8) + 427 at LegacyPassManager.cpp:1520 >> >> frame #14: 0x00000001027cd375 >> >> libLTO.dylib`llvm::FPPassManager::runOnModule(this=0x000000010eba6810, >> >> M=0x000000010115c5f0) + 117 at LegacyPassManager.cpp:1540 >> >> frame #15: 0x00000001027cdda1 libLTO.dylib`(anonymous >> >> namespace)::MPPassManager::runOnModule(this=0x000000010e6cbaf0, >> >> M=0x000000010115c5f0) + 1409 at LegacyPassManager.cpp:1596 >> >> frame #16: 0x00000001027cd636 >> >> libLTO.dylib`llvm::legacy::PassManagerImpl::run(this=0x000000010e6cb740, >> >> M=0x000000010115c5f0) + 310 at LegacyPassManager.cpp:1698 >> >> frame #17: 0x00000001027ce521 >> >> libLTO.dylib`llvm::legacy::PassManager::run(this=0x00007fff5fbf82b8, >> >> M=0x000000010115c5f0) + 33 at LegacyPassManager.cpp:1729 >> >> >> >> >> >> The function body is in the attached file. >> >> >> >> >> >> >> >> GlobalsModRef reports NoAlias for this pair, here: >> >> if (GV1 || GV2) { >> >> // If the global's address is taken, pretend we don't know it's a >> >> pointer to >> >> // the global. >> >> if (GV1 && !NonAddressTakenGlobals.count(GV1)) >> >> GV1 = nullptr; >> >> if (GV2 && !NonAddressTakenGlobals.count(GV2)) >> >> GV2 = nullptr; >> >> >> >> // If the two pointers are derived from two different non-addr-taken >> >> // globals, or if one is and the other isn't, we know these can't alias. >> >> if ((GV1 || GV2) && GV1 != GV2) >> >> return NoAlias; >> >> >> >> // Otherwise if they are both derived from the same addr-taken global, >> >> we >> >> // can't know the two accesses don't overlap. >> >> } >> >> >> >> >> >> Thanks, >> >> Michael >> >> >> >> On Jul 17, 2015, at 12:18 PM, Chandler Carruth <chandlerc at gmail.com <mailto:chandlerc at gmail.com>> wrote: >> >> >> >> On Fri, Jul 17, 2015 at 9:13 AM Evgeny Astigeevich >> >> <evgeny.astigeevich at arm.com <mailto:evgeny.astigeevich at arm.com>> wrote: >> >>> >> >>> It’s Dhrystone. >> >> >> >> Dhrystone has historically not been a good indicator of real-world >> >> performance fluctuations, especially at this small of a shift. >> >> >> >> I'd like to see if we see any fluctuation on larger and more realistic >> >> application benchmarks. One advantage of the flag being set is that we >> >> should get runs from folks who have automatic builds and runs periodically >> >> from trunk. Those should help give an accurate picture. >> >> >> >>> >> >>> >> >>> >> >>> From: Chandler Carruth [mailto:chandlerc at gmail.com <mailto:chandlerc at gmail.com>] >> >>> Sent: 17 July 2015 16:10 >> >>> >> >>> >> >>> To: Evgeny Astigeevich; Chandler Carruth >> >>> Cc: LLVM Developers Mailing List >> >>> >> >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >> >>> >> >>> >> >>> >> >>> Can you say what Benchmark or give a test case so we understand the nature >> >>> of the regression? As Gerolf said, that will be important to understand what >> >>> is best to do. >> >>> >> >>> >> >>> >> >>> On Fri, Jul 17, 2015, 06:43 Evgeny Astigeevich >> >>> <Evgeny.Astigeevich at arm.com <mailto:Evgeny.Astigeevich at arm.com>> wrote: >> >>> >> >>> Yes, the regression is stable. I double checked this. A full benchmark run >> >>> consists of at least 10 sub-runs to validate the score. >> >>> >> >>> I also checked if there were regressions of this benchmark across >> >>> different ARM hardware versions. I found all regressions of this benchmark >> >>> were in range 1.6%-2%. >> >>> >> >>> >> >>> >> >>> Kind regards, >> >>> >> >>> Evgeny Astigeevich >> >>> >> >>> >> >>> >> >>> From: Chandler Carruth [mailto:chandlerc at gmail.com <mailto:chandlerc at gmail.com>] >> >>> Sent: 17 July 2015 07:52 >> >>> To: Evgeny Astigeevich; Chandler Carruth >> >>> Cc: LLVM Developers Mailing List; Michael Zolotukhin >> >>> >> >>> >> >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >> >>> >> >>> >> >>> >> >>> Hey, thanks for benchmarking. >> >>> >> >>> >> >>> >> >>> How stable is the 2% regression? >> >>> >> >>> >> >>> >> >>> Michael ran some benchmarks with GlobalsModRef completely disabled and the >> >>> only differences were in the noise. This was a complete spec2k6 run along >> >>> with some others. Based on the number of benchmarks run there, I'm going to >> >>> go ahead and submit these patches, but if you can clarify the impact here, >> >>> we can look at potentially some other tradeoff. I'm not particularly set on >> >>> one set of defaults, etc, I just don't want to keep patches held up based on >> >>> that. We can flip the default back and forth as new data arrives. >> >>> >> >>> >> >>> >> >>> On Thu, Jul 16, 2015 at 12:23 PM Evgeny Astigeevich >> >>> <evgeny.astigeevich at arm.com <mailto:evgeny.astigeevich at arm.com>> wrote: >> >>> >> >>> Hi Chandler, >> >>> >> >>> >> >>> >> >>> I ran couple benchmarks with LTO turned on and your patches on ARM >> >>> hardware. >> >>> >> >>> There were no performance degradation of one benchmark and 2% slowdown of >> >>> another benchmark. >> >>> >> >>> >> >>> >> >>> Kind regards, >> >>> >> >>> Evgeny Astigeevich >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> From: llvmdev-bounces at cs.uiuc.edu <mailto:llvmdev-bounces at cs.uiuc.edu> [mailto:llvmdev-bounces at cs.uiuc.edu <mailto:llvmdev-bounces at cs.uiuc.edu>] On >> >>> Behalf Of Evgeny Astigeevich >> >>> Sent: 15 July 2015 15:12 >> >>> >> >>> >> >>> To: 'Chandler Carruth'; Gerolf Hoflehner >> >>> Cc: LLVM Developers Mailing List >> >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >> >>> >> >>> >> >>> >> >>> Hi Chandler, >> >>> >> >>> >> >>> >> >>> I would like to run some benchmarks on ARM hardware and to look at impact >> >>> of your patches on LTO. >> >>> >> >>> >> >>> >> >>> Kind regards, >> >>> >> >>> Evgeny Astigeevich >> >>> >> >>> >> >>> >> >>> From: llvmdev-bounces at cs.uiuc.edu <mailto:llvmdev-bounces at cs.uiuc.edu> [mailto:llvmdev-bounces at cs.uiuc.edu <mailto:llvmdev-bounces at cs.uiuc.edu>] On >> >>> Behalf Of Chandler Carruth >> >>> >> >>> >> >>> Sent: 15 July 2015 10:45 >> >>> To: Chandler Carruth; Gerolf Hoflehner >> >>> Cc: LLVM Developers Mailing List >> >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >> >>> >> >>> >> >>> >> >>> I've fixed the obvious bugs I spotted in r242281. These should be pure >> >>> correctness improvements. >> >>> >> >>> >> >>> >> >>> I've sent the two patches I'm imagining to address the core issue here: >> >>> > >> >>> http://reviews.llvm.org/D11213 <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D11213&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=EsY1fGxczRcp-UjYSFv1rWmYXBpmsX7OsckxJsRUy8I&s=d4561YywaWoYOw4i0HZ8j4PKNAigcNH-78hxy3aFemE&e=> >> >>> >> >>> http://reviews.llvm.org/D11214 <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D11214&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=EsY1fGxczRcp-UjYSFv1rWmYXBpmsX7OsckxJsRUy8I&s=zQJk7H1XW9yvEMDPaDEh1JRp5-XJjqLM2PaWlK5fZU0&e=> >> >> >>> >> >>> >> >>> >> >>> Currently, I have the unsafe alias results disabled by default, but with a >> >>> flag that can re-enable them if needed. I don't feel really strongly about >> >>> which way the default is set -- but that may be because I don't have lots of >> >>> users relying on LTO. I'll let others indicate which way they would be most >> >>> comfortable. >> >>> >> >>> >> >>> >> >>> Some IRC conversations indicated that early benchmark results with GMR >> >>> completely disabled weren't showing really significant swings, so maybe this >> >>> relatively small reduction in power of GMR won't be too problematic for >> >>> folks. Either way, I'm open to the different approaches. It's D11214 that I >> >>> care a lot about. =] >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> Thanks for all the thoughts here! >> >>> >> >>> -Chandler >> >>> >> >>> >> >>> >> >>> On Tue, Jul 14, 2015 at 11:25 PM Chandler Carruth <chandlerc at gmail.com <mailto:chandlerc at gmail.com>> >> >>> wrote: >> >>> >> >>> Replying here, but several of the questions raised boil down to "couldn't >> >>> you make the usage of GetUnderlyingObject conservatively correct?". I'll try >> >>> and address that. >> >>> >> >>> >> >>> >> >>> I think this *is* the right approach, but I think it is very hard to do >> >>> without effectively disabling this part of GlobalsModRef. That is, the easy >> >>> ways are likely to fire very frequently IMO. >> >>> >> >>> >> >>> >> >>> The core idea is to detect a "no information" state coming out of >> >>> GetUnderlyingObject (likely be providing a custom version just for >> >>> GlobalsModRef and tailored to its use cases). This is particularly effective >> >>> at avoiding the problems with the recursion limit. But let's look at what >> >>> cases we *wouldn't* return that. Here are the cases I see when I thought >> >>> about this last night with Hal, roughly in descending likelihood I would >> >>> guess: >> >>> >> >>> >> >>> >> >>> 1) We detect some global or an alloca. In that case, even BasicAA would be >> >>> sufficient to provide no-alias. GMR shouldn't be relevant. >> >>> >> >>> >> >>> >> >>> 2) We detect a phi, select, or inttoptr, and stop there. >> >>> >> >>> >> >>> >> >>> 3) We detect a load and stop there. >> >>> >> >>> >> >>> >> >>> 4) We detect a return from a function. >> >>> >> >>> >> >>> >> >>> 5) We detect an argument to the function. >> >>> >> >>> >> >>> >> >>> I strongly suspect the vast majority of queries hit #1. That's why BasicAA >> >>> is *so* effective. Both #4 and #5 I think are actually reasonable places for >> >>> GMR to potentially say "no-alias" and provide useful definitive information. >> >>> But I also suspect these are the least common. >> >>> >> >>> >> >>> >> >>> So let's look at #2 and #3 because I think they're interesting. For these, >> >>> I think it is extremely hard to return "no-alias". It seems extremely easy >> >>> for a reasonable and innocuous change to the IR to introduce a phi or a >> >>> select into one side of the GetUnderlyingObject but not the other. If that >> >>> ever happens, we can't return "no-alias" for #2, or we need to add really >> >>> expensive updates. It also seems reasonable (if not as likely) to want >> >>> adding a store and load to the IR to not trigger a miscompile. If it is >> >>> possible for a valid optimization pass to do reg2mem on an SSA value, then >> >>> that could happen to only one side of the paired GetUnderlyingObject and >> >>> break GMR with #3. If that seems like an unreasonable thing to do, consider >> >>> loop re-rolling or other transformations which may need to take things in >> >>> SSA form at move them out of SSA form. Even if we then try immediately to >> >>> put it back *into* SSA form, before we do that we create a point where GMR >> >>> cannot correctly return no-alias. >> >>> >> >>> >> >>> >> >>> So ultimately, I don't think we want to rely on GMR returning "no-alias" >> >>> for either #2 or #3 because of the challenge of actually updating it in all >> >>> of the circumstances that could break them. That means that *only* #4 and #5 >> >>> are going to return "no-alias" usefully. And even then, function inlining >> >>> and function outlining both break #4 and #5, so you have to preclude those >> >>> transforms while GMR is active. And I have serious doubts about these >> >>> providing enough value to be worth the cost. >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> I think the better way to approach this is the other way around. Rather >> >>> than doing a backwards analysis to see if one location reaches and global >> >>> and the other location doesn't reach a global, I think it would be much more >> >>> effective to re-cast this as a forward analysis that determines all the >> >>> memory locations in a function that come from outside the function, and use >> >>> that to drive the no-alias responses. >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> On Tue, Jul 14, 2015 at 12:12 PM Gerolf Hoflehner <ghoflehner at apple.com <mailto:ghoflehner at apple.com>> >> >>> wrote: >> >>> >> >>> I wouldn’t be willing to give up performance for hypothetical issues. >> >>> Please protect all your changes with options. For some of your concerns it >> >>> is probably hard to provide a test case that shows an/the actual issue. >> >>> >> >>> >> >>> >> >>> I certainly agree that it will be very hard to provide a test case and >> >>> extremely rare to see this in the wild for most of these issues. As long as >> >>> I can remove the problematic update API we currently have (which as its an >> >>> API change can't really be put behind flags), I'm happy to have flags >> >>> control whether or not GMR uses the unsound / stale information to try to >> >>> answer alias queries. Do you have any opinion about what the default value >> >>> of the flags should be? >> >>> >> >>> >> >>> >> >>> I'll go ahead and prepare the patches, as it seems like we're all ending >> >>> up in the same position, and just wondering about the precise tradeoffs we >> >>> want to settle on. >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> LLVM Developers mailing list >> >>> > >> >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> >> >> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> LLVM Developers mailing list > >> >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> >> >> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >> >>> >> >>> _______________________________________________ >> >>> LLVM Developers mailing list > >> >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> >> >> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >> >>> >> >>> >> >>> ______________________________________________ >> >>> LLVM Developers mailing list > >> >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> >> >> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >> >>> >> >>> _______________________________________________ >> >>> LLVM Developers mailing list > >> >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> >> >> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list > >> >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> >> >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >> >> >> >> >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list > >> >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> >> >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150721/959f0c67/attachment.html>