Daniel Berlin
2012-May-22 14:56 UTC
[LLVMdev] Some small changes to the memory dependence analysis API
Hey folks , I was thinking of making some small, optional-for-callers only changes to the memdep API, and before I did it, thought i'd post here and make sure nobody thought it was egregiously stupid. There are two related issues i'm trying to solve, both of which occur in uses like GVN, where we are trying to prove equivalence between a load/store/call with some other load/store/call: 1. Memdep sometimes walks back too far, making it slow. 2. Memdep sometimes walks back too far, making it add dependencies that are confusing to GVN. As a motivating example, take this wonderful piece of code from the real world: _ZN4llvm4castINS_8CallInstEPKNS_5ValueEEENS_10cast_rettyIT_T0_E8ret_typeERKS7_.exit.i.i.i.i.i.i.i.i.i.i.i: ; preds = %entry %1 = getelementptr inbounds %"class.llvm::Instruction"* %I, i64 0, i32 0, i32 0 %arrayidx.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i = getelementptr inbounds %"class.llvm::Value"* %1, i64 -1, i32 4 %2 = load %"class.llvm::Type"** %arrayidx.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i, align 8 %SubclassID.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i = getelementptr inbounds %"class.llvm::Type"* %2, i64 0, i32 1 %3 = bitcast i32* %SubclassID.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i to i8* %4 = load i8* %3, align 1, !tbaa !0 %cmp.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i = icmp ne i8 %4, 2 %tobool.i.i.i.i.i.i.i.i.i.i.i.i = icmp eq %"class.llvm::Type"* %2, null %or.cond.i.i.i.i.i.i.i.i.i.i.i.i = or i1 %cmp.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i, %tobool.i.i.i.i.i.i.i.i.i.i.i.i br i1 %or.cond.i.i.i.i.i.i.i.i.i.i.i.i, label %return, label %_ZN4llvm3isaINS_14DbgDeclareInstEPNS_11InstructionEEEbRKT0_.exit.i _ZN4llvm3isaINS_14DbgDeclareInstEPNS_11InstructionEEEbRKT0_.exit.i: ; preds = %_ZN4llvm4castINS_8CallInstEPKNS_5ValueEEENS_10cast_rettyIT_T0_E8ret_typeERKS7_.exit.i.i.i.i.i.i.i.i.i.i.i %5 = bitcast %"class.llvm::Type"* %2 to %"class.llvm::Function"* %call1.i.i.i.i.i.i.i.i.i.i.i.i = tail call i32 @_ZNK4llvm8Function14getIntrinsicIDEv(%"class.llvm::Function"* %5) nounwind readonly %cmp.i.i.i.i.i.i.i = icmp ne i32 %call1.i.i.i.i.i.i.i.i.i.i.i.i, 142 %tobool = icmp eq %"class.llvm::Instruction"* %I, null %or.cond = or i1 %cmp.i.i.i.i.i.i.i, %tobool br i1 %or.cond, label %_ZN4llvm3isaINS_12DbgValueInstEPNS_11InstructionEEEbRKT0_.exit.i, label %return _ZN4llvm3isaINS_12DbgValueInstEPNS_11InstructionEEEbRKT0_.exit.i: ; preds = %_ZN4llvm3isaINS_14DbgDeclareInstEPNS_11InstructionEEEbRKT0_.exit.i %6 = bitcast %"class.llvm::Type"* %2 to %"class.llvm::Function"* %call1.i.i.i.i.i.i.i.i.i.i.i.i13 = tail call i32 @_ZNK4llvm8Function14getIntrinsicIDEv(%"class.llvm::Function"* %6) nounwind readonly %cmp.i.i.i.i.i.i.i14 = icmp eq i32 %call1.i.i.i.i.i.i.i.i.i.i.i.i13, 143 %phitmp18 = icmp ne %"class.llvm::Instruction"* %I, null %phitmp18. = and i1 %cmp.i.i.i.i.i.i.i14, %phitmp18 ret i1 %phitmp18. We are trying to prove the equivalence of the two tail calls. If you call getNonLocalCallDependency on the second call, you end up with two dependencies[1]. One of these dependencies we care about, the first tail call, and the other we don't, the load way back at %4. I could teach GVN that this particular clobber dependency is irrelevant for this call (and i will :P). But the reality is we don't even need to be looking back that far for dependencies. I'd like to extend the interface for getNonLocallCallDependency optionally take a set of stopping point instructions (or BB's, depending on how expensive testing on every instruction ends up being), and to not look for dependencies back past these points (at least for whatever path it's currently following. You can't short-circuit the whole thing because there may be a path backwards to the same instruction with a different set of dependencies). That way, it stops at the things we are trying to prove equivalence with. You ca see how if you start to insert a lot more blocks in between the "thing we are checking against", and "the next dependency it thinks is real", you can get into hunting for a lot of dependencies we don't care about. Besides being faster, short circuiting like this eliminates more things. A similar issue occurs with getNonLocalPointerDependency (though there the issue is slightly different, we have a set of loads we want to know if this load is equivalent to, and if we walk past those loads without it seeing a def/clobber, we know the answer is "no", and we don't need to know the closest deps) [1] As a side note, whether you get two dependencies or one depends on whether you remove the bitcast at %6, and replace it with the bitcast at %5, because getCallSiteDependencyFrom tries to recognize when it finds two of the same readonly call. The above is actually the output of a larger testcase that was already run through GVN, but if you run GVN on it *again*, it will delete the call the second time because it will now replace the bitcast before calling getNonLocalCallDependency. Suffice to say, the reason it doesn't do so the first time on the larger testcase is because of the issue i'm talking about above. Trying to shove a large amount of value based equivalence logic into memdep seemed like "not the best approach".
Maybe Matching Threads
- [LLVMdev] DSA - LocalDataStructures pass does not create DSGraphs
- [RFC] jump threading on std::pair<int, bool>
- [LLVMdev] Question about boolean type variable generation of Global Variable Optimization
- [RFC] "Properly" Derive Function/Argument/Parameter Attributes
- Dereferenceable load semantics & LICM