Christian Convey
2015-Jun-15 01:35 UTC
[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?
> > > The algorithm maintains a may-point-to graph. Unfortunately the > algorithm > > doesn't delete an "A-->B" edge when there's a strong update of "A" but > the > > value copied into "A" isn't a pointer. So the interpretation of "A" > having > > only one outbound edge (to "B") is a little ambiguous. It means "'A' > > definitely points to 'B', or 'A' doesn't hold a valid pointer." > > > Define "valid pointer please"? >Sorry, I can see how my phrasing raised a red flag. The original version of the algorithm I'm looking at was designed to analyze C source code, not LLVM IR. I'm in the process of adapting its dataflow equations for IR. The algorithm assumes that a correct C program can't just compute pointer values *ex nihilo*; that they can only by obtained from certain syntactic structures like variable declarations, or calls to *malloc*, or pointer literals. The AA algorithm reckons that dereferencing a runtime value obtained by some other mechanism is so likely to be a bug, that they can skip worrying about it. The AA algorithm uses dataflow analysis to monitor the possible propagation of those values through the program code, and it represents those flows by updates to the may-point-to graph. If at some code point CP, a may-point-to graph vertex "B" has no outbound edges, that's equivalent to saying that the AA has concluded the runtime memory modeled by "B" does not contain any pointer that a correct program has any business trying to dereference. So to restate my point in the earlier email: if there's a strong-update of the form "*A = 42;" (in C parlance), it would be nice to have this AA algorithm remove any may-point-to graph edges originating at "A" at that point. But, for the sake of efficiency (in the author's judgment), such assignments are simply ignored by the dataflow equations. And so any existing may-point-to edges originating at "A" are allowed to remain in existence.> > One solution would be for me to adapt the algorithm to remove this > > ambiguity. But if possible I'd like to keep the algorithm as close to the > > published version as possible, so I'd rather find another solution. > > Why? > Published versions are often ... wrong, not well engineered, etc :) >I'm not dead-set against modifying it, I'm just biased against doing it without a good reason. I'm relatively new to implementing AA algorithms, and the author seems to have put a great deal of thought into this algorithm. So I'm trying to follow a policy of "If it's not broken, don't fix it." Also, the more I can remain faithful to the algorithm's original writeup, the less I'm on the hook to write my own documentation for my implementation.> > > > Another approach is to add a value to the AliasResult enumeration, > > indicating "MustAlias or NoAlias, I'm not sure which". But I'm not sure > if > > any downstream analyses could make use of a result like that. > > Above, you say you want to not return MustAlias. > Here you say it's not clear that any downstream results could make use > of better info. > > Before you go and try to figure out what should change, you really > need to actually determine whether the info you have is valuable. > > I would do this by finding a pass you think you can improve with your > extra info, and seeing if it improves (add a temporary hack AA > function or something that gives info about this) by giving it must/no > vs may. > > If something improves, great, we can figure out whether it's worth the > tradeoffs/etc and help you figure out what to do. > If nothing improves, it may not be worth you spending your time on it. >Thanks, will do. I appreciate the feedback! - Christian -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150614/6d9e3a14/attachment.html>
Daniel Berlin
2015-Jun-15 05:43 UTC
[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?
On Sun, Jun 14, 2015 at 6:35 PM, Christian Convey <christian.convey at gmail.com> wrote:>> > The algorithm maintains a may-point-to graph. Unfortunately the >> > algorithm >> > doesn't delete an "A-->B" edge when there's a strong update of "A" but >> > the >> > value copied into "A" isn't a pointer. So the interpretation of "A" >> > having >> > only one outbound edge (to "B") is a little ambiguous. It means "'A' >> > definitely points to 'B', or 'A' doesn't hold a valid pointer." >> >> >> Define "valid pointer please"? > > > Sorry, I can see how my phrasing raised a red flag. > > The original version of the algorithm I'm looking at was designed to analyze > C source code, not LLVM IR. I'm in the process of adapting its dataflow > equations for IR. > > The algorithm assumes that a correct C program can't just compute pointer > values ex nihilo; that they can only by obtained from certain syntactic > structures like variable declarations, or calls to malloc, or pointer > literals.While true in theory, this is 100% wrong in practice ;)> The AA algorithm reckons that dereferencing a runtime value > obtained by some other mechanism is so likely to be a bug, that they can > skip worrying about it.Given that things like "calculating vtable addresses", etc, end up looking indistinguishably the same at the IR level, you can't really.> > The AA algorithm uses dataflow analysis to monitor the possible propagation > of those values through the program code, and it represents those flows by > updates to the may-point-to graph. If at some code point CP, a may-point-to > graph vertex "B" has no outbound edges, that's equivalent to saying that the > AA has concluded the runtime memory modeled by "B" does not contain any > pointer that a correct program has any business trying to dereference.FWIW: When i first did GCC's current points-to analysis, I did the same thing. It eliminated "non-pointer" values along the same lines. This broke roughly "the entire world". I tried to find some subset i felt was worthwhile and where it was okay, but gave up after a while.
Christian Convey
2015-Jun-15 11:15 UTC
[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?
On Mon, Jun 15, 2015 at 1:43 AM, Daniel Berlin <dberlin at dberlin.org> wrote:> > The AA algorithm uses dataflow analysis to monitor the possible > propagation > > of those values through the program code, and it represents those flows > by > > updates to the may-point-to graph. If at some code point CP, a > may-point-to > > graph vertex "B" has no outbound edges, that's equivalent to saying that > the > > AA has concluded the runtime memory modeled by "B" does not contain any > > pointer that a correct program has any business trying to dereference. > > FWIW: When i first did GCC's current points-to analysis, I did the > same thing. It eliminated "non-pointer" values along the same lines. > This broke roughly "the entire world". >Whoa, thanks for the warning.> I tried to find some subset i felt was worthwhile and where it was > okay, but gave up after a while. >I'm not quite sure which things you're referring to in that statement. Would you mind clarifying? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150615/dde54bcd/attachment.html>
Maybe Matching Threads
- [LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?
- [LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?
- [LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?
- [LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?
- [LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?