Christian Convey
2015-Jun-15 21:20 UTC
[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?
On Mon, Jun 15, 2015 at 4:54 PM, Daniel Berlin <dberlin at dberlin.org> wrote:> > I'm mostly going from Robert Wilson's 1997 phd thesis, although I'm > pretty > > sure I've seen a lot of the same ideas elsewhere as well. > > Yes, using summary/transfer functions has been tried a lot. > Note: the numbers in most of these phd thesis do *not* get born out in > practice. > > See, e.g, http://www.cse.unsw.edu.au/~ysui/papers/spe14.pdf > > vs > > http://impact.crhc.illinois.edu/shared/report/phd-thesis-erik-nystrom.pdf > > vs > the wilson thesis. >Cool, I'll give those papers a read, thanks.> In particular, i expect the wilson algorithm is going to fall over > very badly on anything of even medium size. >Interesting. Do you mean "fall over" in terms of running time, or precision, or something else?> Nowadays, the thing of the day seems to be using cfl reachability > based formulations and doing context-sensitivity on demand. >Out of curiosity, what's the appeal (aside from academic novelty) of the CFL approaches?>From a personal perspective, I'm particularly interested in the maximumanalytic precision each AA approach can take, almost without regard to how much time or memory the computation takes to run. So one thing I found appealing about Wilson's thesis was his "location set" approach for providing field-sensitivity. I also liked Alain Deutsch's 1994 paper "Interprocedural may-alias analysis for pointers: Beyond k-limiting" for the same reason. But I didn't take a crack at implementing *that* because based on the paper's complexity, I thought I'd go clinically insane trying to turn it into code :) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150615/f01e8377/attachment.html>
Daniel Berlin
2015-Jun-15 22:03 UTC
[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?
On Mon, Jun 15, 2015 at 2:20 PM, Christian Convey <christian.convey at gmail.com> wrote:> On Mon, Jun 15, 2015 at 4:54 PM, Daniel Berlin <dberlin at dberlin.org> wrote: >> >> > I'm mostly going from Robert Wilson's 1997 phd thesis, although I'm >> > pretty >> > sure I've seen a lot of the same ideas elsewhere as well. >> >> Yes, using summary/transfer functions has been tried a lot. >> Note: the numbers in most of these phd thesis do *not* get born out in >> practice. >> >> See, e.g, http://www.cse.unsw.edu.au/~ysui/papers/spe14.pdf >> >> vs >> >> http://impact.crhc.illinois.edu/shared/report/phd-thesis-erik-nystrom.pdf >> >> vs >> the wilson thesis. > > > Cool, I'll give those papers a read, thanks. > >> >> In particular, i expect the wilson algorithm is going to fall over >> very badly on anything of even medium size. > > > Interesting. Do you mean "fall over" in terms of running time, or > precision, or something else?Running time. Wilson's approach took 16 seconds for 4500 lines of code. Let's assume his implementation actually did things right. Most research impls do things like "ignore effects of external function calls", which significantly impacts runtime in practice. Let's further assume that scaling of computer speed from 1995 to now eats your entire exponential factor, making the algorithm in practice linear (This is not even close to true). His algorithm, on a million lines of code, would take an hour to run. Which would be pretty unacceptable. In practice, i think you'll find it probably never finishes :)> >> >> Nowadays, the thing of the day seems to be using cfl reachability >> based formulations and doing context-sensitivity on demand. > > > Out of curiosity, what's the appeal (aside from academic novelty) of the CFL > approaches?They are demand driven and easy to add context sensitivity to. At the same time, on an all-pairs basis, they are competitive with constraint based andersen's solvers. You can't really do better than this, because it means you pay pretty close to the minimal cost for the stuff you query.> > From a personal perspective, I'm particularly interested in the maximum > analytic precision each AA approach can take, almost without regard to how > much time or memory the computation takes to run.I'm wildly curious why.> So one thing I found > appealing about Wilson's thesis was his "location set" approach for > providing field-sensitivity.I never saw this paper (it was too old for where i started), but GCC uses something similar. We create variables for each field of a given variable. They are linked in a linked list with an offset and size (if known) to the variables that contain them. Constraints contain stride info, and intersection is then used to compute what sub-variables a constraint touches during solving. It is a variant of this: http://homepages.mcs.vuw.ac.nz/~djp/files/paste04.ps Note a few things: Some programs have so many sub-variables you will run out of memory. This is true in the location set approach as well. GCC only created variables for possibly-used field ranges too. I can point you at GCC bugs offline if you are interested> > I also liked Alain Deutsch's 1994 paper "Interprocedural may-alias analysis > for pointers: Beyond k-limiting" for the same reason. But I didn't take a > crack at implementing *that* because based on the paper's complexity, I > thought I'd go clinically insane trying to turn it into code :)
Christian Convey
2015-Jun-15 23:00 UTC
[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?
On Mon, Jun 15, 2015 at 6:03 PM, Daniel Berlin <dberlin at dberlin.org> wrote:> > From a personal perspective, I'm particularly interested in the maximum > > analytic precision each AA approach can take, almost without regard to > how > > much time or memory the computation takes to run. > > I'm wildly curious why. >One reason is that I'm simply curious about the trade-off space between AA precision and AA running-time. Since I don't make compilers for a living, I have the luxury of staring at corners of the design space which would be idiotic to actually include in LLVM :) Another reason is that in past work, I've sometimes worked on code where I'd gladly accept a 10-day compilation time, if it bought me an extra 10% in performance. So I sometimes wonder what's involved in making such optimization passes available, even though people wouldn't want to use them on a day-to-day basis. Constraints contain stride info, and intersection is then used to> compute what sub-variables a constraint touches during solving. > It is a variant of this: > http://homepages.mcs.vuw.ac.nz/~djp/files/paste04.psThanks, I'll try to give it a read.> Some programs have so many sub-variables you will run out of memory. > This is true in the location set approach as well. >I'm surprised you ran into that kind of trouble with memory. Was that back in 32-bit days?> GCC only created variables for possibly-used field ranges too. > I can point you at GCC bugs offline if you are interested >Yeah, if you don't mind I'd be grateful for the links. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150615/a3168783/attachment.html>