Jia Chen via llvm-dev
2016-Jun-21 22:29 UTC
[llvm-dev] [GSoC 2016] Better Alias Analysis By Default - Mid Term Summary
Dear LLVM Community, This is a brief summary of what I've done so far for CFL-AA, and what I plan to do next. tl;dr: CFL-AA is getting saner. Low-hanging fruits on its improvement have almost been picked up. I can either make CFL-AA more precise (with certain performance cost), or teach other passes to capitalize on CFL-AA better as the next step. Comments and suggestions are more than welcomed. Before my project starts, I asked on the mailing list for testing CFL-AA. According to Geoff Berry (http://lists.llvm.org/pipermail/llvm-dev/2016-May/099900.html), - Enabling CFL-AA in the complilation pipeline did not cause any correctness issues. - Enabling CFL-AA in the complilation pipeline did not affect performance of the compiled program in any significant way. In other words, CFL-AA was very conservative at that moment, to the point that it did not offer much beyond what BasicAA can do. Fortunately, over the last month the situation has been improved. Here is a list of precision enhancement I made for CFL-AA: - [r271322] Remove aliasing relations between GEP pointers and GEP indices. Before this patch, CFL-AA will claim that a and b may-alias each other in the following codes: int a[10], b[10]; a[N] = ...; b[N] = ...; This seemingly insane behavior was actually there to safeguard certain cases where people do crazy stuffs with pointer arithmetics. Later we figured out that those crazy behaviors were in fact UB and therefore we could drop support for it. - [r271421] Teach CFL-AA to understand function attributes. CFL-AA used to treat opaque function calls in a very conservative manner. However, we know that library calls such as malloc(), free() and strcmp() are marked with special attributes we can use to help resolve aliasing queries. With this patch, we can safely rely on CFL-AA to say "noalias" for a and b in the following code snippet: char* a = (char*) malloc(len); char* b = (char*) malloc(len); ... // popluate string a and b here if (strcmp(a, b) == 0) ... - [r272040] Improve precision for inttoptr/ptrtoint. If CFL-AA found this snippet: int a, b, *p = &a, *q = &b; int c= (int)p + (int)q; It used to report may-alias for p and q, just because both of them are casted to integers. This was later relaxed in a way that pointers are only treated conservatively when you cast it back from integers. - [r272335, r272690, r273219, r273229] Improvement on inter-procedural analysis. This is the one single feature of CFL-AA that enables it to really offer something that BasicAA doesn't. The work is still ongoing, but the central idea here is that we can make CFL-AA look past function calls and yield almost the same result as if the function call was inlined. So if we have int *p = ..., *q = /* something other than *p */; foo(&p, &q); ... // use p and q here As long as CFL-AA can see the body of foo(), and as long as foo() doesn't do anything that makes p alias q, CFL-AA won't count p and q as may-alias pairs when analyzing the codes after the call to foo(). I don't think this is currently possible with BasicAA. Looking forward, there are certainly many other ways to further improve CFL-AA's precision (e.g. adding field sensitivity, and making the analysis inclusion-based rather than equality-based). However, I feel like I've almost pushed the analysis to a point where in exchange for more precision we may inevitably start to pay extra cost and observe some noticeable performance drops. If the performance problem turns out to be not that bad, I could just keep going where I'm headed. But if it becomes a problem, there is also a plan B: Notice that what I've said is all about alias queries. In LLVM, clients of alias analysis also care about other things like mod-ref info, in which area CFL-AA hasn't improved that much. Also, code snippets I listed before to justify my patches look pathetically simple and very unrealistic. What I really want to look into is some real-world client passes running on real-world benchmarks, and see if any precision gets lost during the interaction between the client pass and the alias analysis pass. If you have any comments or suggestions, please let me know. Also, if you have any concrete examples in mind that existing AAs (including CFL-AA) can't do but you them to be handled properly, please tell me! I am more than happy to make everyone's code a little bit faster :) -- Best Regards, -- Jia Chen
Joerg Sonnenberger via llvm-dev
2016-Jun-21 22:50 UTC
[llvm-dev] [GSoC 2016] Better Alias Analysis By Default - Mid Term Summary
On Tue, Jun 21, 2016 at 05:29:13PM -0500, Jia Chen via llvm-dev wrote:> - [r271322] Remove aliasing relations between GEP pointers and GEP indices. > Before this patch, CFL-AA will claim that a and b may-alias each other in > the following codes: > > int a[10], b[10]; > a[N] = ...; > b[N] = ...; > > This seemingly insane behavior was actually there to safeguard certain cases > where people do crazy stuffs with pointer arithmetics. Later we figured out > that those crazy behaviors were in fact UB and therefore we could drop > support for it.What exactly is the semantic justification here? I'm asking because there are a number of crucial use cases for aliasing global arrayish variables behind the compiler like linker sets. We had quite some fun in NetBSD with newer GCC introducing breakage by exploiting UB in fun ways. It would be nice to avoid breaking valid applications in system/embedded land. Joerg
George Burgess IV via llvm-dev
2016-Jun-21 23:07 UTC
[llvm-dev] [GSoC 2016] Better Alias Analysis By Default - Mid Term Summary
+Hal The use-case we dropped support for, specifically, was: ``` void foo() { int a; int *ap = &a; int *ap2 = (int*)((char*)NULL + (uintptr_t)&a); // now ap and ap2 may alias } ``` Our justification is that LLVM (specifically, BasicAA) already explicitly doesn't support this pattern, so we shouldn't have to support it in CFLAA. On Tue, Jun 21, 2016 at 3:50 PM, Joerg Sonnenberger via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Tue, Jun 21, 2016 at 05:29:13PM -0500, Jia Chen via llvm-dev wrote: > > - [r271322] Remove aliasing relations between GEP pointers and GEP > indices. > > Before this patch, CFL-AA will claim that a and b may-alias each other in > > the following codes: > > > > int a[10], b[10]; > > a[N] = ...; > > b[N] = ...; > > > > This seemingly insane behavior was actually there to safeguard certain > cases > > where people do crazy stuffs with pointer arithmetics. Later we figured > out > > that those crazy behaviors were in fact UB and therefore we could drop > > support for it. > > What exactly is the semantic justification here? I'm asking because > there are a number of crucial use cases for aliasing global arrayish > variables behind the compiler like linker sets. We had quite some fun in > NetBSD with newer GCC introducing breakage by exploiting UB in fun ways. > It would be nice to avoid breaking valid applications in system/embedded > land. > > Joerg > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160621/34ce8366/attachment.html>