thr3ads.net - llvm dev - [llvm-dev] [GSoC 2016] Better Alias Analysis By Default

If this information is useful, please help other people find it:
Share via:

Jia Chen via llvm-dev

2016-Jun-21 22:29 UTC

[llvm-dev] [GSoC 2016] Better Alias Analysis By Default - Mid Term Summary

Dear LLVM Community,

This is a brief summary of what I've done so far for CFL-AA, and what I 
plan to do next.

tl;dr: CFL-AA is getting saner. Low-hanging fruits on its improvement 
have almost been picked up. I can either make CFL-AA more precise (with 
certain performance cost), or teach other passes to capitalize on CFL-AA 
better as the next step. Comments and suggestions are more than welcomed.

Before my project starts, I asked on the mailing list for testing 
CFL-AA. According to Geoff Berry 
(http://lists.llvm.org/pipermail/llvm-dev/2016-May/099900.html),
- Enabling CFL-AA in the complilation pipeline did not cause any 
correctness issues.
- Enabling CFL-AA in the complilation pipeline did not affect 
performance of the compiled program in any significant way.

In other words, CFL-AA was very conservative at that moment, to the 
point that it did not offer much beyond what BasicAA can do. 
Fortunately, over the last month the situation has been improved. Here 
is a list of precision enhancement I made for CFL-AA:

- [r271322] Remove aliasing relations between GEP pointers and GEP 
indices. Before this patch, CFL-AA will claim that a and b may-alias 
each other in the following codes:

int a[10], b[10];
a[N] = ...;
b[N] = ...;

This seemingly insane behavior was actually there to safeguard certain 
cases where people do crazy stuffs with pointer arithmetics. Later we 
figured out that those crazy behaviors were in fact UB and therefore we 
could drop support for it.

- [r271421] Teach CFL-AA to understand function attributes. CFL-AA used 
to treat opaque function calls in a very conservative manner. However, 
we know that library calls such as malloc(), free() and strcmp() are 
marked with special attributes we can use to help resolve aliasing 
queries. With this patch, we can safely rely on CFL-AA to say
"noalias"
for a and b in the following code snippet:

char* a = (char*) malloc(len);
char* b = (char*) malloc(len);
...  // popluate string a and b here
if (strcmp(a, b) == 0)
   ...

- [r272040] Improve precision for inttoptr/ptrtoint. If CFL-AA found 
this snippet:

int a, b, *p = &a, *q = &b;
int c= (int)p + (int)q;

It used to report may-alias for p and q, just because both of them are 
casted to integers. This was later relaxed in a way that pointers are 
only treated conservatively when you cast it back from integers.

- [r272335, r272690, r273219, r273229] Improvement on inter-procedural 
analysis. This is the one single feature of CFL-AA that enables it to 
really offer something that BasicAA doesn't. The work is still ongoing, 
but the central idea here is that we can make CFL-AA look past function 
calls and yield almost the same result as if the function call was 
inlined. So if we have

int *p = ..., *q = /* something other than *p */;
foo(&p, &q);
... // use p and q here

As long as CFL-AA can see the body of foo(), and as long as foo() 
doesn't do anything that makes p alias q, CFL-AA won't count p and q as 
may-alias pairs when analyzing the codes after the call to foo(). I 
don't think this is currently possible with BasicAA.

Looking forward, there are certainly many other ways to further improve 
CFL-AA's precision (e.g. adding field sensitivity, and making the 
analysis inclusion-based rather than equality-based). However, I feel 
like I've almost pushed the analysis to a point where in exchange for 
more precision we may inevitably start to pay extra cost and observe 
some noticeable performance drops. If the performance problem turns out 
to be not that bad, I could just keep going where I'm headed.

But if it becomes a problem, there is also a plan B: Notice that what 
I've said is all about alias queries. In LLVM, clients of alias analysis 
also care about other things like mod-ref info, in which area CFL-AA 
hasn't improved that much. Also, code snippets I listed before to 
justify my patches look pathetically simple and very unrealistic. What I 
really want to look into is some real-world client passes running on 
real-world benchmarks, and see if any precision gets lost during the 
interaction between the client pass and the alias analysis pass.

If you have any comments or suggestions, please let me know. Also, if 
you have any concrete examples in mind that existing AAs (including 
CFL-AA) can't do but you them to be handled properly, please tell me! I 
am more than happy to make everyone's code a little bit faster :)


-- 
Best Regards,

--
Jia Chen

Joerg Sonnenberger via llvm-dev

2016-Jun-21 22:50 UTC

head link

[llvm-dev] [GSoC 2016] Better Alias Analysis By Default - Mid Term Summary

On Tue, Jun 21, 2016 at 05:29:13PM -0500, Jia Chen via llvm-dev
wrote:> - [r271322] Remove aliasing relations between GEP pointers and GEP indices.
> Before this patch, CFL-AA will claim that a and b may-alias each other in
> the following codes:
> 
> int a[10], b[10];
> a[N] = ...;
> b[N] = ...;
> 
> This seemingly insane behavior was actually there to safeguard certain
cases
> where people do crazy stuffs with pointer arithmetics. Later we figured out
> that those crazy behaviors were in fact UB and therefore we could drop
> support for it.
What exactly is the semantic justification here? I'm asking because
there are a number of crucial use cases for aliasing global arrayish
variables behind the compiler like linker sets. We had quite some fun in
NetBSD with newer GCC introducing breakage by exploiting UB in fun ways.
It would be nice to avoid breaking valid applications in system/embedded
land.

Joerg

George Burgess IV via llvm-dev

2016-Jun-21 23:07 UTC

head link

[llvm-dev] [GSoC 2016] Better Alias Analysis By Default - Mid Term Summary

+Hal

The use-case we dropped support for, specifically, was:

```
void foo() {
  int a;
  int *ap = &a;
  int *ap2 = (int*)((char*)NULL + (uintptr_t)&a);
  // now ap and ap2 may alias
}
```

Our justification is that LLVM (specifically, BasicAA) already explicitly
doesn't support this pattern, so we shouldn't have to support it in
CFLAA.

On Tue, Jun 21, 2016 at 3:50 PM, Joerg Sonnenberger via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Tue, Jun 21, 2016 at 05:29:13PM -0500, Jia Chen via llvm-dev wrote:
> > - [r271322] Remove aliasing relations between GEP pointers and GEP
> indices.
> > Before this patch, CFL-AA will claim that a and b may-alias each other
in
> > the following codes:
> >
> > int a[10], b[10];
> > a[N] = ...;
> > b[N] = ...;
> >
> > This seemingly insane behavior was actually there to safeguard certain
> cases
> > where people do crazy stuffs with pointer arithmetics. Later we
figured
> out
> > that those crazy behaviors were in fact UB and therefore we could drop
> > support for it.
>
> What exactly is the semantic justification here? I'm asking because
> there are a number of crucial use cases for aliasing global arrayish
> variables behind the compiler like linker sets. We had quite some fun in
> NetBSD with newer GCC introducing breakage by exploiting UB in fun ways.
> It would be nice to avoid breaking valid applications in system/embedded
> land.
>
> Joerg
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160621/34ce8366/attachment.html>

llvm dev - Jun 2016 - [GSoC 2016] Better Alias Analysis By Default - Mid Term Summary

[llvm-dev] [GSoC 2016] Better Alias Analysis By Default - Mid Term Summary

[llvm-dev] [GSoC 2016] Better Alias Analysis By Default - Mid Term Summary

[llvm-dev] [GSoC 2016] Better Alias Analysis By Default - Mid Term Summary