thr3ads.net - llvm dev - [llvm-dev] Alias analysis results [May 2017]

If this information is useful, please help other people find it:
Share via:

Ivan A. Kosarev via llvm-dev

2017-May-11 08:50 UTC

[llvm-dev] Alias analysis results

Hello,

I'm trying to get the whole picture of what we are doing in terms
of alias analysis in LLVM. Being new to this I may be missing
some well known things so my apologies if I seem to ask obvious
things.

There are lots of questions arise in my head as I read related
bug reports, sources and documentation, of which looks to be the
most simple is,

Why the current TBAA implementation returns NoAlias for pairs of
accesses for which it has no idea if they really alias?

Let me elaborate on this. My understanding of what the TBAA
implementation is supposed to respond is whether a given pair of
accesses is allowed to alias by the rules of the input language.
(Which in turn leads me to another question, that is, whether
we are really trying to perform C/C++ alias analysis in a
[language-neutral] codegen pass, but let's postpone this for
later.)

This consideration assumes that what the TBAA implementation
actually returns as a result is orthogonal to the
Must/No/MayAlias set, that used to be the result type of various
AA, including TBAA. I would expect that the result of the
complete alias analysis includes both the information on whether
given accesses alias and, as a separate element, whether they are
allowed to alias by the rules of the language. Then we may have
combinations like (MustAlias, AllowedAlias) that seem to be the
common case and combinations like (MustAlias, NotAllowedAlias)
that I would expect to a) generate the breaks-alias-rules kind of
warnings and b) proceed further as any other MustAlias case.

The latter combination is what I would expect for illegal type
puns, for whatever definition of "illegal". Note that for such
cases we currently get MustAlias from BasicAA and NoAlias from
TBAA. In some comments this is considered okay whereas I have to
admit that I don't understand how that may possibly be okay,
meaning it's hard to imagine how a pair of accesses can be
aliasing and non-aliasing at the same time. To me it appears
like that the root cause if this inconsistency is not that any of
these analyses is not correct, but that we are trying to
represent their responses in an inadequate form and we are trying
to use BasicAA as a faster replacement for TBAA in simple cases.

Any thoughts are highly appreciated.

Thanks,

--

George Burgess IV via llvm-dev

2017-May-12 01:48 UTC

head link

[llvm-dev] Alias analysis results

+hal

Hi,

I'm not an expert on all of our AA bits, but I've been in the area some.
If
I'm wrong, I'm sure someone will jump in and we'll both end up
learning
something. :)
> Why the current TBAA implementation returns NoAlias for pairs of
> accesses for which it has no idea if they really alias?
For the same reason that LLVM assumes many other things it can't prove. The
language says "if you do ${x}, the behavior is undefined," so
optimizing on
the assumption that the user isn't doing ${x} is OK.

That said, optimizing on aliasing assumptions has been contentious in the
past, so frontends that emit TBAA data (e.g. clang) generally support flags
like -fno-strict-aliasing to relax this behavior.
> that I would expect to a) generate [...] warnings and [...]
FWIW, it's generally nontrivial to emit helpful warnings from LLVM. This is
because LLVM doesn't have the AST that the IR was generated from, so it has
no source-level information to speak of (except the bits we encode in
debuginfo, but that doesn't exactly make for great diags).

In other words, a warning emitted by LLVM would probably be as helpful as
"hey, somewhere in the function '_Z3fooi', or one of the functions
that we
happened to inline into it, we found some pretty obvious type punning."
> it's hard to imagine how a pair of accesses can be
> aliasing and non-aliasing at the same time
As you've noted, this is an artifact of taking two logically independent
pieces of information and merging them into one. While this might not be
ideal from a theoretical point of view, I don't see why an optimization
would care about the difference between "proven NoAlias" and
"proven
illegal to alias." In either case, it's allowed to optimize based on
the
assumption that those things don't alias. So, from a practical standpoint,
I don't understand what making that distinction would buy us.
> we are trying to use BasicAA as a faster replacement for
> TBAA in simple cases
AIUI, the idea of putting BasicAA in front of TBAA isn't an efficiency
hack: its purpose is to catch "obvious" type punning, and handle it
gracefully for the user (your 'b)' above).

You might find Hal's recent work on TySan (https://reviews.llvm.org/D32197)
interesting; it's a sanitizer that tries to catch and complain about type
punning at run-time. :)

George

On Thu, May 11, 2017 at 1:50 AM, Ivan A. Kosarev via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hello,
>
> I'm trying to get the whole picture of what we are doing in terms
> of alias analysis in LLVM. Being new to this I may be missing
> some well known things so my apologies if I seem to ask obvious
> things.
>
> There are lots of questions arise in my head as I read related
> bug reports, sources and documentation, of which looks to be the
> most simple is,
>
> Why the current TBAA implementation returns NoAlias for pairs of
> accesses for which it has no idea if they really alias?
>
> Let me elaborate on this. My understanding of what the TBAA
> implementation is supposed to respond is whether a given pair of
> accesses is allowed to alias by the rules of the input language.
> (Which in turn leads me to another question, that is, whether
> we are really trying to perform C/C++ alias analysis in a
> [language-neutral] codegen pass, but let's postpone this for
> later.)
>
> This consideration assumes that what the TBAA implementation
> actually returns as a result is orthogonal to the
> Must/No/MayAlias set, that used to be the result type of various
> AA, including TBAA. I would expect that the result of the
> complete alias analysis includes both the information on whether
> given accesses alias and, as a separate element, whether they are
> allowed to alias by the rules of the language. Then we may have
> combinations like (MustAlias, AllowedAlias) that seem to be the
> common case and combinations like (MustAlias, NotAllowedAlias)
> that I would expect to a) generate the breaks-alias-rules kind of
> warnings and b) proceed further as any other MustAlias case.
>
> The latter combination is what I would expect for illegal type
> puns, for whatever definition of "illegal". Note that for such
> cases we currently get MustAlias from BasicAA and NoAlias from
> TBAA. In some comments this is considered okay whereas I have to
> admit that I don't understand how that may possibly be okay,
> meaning it's hard to imagine how a pair of accesses can be
> aliasing and non-aliasing at the same time. To me it appears
> like that the root cause if this inconsistency is not that any of
> these analyses is not correct, but that we are trying to
> represent their responses in an inadequate form and we are trying
> to use BasicAA as a faster replacement for TBAA in simple cases.
>
> Any thoughts are highly appreciated.
>
> Thanks,
>
> --
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170511/2223ed47/attachment-0001.html>

Flamedoge via llvm-dev

2017-May-12 02:24 UTC

head link

[llvm-dev] Alias analysis results

> (MustAlias, NotAllowedAlias)
>The latter combination is what I would expect for illegal type
>puns, for whatever definition of "illegal". Note that for such
>cases we currently get MustAlias from BasicAA and NoAlias from
>TBAA. In some comments this is considered okay whereas I have to
>admit that I don't understand how that may possibly be okay,
>meaning it's hard to imagine how a pair of accesses can be
>aliasing and non-aliasing at the same time. To me it appears
>like that the root cause if this inconsistency is not that any of
>these analyses is not correct, but that we are trying to
>represent their responses in an inadequate form
There was another thread on how broken the handling of C/C++ union is with
TBAA. TBAA does not model unions and therefore it used to return NoAlias
between union member accesses which was the cause for few misoptimizations.
I believe the hotfix for this was put into clang, disabling tbaa metadata
for unions.

Kevin

On Thu, May 11, 2017 at 6:48 PM, George Burgess IV via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> +hal
>
> Hi,
>
> I'm not an expert on all of our AA bits, but I've been in the area
some.
> If I'm wrong, I'm sure someone will jump in and we'll both end
up learning
> something. :)
>
> > Why the current TBAA implementation returns NoAlias for pairs of
> > accesses for which it has no idea if they really alias?
>
> For the same reason that LLVM assumes many other things it can't prove.
> The language says "if you do ${x}, the behavior is undefined," so
> optimizing on the assumption that the user isn't doing ${x} is OK.
>
> That said, optimizing on aliasing assumptions has been contentious in the
> past, so frontends that emit TBAA data (e.g. clang) generally support flags
> like -fno-strict-aliasing to relax this behavior.
>
> > that I would expect to a) generate [...] warnings and [...]
>
> FWIW, it's generally nontrivial to emit helpful warnings from LLVM.
This
> is because LLVM doesn't have the AST that the IR was generated from, so
it
> has no source-level information to speak of (except the bits we encode in
> debuginfo, but that doesn't exactly make for great diags).
>
> In other words, a warning emitted by LLVM would probably be as helpful as
> "hey, somewhere in the function '_Z3fooi', or one of the
functions that we
> happened to inline into it, we found some pretty obvious type
punning."
>
> > it's hard to imagine how a pair of accesses can be
> > aliasing and non-aliasing at the same time
>
> As you've noted, this is an artifact of taking two logically
independent
> pieces of information and merging them into one. While this might not be
> ideal from a theoretical point of view, I don't see why an optimization
> would care about the difference between "proven NoAlias" and
"proven
> illegal to alias." In either case, it's allowed to optimize based
on the
> assumption that those things don't alias. So, from a practical
standpoint,
> I don't understand what making that distinction would buy us.
>
> > we are trying to use BasicAA as a faster replacement for
> > TBAA in simple cases
>
> AIUI, the idea of putting BasicAA in front of TBAA isn't an efficiency
> hack: its purpose is to catch "obvious" type punning, and handle
it
> gracefully for the user (your 'b)' above).
>
> You might find Hal's recent work on TySan
(https://reviews.llvm.org/D32197)
> interesting; it's a sanitizer that tries to catch and complain about
type
> punning at run-time. :)
>
> George
>
> On Thu, May 11, 2017 at 1:50 AM, Ivan A. Kosarev via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hello,
>>
>> I'm trying to get the whole picture of what we are doing in terms
>> of alias analysis in LLVM. Being new to this I may be missing
>> some well known things so my apologies if I seem to ask obvious
>> things.
>>
>> There are lots of questions arise in my head as I read related
>> bug reports, sources and documentation, of which looks to be the
>> most simple is,
>>
>> Why the current TBAA implementation returns NoAlias for pairs of
>> accesses for which it has no idea if they really alias?
>>
>> Let me elaborate on this. My understanding of what the TBAA
>> implementation is supposed to respond is whether a given pair of
>> accesses is allowed to alias by the rules of the input language.
>> (Which in turn leads me to another question, that is, whether
>> we are really trying to perform C/C++ alias analysis in a
>> [language-neutral] codegen pass, but let's postpone this for
>> later.)
>>
>> This consideration assumes that what the TBAA implementation
>> actually returns as a result is orthogonal to the
>> Must/No/MayAlias set, that used to be the result type of various
>> AA, including TBAA. I would expect that the result of the
>> complete alias analysis includes both the information on whether
>> given accesses alias and, as a separate element, whether they are
>> allowed to alias by the rules of the language. Then we may have
>> combinations like (MustAlias, AllowedAlias) that seem to be the
>> common case and combinations like (MustAlias, NotAllowedAlias)
>> that I would expect to a) generate the breaks-alias-rules kind of
>> warnings and b) proceed further as any other MustAlias case.
>>
>> The latter combination is what I would expect for illegal type
>> puns, for whatever definition of "illegal". Note that for
such
>> cases we currently get MustAlias from BasicAA and NoAlias from
>> TBAA. In some comments this is considered okay whereas I have to
>> admit that I don't understand how that may possibly be okay,
>> meaning it's hard to imagine how a pair of accesses can be
>> aliasing and non-aliasing at the same time. To me it appears
>> like that the root cause if this inconsistency is not that any of
>> these analyses is not correct, but that we are trying to
>> represent their responses in an inadequate form and we are trying
>> to use BasicAA as a faster replacement for TBAA in simple cases.
>>
>> Any thoughts are highly appreciated.
>>
>> Thanks,
>>
>> --
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170511/b8111c3c/attachment.html>

Daniel Berlin via llvm-dev

2017-Jun-19 18:38 UTC

head link

[llvm-dev] Alias analysis results

On Thu, May 11, 2017 at 1:50 AM, Ivan A. Kosarev via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hello,
>
> I'm trying to get the whole picture of what we are doing in terms
> of alias analysis in LLVM. Being new to this I may be missing
> some well known things so my apologies if I seem to ask obvious
> things.
>
> There are lots of questions arise in my head as I read related
> bug reports, sources and documentation, of which looks to be the
> most simple is,
>
> Why the current TBAA implementation returns NoAlias for pairs of
> accesses for which it has no idea if they really alias?
>
TBAA is broken in various ways, but in general, it is because it knows they
can't legally do so.

>
> Let me elaborate on this. My understanding of what the TBAA
> implementation is supposed to respond is whether a given pair of
> accesses is allowed to alias by the rules of the input language.
>No.
TBAA is a bad name for what LLVM is doing, because it's not really TBAA at
that point.  I've objected to the name before as it confuses people.
It's really anti-alias sets/subsets structured as a tree.  Loads/stores are
tagged with which set they belong to.

In C/C++, those sets are  generated by  TBAA rules.  You could generate
them however you want.

(Which in turn leads me to another question, that is,
whether> we are really trying to perform C/C++ alias analysis in a
> [language-neutral] codegen pass, but let's postpone this for
> later.)
>
No, we are not trying to do so.

>
> This consideration assumes that what the TBAA implementation
> actually returns as a result is orthogonal to the
> Must/No/MayAlias set,

No, it is exactly a noalias set.
>
> The latter combination is what I would expect for illegal type
> puns, for whatever definition of "illegal". Note that for such
> cases we currently get MustAlias from BasicAA and NoAlias from
> TBAA.

This is, IMHO, very very broken and should be fixed in clang.

> In some comments this is considered okay whereas I have to
> admit that I don't understand how that may possibly be okay,
> meaning it's hard to imagine how a pair of accesses can be
> aliasing and non-aliasing at the same time.

Note that it will always be the case
It is entirely possible for one alias analysis to say must alias, and
another to say no-alias, depending on the source of the info.



> To me it appears
> like that the root cause if this inconsistency is not that any of
> these analyses is not correct, but that we are trying to
> represent their responses in an inadequate form and we are trying
> to use BasicAA as a faster replacement for TBAA in simple cases.
>
I don't really see this at all.

THe use of BasicAA to go with TBAA is, IMHO, just broken.
it's not trying to be faster, it's doing it for correctness.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170619/68e8b584/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - May 2017 - Alias analysis results

[llvm-dev] Alias analysis results

[llvm-dev] Alias analysis results

[llvm-dev] Alias analysis results

[llvm-dev] Alias analysis results

Possibly Parallel Threads