Hello, I'm trying to get the whole picture of what we are doing in terms of alias analysis in LLVM. Being new to this I may be missing some well known things so my apologies if I seem to ask obvious things. There are lots of questions arise in my head as I read related bug reports, sources and documentation, of which looks to be the most simple is, Why the current TBAA implementation returns NoAlias for pairs of accesses for which it has no idea if they really alias? Let me elaborate on this. My understanding of what the TBAA implementation is supposed to respond is whether a given pair of accesses is allowed to alias by the rules of the input language. (Which in turn leads me to another question, that is, whether we are really trying to perform C/C++ alias analysis in a [language-neutral] codegen pass, but let's postpone this for later.) This consideration assumes that what the TBAA implementation actually returns as a result is orthogonal to the Must/No/MayAlias set, that used to be the result type of various AA, including TBAA. I would expect that the result of the complete alias analysis includes both the information on whether given accesses alias and, as a separate element, whether they are allowed to alias by the rules of the language. Then we may have combinations like (MustAlias, AllowedAlias) that seem to be the common case and combinations like (MustAlias, NotAllowedAlias) that I would expect to a) generate the breaks-alias-rules kind of warnings and b) proceed further as any other MustAlias case. The latter combination is what I would expect for illegal type puns, for whatever definition of "illegal". Note that for such cases we currently get MustAlias from BasicAA and NoAlias from TBAA. In some comments this is considered okay whereas I have to admit that I don't understand how that may possibly be okay, meaning it's hard to imagine how a pair of accesses can be aliasing and non-aliasing at the same time. To me it appears like that the root cause if this inconsistency is not that any of these analyses is not correct, but that we are trying to represent their responses in an inadequate form and we are trying to use BasicAA as a faster replacement for TBAA in simple cases. Any thoughts are highly appreciated. Thanks, --
+hal Hi, I'm not an expert on all of our AA bits, but I've been in the area some. If I'm wrong, I'm sure someone will jump in and we'll both end up learning something. :)> Why the current TBAA implementation returns NoAlias for pairs of > accesses for which it has no idea if they really alias?For the same reason that LLVM assumes many other things it can't prove. The language says "if you do ${x}, the behavior is undefined," so optimizing on the assumption that the user isn't doing ${x} is OK. That said, optimizing on aliasing assumptions has been contentious in the past, so frontends that emit TBAA data (e.g. clang) generally support flags like -fno-strict-aliasing to relax this behavior.> that I would expect to a) generate [...] warnings and [...]FWIW, it's generally nontrivial to emit helpful warnings from LLVM. This is because LLVM doesn't have the AST that the IR was generated from, so it has no source-level information to speak of (except the bits we encode in debuginfo, but that doesn't exactly make for great diags). In other words, a warning emitted by LLVM would probably be as helpful as "hey, somewhere in the function '_Z3fooi', or one of the functions that we happened to inline into it, we found some pretty obvious type punning."> it's hard to imagine how a pair of accesses can be > aliasing and non-aliasing at the same timeAs you've noted, this is an artifact of taking two logically independent pieces of information and merging them into one. While this might not be ideal from a theoretical point of view, I don't see why an optimization would care about the difference between "proven NoAlias" and "proven illegal to alias." In either case, it's allowed to optimize based on the assumption that those things don't alias. So, from a practical standpoint, I don't understand what making that distinction would buy us.> we are trying to use BasicAA as a faster replacement for > TBAA in simple casesAIUI, the idea of putting BasicAA in front of TBAA isn't an efficiency hack: its purpose is to catch "obvious" type punning, and handle it gracefully for the user (your 'b)' above). You might find Hal's recent work on TySan (https://reviews.llvm.org/D32197) interesting; it's a sanitizer that tries to catch and complain about type punning at run-time. :) George On Thu, May 11, 2017 at 1:50 AM, Ivan A. Kosarev via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hello, > > I'm trying to get the whole picture of what we are doing in terms > of alias analysis in LLVM. Being new to this I may be missing > some well known things so my apologies if I seem to ask obvious > things. > > There are lots of questions arise in my head as I read related > bug reports, sources and documentation, of which looks to be the > most simple is, > > Why the current TBAA implementation returns NoAlias for pairs of > accesses for which it has no idea if they really alias? > > Let me elaborate on this. My understanding of what the TBAA > implementation is supposed to respond is whether a given pair of > accesses is allowed to alias by the rules of the input language. > (Which in turn leads me to another question, that is, whether > we are really trying to perform C/C++ alias analysis in a > [language-neutral] codegen pass, but let's postpone this for > later.) > > This consideration assumes that what the TBAA implementation > actually returns as a result is orthogonal to the > Must/No/MayAlias set, that used to be the result type of various > AA, including TBAA. I would expect that the result of the > complete alias analysis includes both the information on whether > given accesses alias and, as a separate element, whether they are > allowed to alias by the rules of the language. Then we may have > combinations like (MustAlias, AllowedAlias) that seem to be the > common case and combinations like (MustAlias, NotAllowedAlias) > that I would expect to a) generate the breaks-alias-rules kind of > warnings and b) proceed further as any other MustAlias case. > > The latter combination is what I would expect for illegal type > puns, for whatever definition of "illegal". Note that for such > cases we currently get MustAlias from BasicAA and NoAlias from > TBAA. In some comments this is considered okay whereas I have to > admit that I don't understand how that may possibly be okay, > meaning it's hard to imagine how a pair of accesses can be > aliasing and non-aliasing at the same time. To me it appears > like that the root cause if this inconsistency is not that any of > these analyses is not correct, but that we are trying to > represent their responses in an inadequate form and we are trying > to use BasicAA as a faster replacement for TBAA in simple cases. > > Any thoughts are highly appreciated. > > Thanks, > > -- > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170511/2223ed47/attachment-0001.html>
> (MustAlias, NotAllowedAlias) >The latter combination is what I would expect for illegal type >puns, for whatever definition of "illegal". Note that for such >cases we currently get MustAlias from BasicAA and NoAlias from >TBAA. In some comments this is considered okay whereas I have to >admit that I don't understand how that may possibly be okay, >meaning it's hard to imagine how a pair of accesses can be >aliasing and non-aliasing at the same time. To me it appears >like that the root cause if this inconsistency is not that any of >these analyses is not correct, but that we are trying to >represent their responses in an inadequate formThere was another thread on how broken the handling of C/C++ union is with TBAA. TBAA does not model unions and therefore it used to return NoAlias between union member accesses which was the cause for few misoptimizations. I believe the hotfix for this was put into clang, disabling tbaa metadata for unions. Kevin On Thu, May 11, 2017 at 6:48 PM, George Burgess IV via llvm-dev < llvm-dev at lists.llvm.org> wrote:> +hal > > Hi, > > I'm not an expert on all of our AA bits, but I've been in the area some. > If I'm wrong, I'm sure someone will jump in and we'll both end up learning > something. :) > > > Why the current TBAA implementation returns NoAlias for pairs of > > accesses for which it has no idea if they really alias? > > For the same reason that LLVM assumes many other things it can't prove. > The language says "if you do ${x}, the behavior is undefined," so > optimizing on the assumption that the user isn't doing ${x} is OK. > > That said, optimizing on aliasing assumptions has been contentious in the > past, so frontends that emit TBAA data (e.g. clang) generally support flags > like -fno-strict-aliasing to relax this behavior. > > > that I would expect to a) generate [...] warnings and [...] > > FWIW, it's generally nontrivial to emit helpful warnings from LLVM. This > is because LLVM doesn't have the AST that the IR was generated from, so it > has no source-level information to speak of (except the bits we encode in > debuginfo, but that doesn't exactly make for great diags). > > In other words, a warning emitted by LLVM would probably be as helpful as > "hey, somewhere in the function '_Z3fooi', or one of the functions that we > happened to inline into it, we found some pretty obvious type punning." > > > it's hard to imagine how a pair of accesses can be > > aliasing and non-aliasing at the same time > > As you've noted, this is an artifact of taking two logically independent > pieces of information and merging them into one. While this might not be > ideal from a theoretical point of view, I don't see why an optimization > would care about the difference between "proven NoAlias" and "proven > illegal to alias." In either case, it's allowed to optimize based on the > assumption that those things don't alias. So, from a practical standpoint, > I don't understand what making that distinction would buy us. > > > we are trying to use BasicAA as a faster replacement for > > TBAA in simple cases > > AIUI, the idea of putting BasicAA in front of TBAA isn't an efficiency > hack: its purpose is to catch "obvious" type punning, and handle it > gracefully for the user (your 'b)' above). > > You might find Hal's recent work on TySan (https://reviews.llvm.org/D32197) > interesting; it's a sanitizer that tries to catch and complain about type > punning at run-time. :) > > George > > On Thu, May 11, 2017 at 1:50 AM, Ivan A. Kosarev via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hello, >> >> I'm trying to get the whole picture of what we are doing in terms >> of alias analysis in LLVM. Being new to this I may be missing >> some well known things so my apologies if I seem to ask obvious >> things. >> >> There are lots of questions arise in my head as I read related >> bug reports, sources and documentation, of which looks to be the >> most simple is, >> >> Why the current TBAA implementation returns NoAlias for pairs of >> accesses for which it has no idea if they really alias? >> >> Let me elaborate on this. My understanding of what the TBAA >> implementation is supposed to respond is whether a given pair of >> accesses is allowed to alias by the rules of the input language. >> (Which in turn leads me to another question, that is, whether >> we are really trying to perform C/C++ alias analysis in a >> [language-neutral] codegen pass, but let's postpone this for >> later.) >> >> This consideration assumes that what the TBAA implementation >> actually returns as a result is orthogonal to the >> Must/No/MayAlias set, that used to be the result type of various >> AA, including TBAA. I would expect that the result of the >> complete alias analysis includes both the information on whether >> given accesses alias and, as a separate element, whether they are >> allowed to alias by the rules of the language. Then we may have >> combinations like (MustAlias, AllowedAlias) that seem to be the >> common case and combinations like (MustAlias, NotAllowedAlias) >> that I would expect to a) generate the breaks-alias-rules kind of >> warnings and b) proceed further as any other MustAlias case. >> >> The latter combination is what I would expect for illegal type >> puns, for whatever definition of "illegal". Note that for such >> cases we currently get MustAlias from BasicAA and NoAlias from >> TBAA. In some comments this is considered okay whereas I have to >> admit that I don't understand how that may possibly be okay, >> meaning it's hard to imagine how a pair of accesses can be >> aliasing and non-aliasing at the same time. To me it appears >> like that the root cause if this inconsistency is not that any of >> these analyses is not correct, but that we are trying to >> represent their responses in an inadequate form and we are trying >> to use BasicAA as a faster replacement for TBAA in simple cases. >> >> Any thoughts are highly appreciated. >> >> Thanks, >> >> -- >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170511/b8111c3c/attachment.html>
On Thu, May 11, 2017 at 1:50 AM, Ivan A. Kosarev via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hello, > > I'm trying to get the whole picture of what we are doing in terms > of alias analysis in LLVM. Being new to this I may be missing > some well known things so my apologies if I seem to ask obvious > things. > > There are lots of questions arise in my head as I read related > bug reports, sources and documentation, of which looks to be the > most simple is, > > Why the current TBAA implementation returns NoAlias for pairs of > accesses for which it has no idea if they really alias? >TBAA is broken in various ways, but in general, it is because it knows they can't legally do so.> > Let me elaborate on this. My understanding of what the TBAA > implementation is supposed to respond is whether a given pair of > accesses is allowed to alias by the rules of the input language. >No. TBAA is a bad name for what LLVM is doing, because it's not really TBAA at that point. I've objected to the name before as it confuses people. It's really anti-alias sets/subsets structured as a tree. Loads/stores are tagged with which set they belong to. In C/C++, those sets are generated by TBAA rules. You could generate them however you want. (Which in turn leads me to another question, that is, whether> we are really trying to perform C/C++ alias analysis in a > [language-neutral] codegen pass, but let's postpone this for > later.) >No, we are not trying to do so.> > This consideration assumes that what the TBAA implementation > actually returns as a result is orthogonal to the > Must/No/MayAlias set,No, it is exactly a noalias set.> > The latter combination is what I would expect for illegal type > puns, for whatever definition of "illegal". Note that for such > cases we currently get MustAlias from BasicAA and NoAlias from > TBAA.This is, IMHO, very very broken and should be fixed in clang.> In some comments this is considered okay whereas I have to > admit that I don't understand how that may possibly be okay, > meaning it's hard to imagine how a pair of accesses can be > aliasing and non-aliasing at the same time.Note that it will always be the case It is entirely possible for one alias analysis to say must alias, and another to say no-alias, depending on the source of the info.> To me it appears > like that the root cause if this inconsistency is not that any of > these analyses is not correct, but that we are trying to > represent their responses in an inadequate form and we are trying > to use BasicAA as a faster replacement for TBAA in simple cases. >I don't really see this at all. THe use of BasicAA to go with TBAA is, IMHO, just broken. it's not trying to be faster, it's doing it for correctness. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170619/68e8b584/attachment.html>