Hi Daniel, Sorry for taking so long to respond. I spoke with a colleague more familiar with llvm who thought he could clear up my confusion, but we both came out of the conversation confused. I will try my best to explain the ambiguity. In an DAG, alias queries would be completely unambiguous. Every instruction would only be executed once, and every SSA value really would have a single static assignment. Any instruction would have a single value per execution, and when you ask alias(x,y) it would be a question of whether those single memory addresses overlapped. However, once backedges are introduced, and instruction can be executed multiple times. Every time the execution passes over a backedge, any instructions that are executed again, may have completely different values. Now, when you ask alias(x,y), it's ambiguous which pairs of addresses that query considers. In some cases, it seems clear. For example, in the loops we have been talking about, it seems like you are saying that it only compares the values in the same iteration, but does not say anything about aliasing between iterations. These are perfectly reasonable semantics for loops, but it does not define what the alias queries mean in general. For example, consider the following CFG: If we ask a query alias(x,y) about instructions in basic blocks B and C respectively, what will those answers mean? There is always a backedge between their executions so are they considered to be different iterations and therefore they never alias? I get that there may be some way of understanding the alias queries in which "dynamic instructions" do not matter to the answer, but whatever that way is, it must also be possible to use that way to explain things in terms of the "dynamic instructions" that the CPU ends up executing. And at least for me that would resolve any confusion I have about the AliasAnalysis. Thank you again for your time, Jeremy On Thu, Aug 14, 2014 at 10:46 AM, Daniel Berlin <dberlin at dberlin.org> wrote:> Let me take your previous example in an attempt to hopefully explain why > "dynamic instructions" do not matter to the answer: > You asked about: > > > std::vector<int> A(100); > >> int* x,y; > >> x=&A[0]; > >> > >> for(int i=0; i<100; i++) { > >> y=&A[i]; > >> *y=*x; > >> x=&A[i+1]; > >> } > > > > This is not what it looks like in LLVM. > > In LLVM, it looks like this: > > std::vector<int> A(100); >> int* x,y; >> > >> > x_1=GEP A, 0, 0 >> > for(int i=0; i<100; i++) { >> > i_2 = phi (0, i_3) > x_2 = phi(x_1, x_3) > y_1 = GEP A, 0, i_2 > temp = load x_2 > store y_1, temp > temp2 = add i_2, 1 > x_3 = GEP A, 0, temp2 > i_3 = add i_2, 1 > } > > As you can see, every time you redefine the value of the pointer x to a > new value, it gets a new name. > > alias(x_1, y_1) = mayalias > alias(x_2, y_1) = mustalias > alias(x_3, y_1) = mayalias > > > The only interesting question, IMHO, is whether alias(x_3,y_1) is MayAlias > or NoAlias. > x_2 clearly has the same value as y_1 at all times throughout the program. > > > > > On Thu, Aug 14, 2014 at 6:54 AM, Daniel Berlin <dberlin at dberlin.org> > wrote: > >> On Thu, Aug 14, 2014 at 6:37 AM, Daniel Berlin <dberlin at dberlin.org> >> wrote: >> > On Wed, Aug 13, 2014 at 8:35 PM, Jeremy Salwen <jeremysalwen at gmail.com> >> wrote: >> >> Hey Daniel, >> >> >> >> Thanks again for the help. I'm still a bit confused about the >> interface to >> >> the alias analysis. It seems like we are talking about different >> >> interfaces. >> > Sorry, yes, i was thinking of a different interface. >> > >> >> Has it changed from what the documentation says? As far as I >> >> can tell, the documentation takes a specific Value*, and no information >> >> about which dynamic execution it is talking about. >> > >> > The actual standard low level interface is: >> > >> > virtual AliasResult alias(const Location &LocA, const Location &LocB); >> > (There is a value wrapper for this) >> > >> > I think part of the confusion here is that you keep redefining >> > pointers and expecting the results of queries to change. >> > >> > Outside of globals, LLVM is in SSA form. Thus, these are queries will >> > end up being about specific pointers that will only be defined once. >> > In all of your examples, you have asked about local variables. These >> > local variables will be put in SSA. Every time you redefine the >> > pointer value, it will generate a new definition that is separate from >> > the old ones. >> > If you transform your examples above into globals, you may get >> > different answers. >> > >> > >> > >> >> >> >> When you say "Right. It's a query about specific loads and stores." it >> >> sounds like you are saying that it does consider dynamic executions, >> and >> >> that you can make queries about specific dynamic executions. I can't >> seem >> >> to find this option anywhere in the API.. >> > Yes, sorry, I was thinking of the MemoryDependence interface, which >> > does give you that information. >> > AliasAnalysis does not. >> > >> >> >> >> Perhaps someone else could clear the confusion, because I feel like >> there is >> >> a gap of communication. >> > >> > >> >> >> >> For example, in this code what would the result be for the first store >> and >> >> the first load? >> > >> >> >> >>> #include <cstdio> >> >>> #include <cstdlib> >> >>> int main(int c, char** argv) { >> >>> int* A=(int*)malloc(100*sizeof(int)); >> >>> >> >>> if(c>1) { >> >>> for(int i=0; i<5; i++) { >> >>> A[i]=5; >> >>> } >> >>> } else { >> >>> A[1]=5; >> >>> } >> >>> printf("%d\n",A[4]); >> >>> free(A); >> >>> } >> >> >> >> >> > >> >> MustAlias? Because whenever they are executed, they do alias. Or >> MayAlias? >> > Since you seem to be having trouble going all the way down the rabbit >> > hole here, let me take a different approach. >> > >> > In this example, A[i] is not an LLVM location or value *. >> > You will never see A[i] in the LLVM IR to query about. >> > You will see two pointers as this: >> > %i.0 = phi i32 [ 0, %4 ], [ %11, %10 ] >> > %8 = sext i32 %i.0 to i64 >> > %9 = getelementptr inbounds i32* %2, i64 %8 >> > store i32 5, i32* %9, align 4 >> > >> > and >> > %16 = getelementptr inbounds i32* %2, i64 4 >> > %17 = load i32* %16, align 4 >> > >> > The query will be alias(%9, %16) >> > >> >> >> > In this case, it cannot prove it is MustAlias, but it is actually >> MustAlias. >> > It will, however, return MayAlias because it is not powerful enough to >> prove it. >> >> Sigh, too early in the morning, pasted the right output, wrote the >> answer while looking at the wrong pass output. >> These two pointers will always be MayAlias. >> LLVM later actually deletes the loop, and transforms the above into a >> memset + a load from memset, and ends up with two mustalias pointers >> in that case. >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140820/5a4ee3bf/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: Untitled.png Type: image/png Size: 6953 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140820/5a4ee3bf/attachment.png>
----- Original Message -----> From: "Jeremy Salwen" <jeremysalwen at gmail.com> > To: "Daniel Berlin" <dberlin at dberlin.org> > Cc: "llvmdev" <llvmdev at cs.uiuc.edu> > Sent: Wednesday, August 20, 2014 7:58:46 PM > Subject: Re: [LLVMdev] Alias Analysis Semantics > > Hi Daniel, > > Sorry for taking so long to respond. I spoke with a colleague more > familiar with llvm who thought he could clear up my confusion, but > we both came out of the conversation confused. I will try my best to > explain the ambiguity. > > In an DAG, alias queries would be completely unambiguous. Every > instruction would only be executed once, and every SSA value really > would have a single static assignment. Any instruction would have a > single value per execution, and when you ask alias(x,y) it would be > a question of whether those single memory addresses overlapped. > > However, once backedges are introduced, and instruction can be > executed multiple times. Every time the execution passes over a > backedge, any instructions that are executed again, may have > completely different values. Now, when you ask alias(x,y), it's > ambiguous which pairs of addresses that query considers. > > In some cases, it seems clear. For example, in the loops we have been > talking about, it seems like you are saying that it only compares > the values in the same iteration, but does not say anything about > aliasing between iterations. These are perfectly reasonable > semantics for loops, but it does not define what the alias queries > mean in general. For example, consider the following CFG: > > > > If we ask a query alias(x,y) about instructions in basic blocks B and > C respectively, what will those answers mean? There is always a > backedge between their executions so are they considered to be > different iterations and therefore they never alias?Perhaps Danny will have a better answer, but I think about it this way: AliasAnalysis only answers queries about variable relationships at points along the dynamic execution path of the program. For a query to be well formed, there much be some point (some instruction in the LLVM IR) at which both values being queried are simultaneously live. In your example, there is no block dominated by both B and C, and so no point in the program where it would be legal to refer to both x and y. As a result, the query itself is meaningless. Do you have some transformation or analysis in mind that would issue such a query? -Hal> > > I get that there may be some way of understanding the alias queries > in which "dynamic instructions" do not matter to the answer, but > whatever that way is, it must also be possible to use that way to > explain things in terms of the "dynamic instructions" that the CPU > ends up executing. And at least for me that would resolve any > confusion I have about the AliasAnalysis. > > > Thank you again for your time, > > Jeremy > > > > > On Thu, Aug 14, 2014 at 10:46 AM, Daniel Berlin < dberlin at dberlin.org > > wrote: > > > > Let me take your previous example in an attempt to hopefully explain > why "dynamic instructions" do not matter to the answer: > You asked about: > > > > std::vector<int> A(100); > >> int* x,y; > >> x=&A[0]; > >> > >> for(int i=0; i<100; i++) { > >> y=&A[i]; > >> *y=*x; > >> x=&A[i+1]; > >> } > > > > > > This is not what it looks like in LLVM. > > > In LLVM, it looks like this: > > > > > > > > > std::vector<int> A(100); > > int* x,y; > > > > > > > > > x_1=GEP A, 0, 0 > > > > for(int i=0; i<100; i++) { > i_2 = phi (0, i_3) > x_2 = phi(x_1, x_3) > y_1 = GEP A, 0, i_2 > temp = load x_2 > store y_1, temp > temp2 = add i_2, 1 > x_3 = GEP A, 0, temp2 > i_3 = add i_2, 1 > } > > > As you can see, every time you redefine the value of the pointer x to > a new value, it gets a new name. > > > alias(x_1, y_1) = mayalias > alias(x_2, y_1) = mustalias > alias(x_3, y_1) = mayalias > > > > > The only interesting question, IMHO, is whether alias(x_3,y_1) is > MayAlias or NoAlias. > x_2 clearly has the same value as y_1 at all times throughout the > program. > > > > > > > > > On Thu, Aug 14, 2014 at 6:54 AM, Daniel Berlin < dberlin at dberlin.org > > wrote: > > > > > On Thu, Aug 14, 2014 at 6:37 AM, Daniel Berlin < dberlin at dberlin.org > > wrote: > > On Wed, Aug 13, 2014 at 8:35 PM, Jeremy Salwen < > > jeremysalwen at gmail.com > wrote: > >> Hey Daniel, > >> > >> Thanks again for the help. I'm still a bit confused about the > >> interface to > >> the alias analysis. It seems like we are talking about different > >> interfaces. > > Sorry, yes, i was thinking of a different interface. > > > >> Has it changed from what the documentation says? As far as I > >> can tell, the documentation takes a specific Value*, and no > >> information > >> about which dynamic execution it is talking about. > > > > The actual standard low level interface is: > > > > virtual AliasResult alias(const Location &LocA, const Location > > &LocB); > > (There is a value wrapper for this) > > > > I think part of the confusion here is that you keep redefining > > pointers and expecting the results of queries to change. > > > > Outside of globals, LLVM is in SSA form. Thus, these are queries > > will > > end up being about specific pointers that will only be defined > > once. > > In all of your examples, you have asked about local variables. > > These > > local variables will be put in SSA. Every time you redefine the > > pointer value, it will generate a new definition that is separate > > from > > the old ones. > > If you transform your examples above into globals, you may get > > different answers. > > > > > > > >> > >> When you say "Right. It's a query about specific loads and > >> stores." it > >> sounds like you are saying that it does consider dynamic > >> executions, and > >> that you can make queries about specific dynamic executions. I > >> can't seem > >> to find this option anywhere in the API.. > > Yes, sorry, I was thinking of the MemoryDependence interface, which > > does give you that information. > > AliasAnalysis does not. > > > >> > >> Perhaps someone else could clear the confusion, because I feel > >> like there is > >> a gap of communication. > > > > > >> > >> For example, in this code what would the result be for the first > >> store and > >> the first load? > > > >> > >>> #include <cstdio> > >>> #include <cstdlib> > >>> int main(int c, char** argv) { > >>> int* A=(int*)malloc(100*sizeof(int)); > >>> > >>> if(c>1) { > >>> for(int i=0; i<5; i++) { > >>> A[i]=5; > >>> } > >>> } else { > >>> A[1]=5; > >>> } > >>> printf("%d\n",A[4]); > >>> free(A); > >>> } > >> > >> > > > >> MustAlias? Because whenever they are executed, they do alias. Or > >> MayAlias? > > Since you seem to be having trouble going all the way down the > > rabbit > > hole here, let me take a different approach. > > > > In this example, A[i] is not an LLVM location or value *. > > You will never see A[i] in the LLVM IR to query about. > > You will see two pointers as this: > > %i.0 = phi i32 [ 0, %4 ], [ %11, %10 ] > > %8 = sext i32 %i.0 to i64 > > %9 = getelementptr inbounds i32* %2, i64 %8 > > store i32 5, i32* %9, align 4 > > > > and > > %16 = getelementptr inbounds i32* %2, i64 4 > > %17 = load i32* %16, align 4 > > > > The query will be alias(%9, %16) > > > > > > In this case, it cannot prove it is MustAlias, but it is actually > > MustAlias. > > It will, however, return MayAlias because it is not powerful enough > > to prove it. > > Sigh, too early in the morning, pasted the right output, wrote the > answer while looking at the wrong pass output. > These two pointers will always be MayAlias. > LLVM later actually deletes the loop, and transforms the above into a > memset + a load from memset, and ends up with two mustalias pointers > in that case. > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Hi Hal, Thank you for your email, that makes a lot of sense to me. I am working on some tools to use memory profiling to speculatively replace memory loads and stores with value forwarding in hardware implementations. I'd like to compare the profiled data to static alias analysis, so it would be super useful if there was a way to answer the questions about aliasing across backedges that AliasAnalysis considers meaningless. Perhaps the MemoryDependence would allow this? Thank you, Jeremy On Wed, Aug 20, 2014 at 11:58 PM, Hal Finkel <hfinkel at anl.gov> wrote:> ----- Original Message ----- > > From: "Jeremy Salwen" <jeremysalwen at gmail.com> > > To: "Daniel Berlin" <dberlin at dberlin.org> > > Cc: "llvmdev" <llvmdev at cs.uiuc.edu> > > Sent: Wednesday, August 20, 2014 7:58:46 PM > > Subject: Re: [LLVMdev] Alias Analysis Semantics > > > > Hi Daniel, > > > > Sorry for taking so long to respond. I spoke with a colleague more > > familiar with llvm who thought he could clear up my confusion, but > > we both came out of the conversation confused. I will try my best to > > explain the ambiguity. > > > > In an DAG, alias queries would be completely unambiguous. Every > > instruction would only be executed once, and every SSA value really > > would have a single static assignment. Any instruction would have a > > single value per execution, and when you ask alias(x,y) it would be > > a question of whether those single memory addresses overlapped. > > > > However, once backedges are introduced, and instruction can be > > executed multiple times. Every time the execution passes over a > > backedge, any instructions that are executed again, may have > > completely different values. Now, when you ask alias(x,y), it's > > ambiguous which pairs of addresses that query considers. > > > > In some cases, it seems clear. For example, in the loops we have been > > talking about, it seems like you are saying that it only compares > > the values in the same iteration, but does not say anything about > > aliasing between iterations. These are perfectly reasonable > > semantics for loops, but it does not define what the alias queries > > mean in general. For example, consider the following CFG: > > > > > > > > If we ask a query alias(x,y) about instructions in basic blocks B and > > C respectively, what will those answers mean? There is always a > > backedge between their executions so are they considered to be > > different iterations and therefore they never alias? > > Perhaps Danny will have a better answer, but I think about it this way: > AliasAnalysis only answers queries about variable relationships at points > along the dynamic execution path of the program. For a query to be well > formed, there much be some point (some instruction in the LLVM IR) at which > both values being queried are simultaneously live. In your example, there > is no block dominated by both B and C, and so no point in the program where > it would be legal to refer to both x and y. As a result, the query itself > is meaningless. Do you have some transformation or analysis in mind that > would issue such a query? > > -Hal > > > > > > > I get that there may be some way of understanding the alias queries > > in which "dynamic instructions" do not matter to the answer, but > > whatever that way is, it must also be possible to use that way to > > explain things in terms of the "dynamic instructions" that the CPU > > ends up executing. And at least for me that would resolve any > > confusion I have about the AliasAnalysis. > > > > > > Thank you again for your time, > > > > Jeremy > > > > > > > > > > On Thu, Aug 14, 2014 at 10:46 AM, Daniel Berlin < dberlin at dberlin.org > > > wrote: > > > > > > > > Let me take your previous example in an attempt to hopefully explain > > why "dynamic instructions" do not matter to the answer: > > You asked about: > > > > > > > std::vector<int> A(100); > > >> int* x,y; > > >> x=&A[0]; > > >> > > >> for(int i=0; i<100; i++) { > > >> y=&A[i]; > > >> *y=*x; > > >> x=&A[i+1]; > > >> } > > > > > > > > > > > This is not what it looks like in LLVM. > > > > > > In LLVM, it looks like this: > > > > > > > > > > > > > > > > > > std::vector<int> A(100); > > > > int* x,y; > > > > > > > > > > > > > > > > > > x_1=GEP A, 0, 0 > > > > > > > > for(int i=0; i<100; i++) { > > i_2 = phi (0, i_3) > > x_2 = phi(x_1, x_3) > > y_1 = GEP A, 0, i_2 > > temp = load x_2 > > store y_1, temp > > temp2 = add i_2, 1 > > x_3 = GEP A, 0, temp2 > > i_3 = add i_2, 1 > > } > > > > > > As you can see, every time you redefine the value of the pointer x to > > a new value, it gets a new name. > > > > > > alias(x_1, y_1) = mayalias > > alias(x_2, y_1) = mustalias > > alias(x_3, y_1) = mayalias > > > > > > > > > > The only interesting question, IMHO, is whether alias(x_3,y_1) is > > MayAlias or NoAlias. > > x_2 clearly has the same value as y_1 at all times throughout the > > program. > > > > > > > > > > > > > > > > > > On Thu, Aug 14, 2014 at 6:54 AM, Daniel Berlin < dberlin at dberlin.org > > > wrote: > > > > > > > > > > On Thu, Aug 14, 2014 at 6:37 AM, Daniel Berlin < dberlin at dberlin.org > > > wrote: > > > On Wed, Aug 13, 2014 at 8:35 PM, Jeremy Salwen < > > > jeremysalwen at gmail.com > wrote: > > >> Hey Daniel, > > >> > > >> Thanks again for the help. I'm still a bit confused about the > > >> interface to > > >> the alias analysis. It seems like we are talking about different > > >> interfaces. > > > Sorry, yes, i was thinking of a different interface. > > > > > >> Has it changed from what the documentation says? As far as I > > >> can tell, the documentation takes a specific Value*, and no > > >> information > > >> about which dynamic execution it is talking about. > > > > > > The actual standard low level interface is: > > > > > > virtual AliasResult alias(const Location &LocA, const Location > > > &LocB); > > > (There is a value wrapper for this) > > > > > > I think part of the confusion here is that you keep redefining > > > pointers and expecting the results of queries to change. > > > > > > Outside of globals, LLVM is in SSA form. Thus, these are queries > > > will > > > end up being about specific pointers that will only be defined > > > once. > > > In all of your examples, you have asked about local variables. > > > These > > > local variables will be put in SSA. Every time you redefine the > > > pointer value, it will generate a new definition that is separate > > > from > > > the old ones. > > > If you transform your examples above into globals, you may get > > > different answers. > > > > > > > > > > > >> > > >> When you say "Right. It's a query about specific loads and > > >> stores." it > > >> sounds like you are saying that it does consider dynamic > > >> executions, and > > >> that you can make queries about specific dynamic executions. I > > >> can't seem > > >> to find this option anywhere in the API.. > > > Yes, sorry, I was thinking of the MemoryDependence interface, which > > > does give you that information. > > > AliasAnalysis does not. > > > > > >> > > >> Perhaps someone else could clear the confusion, because I feel > > >> like there is > > >> a gap of communication. > > > > > > > > >> > > >> For example, in this code what would the result be for the first > > >> store and > > >> the first load? > > > > > >> > > >>> #include <cstdio> > > >>> #include <cstdlib> > > >>> int main(int c, char** argv) { > > >>> int* A=(int*)malloc(100*sizeof(int)); > > >>> > > >>> if(c>1) { > > >>> for(int i=0; i<5; i++) { > > >>> A[i]=5; > > >>> } > > >>> } else { > > >>> A[1]=5; > > >>> } > > >>> printf("%d\n",A[4]); > > >>> free(A); > > >>> } > > >> > > >> > > > > > >> MustAlias? Because whenever they are executed, they do alias. Or > > >> MayAlias? > > > Since you seem to be having trouble going all the way down the > > > rabbit > > > hole here, let me take a different approach. > > > > > > In this example, A[i] is not an LLVM location or value *. > > > You will never see A[i] in the LLVM IR to query about. > > > You will see two pointers as this: > > > %i.0 = phi i32 [ 0, %4 ], [ %11, %10 ] > > > %8 = sext i32 %i.0 to i64 > > > %9 = getelementptr inbounds i32* %2, i64 %8 > > > store i32 5, i32* %9, align 4 > > > > > > and > > > %16 = getelementptr inbounds i32* %2, i64 4 > > > %17 = load i32* %16, align 4 > > > > > > The query will be alias(%9, %16) > > > > > > > > > > In this case, it cannot prove it is MustAlias, but it is actually > > > MustAlias. > > > It will, however, return MayAlias because it is not powerful enough > > > to prove it. > > > > Sigh, too early in the morning, pasted the right output, wrote the > > answer while looking at the wrong pass output. > > These two pointers will always be MayAlias. > > LLVM later actually deletes the loop, and transforms the above into a > > memset + a load from memset, and ends up with two mustalias pointers > > in that case. > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140821/fb997df6/attachment.html>