Shuai Wang via llvm-dev
2020-Jul-09 10:51 UTC
[llvm-dev] Understand alias-analysis results
Hey Matt, That's awesome. Thank you very much for all the information and clarification! Just a few follow up questions. Could you kindly shed some lights on it? Thank you! 1. I tried to tweak the code in the following way: - Clang [-> LLVM-IR]: https://llvm.godbolt.org/z/n9rGrs - [LLVM-IR ->] opt: https://llvm.godbolt.org/z/Uc6h5Y And i note that the outputs are: Alias sets for function 'main': Alias Set Tracker: 2 alias sets for 4 pointer values. *AliasSet[0x563faa6c6260, 5] may alias, Mod/Ref Pointers: (i8* %0, LocationSize::precise(4)), (i32* %a, LocationSize::precise(4)), (i8* %1, LocationSize::precise(4)), (i32* %b, LocationSize::precise(4))* 1 Unknown instructions: call void @NOALIAS(i8* nonnull %0, i8* nonnull %1) #3 *AliasSet[0x563faa6b45e0, 1] must alias, Mod/Ref forwarding to 0x563faa6c6260* ===== Alias Analysis Evaluator Report ==== 6 Total Alias Queries Performed 4 no alias responses (66.6%) * 0 may alias responses (0.0%)* 0 partial alias responses (0.0%) *2 must alias responses (33.3%)* I am trying to interpret the outputs, so if I understand correctly, the output indicates that we have an alias set of 4 pointers which "potentially" point to the same memory region, correct? Then is there any more accurate analysis pass that I could use to somewhat infer that "there are two must alias sets, each set has two pointers"? Correct me if I was wrong here.. Using my local opt (version 6.0), I tried to iterate all feasible alias analysis passes but the results are not changed. Also, what is the "must alias, Mod/Ref forwarding to 0x563faa6c6260"? And how to interpret that we have "2 must alias responses"? Where does it come from? And why do we have "0 may alias response"? I would expect to have at least "4 may alias responses" as well? 2. I note that using the latest opt (version 11.0?) gives different outputs with my local opt (version 6.0). For opt (version 6.0), it reports: 2 alias sets for 2 pointer values. More importantly, can I expect to get generally better alias analysis results when switching to version 11.0? Thank you very much! Best, Shuai On Thu, Jul 9, 2020 at 6:14 PM Matt P. Dziubinski <matdzb at gmail.com> wrote:> On 7/9/2020 10:15, Shuai Wang via llvm-dev wrote: > > Hello, > > > > I am performing alias analysis toward the following simple code: > > > > [...] > > > > I checked the generated .ll code, and it shows that within the main > > function and NOALIAS functions, there is only a "ret" statement, with no > > global or local variables used. Could anyone shed some lights on where > > the "1 may alias" come from? And is there a way that I can force the > > alias analysis algorithm to focus only the "main" function? Thank you > > very much. > > Hi! > > Here's more information after initializing the variables (assuming the > intent in the source code was, e.g., to initialize `a` and `b` to `0` > and the pointers `f1` and `f2` to `NULL`, using aggregate initialization > for `s`): > - Clang [-> LLVM-IR]: https://llvm.godbolt.org/z/WT7V3E > - [LLVM-IR ->] opt: https://llvm.godbolt.org/z/Veswa4 > > Alias sets for function 'main': Alias Set Tracker: 1 alias sets for 2 > pointer values. > AliasSet[0x55ec7f9a23e0, 3] may alias, Mod/Ref Pointers: (i8* %0, > LocationSize::precise(4)), (i32* %a, LocationSize::precise(4)) > > Note that in the original source code `a`, `b` are > uninitialized--consequently, attempting to access `s[a].f1` and > `s[b].f2` is undefined behavior (as we're using automatic storage > duration objects `a` and `b` while their values are indeterminate): > https://taas.trust-in-soft.com/tsnippet/t/acff56c8 > > Cf. https://cigix.me/c17#6.7.9.p10 ("If an object that has automatic > storage duration is not initialized explicitly, its value is > indeterminate.") & https://cigix.me/c17#J.2.p1 > ("The behavior is undefined in the following circumstances: [...] The > value of an object with automatic storage duration is used while it is > indeterminate"). > > As such, you can notice that most of the code is going to be optimized > away between mem2reg and dead argument elimination: > https://llvm.godbolt.org/z/iEdKE_ > > (Similarly, even if `a` and `b` were initialized to `0`, we only wrote > to `f1` for `s[0]` and `s[1]`, so accessing `s[b].f2` is again using an > object while it is indeterminate and undefined behavior.) > > *** IR Dump After Promote Memory to Register *** > > ; the following corresponds to loading `s[a].f1` > %3 = load i32, i32* %a, align 4, !tbaa !7 > %idxprom = sext i32 %3 to i64 > %arrayidx3 = getelementptr inbounds [2 x %struct.MyStruct], [2 x > %struct.MyStruct]* %s, i64 0, i64 %idxprom > %f14 = getelementptr inbounds %struct.MyStruct, %struct.MyStruct* > %arrayidx3, i32 0, i32 0 > %4 = load i32*, i32** %f14, align 16, !tbaa !2 > %5 = bitcast i32* %4 to i8* > > ; the following corresponds to loading `s[b].f2` > %6 = load i32, i32* %b, align 4, !tbaa !7 > %idxprom5 = sext i32 %6 to i64 > %arrayidx6 = getelementptr inbounds [2 x %struct.MyStruct], [2 x > %struct.MyStruct]* %s, i64 0, i64 %idxprom5 > %f2 = getelementptr inbounds %struct.MyStruct, %struct.MyStruct* > %arrayidx6, i32 0, i32 1 > %7 = load i32*, i32** %f2, align 8, !tbaa !9 > %8 = bitcast i32* %7 to i8* > call void @NOALIAS(i8* %5, i8* %8) > > *** IR Dump After Dead Argument Elimination *** > ; note how the arguments have been rewritten to `undef` in the following: > call void @NOALIAS(i8* undef, i8* undef) > > > And is there a way that I can force the alias analysis algorithm to > focus only the "main" function? > > One way is to make the definition of `NOALIAS` unavailable (as if > external) by only providing the declaration (as in the above examples). > > Best, > Matt >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200709/623335fc/attachment.html>
Shuai Wang via llvm-dev
2020-Jul-09 11:24 UTC
[llvm-dev] Understand alias-analysis results
And another case: - Clang [-> LLVM-IR]: https://llvm.godbolt.org/z/SGeJZw - [LLVM-IR ->] opt: https://llvm.godbolt.org/z/dNi-k2 Is there any chance that we can smoothly infer that: - x and &c are "must alias" - x and b are "must alias" I don't know how to interpret the current results, in particular the following outputs: AliasSet[0x5584ab7e5f30, 1] must alias, Mod/Ref Pointers: (i32** %x, LocationSize::precise(8)) AliasSet[0x5584ab7e6020, 1] must alias, Mod Pointers: (i32* %y, LocationSize::precise(4)) It means we have two "must alias" sets, each of which contain only one pointer? That seems quite confusing to me.. Best, Shuai On Thu, Jul 9, 2020 at 6:51 PM Shuai Wang <wangshuai901 at gmail.com> wrote:> Hey Matt, > > That's awesome. Thank you very much for all the information and > clarification! Just a few follow up questions. Could you kindly shed some > lights on it? Thank you! > > 1. I tried to tweak the code in the following way: > > - Clang [-> LLVM-IR]: https://llvm.godbolt.org/z/n9rGrs > - [LLVM-IR ->] opt: https://llvm.godbolt.org/z/Uc6h5Y > > And i note that the outputs are: > > Alias sets for function 'main': > > Alias Set Tracker: 2 alias sets for 4 pointer values. > > *AliasSet[0x563faa6c6260, 5] may alias, Mod/Ref Pointers: (i8* %0, LocationSize::precise(4)), (i32* %a, LocationSize::precise(4)), (i8* %1, LocationSize::precise(4)), (i32* %b, LocationSize::precise(4))* > > 1 Unknown instructions: call void @NOALIAS(i8* nonnull %0, i8* nonnull %1) #3 > > *AliasSet[0x563faa6b45e0, 1] must alias, Mod/Ref forwarding to 0x563faa6c6260* > > > ===== Alias Analysis Evaluator Report ====> > 6 Total Alias Queries Performed > > 4 no alias responses (66.6%) > > * 0 may alias responses (0.0%)* > > 0 partial alias responses (0.0%) > > *2 must alias responses (33.3%)* > > > I am trying to interpret the outputs, so if I understand correctly, the > output indicates that we have an alias set of 4 pointers which > "potentially" point to the same memory region, correct? Then is there any > more accurate analysis pass that I could use to somewhat infer that "there > are two must alias sets, each set has two pointers"? Correct me if I was > wrong here.. Using my local opt (version 6.0), I tried to iterate all > feasible alias analysis passes but the results are not changed. > > Also, what is the "must alias, Mod/Ref forwarding to 0x563faa6c6260"? > > And how to interpret that we have "2 must alias responses"? Where does it > come from? And why do we have "0 may alias response"? I would expect to > have at least "4 may alias responses" as well? > > 2. I note that using the latest opt (version 11.0?) gives different > outputs with my local opt (version 6.0). For opt (version 6.0), it reports: > 2 alias sets for 2 pointer values. > > More importantly, can I expect to get generally better alias analysis > results when switching to version 11.0? > > Thank you very much! > > Best, > Shuai > > > > On Thu, Jul 9, 2020 at 6:14 PM Matt P. Dziubinski <matdzb at gmail.com> > wrote: > >> On 7/9/2020 10:15, Shuai Wang via llvm-dev wrote: >> > Hello, >> > >> > I am performing alias analysis toward the following simple code: >> > >> > [...] >> > >> > I checked the generated .ll code, and it shows that within the main >> > function and NOALIAS functions, there is only a "ret" statement, with >> no >> > global or local variables used. Could anyone shed some lights on where >> > the "1 may alias" come from? And is there a way that I can force the >> > alias analysis algorithm to focus only the "main" function? Thank you >> > very much. >> >> Hi! >> >> Here's more information after initializing the variables (assuming the >> intent in the source code was, e.g., to initialize `a` and `b` to `0` >> and the pointers `f1` and `f2` to `NULL`, using aggregate initialization >> for `s`): >> - Clang [-> LLVM-IR]: https://llvm.godbolt.org/z/WT7V3E >> - [LLVM-IR ->] opt: https://llvm.godbolt.org/z/Veswa4 >> >> Alias sets for function 'main': Alias Set Tracker: 1 alias sets for 2 >> pointer values. >> AliasSet[0x55ec7f9a23e0, 3] may alias, Mod/Ref Pointers: (i8* %0, >> LocationSize::precise(4)), (i32* %a, LocationSize::precise(4)) >> >> Note that in the original source code `a`, `b` are >> uninitialized--consequently, attempting to access `s[a].f1` and >> `s[b].f2` is undefined behavior (as we're using automatic storage >> duration objects `a` and `b` while their values are indeterminate): >> https://taas.trust-in-soft.com/tsnippet/t/acff56c8 >> >> Cf. https://cigix.me/c17#6.7.9.p10 ("If an object that has automatic >> storage duration is not initialized explicitly, its value is >> indeterminate.") & https://cigix.me/c17#J.2.p1 >> ("The behavior is undefined in the following circumstances: [...] The >> value of an object with automatic storage duration is used while it is >> indeterminate"). >> >> As such, you can notice that most of the code is going to be optimized >> away between mem2reg and dead argument elimination: >> https://llvm.godbolt.org/z/iEdKE_ >> >> (Similarly, even if `a` and `b` were initialized to `0`, we only wrote >> to `f1` for `s[0]` and `s[1]`, so accessing `s[b].f2` is again using an >> object while it is indeterminate and undefined behavior.) >> >> *** IR Dump After Promote Memory to Register *** >> >> ; the following corresponds to loading `s[a].f1` >> %3 = load i32, i32* %a, align 4, !tbaa !7 >> %idxprom = sext i32 %3 to i64 >> %arrayidx3 = getelementptr inbounds [2 x %struct.MyStruct], [2 x >> %struct.MyStruct]* %s, i64 0, i64 %idxprom >> %f14 = getelementptr inbounds %struct.MyStruct, %struct.MyStruct* >> %arrayidx3, i32 0, i32 0 >> %4 = load i32*, i32** %f14, align 16, !tbaa !2 >> %5 = bitcast i32* %4 to i8* >> >> ; the following corresponds to loading `s[b].f2` >> %6 = load i32, i32* %b, align 4, !tbaa !7 >> %idxprom5 = sext i32 %6 to i64 >> %arrayidx6 = getelementptr inbounds [2 x %struct.MyStruct], [2 x >> %struct.MyStruct]* %s, i64 0, i64 %idxprom5 >> %f2 = getelementptr inbounds %struct.MyStruct, %struct.MyStruct* >> %arrayidx6, i32 0, i32 1 >> %7 = load i32*, i32** %f2, align 8, !tbaa !9 >> %8 = bitcast i32* %7 to i8* >> call void @NOALIAS(i8* %5, i8* %8) >> >> *** IR Dump After Dead Argument Elimination *** >> ; note how the arguments have been rewritten to `undef` in the following: >> call void @NOALIAS(i8* undef, i8* undef) >> >> > And is there a way that I can force the alias analysis algorithm to >> focus only the "main" function? >> >> One way is to make the definition of `NOALIAS` unavailable (as if >> external) by only providing the declaration (as in the above examples). >> >> Best, >> Matt >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200709/e9698365/attachment-0001.html>
Matt P. Dziubinski via llvm-dev
2020-Jul-09 13:58 UTC
[llvm-dev] Understand alias-analysis results
Hi again! Replying in chronological order:> On Thu, Jul 9, 2020 at 6:51 PM Shuai Wang <wangshuai901 at gmail.com > <mailto:wangshuai901 at gmail.com>> wrote: > > Hey Matt, > > That's awesome. Thank you very much for all the information and > clarification! Just a few follow up questions. Could you kindly shed > some lights on it? Thank you! > > 1. I tried to tweak the code in the following way: [...] > > I am trying to interpret the outputs, so if I understand correctly, > the output indicates that we have an alias set of 4 pointers which > "potentially" point to the same memory region, correct? Then is > there any more accurate analysis pass that I could use to somewhat > infer that "there are two must alias sets, each set has two > pointers"? Correct me if I was wrong here.. Using my local opt > (version 6.0), I tried to iterate all feasible alias analysis passes > but the results are not changed.Seems correct, I don't think you'll get more precise results out of the basic-aa pass, note that it has limited context sensitivity: https://llvm.org/docs/AliasAnalysis.html#the-basic-aa-pass Compare the results for `test_simple`, `test_in_array`, and `test_same_underlying_object_different_indices`: https://github.com/llvm/llvm-project/blob/release/10.x/llvm/test/Analysis/BasicAA/struct-geps.ll> Also, what is the "must alias, Mod/Ref forwarding to 0x563faa6c6260"?If alias sets have been merged, you'll get the attached node forwarding to the root node; note the comment for `getForwardedTarget` making a reference to union-find: https://github.com/llvm/llvm-project/blob/release/10.x/llvm/include/llvm/Analysis/AliasSetTracker.h#L281 (with "merge" corresponding to the union-find collapsing for "union", https://en.wikipedia.org/wiki/Disjoint-set_data_structure#Union). You can see how `AliasSet::mergeSetIn` (called, e.g., by `AliasSetTracker::mergeAliasSetsForPointer`) sets up forwarding for `AS.Forward`: https://github.com/llvm/llvm-project/blob/release/10.x/llvm/lib/Analysis/AliasSetTracker.cpp#L51, https://github.com/llvm/llvm-project/blob/release/10.x/llvm/lib/Analysis/AliasSetTracker.cpp#L301 FWIW, you can use a tracer or manually step through `opt` in a debugger to follow the function calls.> And how to interpret that we have "2 must alias responses"? Where > does it come from? And why do we have "0 may alias response"? I > would expect to have at least "4 may alias responses" as well?No, "MayAlias" and "MustAlias" are distinct elements in the lattice, cf. https://llvm.org/docs/AliasAnalysis.html#must-may-or-no There's a good explanation of the alias analysis queries and responses in the following talk (particularly the part starting with "AA Query" around the 22 min. mark): “Pointers, Alias & ModRef Analyses” (2018 EuroLLVM Developers’ Meeting: A. Sbirlea & N. Lopes) https://www.youtube.com/watch?v=r0XVS4Atl3U When you return `AliasResult` from your analysis you choose one: https://llvm.org/doxygen/namespacellvm.html#ae1738272abcf2ac638b97e7dc6360cfd You can see a simple example here (`TARAAResult::alias`): https://blog.tartanllama.xyz/llvm-alias-analysis/> 2. I note that using the latest opt (version 11.0?) gives different > outputs with my local opt (version 6.0). For opt (version 6.0), it > reports: 2 alias sets for 2 pointer values. > > More importantly, can I expect to get generally better alias > analysis results when switching to version 11.0?I'd assume that generally it shouldn't get worse :-)> > Thank you very much!On 7/9/2020 13:24, Shuai Wang wrote: > And another case: > > - Clang [-> LLVM-IR]: https://llvm.godbolt.org/z/SGeJZw > - [LLVM-IR ->] opt: https://llvm.godbolt.org/z/dNi-k2 > > Is there any chance that we can smoothly infer that: > - x and &c are "must alias" > - x and b are "must alias" > > I don't know how to interpret the current results, in particular the > following outputs: > > AliasSet[0x5584ab7e5f30, 1] must alias, Mod/Ref Pointers: (i32** %x, > LocationSize::precise(8)) > AliasSet[0x5584ab7e6020, 1] must alias, Mod Pointers: (i32* %y, > LocationSize::precise(4)) > > It means we have two "must alias" sets, each of which contain only one > pointer? That seems quite confusing to me.. You can add -print-all-alias-modref-info for more detailed information: https://llvm.godbolt.org/z/9njGqx -- you'll notice "MustAlias: i32* %c, i8* %6". Adding `-evaluate-aa-metadata` for `load` and `store` instructions, https://llvm.godbolt.org/z/YaW1Mb, you'll notice "MustAlias: %0 = load i32**, i32*** %a, align 8 <-> store i32** %b, i32*** %a, align 8" However, from your results we can already note: AliasSet[0x5584ab7e5d00, 5] may alias, Mod/Ref Pointers: (i32* %c, LocationSize::precise(4)), (i32** %b, LocationSize::precise(8)), (i32** %0, LocationSize::precise(8)), (i32* %2, LocationSize::precise(4)) Note how in the C source code pointer `b` points to int `c` (`b = &c;`) corresponding to the memory locations (same object in memory when loading from `c` or `*b`). However, pointers `x` and `y` are distinct objects in memory themselves. In general, remember not to confuse pointers with what they point to--here also distinct, since `x` points `b` but `y` points to `c` (I mention this specifically since the desired inference of "x and b are "must alias"" cannot be correct--these are not the same objects in memory). Best, Matt