On Sun, Mar 15, 2015 at 4:34 PM Olivier Sallenave <ol.sall at gmail.com> wrote:> Hi Daniel, > > Thanks for your feedback. I would prefer not to write a new AA. Can't we > directly implement that traversal in BasicAA? >Can I ask why? Outside of the "well, it's another pass", i mean? BasicAA is stateless, so you can't cache, and you really don't want to redo these walks repeatedly (especially when the answer doesn't change unless AA is invalidated). It would be really expensive. Doing this in a separate analysis pass, caching the answer, and producing AA results, seems to me exactly the right thing.> Otherwise, I'll investigate why this i64 was generated in the first place > (but like you, I don't really want to know why :-) >Reid walked over to my desk and told me all the gory details - the clang ABI lowering of structs like these for anything but x86 basically says "oh, this really should be 2 8 byte GPR's, and the way it makes this happen is by using i64's :) If you want to do it at a clang level, the right thing to do is to fixup the ABI lowerings for pointers to keep them pointers in this case. For something simpler, if you wanted, you could do the *exact* opposite of happens now and you'd get better results. This is, pass everything that needs to be 2 8 byte regs as 8 byte pointers, and cast it back to something if it's not really a pointer, using ptrtoint. if it's not really a pointer, we don't care what AA says about it. If it is a pointer, now we don't get bad AA answers, because there is no inttoptr being used like there is now. Of course, i have no idea if this strategy is going to produce correct ABI results on all platforms, it depends on whether they treat pointers as special. Olivier> > 2015-03-13 18:51 GMT-04:00 Daniel Berlin <dberlin at dberlin.org>: > >> >> >> On Fri, Mar 13, 2015 at 2:54 PM Daniel Berlin <dberlin at dberlin.org> >> wrote: >> >>> On Fri, Mar 13, 2015 at 2:39 PM Olivier H Sallenave <ohsallen at us.ibm.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I have the following C loop to vectorize: >>>> >>>> struct box { >>>> double* source; >>>> }; >>>> >>>> void test(double* restrict result, struct box my_struct, int len) >>>> { >>>> for (int i=0 ; i<len; i++) { >>>> result[i] = my_struct.source[i] * my_struct.source[i]; >>>> } >>>> } >>>> >>>> There are two references in the loop, result[i] (restrict) and >>>> my_struct.source[i] (readonly). The compiler should easily figure out that >>>> they do not alias. >>>> >>>> Compiling for x86, the loop alias analysis works just fine: >>>> AST: Alias Set Tracker: 2 alias sets for 2 pointer values. >>>> AliasSet[0x7fd8e2f32290, 1] must alias, No access Pointers: (double* >>>> %arrayidx5, 18446744073709551615) >>>> AliasSet[0x7fd8e2f322e0, 1] must alias, No access Pointers: (double* >>>> %arrayidx, 18446744073709551615) >>>> >>>> Compiling for PPC with -target powerpc64le-ibm-linux-gnu, the two >>>> addresses now alias: >>>> AST: Alias Set Tracker: 1 alias sets for 2 pointer values. >>>> AliasSet[0x7f931bd5bdc0, 2] may alias, No access Pointers: (double* >>>> %arrayidx5, 18446744073709551615), (double* %arrayidx, 18446744073709551615) >>>> >>>> BasicAA is used for both targets by default. The difference is that in >>>> PPC, the IR obtained from Clang takes an i64 as parameter instead of a >>>> double* for my_struct. >>>> >>> >>> I don't even want to know why this would be the case :) >>> >>> >>>> This parameter is then coerced into double* using an inttoptr >>>> instruction. The code in BasicAliasAnalysis.cpp which is triggered for x86 >>>> is the following: >>>> >>>> // Function arguments can't alias with things that are known to be >>>> // unambigously identified at the function level. >>>> if ((isa<Argument>(O1) && isIdentifiedFunctionLocal(O2)) || >>>> (isa<Argument>(O2) && isIdentifiedFunctionLocal(O1))) >>>> return NoAlias; >>>> >>>> isIdentifiedFunctionLocal(V) returns true for a noalias argument (such >>>> as result), but the other address (my_struct) must be a function argument >>>> in order to return NoAlias, which is not the case anymore for PPC (since >>>> my_struct is now the result from an inttoptr instruction). If I understand, >>>> the problem is that we cannot trust the fact that locals do not alias with >>>> restrict parameters (because the compiler could generate some locals which >>>> alias)? >>>> >>> Yes, because pointers *based on* the noalias'd argument are legal >>> aliases. >>> >>> So if you don't know it's an argument or an identified local, it could >>> be based on the restricted pointer, and thus, alias it. >>> >>> >>> If someone has suggestions about this, that would help a lot. >>>> >>> >>> The only way you could prove something in this case would be to walk the >>> chain and prove the value comes directly from an argument with no >>> modification. >>> >> >> Actually, you could do the opposite, too, pretty cheaply. >> >> You could write a new pass or AA. >> It traverses chains in the reverse direction (IE it goes from the >> arguments, and walks down the immediate use chain, marking things as based >> on arguments or not), and makes a lookup table of things it can prove are >> also unmolested identified objects. >> (which would be the result of inttoptr in your case). >> >> You can then use this simple lookup table to answer the >> isIdentifiedObject question better. >> (You'd have to make isIdentifiedObject part of the AA interface, or take >> an optional table, blah blah blah) >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150315/a2177f32/attachment.html>
Olivier Sallenave
2015-Mar-16 02:01 UTC
[LLVMdev] Alias analysis issue with structs on PPC
Reason is that I want this case to be supported by default, without specific flags (should we change the default AA for PPC if we do a separate one?). Fixing at the Clang level might be the right thing to do... But if there are other cases like this, the traversal would definitely help in general to get a proper handling of restrict. Olivier On Sunday, March 15, 2015, Daniel Berlin <dberlin at dberlin.org> wrote:> > > On Sun, Mar 15, 2015 at 4:34 PM Olivier Sallenave <ol.sall at gmail.com > <javascript:_e(%7B%7D,'cvml','ol.sall at gmail.com');>> wrote: > >> Hi Daniel, >> >> Thanks for your feedback. I would prefer not to write a new AA. Can't we >> directly implement that traversal in BasicAA? >> > Can I ask why? > Outside of the "well, it's another pass", i mean? > > BasicAA is stateless, so you can't cache, and you really don't want to > redo these walks repeatedly (especially when the answer doesn't change > unless AA is invalidated). It would be really expensive. > Doing this in a separate analysis pass, caching the answer, and producing > AA results, seems to me exactly the right thing. > > >> Otherwise, I'll investigate why this i64 was generated in the first place >> (but like you, I don't really want to know why :-) >> > > Reid walked over to my desk and told me all the gory details - the clang > ABI lowering of structs like these for anything but x86 basically says > "oh, this really should be 2 8 byte GPR's, and the way it makes this > happen is by using i64's :) > > If you want to do it at a clang level, the right thing to do is to fixup > the ABI lowerings for pointers to keep them pointers in this case. > > For something simpler, if you wanted, you could do the *exact* opposite of > happens now and you'd get better results. > > This is, pass everything that needs to be 2 8 byte regs as 8 byte > pointers, and cast it back to something if it's not really a pointer, using > ptrtoint. > > if it's not really a pointer, we don't care what AA says about it. > If it is a pointer, now we don't get bad AA answers, because there is no > inttoptr being used like there is now. > > Of course, i have no idea if this strategy is going to produce correct ABI > results on all platforms, it depends on whether they treat pointers as > special. > > Olivier >> >> 2015-03-13 18:51 GMT-04:00 Daniel Berlin <dberlin at dberlin.org >> <javascript:_e(%7B%7D,'cvml','dberlin at dberlin.org');>>: >> >>> >>> >>> On Fri, Mar 13, 2015 at 2:54 PM Daniel Berlin <dberlin at dberlin.org >>> <javascript:_e(%7B%7D,'cvml','dberlin at dberlin.org');>> wrote: >>> >>>> On Fri, Mar 13, 2015 at 2:39 PM Olivier H Sallenave < >>>> ohsallen at us.ibm.com >>>> <javascript:_e(%7B%7D,'cvml','ohsallen at us.ibm.com');>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I have the following C loop to vectorize: >>>>> >>>>> struct box { >>>>> double* source; >>>>> }; >>>>> >>>>> void test(double* restrict result, struct box my_struct, int len) >>>>> { >>>>> for (int i=0 ; i<len; i++) { >>>>> result[i] = my_struct.source[i] * my_struct.source[i]; >>>>> } >>>>> } >>>>> >>>>> There are two references in the loop, result[i] (restrict) and >>>>> my_struct.source[i] (readonly). The compiler should easily figure out that >>>>> they do not alias. >>>>> >>>>> Compiling for x86, the loop alias analysis works just fine: >>>>> AST: Alias Set Tracker: 2 alias sets for 2 pointer values. >>>>> AliasSet[0x7fd8e2f32290, 1] must alias, No access Pointers: (double* >>>>> %arrayidx5, 18446744073709551615) >>>>> AliasSet[0x7fd8e2f322e0, 1] must alias, No access Pointers: (double* >>>>> %arrayidx, 18446744073709551615) >>>>> >>>>> Compiling for PPC with -target powerpc64le-ibm-linux-gnu, the two >>>>> addresses now alias: >>>>> AST: Alias Set Tracker: 1 alias sets for 2 pointer values. >>>>> AliasSet[0x7f931bd5bdc0, 2] may alias, No access Pointers: (double* >>>>> %arrayidx5, 18446744073709551615), (double* %arrayidx, 18446744073709551615) >>>>> >>>>> BasicAA is used for both targets by default. The difference is that in >>>>> PPC, the IR obtained from Clang takes an i64 as parameter instead of a >>>>> double* for my_struct. >>>>> >>>> >>>> I don't even want to know why this would be the case :) >>>> >>>> >>>>> This parameter is then coerced into double* using an inttoptr >>>>> instruction. The code in BasicAliasAnalysis.cpp which is triggered for x86 >>>>> is the following: >>>>> >>>>> // Function arguments can't alias with things that are known to be >>>>> // unambigously identified at the function level. >>>>> if ((isa<Argument>(O1) && isIdentifiedFunctionLocal(O2)) || >>>>> (isa<Argument>(O2) && isIdentifiedFunctionLocal(O1))) >>>>> return NoAlias; >>>>> >>>>> isIdentifiedFunctionLocal(V) returns true for a noalias argument (such >>>>> as result), but the other address (my_struct) must be a function argument >>>>> in order to return NoAlias, which is not the case anymore for PPC (since >>>>> my_struct is now the result from an inttoptr instruction). If I understand, >>>>> the problem is that we cannot trust the fact that locals do not alias with >>>>> restrict parameters (because the compiler could generate some locals which >>>>> alias)? >>>>> >>>> Yes, because pointers *based on* the noalias'd argument are legal >>>> aliases. >>>> >>>> So if you don't know it's an argument or an identified local, it could >>>> be based on the restricted pointer, and thus, alias it. >>>> >>>> >>>> If someone has suggestions about this, that would help a lot. >>>>> >>>> >>>> The only way you could prove something in this case would be to walk >>>> the chain and prove the value comes directly from an argument with no >>>> modification. >>>> >>> >>> Actually, you could do the opposite, too, pretty cheaply. >>> >>> You could write a new pass or AA. >>> It traverses chains in the reverse direction (IE it goes from the >>> arguments, and walks down the immediate use chain, marking things as based >>> on arguments or not), and makes a lookup table of things it can prove are >>> also unmolested identified objects. >>> (which would be the result of inttoptr in your case). >>> >>> You can then use this simple lookup table to answer the >>> isIdentifiedObject question better. >>> (You'd have to make isIdentifiedObject part of the AA interface, or take >>> an optional table, blah blah blah) >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu >>> <javascript:_e(%7B%7D,'cvml','LLVMdev at cs.uiuc.edu');> >>> http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150315/001e2549/attachment.html>
On Sun, Mar 15, 2015 at 7:01 PM Olivier Sallenave <ol.sall at gmail.com> wrote:> Reason is that I want this case to be supported by default, without > specific flags (should we change the default AA for PPC if we do a separate > one?).Yeah, just change the default AA to also include this in the stack. It seems like it would be a win in all cases.> > Fixing at the Clang level might be the right thing to do... But if there > are other cases like this, the traversal would definitely help in general > to get a proper handling of restrict. > > You mean noalias ;)(restrict has weird semantics, and doesn't exist at the LLVM level. noalias has sane semantics, and is close to restrict, but better) In any case, this will help more than just your case, so it seems obvious to me that unless it's expensive, it should be on by default. Olivier> > > On Sunday, March 15, 2015, Daniel Berlin <dberlin at dberlin.org> wrote: > >> >> >> On Sun, Mar 15, 2015 at 4:34 PM Olivier Sallenave <ol.sall at gmail.com> >> wrote: >> >>> Hi Daniel, >>> >>> Thanks for your feedback. I would prefer not to write a new AA. Can't we >>> directly implement that traversal in BasicAA? >>> >> Can I ask why? >> Outside of the "well, it's another pass", i mean? >> >> BasicAA is stateless, so you can't cache, and you really don't want to >> redo these walks repeatedly (especially when the answer doesn't change >> unless AA is invalidated). It would be really expensive. >> Doing this in a separate analysis pass, caching the answer, and producing >> AA results, seems to me exactly the right thing. >> >> >>> Otherwise, I'll investigate why this i64 was generated in the first >>> place (but like you, I don't really want to know why :-) >>> >> >> Reid walked over to my desk and told me all the gory details - the clang >> ABI lowering of structs like these for anything but x86 basically says >> "oh, this really should be 2 8 byte GPR's, and the way it makes this >> happen is by using i64's :) >> >> If you want to do it at a clang level, the right thing to do is to fixup >> the ABI lowerings for pointers to keep them pointers in this case. >> >> For something simpler, if you wanted, you could do the *exact* opposite >> of happens now and you'd get better results. >> >> This is, pass everything that needs to be 2 8 byte regs as 8 byte >> pointers, and cast it back to something if it's not really a pointer, using >> ptrtoint. >> >> if it's not really a pointer, we don't care what AA says about it. >> If it is a pointer, now we don't get bad AA answers, because there is no >> inttoptr being used like there is now. >> >> Of course, i have no idea if this strategy is going to produce correct >> ABI results on all platforms, it depends on whether they treat pointers as >> special. >> >> Olivier >>> >>> 2015-03-13 18:51 GMT-04:00 Daniel Berlin <dberlin at dberlin.org>: >>> >>>> >>>> >>>> On Fri, Mar 13, 2015 at 2:54 PM Daniel Berlin <dberlin at dberlin.org> >>>> wrote: >>>> >>>>> On Fri, Mar 13, 2015 at 2:39 PM Olivier H Sallenave < >>>>> ohsallen at us.ibm.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I have the following C loop to vectorize: >>>>>> >>>>>> struct box { >>>>>> double* source; >>>>>> }; >>>>>> >>>>>> void test(double* restrict result, struct box my_struct, int len) >>>>>> { >>>>>> for (int i=0 ; i<len; i++) { >>>>>> result[i] = my_struct.source[i] * my_struct.source[i]; >>>>>> } >>>>>> } >>>>>> >>>>>> There are two references in the loop, result[i] (restrict) and >>>>>> my_struct.source[i] (readonly). The compiler should easily figure out that >>>>>> they do not alias. >>>>>> >>>>>> Compiling for x86, the loop alias analysis works just fine: >>>>>> AST: Alias Set Tracker: 2 alias sets for 2 pointer values. >>>>>> AliasSet[0x7fd8e2f32290, 1] must alias, No access Pointers: >>>>>> (double* %arrayidx5, 18446744073709551615) >>>>>> AliasSet[0x7fd8e2f322e0, 1] must alias, No access Pointers: >>>>>> (double* %arrayidx, 18446744073709551615) >>>>>> >>>>>> Compiling for PPC with -target powerpc64le-ibm-linux-gnu, the two >>>>>> addresses now alias: >>>>>> AST: Alias Set Tracker: 1 alias sets for 2 pointer values. >>>>>> AliasSet[0x7f931bd5bdc0, 2] may alias, No access Pointers: (double* >>>>>> %arrayidx5, 18446744073709551615), (double* %arrayidx, 18446744073709551615) >>>>>> >>>>>> BasicAA is used for both targets by default. The difference is that >>>>>> in PPC, the IR obtained from Clang takes an i64 as parameter instead of a >>>>>> double* for my_struct. >>>>>> >>>>> >>>>> I don't even want to know why this would be the case :) >>>>> >>>>> >>>>>> This parameter is then coerced into double* using an inttoptr >>>>>> instruction. The code in BasicAliasAnalysis.cpp which is triggered for x86 >>>>>> is the following: >>>>>> >>>>>> // Function arguments can't alias with things that are known to >>>>>> be >>>>>> // unambigously identified at the function level. >>>>>> if ((isa<Argument>(O1) && isIdentifiedFunctionLocal(O2)) || >>>>>> (isa<Argument>(O2) && isIdentifiedFunctionLocal(O1))) >>>>>> return NoAlias; >>>>>> >>>>>> isIdentifiedFunctionLocal(V) returns true for a noalias argument >>>>>> (such as result), but the other address (my_struct) must be a function >>>>>> argument in order to return NoAlias, which is not the case anymore for PPC >>>>>> (since my_struct is now the result from an inttoptr instruction). If I >>>>>> understand, the problem is that we cannot trust the fact that locals do not >>>>>> alias with restrict parameters (because the compiler could generate some >>>>>> locals which alias)? >>>>>> >>>>> Yes, because pointers *based on* the noalias'd argument are legal >>>>> aliases. >>>>> >>>>> So if you don't know it's an argument or an identified local, it could >>>>> be based on the restricted pointer, and thus, alias it. >>>>> >>>>> >>>>> If someone has suggestions about this, that would help a lot. >>>>>> >>>>> >>>>> The only way you could prove something in this case would be to walk >>>>> the chain and prove the value comes directly from an argument with no >>>>> modification. >>>>> >>>> >>>> Actually, you could do the opposite, too, pretty cheaply. >>>> >>>> You could write a new pass or AA. >>>> It traverses chains in the reverse direction (IE it goes from the >>>> arguments, and walks down the immediate use chain, marking things as based >>>> on arguments or not), and makes a lookup table of things it can prove are >>>> also unmolested identified objects. >>>> (which would be the result of inttoptr in your case). >>>> >>>> You can then use this simple lookup table to answer the >>>> isIdentifiedObject question better. >>>> (You'd have to make isIdentifiedObject part of the AA interface, or >>>> take an optional table, blah blah blah) >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150316/6634ea18/attachment.html>
----- Original Message -----> From: "Daniel Berlin" <dberlin at dberlin.org> > To: "Olivier Sallenave" <ol.sall at gmail.com> > Cc: llvmdev at cs.uiuc.edu > Sent: Sunday, March 15, 2015 6:48:41 PM > Subject: Re: [LLVMdev] Alias analysis issue with structs on PPC> On Sun, Mar 15, 2015 at 4:34 PM Olivier Sallenave < ol.sall at gmail.com > > wrote:> > Hi Daniel, >> > Thanks for your feedback. I would prefer not to write a new AA. > > Can't > > we directly implement that traversal in BasicAA? > > Can I ask why? > Outside of the "well, it's another pass", i mean?> BasicAA is stateless, so you can't cache, and you really don't want > to redo these walks repeatedly (especially when the answer doesn't > change unless AA is invalidated). It would be really expensive. > Doing this in a separate analysis pass, caching the answer, and > producing AA results, seems to me exactly the right thing.> > Otherwise, I'll investigate why this i64 was generated in the first > > place (but like you, I don't really want to know why :-) > > Reid walked over to my desk and told me all the gory details - the > clang ABI lowering of structs like these for anything but x86 > basically says "oh, this really should be 2 8 byte GPR's, and the > way it makes this happen is by using i64's :)> If you want to do it at a clang level, the right thing to do is to > fixup the ABI lowerings for pointers to keep them pointers in this > case.So this is an artifact of the way that we pass structures, and constructing a general solution at the ABI level might be tricky. I've cc'd Uli, who did most of the recent work here. For the single-element struct case, we could fix this by keeping it a pointer type. The relevant code in Clang is in lib/CodeGen/TargetInfo.cpp (look at PPC64_SVR4_ABIInfo::classifyArgumentType and nearby code). But that does not really address the underlying issue: If I take your example and modify it so that we have: struct box { double* source; double* source2; }; then the parameter is passed as: define void @test(double* noalias nocapture %result, [2 x i64] %my_struct.coerce, i32 signext %len) #0 and is extracted the same way: %my_struct.coerce.fca.0.extract = extractvalue [2 x i64] %my_struct.coerce, 0 %0 = inttoptr i64 %my_struct.coerce.fca.0.extract to double* but, it is also important to realize that the i64 here in the array type is actually chosen to satisfy alignment requirements. If we have this: typedef float __attribute__((ext_vector_type(4))) vt; struct box { double* source; double* source2; vt v; }; then the struct is passed as: define void @test(double* noalias nocapture %result, [2 x i128] %my_struct.coerce, i32 signext %len) and the extraction code looks like: %my_struct.coerce.fca.0.extract = extractvalue [2 x i128] %my_struct.coerce, 0 %my_struct.sroa.0.sroa.0.0.extract.shift = lshr i128 %my_struct.coerce.fca.0.extract, 64 %my_struct.sroa.0.sroa.0.0.extract.trunc = trunc i128 %my_struct.sroa.0.sroa.0.0.extract.shift to i64 %0 = inttoptr i64 %my_struct.sroa.0.sroa.0.0.extract.trunc to double* so just using pointer types instead of i64 will help common cases, but will not address the general issue. Now part of this does some down to using array parameters as a substitute for byval/direct parameters. As I recall, this was done because it allowed a natural partial decomposition between GPRs and stack for structures that straddle the number of available parameter-passing GPRs. If we could accomplish that with regular byval parameters and regular direct parameters, then we'd not need any of this array coercion, and the system, including for the purposes of aliasing analysis, would work as intended. There may be some infrastructure work required in the backend (SelectionDAG builder, etc.) -- Uli, if you know please comment -- but I think moving away from the array coercions might be the right solution, even if that requires some infrastructure enhancements. -Hal> For something simpler, if you wanted, you could do the *exact* > opposite of happens now and you'd get better results.> This is, pass everything that needs to be 2 8 byte regs as 8 byte > pointers, and cast it back to something if it's not really a > pointer, using ptrtoint.> if it's not really a pointer, we don't care what AA says about it. > If it is a pointer, now we don't get bad AA answers, because there is > no inttoptr being used like there is now.> Of course, i have no idea if this strategy is going to produce > correct ABI results on all platforms, it depends on whether they > treat pointers as special.> > Olivier >> > 2015-03-13 18:51 GMT-04:00 Daniel Berlin < dberlin at dberlin.org > : >> > > On Fri, Mar 13, 2015 at 2:54 PM Daniel Berlin < > > > dberlin at dberlin.org > > > > > > > wrote: > > >> > > > On Fri, Mar 13, 2015 at 2:39 PM Olivier H Sallenave < > > > > ohsallen at us.ibm.com > wrote: > > > > > >> > > > > Hi, > > > > > > > > > >> > > > > I have the following C loop to vectorize: > > > > > > > > > >> > > > > struct box { > > > > > > > > > > > > > > > double* source; > > > > > > > > > > > > > > > }; > > > > > > > > > >> > > > > void test(double* restrict result, struct box my_struct, int > > > > > len) > > > > > > > > > > > > > > > { > > > > > > > > > > > > > > > for (int i=0 ; i<len; i++) { > > > > > > > > > > > > > > > result[i] = my_struct.source[i] * my_struct.source[i]; > > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > } > > > > > > > > > >> > > > > There are two references in the loop, result[i] (restrict) > > > > > and > > > > > my_struct.source[i] (readonly). The compiler should easily > > > > > figure > > > > > out that they do not alias. > > > > > > > > > >> > > > > Compiling for x86, the loop alias analysis works just fine: > > > > > > > > > > > > > > > AST: Alias Set Tracker: 2 alias sets for 2 pointer values. > > > > > > > > > > > > > > > AliasSet[0x7fd8e2f32290, 1] must alias, No access Pointers: > > > > > (double* > > > > > %arrayidx5, 18446744073709551615) > > > > > > > > > > > > > > > AliasSet[0x7fd8e2f322e0, 1] must alias, No access Pointers: > > > > > (double* > > > > > %arrayidx, 18446744073709551615) > > > > > > > > > >> > > > > Compilin g for PPC with -target powerpc64le-ibm-linux-gnu, > > > > > the > > > > > two > > > > > addresses now alias: > > > > > > > > > > > > > > > AST: Alias Set Tracker: 1 alias sets for 2 pointer values. > > > > > > > > > > > > > > > AliasSet[0x7f931bd5bdc0, 2] may alias, No access Pointers: > > > > > (double* > > > > > %arrayidx5, 18446744073709551615), (double* %arrayidx, > > > > > 18446744073709551615) > > > > > > > > > >> > > > > BasicAA is used for both targets by default. The difference > > > > > is > > > > > that > > > > > in PPC, the IR obtained from Clang takes an i64 as parameter > > > > > instead > > > > > of a double* for my_struct. > > > > > > > > > > > > > > I don't even want to know why this would be the case :) > > > > > >> > > > > This parameter is then coerced into double* using an inttoptr > > > > > instruction. The code in BasicAliasAnalysis.cpp which is > > > > > triggered > > > > > for x86 is the following: > > > > > > > > > >> > > > > // Function arguments can't alias with things that are known > > > > > to > > > > > be > > > > > > > > > > > > > > > // unambigously identified at the function level. > > > > > > > > > > > > > > > if (( isa<Argument> ( O1 ) && isId entifiedFunctionLocal ( O2 > > > > > )) > > > > > || > > > > > > > > > > > > > > > ( isa<Argument> ( O2 ) && isIdenti fiedFunctionLocal ( O1 ))) > > > > > > > > > > > > > > > return NoAlias ; > > > > > > > > > >> > > > > isIdentifiedFunctionLocal(V) returns true for a noalias > > > > > argument > > > > > (such as result), but the other address (my_struct) must be a > > > > > function argument in order to return NoAlias, which is not > > > > > the > > > > > case > > > > > anymore for PPC (since my_struct is now the result from an > > > > > inttoptr > > > > > instruction). If I understand, the problem is that we cannot > > > > > trust > > > > > the fact that locals do not alias with restrict parameters > > > > > (because > > > > > the compiler could generate some locals which alias)? > > > > > > > > > > > > > > Yes, because pointers *based on* the noalias'd argument are > > > > legal > > > > aliases. > > > > > >> > > > So if you don't know it's an argument or an identified local, > > > > it > > > > could be based on the restricted pointer, and thus, alias it. > > > > > >> > > > > If someone has suggestions about this, that would help a lot. > > > > > > > > > > > > > > The only way you could prove something in this case would be to > > > > walk > > > > the chain and prove the value comes directly from an argument > > > > with > > > > no modification. > > > > > > > > > Actually, you could do the opposite, too, pretty cheaply. > > >> > > You could write a new pass or AA. > > > > > > It traverses chains in the reverse direction (IE it goes from the > > > arguments, and walks down the immediate use chain, marking things > > > as > > > based on arguments or not), and makes a lookup table of things it > > > can prove are also unmolested identified objects. > > > > > > (which would be the result of inttoptr in your case). > > >> > > You can then use this simple lookup table to answer the > > > isIdentifiedObject question better. > > > > > > (You'd have to make isIdentifiedObject part of the AA interface, > > > or > > > take an optional table, blah blah blah) > > >> > > _______________________________________________ > > > > > > LLVM Developers mailing list > > > > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > >> _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150316/4e84e2c7/attachment.html>
On Mon, Mar 16, 2015 at 9:56 AM, Hal Finkel <hfinkel at anl.gov> wrote:> > so just using pointer types instead of i64 will help common cases, but > will not address the general issue. Now part of this does some down to > using array parameters as a substitute for byval/direct parameters. As I > recall, this was done because it allowed a natural partial decomposition > between GPRs and stack for structures that straddle the number of available > parameter-passing GPRs. If we could accomplish that with regular byval > parameters and regular direct parameters, then we'd not need any of this > array coercion, and the system, including for the purposes of aliasing > analysis, would work as intended. There may be some infrastructure work > required in the backend (SelectionDAG builder, etc.) -- Uli, if you know > please comment -- but I think moving away from the array coercions might be > the right solution, even if that requires some infrastructure enhancements. >So, every backend interprets 'byval' differently, but it usually means "pass this whole thing in stack memory". It also requires extra copies through memory at the IR level, so I don't think we should be moving towards this construct. If you want to pass things in registers, it's usually best to use SSA values. Even though the extra 'extractvalue' instructions look expensive in the IR, they lower down to simple virtual register copies in the selection dag. The shift and trunc, on the other hand, don't model the machine code at all, and it would be good if we could eliminate them. I wonder if we could solve this parameter alignment problem via the 'align' parameter attribute. Unfortunately, I think for pointer types it's already overloaded to describe the alignment of the pointee and not the argument itself. In fact, I think you did this Hal. :) I think, in the long term, we should probably use a direct FCA. I believe this is what ARM does. It's also nice to flatten the FCA if we can detect that we're in a simple case where no interesting alignment is required. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150316/861bc32b/attachment.html>
Hal Finkel <hfinkel at anl.gov> wrote on 16.03.2015 17:56:20:> If you want to do it at a clang level, the right thing to do is to > fixup the ABI lowerings for pointers to keep them pointers in this case. > So this is an artifact of the way that we pass structures, and > constructing a general solution at the ABI level might be tricky. > I've cc'd Uli, who did most of the recent work here. > > For the single-element struct case, we could fix this by keeping it > a pointer type. The relevant code in Clang is in lib/CodeGen/ > TargetInfo.cpp (look at PPC64_SVR4_ABIInfo::classifyArgumentType and > nearby code). But that does not really address the underlying issue: > > If I take your example and modify it so that we have: > > struct box { > double* source; > double* source2; > }; > > then the parameter is passed as: > > define void @test(double* noalias nocapture %result, [2 x i64] % > my_struct.coerce, i32 signext %len) #0 > > and is extracted the same way: > > %my_struct.coerce.fca.0.extract = extractvalue [2 x i64] % > my_struct.coerce, 0 > %0 = inttoptr i64 %my_struct.coerce.fca.0.extract to double* > > but, it is also important to realize that the i64 here in the array > type is actually chosen to satisfy alignment requirements. If we havethis:> > typedef float __attribute__((ext_vector_type(4))) vt; > struct box { > double* source; > double* source2; > vt v; > }; > > then the struct is passed as: > > define void @test(double* noalias nocapture %result, [2 x i128] % > my_struct.coerce, i32 signext %len) > > and the extraction code looks like: > > %my_struct.coerce.fca.0.extract = extractvalue [2 x i128] % > my_struct.coerce, 0 > %my_struct.sroa.0.sroa.0.0.extract.shift = lshr i128 % > my_struct.coerce.fca.0.extract, 64 > %my_struct.sroa.0.sroa.0.0.extract.trunc = trunc i128 % > my_struct.sroa.0.sroa.0.0.extract.shift to i64 > %0 = inttoptr i64 %my_struct.sroa.0.sroa.0.0.extract.trunc to double* > > so just using pointer types instead of i64 will help common cases, > but will not address the general issue. Now part of this does some > down to using array parameters as a substitute for byval/direct > parameters. As I recall, this was done because it allowed a natural > partial decomposition between GPRs and stack for structures that > straddle the number of available parameter-passing GPRs. If we could > accomplish that with regular byval parameters and regular direct > parameters, then we'd not need any of this array coercion, and the > system, including for the purposes of aliasing analysis, would work > as intended. There may be some infrastructure work required in the > backend (SelectionDAG builder, etc.) -- Uli, if you know please > comment -- but I think moving away from the array coercions might be > the right solution, even if that requires some infrastructureenhancements. I still think "byval" is the wrong approach here. Using "byval", the LLVM target-independent codegen layers assume the ABI requires actually passing a *pointer* to the argument -- but that's not true on PowerPC64, no pointer is in fact being passed. This means that the back-end, in those cases where we use byval, has to fake up an address when common code asks for the value of the argument. This is not too bad if the argument was in fact actually passed (fully) on the stack, but if it wasn't, we have to write the argument (partially) to the stack, and possibly even allocate stack space, just to satisfy common code ... Now, passing such arguments as "direct" might make more sense. But with the current infrastructure, this doesn't really work either. For one, there is currently no place to attach alignment information to a "direct" argument, and the back-end is not able in all cases to reconstruct the required on-stack alignment from the LLVM type info. In addition, passing a struct type "direct" causes clang codegen common code in many cases to "flatten" the argument, i.e. pass the struct elements as separate arguments on the LLVM IR level. This may in some cases be OK since it results in the same binary ABI; but in other cases it is definitively wrong. Still, if we need to move away from coerced array types, fixing the infrastructure such that "direct" can be used would be my prefered solution. If I understood the original email thread correctly, the current situation does not lead to incorrect code generation, just potentially suboptimal code generation? In this case, I guess an incremental solution could be employed; we could start with avoiding the array coercion in those cases where using the struct type as direct type is already safe even with the current infrastructure, and then expand the set of types where this is true by successively improving the infrastructure. Bye, Ulrich