Hal Finkel <hfinkel at anl.gov> wrote on 16.03.2015 17:56:20:> If you want to do it at a clang level, the right thing to do is to > fixup the ABI lowerings for pointers to keep them pointers in this case. > So this is an artifact of the way that we pass structures, and > constructing a general solution at the ABI level might be tricky. > I've cc'd Uli, who did most of the recent work here. > > For the single-element struct case, we could fix this by keeping it > a pointer type. The relevant code in Clang is in lib/CodeGen/ > TargetInfo.cpp (look at PPC64_SVR4_ABIInfo::classifyArgumentType and > nearby code). But that does not really address the underlying issue: > > If I take your example and modify it so that we have: > > struct box { > double* source; > double* source2; > }; > > then the parameter is passed as: > > define void @test(double* noalias nocapture %result, [2 x i64] % > my_struct.coerce, i32 signext %len) #0 > > and is extracted the same way: > > %my_struct.coerce.fca.0.extract = extractvalue [2 x i64] % > my_struct.coerce, 0 > %0 = inttoptr i64 %my_struct.coerce.fca.0.extract to double* > > but, it is also important to realize that the i64 here in the array > type is actually chosen to satisfy alignment requirements. If we havethis:> > typedef float __attribute__((ext_vector_type(4))) vt; > struct box { > double* source; > double* source2; > vt v; > }; > > then the struct is passed as: > > define void @test(double* noalias nocapture %result, [2 x i128] % > my_struct.coerce, i32 signext %len) > > and the extraction code looks like: > > %my_struct.coerce.fca.0.extract = extractvalue [2 x i128] % > my_struct.coerce, 0 > %my_struct.sroa.0.sroa.0.0.extract.shift = lshr i128 % > my_struct.coerce.fca.0.extract, 64 > %my_struct.sroa.0.sroa.0.0.extract.trunc = trunc i128 % > my_struct.sroa.0.sroa.0.0.extract.shift to i64 > %0 = inttoptr i64 %my_struct.sroa.0.sroa.0.0.extract.trunc to double* > > so just using pointer types instead of i64 will help common cases, > but will not address the general issue. Now part of this does some > down to using array parameters as a substitute for byval/direct > parameters. As I recall, this was done because it allowed a natural > partial decomposition between GPRs and stack for structures that > straddle the number of available parameter-passing GPRs. If we could > accomplish that with regular byval parameters and regular direct > parameters, then we'd not need any of this array coercion, and the > system, including for the purposes of aliasing analysis, would work > as intended. There may be some infrastructure work required in the > backend (SelectionDAG builder, etc.) -- Uli, if you know please > comment -- but I think moving away from the array coercions might be > the right solution, even if that requires some infrastructureenhancements. I still think "byval" is the wrong approach here. Using "byval", the LLVM target-independent codegen layers assume the ABI requires actually passing a *pointer* to the argument -- but that's not true on PowerPC64, no pointer is in fact being passed. This means that the back-end, in those cases where we use byval, has to fake up an address when common code asks for the value of the argument. This is not too bad if the argument was in fact actually passed (fully) on the stack, but if it wasn't, we have to write the argument (partially) to the stack, and possibly even allocate stack space, just to satisfy common code ... Now, passing such arguments as "direct" might make more sense. But with the current infrastructure, this doesn't really work either. For one, there is currently no place to attach alignment information to a "direct" argument, and the back-end is not able in all cases to reconstruct the required on-stack alignment from the LLVM type info. In addition, passing a struct type "direct" causes clang codegen common code in many cases to "flatten" the argument, i.e. pass the struct elements as separate arguments on the LLVM IR level. This may in some cases be OK since it results in the same binary ABI; but in other cases it is definitively wrong. Still, if we need to move away from coerced array types, fixing the infrastructure such that "direct" can be used would be my prefered solution. If I understood the original email thread correctly, the current situation does not lead to incorrect code generation, just potentially suboptimal code generation? In this case, I guess an incremental solution could be employed; we could start with avoiding the array coercion in those cases where using the struct type as direct type is already safe even with the current infrastructure, and then expand the set of types where this is true by successively improving the infrastructure. Bye, Ulrich
----- Original Message -----> From: "Ulrich Weigand" <Ulrich.Weigand at de.ibm.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "Daniel Berlin" <dberlin at dberlin.org>, llvmdev at cs.uiuc.edu, "Olivier Sallenave" <ol.sall at gmail.com> > Sent: Tuesday, March 17, 2015 4:56:48 AM > Subject: Re: [LLVMdev] Alias analysis issue with structs on PPC > > Hal Finkel <hfinkel at anl.gov> wrote on 16.03.2015 17:56:20: > > > If you want to do it at a clang level, the right thing to do is to > > fixup the ABI lowerings for pointers to keep them pointers in this > > case. > > So this is an artifact of the way that we pass structures, and > > constructing a general solution at the ABI level might be tricky. > > I've cc'd Uli, who did most of the recent work here. > > > > For the single-element struct case, we could fix this by keeping it > > a pointer type. The relevant code in Clang is in lib/CodeGen/ > > TargetInfo.cpp (look at PPC64_SVR4_ABIInfo::classifyArgumentType > > and > > nearby code). But that does not really address the underlying > > issue: > > > > If I take your example and modify it so that we have: > > > > struct box { > > double* source; > > double* source2; > > }; > > > > then the parameter is passed as: > > > > define void @test(double* noalias nocapture %result, [2 x i64] % > > my_struct.coerce, i32 signext %len) #0 > > > > and is extracted the same way: > > > > %my_struct.coerce.fca.0.extract = extractvalue [2 x i64] % > > my_struct.coerce, 0 > > %0 = inttoptr i64 %my_struct.coerce.fca.0.extract to double* > > > > but, it is also important to realize that the i64 here in the array > > type is actually chosen to satisfy alignment requirements. If we > > have > this: > > > > typedef float __attribute__((ext_vector_type(4))) vt; > > struct box { > > double* source; > > double* source2; > > vt v; > > }; > > > > then the struct is passed as: > > > > define void @test(double* noalias nocapture %result, [2 x i128] % > > my_struct.coerce, i32 signext %len) > > > > and the extraction code looks like: > > > > %my_struct.coerce.fca.0.extract = extractvalue [2 x i128] % > > my_struct.coerce, 0 > > %my_struct.sroa.0.sroa.0.0.extract.shift = lshr i128 % > > my_struct.coerce.fca.0.extract, 64 > > %my_struct.sroa.0.sroa.0.0.extract.trunc = trunc i128 % > > my_struct.sroa.0.sroa.0.0.extract.shift to i64 > > %0 = inttoptr i64 %my_struct.sroa.0.sroa.0.0.extract.trunc to > > double* > > > > so just using pointer types instead of i64 will help common cases, > > but will not address the general issue. Now part of this does some > > down to using array parameters as a substitute for byval/direct > > parameters. As I recall, this was done because it allowed a natural > > partial decomposition between GPRs and stack for structures that > > straddle the number of available parameter-passing GPRs. If we > > could > > accomplish that with regular byval parameters and regular direct > > parameters, then we'd not need any of this array coercion, and the > > system, including for the purposes of aliasing analysis, would work > > as intended. There may be some infrastructure work required in the > > backend (SelectionDAG builder, etc.) -- Uli, if you know please > > comment -- but I think moving away from the array coercions might > > be > > the right solution, even if that requires some infrastructure > enhancements. > > I still think "byval" is the wrong approach here. Using "byval", > the LLVM target-independent codegen layers assume the ABI requires > actually passing a *pointer* to the argument -- but that's not true > on PowerPC64, no pointer is in fact being passed. > > This means that the back-end, in those cases where we use byval, > has to fake up an address when common code asks for the value of > the argument. This is not too bad if the argument was in fact > actually passed (fully) on the stack, but if it wasn't, we have > to write the argument (partially) to the stack, and possibly > even allocate stack space, just to satisfy common code ... > > Now, passing such arguments as "direct" might make more sense. But > with the current infrastructure, this doesn't really work either. > For one, there is currently no place to attach alignment information > to a "direct" argument, and the back-end is not able in all cases to > reconstruct the required on-stack alignment from the LLVM type info. > In addition, passing a struct type "direct" causes clang codegen > common code in many cases to "flatten" the argument, i.e. pass the > struct elements as separate arguments on the LLVM IR level. This > may in some cases be OK since it results in the same binary ABI; > but in other cases it is definitively wrong. > > Still, if we need to move away from coerced array types, fixing > the infrastructure such that "direct" can be used would be my > prefered solution. If I understood the original email thread > correctly, the current situation does not lead to incorrect code > generation, just potentially suboptimal code generation? In this > case, I guess an incremental solution could be employed; we could > start with avoiding the array coercion in those cases where using > the struct type as direct type is already safe even with the > current infrastructure, and then expand the set of types where > this is true by successively improving the infrastructure.That's correct, the code is correct, but potentially suboptimal. I agree with your assessment, and an incremental approach certainly makes sense. Regarding the general case, which requires infrastructure enhancement, we just need to decide what improvement is needed. As I mentioned in the e-mail to Reid, we could certainly attach an 'align' attribute to a 'direct' struct argument, and update the infrastructure so that it means the right thing. I did a quick experiment, and changed the IR from the previous example so that we have: %struct.box3 = type { double*, double*, <4 x float> } and the function taking the parameter like this: define void @test3(double* noalias nocapture %result, %struct.box3 %my_struct, i32 %len) and then extracting the first member like this: %0 = extractvalue %struct.box3 %my_struct, 0 and I ran it though the standard pipeline at -O3, and nothing changed that (so we, at least, don't always break up these FCA arguments into separate parameters). What I mean above is that it might make sense to change things so that we're allowed to use this: define void @test3(double* noalias nocapture %result, %struct.box3 %my_struct align 16, i32 %len) and then I imagine that the SDAGBuilder needs to keep track of which parameter values being lowered are part of a structure (in the same way as it keeps track of this for arrays currently), if it does not do this already. Thanks again, Hal> > Bye, > Ulrich > >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
----- Original Message -----> From: "Hal Finkel" <hfinkel at anl.gov> > To: "Ulrich Weigand" <Ulrich.Weigand at de.ibm.com> > Cc: llvmdev at cs.uiuc.edu > Sent: Tuesday, March 17, 2015 6:55:11 AM > Subject: Re: [LLVMdev] Alias analysis issue with structs on PPC > > ----- Original Message ----- > > From: "Ulrich Weigand" <Ulrich.Weigand at de.ibm.com> > > To: "Hal Finkel" <hfinkel at anl.gov> > > Cc: "Daniel Berlin" <dberlin at dberlin.org>, llvmdev at cs.uiuc.edu, > > "Olivier Sallenave" <ol.sall at gmail.com> > > Sent: Tuesday, March 17, 2015 4:56:48 AM > > Subject: Re: [LLVMdev] Alias analysis issue with structs on PPC > > > > Hal Finkel <hfinkel at anl.gov> wrote on 16.03.2015 17:56:20: > > > > > If you want to do it at a clang level, the right thing to do is > > > to > > > fixup the ABI lowerings for pointers to keep them pointers in > > > this > > > case. > > > So this is an artifact of the way that we pass structures, and > > > constructing a general solution at the ABI level might be tricky. > > > I've cc'd Uli, who did most of the recent work here. > > > > > > For the single-element struct case, we could fix this by keeping > > > it > > > a pointer type. The relevant code in Clang is in lib/CodeGen/ > > > TargetInfo.cpp (look at PPC64_SVR4_ABIInfo::classifyArgumentType > > > and > > > nearby code). But that does not really address the underlying > > > issue: > > > > > > If I take your example and modify it so that we have: > > > > > > struct box { > > > double* source; > > > double* source2; > > > }; > > > > > > then the parameter is passed as: > > > > > > define void @test(double* noalias nocapture %result, [2 x i64] % > > > my_struct.coerce, i32 signext %len) #0 > > > > > > and is extracted the same way: > > > > > > %my_struct.coerce.fca.0.extract = extractvalue [2 x i64] % > > > my_struct.coerce, 0 > > > %0 = inttoptr i64 %my_struct.coerce.fca.0.extract to double* > > > > > > but, it is also important to realize that the i64 here in the > > > array > > > type is actually chosen to satisfy alignment requirements. If we > > > have > > this: > > > > > > typedef float __attribute__((ext_vector_type(4))) vt; > > > struct box { > > > double* source; > > > double* source2; > > > vt v; > > > }; > > > > > > then the struct is passed as: > > > > > > define void @test(double* noalias nocapture %result, [2 x i128] % > > > my_struct.coerce, i32 signext %len) > > > > > > and the extraction code looks like: > > > > > > %my_struct.coerce.fca.0.extract = extractvalue [2 x i128] % > > > my_struct.coerce, 0 > > > %my_struct.sroa.0.sroa.0.0.extract.shift = lshr i128 % > > > my_struct.coerce.fca.0.extract, 64 > > > %my_struct.sroa.0.sroa.0.0.extract.trunc = trunc i128 % > > > my_struct.sroa.0.sroa.0.0.extract.shift to i64 > > > %0 = inttoptr i64 %my_struct.sroa.0.sroa.0.0.extract.trunc to > > > double* > > > > > > so just using pointer types instead of i64 will help common > > > cases, > > > but will not address the general issue. Now part of this does > > > some > > > down to using array parameters as a substitute for byval/direct > > > parameters. As I recall, this was done because it allowed a > > > natural > > > partial decomposition between GPRs and stack for structures that > > > straddle the number of available parameter-passing GPRs. If we > > > could > > > accomplish that with regular byval parameters and regular direct > > > parameters, then we'd not need any of this array coercion, and > > > the > > > system, including for the purposes of aliasing analysis, would > > > work > > > as intended. There may be some infrastructure work required in > > > the > > > backend (SelectionDAG builder, etc.) -- Uli, if you know please > > > comment -- but I think moving away from the array coercions might > > > be > > > the right solution, even if that requires some infrastructure > > enhancements. > > > > I still think "byval" is the wrong approach here. Using "byval", > > the LLVM target-independent codegen layers assume the ABI requires > > actually passing a *pointer* to the argument -- but that's not true > > on PowerPC64, no pointer is in fact being passed. > > > > This means that the back-end, in those cases where we use byval, > > has to fake up an address when common code asks for the value of > > the argument. This is not too bad if the argument was in fact > > actually passed (fully) on the stack, but if it wasn't, we have > > to write the argument (partially) to the stack, and possibly > > even allocate stack space, just to satisfy common code ... > > > > Now, passing such arguments as "direct" might make more sense. But > > with the current infrastructure, this doesn't really work either. > > For one, there is currently no place to attach alignment > > information > > to a "direct" argument, and the back-end is not able in all cases > > to > > reconstruct the required on-stack alignment from the LLVM type > > info. > > In addition, passing a struct type "direct" causes clang codegen > > common code in many cases to "flatten" the argument, i.e. pass the > > struct elements as separate arguments on the LLVM IR level. This > > may in some cases be OK since it results in the same binary ABI; > > but in other cases it is definitively wrong. > > > > Still, if we need to move away from coerced array types, fixing > > the infrastructure such that "direct" can be used would be my > > prefered solution. If I understood the original email thread > > correctly, the current situation does not lead to incorrect code > > generation, just potentially suboptimal code generation? In this > > case, I guess an incremental solution could be employed; we could > > start with avoiding the array coercion in those cases where using > > the struct type as direct type is already safe even with the > > current infrastructure, and then expand the set of types where > > this is true by successively improving the infrastructure. > > That's correct, the code is correct, but potentially suboptimal. I > agree with your assessment, and an incremental approach certainly > makes sense. Regarding the general case, which requires > infrastructure enhancement, we just need to decide what improvement > is needed. As I mentioned in the e-mail to Reid, we could certainly > attach an 'align' attribute to a 'direct' struct argument, and > update the infrastructure so that it means the right thing. > > I did a quick experiment, and changed the IR from the previous > example so that we have: > > %struct.box3 = type { double*, double*, <4 x float> } > > and the function taking the parameter like this: > > define void @test3(double* noalias nocapture %result, %struct.box3 > %my_struct, i32 %len) > > and then extracting the first member like this: > > %0 = extractvalue %struct.box3 %my_struct, 0 > > and I ran it though the standard pipeline at -O3, and nothing changed > that (so we, at least, don't always break up these FCA arguments > into separate parameters). What I mean above is that it might make > sense to change things so that we're allowed to use this: > > define void @test3(double* noalias nocapture %result, %struct.box3 > %my_struct align 16, i32 %len)Minor correction: The syntax should be: define void @test3(double* noalias nocapture %result, %struct.box3 align 16 %my_struct, i32 %len) and this is currently allowed by the verifier, but I think that it currently does not mean anything (the alignment is ignored because the argument does not have pointer type). -Hal> > and then I imagine that the SDAGBuilder needs to keep track of which > parameter values being lowered are part of a structure (in the same > way as it keeps track of this for arrays currently), if it does not > do this already. > > Thanks again, > Hal > > > > > Bye, > > Ulrich > > > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory