Can you explain please why it works for this version of the function: double f(double *__restrict__ x, double *__restrict__ y, double *__restrict__ z); What is different here? There are stores here as well. Brent On Wed, Jan 25, 2012 at 12:34 AM, Roel Jordans <r.jordans at tue.nl> wrote:> Hi Brent, > > I think this is a problem in the easy-cse transform. In this transform load > operations can be replaced by their subexpression, in this case the > propagated constant, based on the value of the 'CurrentGeneration' of memory > writes. This implies that any store operation invalidates the knowledge > about previously stored subexpressions. > > In general, this is a safe assumption but in this case it is missing quite > some optimization potential. > > The effect of this can be shown by moving the line %6 one up, to before the > previous store operation. This doesn't change the program behaviour but does > influence the optimization. > > More info on this is in lib/Transforms/Scalar/EarlyCSE.cpp (line 415 is a > good start) > > I don't have time to improve this at this moment so I'll leave that to you > (or anyone else that feels inspired). > > Cheers, > Roel > > > On 01/24/2012 03:59 PM, Brent Walker wrote: >> >> Hi Roel, >> the code you list below is precisely what I expect to get (of course >> the stores must happen but the constant folding should happen as >> well). >> >> It all looks very strange. LLVM is behaving as if the __restrict__ >> keyword was not used at all. Even more strange is the fact that for >> this function: >> >> double f(double *__restrict__ x, double *__restrict__ y, double >> *__restrict__ z) >> { >> *x = 1.0; >> *y = *x + 2; >> *z = *x + 3; >> >> return *x + *y + *z; >> } >> >> everything works as expected: >> >> define double @_Z1fPdS_S_(double* noalias nocapture %x, double* >> noalias nocapture %y, double* noalias nocapture %z) nounwind uwtable { >> store double 1.000000e+00, double* %x, align 8, !tbaa !0 >> store double 3.000000e+00, double* %y, align 8, !tbaa !0 >> store double 4.000000e+00, double* %z, align 8, !tbaa !0 >> ret double 8.000000e+00 >> } >> >> !0 = metadata !{metadata !"double", metadata !1} >> !1 = metadata !{metadata !"omnipotent char", metadata !2} >> !2 = metadata !{metadata !"Simple C/C++ TBAA", null} >> >> So I can't figure out what the mechanism is for telling llvm that two >> pointers that are local variables (and not input arguments) to a >> function do not alias each other. I hope one of the developers can >> shed some light on this. >> >> Thank you in advance for any help, >> Brent >> >> >> On Tue, Jan 24, 2012 at 9:01 PM, Roel Jordans<r.jordans at tue.nl> wrote: >>> >>> Hi Brent, >>> >>> Looking at your code I can see at least one reason why some of the store >>> operations remain in the output since you are (through x, y, and z) >>> writing in memory which exists outside of your function (p). >>> >>> Constant propagation also seems to work in the first few lines, *y = *x >>> +1 (%3) is stored directly. >>> >>> The strange thing to me is that the same doesn't happen for *z = *x + 2. >>> Here *x is loaded again and the addition is still performed... >>> >>> From this point on, constant propagation seems to stop working >>> completely. Looking at the IR I would have expected something like the >>> following: >>> >>> define double @f(double** nocapture %p) nounwind uwtable { >>> %1 = load double** %p, align 8, !tbaa !0 >>> %2 = getelementptr inbounds double** %p, i64 1 >>> %3 = load double** %2, align 8, !tbaa !0 >>> %4 = getelementptr inbounds double** %p, i64 2 >>> %5 = load double** %4, align 8, !tbaa !0 >>> store double 1.000000e+00, double* %1, align 8, !tbaa !3 >>> store double 3.000000e+00, double* %3, align 8, !tbaa !3 >>> store double 4.000000e+00, double* %5, align 8, !tbaa !3 >>> ret double 8.000000e+00 >>> } >>> >>> !0 = metadata !{metadata !"any pointer", metadata !1} >>> !1 = metadata !{metadata !"omnipotent char", metadata !2} >>> !2 = metadata !{metadata !"Simple C/C++ TBAA", null} >>> !3 = metadata !{metadata !"double", metadata !1} >>> >>> I am afraid I can't really help you in telling what went wrong here and >>> caused the missing optimizations, but I agree with you that the result >>> in not what I would have expected. >>> >>> Cheers, >>> Roel >>> On 01/23/2012 03:31 AM, Brent Walker wrote: >>>> >>>> Hi LLVMers, >>>> I would like to ask a question regarding aliasing. Suppose I have the >>>> following program: >>>> >>>> double f(double** p ) >>>> { >>>> double a,b,c; >>>> double * x =&a; >>>> double * y =&b; >>>> double * z =&c; >>>> >>>> *x = 1; >>>> *y = *x + 2; >>>> *z = *x + 3; >>>> >>>> return *x+*y+*z; >>>> } >>>> >>>> LLVM can tell that the three pointers do not alias each other so can >>>> perform the constant folding at compile time. >>>> >>>> define double @f(double** nocapture %p) nounwind uwtable readnone { >>>> ret double 8.000000e+00 >>>> } >>>> >>>> Now consider the function below. I know (in my particluar case) that >>>> the pointers in the p array do not alias each other. I tried to >>>> communicate this information to llvm/clang via the __restrict__ >>>> qualifier but it does not seem to have an effect. Can you please >>>> suggest what is wrong. How can I achieve what I want? >>>> >>>> double f(double** p ) >>>> { >>>> double *__restrict__ x = p[0]; >>>> double *__restrict__ y = p[1]; >>>> double *__restrict__ z = p[2]; >>>> >>>> *x = 1; >>>> *y = *x + 2; >>>> *z = *x + 3; >>>> >>>> return *x+*y+*z; >>>> } >>>> >>>> define double @f(double** nocapture %p) nounwind uwtable { >>>> %1 = load double** %p, align 8, !tbaa !0 >>>> %2 = getelementptr inbounds double** %p, i64 1 >>>> %3 = load double** %2, align 8, !tbaa !0 >>>> %4 = getelementptr inbounds double** %p, i64 2 >>>> %5 = load double** %4, align 8, !tbaa !0 >>>> store double 1.000000e+00, double* %1, align 8, !tbaa !3 >>>> store double 3.000000e+00, double* %3, align 8, !tbaa !3 >>>> %6 = load double* %1, align 8, !tbaa !3 >>>> %7 = fadd double %6, 3.000000e+00 >>>> store double %7, double* %5, align 8, !tbaa !3 >>>> %8 = load double* %1, align 8, !tbaa !3 >>>> %9 = load double* %3, align 8, !tbaa !3 >>>> %10 = fadd double %8, %9 >>>> %11 = fadd double %10, %7 >>>> ret double %11 >>>> } >>>> >>>> !0 = metadata !{metadata !"any pointer", metadata !1} >>>> !1 = metadata !{metadata !"omnipotent char", metadata !2} >>>> !2 = metadata !{metadata !"Simple C/C++ TBAA", null} >>>> !3 = metadata !{metadata !"double", metadata !1} >>>> >>>> Thank you for any help. >>>> >>>> Brent >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
I have no clue, I didn't have time to look into that example yet. How does the IR (before optimization) differ from the other version? Roel On 01/24/2012 04:45 PM, Brent Walker wrote:> Can you explain please why it works for this version of the function: > > double f(double *__restrict__ x, double *__restrict__ y, double > *__restrict__ z); > > What is different here? There are stores here as well. > > Brent > > > On Wed, Jan 25, 2012 at 12:34 AM, Roel Jordans<r.jordans at tue.nl> wrote: >> Hi Brent, >> >> I think this is a problem in the easy-cse transform. In this transform load >> operations can be replaced by their subexpression, in this case the >> propagated constant, based on the value of the 'CurrentGeneration' of memory >> writes. This implies that any store operation invalidates the knowledge >> about previously stored subexpressions. >> >> In general, this is a safe assumption but in this case it is missing quite >> some optimization potential. >> >> The effect of this can be shown by moving the line %6 one up, to before the >> previous store operation. This doesn't change the program behaviour but does >> influence the optimization. >> >> More info on this is in lib/Transforms/Scalar/EarlyCSE.cpp (line 415 is a >> good start) >> >> I don't have time to improve this at this moment so I'll leave that to you >> (or anyone else that feels inspired). >> >> Cheers, >> Roel >> >> >> On 01/24/2012 03:59 PM, Brent Walker wrote: >>> >>> Hi Roel, >>> the code you list below is precisely what I expect to get (of course >>> the stores must happen but the constant folding should happen as >>> well). >>> >>> It all looks very strange. LLVM is behaving as if the __restrict__ >>> keyword was not used at all. Even more strange is the fact that for >>> this function: >>> >>> double f(double *__restrict__ x, double *__restrict__ y, double >>> *__restrict__ z) >>> { >>> *x = 1.0; >>> *y = *x + 2; >>> *z = *x + 3; >>> >>> return *x + *y + *z; >>> } >>> >>> everything works as expected: >>> >>> define double @_Z1fPdS_S_(double* noalias nocapture %x, double* >>> noalias nocapture %y, double* noalias nocapture %z) nounwind uwtable { >>> store double 1.000000e+00, double* %x, align 8, !tbaa !0 >>> store double 3.000000e+00, double* %y, align 8, !tbaa !0 >>> store double 4.000000e+00, double* %z, align 8, !tbaa !0 >>> ret double 8.000000e+00 >>> } >>> >>> !0 = metadata !{metadata !"double", metadata !1} >>> !1 = metadata !{metadata !"omnipotent char", metadata !2} >>> !2 = metadata !{metadata !"Simple C/C++ TBAA", null} >>> >>> So I can't figure out what the mechanism is for telling llvm that two >>> pointers that are local variables (and not input arguments) to a >>> function do not alias each other. I hope one of the developers can >>> shed some light on this. >>> >>> Thank you in advance for any help, >>> Brent >>> >>> >>> On Tue, Jan 24, 2012 at 9:01 PM, Roel Jordans<r.jordans at tue.nl> wrote: >>>> >>>> Hi Brent, >>>> >>>> Looking at your code I can see at least one reason why some of the store >>>> operations remain in the output since you are (through x, y, and z) >>>> writing in memory which exists outside of your function (p). >>>> >>>> Constant propagation also seems to work in the first few lines, *y = *x >>>> +1 (%3) is stored directly. >>>> >>>> The strange thing to me is that the same doesn't happen for *z = *x + 2. >>>> Here *x is loaded again and the addition is still performed... >>>> >>>> From this point on, constant propagation seems to stop working >>>> completely. Looking at the IR I would have expected something like the >>>> following: >>>> >>>> define double @f(double** nocapture %p) nounwind uwtable { >>>> %1 = load double** %p, align 8, !tbaa !0 >>>> %2 = getelementptr inbounds double** %p, i64 1 >>>> %3 = load double** %2, align 8, !tbaa !0 >>>> %4 = getelementptr inbounds double** %p, i64 2 >>>> %5 = load double** %4, align 8, !tbaa !0 >>>> store double 1.000000e+00, double* %1, align 8, !tbaa !3 >>>> store double 3.000000e+00, double* %3, align 8, !tbaa !3 >>>> store double 4.000000e+00, double* %5, align 8, !tbaa !3 >>>> ret double 8.000000e+00 >>>> } >>>> >>>> !0 = metadata !{metadata !"any pointer", metadata !1} >>>> !1 = metadata !{metadata !"omnipotent char", metadata !2} >>>> !2 = metadata !{metadata !"Simple C/C++ TBAA", null} >>>> !3 = metadata !{metadata !"double", metadata !1} >>>> >>>> I am afraid I can't really help you in telling what went wrong here and >>>> caused the missing optimizations, but I agree with you that the result >>>> in not what I would have expected. >>>> >>>> Cheers, >>>> Roel >>>> On 01/23/2012 03:31 AM, Brent Walker wrote: >>>>> >>>>> Hi LLVMers, >>>>> I would like to ask a question regarding aliasing. Suppose I have the >>>>> following program: >>>>> >>>>> double f(double** p ) >>>>> { >>>>> double a,b,c; >>>>> double * x =&a; >>>>> double * y =&b; >>>>> double * z =&c; >>>>> >>>>> *x = 1; >>>>> *y = *x + 2; >>>>> *z = *x + 3; >>>>> >>>>> return *x+*y+*z; >>>>> } >>>>> >>>>> LLVM can tell that the three pointers do not alias each other so can >>>>> perform the constant folding at compile time. >>>>> >>>>> define double @f(double** nocapture %p) nounwind uwtable readnone { >>>>> ret double 8.000000e+00 >>>>> } >>>>> >>>>> Now consider the function below. I know (in my particluar case) that >>>>> the pointers in the p array do not alias each other. I tried to >>>>> communicate this information to llvm/clang via the __restrict__ >>>>> qualifier but it does not seem to have an effect. Can you please >>>>> suggest what is wrong. How can I achieve what I want? >>>>> >>>>> double f(double** p ) >>>>> { >>>>> double *__restrict__ x = p[0]; >>>>> double *__restrict__ y = p[1]; >>>>> double *__restrict__ z = p[2]; >>>>> >>>>> *x = 1; >>>>> *y = *x + 2; >>>>> *z = *x + 3; >>>>> >>>>> return *x+*y+*z; >>>>> } >>>>> >>>>> define double @f(double** nocapture %p) nounwind uwtable { >>>>> %1 = load double** %p, align 8, !tbaa !0 >>>>> %2 = getelementptr inbounds double** %p, i64 1 >>>>> %3 = load double** %2, align 8, !tbaa !0 >>>>> %4 = getelementptr inbounds double** %p, i64 2 >>>>> %5 = load double** %4, align 8, !tbaa !0 >>>>> store double 1.000000e+00, double* %1, align 8, !tbaa !3 >>>>> store double 3.000000e+00, double* %3, align 8, !tbaa !3 >>>>> %6 = load double* %1, align 8, !tbaa !3 >>>>> %7 = fadd double %6, 3.000000e+00 >>>>> store double %7, double* %5, align 8, !tbaa !3 >>>>> %8 = load double* %1, align 8, !tbaa !3 >>>>> %9 = load double* %3, align 8, !tbaa !3 >>>>> %10 = fadd double %8, %9 >>>>> %11 = fadd double %10, %7 >>>>> ret double %11 >>>>> } >>>>> >>>>> !0 = metadata !{metadata !"any pointer", metadata !1} >>>>> !1 = metadata !{metadata !"omnipotent char", metadata !2} >>>>> !2 = metadata !{metadata !"Simple C/C++ TBAA", null} >>>>> !3 = metadata !{metadata !"double", metadata !1} >>>>> >>>>> Thank you for any help. >>>>> >>>>> Brent >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
I think the problem here is that the IR doesn't have any way to attach restrict information to loads/stores/pointers. It works on arguments because they can be given the 'noalias' attribute, and then the alias analyzer must understand what that means. Pete On Jan 24, 2012, at 7:47 AM, Roel Jordans wrote:> I have no clue, I didn't have time to look into that example yet. > > How does the IR (before optimization) differ from the other version? > > Roel > > On 01/24/2012 04:45 PM, Brent Walker wrote: >> Can you explain please why it works for this version of the function: >> >> double f(double *__restrict__ x, double *__restrict__ y, double >> *__restrict__ z); >> >> What is different here? There are stores here as well. >> >> Brent >> >> >> On Wed, Jan 25, 2012 at 12:34 AM, Roel Jordans<r.jordans at tue.nl> wrote: >>> Hi Brent, >>> >>> I think this is a problem in the easy-cse transform. In this transform load >>> operations can be replaced by their subexpression, in this case the >>> propagated constant, based on the value of the 'CurrentGeneration' of memory >>> writes. This implies that any store operation invalidates the knowledge >>> about previously stored subexpressions. >>> >>> In general, this is a safe assumption but in this case it is missing quite >>> some optimization potential. >>> >>> The effect of this can be shown by moving the line %6 one up, to before the >>> previous store operation. This doesn't change the program behaviour but does >>> influence the optimization. >>> >>> More info on this is in lib/Transforms/Scalar/EarlyCSE.cpp (line 415 is a >>> good start) >>> >>> I don't have time to improve this at this moment so I'll leave that to you >>> (or anyone else that feels inspired). >>> >>> Cheers, >>> Roel >>> >>> >>> On 01/24/2012 03:59 PM, Brent Walker wrote: >>>> >>>> Hi Roel, >>>> the code you list below is precisely what I expect to get (of course >>>> the stores must happen but the constant folding should happen as >>>> well). >>>> >>>> It all looks very strange. LLVM is behaving as if the __restrict__ >>>> keyword was not used at all. Even more strange is the fact that for >>>> this function: >>>> >>>> double f(double *__restrict__ x, double *__restrict__ y, double >>>> *__restrict__ z) >>>> { >>>> *x = 1.0; >>>> *y = *x + 2; >>>> *z = *x + 3; >>>> >>>> return *x + *y + *z; >>>> } >>>> >>>> everything works as expected: >>>> >>>> define double @_Z1fPdS_S_(double* noalias nocapture %x, double* >>>> noalias nocapture %y, double* noalias nocapture %z) nounwind uwtable { >>>> store double 1.000000e+00, double* %x, align 8, !tbaa !0 >>>> store double 3.000000e+00, double* %y, align 8, !tbaa !0 >>>> store double 4.000000e+00, double* %z, align 8, !tbaa !0 >>>> ret double 8.000000e+00 >>>> } >>>> >>>> !0 = metadata !{metadata !"double", metadata !1} >>>> !1 = metadata !{metadata !"omnipotent char", metadata !2} >>>> !2 = metadata !{metadata !"Simple C/C++ TBAA", null} >>>> >>>> So I can't figure out what the mechanism is for telling llvm that two >>>> pointers that are local variables (and not input arguments) to a >>>> function do not alias each other. I hope one of the developers can >>>> shed some light on this. >>>> >>>> Thank you in advance for any help, >>>> Brent >>>> >>>> >>>> On Tue, Jan 24, 2012 at 9:01 PM, Roel Jordans<r.jordans at tue.nl> wrote: >>>>> >>>>> Hi Brent, >>>>> >>>>> Looking at your code I can see at least one reason why some of the store >>>>> operations remain in the output since you are (through x, y, and z) >>>>> writing in memory which exists outside of your function (p). >>>>> >>>>> Constant propagation also seems to work in the first few lines, *y = *x >>>>> +1 (%3) is stored directly. >>>>> >>>>> The strange thing to me is that the same doesn't happen for *z = *x + 2. >>>>> Here *x is loaded again and the addition is still performed... >>>>> >>>>> From this point on, constant propagation seems to stop working >>>>> completely. Looking at the IR I would have expected something like the >>>>> following: >>>>> >>>>> define double @f(double** nocapture %p) nounwind uwtable { >>>>> %1 = load double** %p, align 8, !tbaa !0 >>>>> %2 = getelementptr inbounds double** %p, i64 1 >>>>> %3 = load double** %2, align 8, !tbaa !0 >>>>> %4 = getelementptr inbounds double** %p, i64 2 >>>>> %5 = load double** %4, align 8, !tbaa !0 >>>>> store double 1.000000e+00, double* %1, align 8, !tbaa !3 >>>>> store double 3.000000e+00, double* %3, align 8, !tbaa !3 >>>>> store double 4.000000e+00, double* %5, align 8, !tbaa !3 >>>>> ret double 8.000000e+00 >>>>> } >>>>> >>>>> !0 = metadata !{metadata !"any pointer", metadata !1} >>>>> !1 = metadata !{metadata !"omnipotent char", metadata !2} >>>>> !2 = metadata !{metadata !"Simple C/C++ TBAA", null} >>>>> !3 = metadata !{metadata !"double", metadata !1} >>>>> >>>>> I am afraid I can't really help you in telling what went wrong here and >>>>> caused the missing optimizations, but I agree with you that the result >>>>> in not what I would have expected. >>>>> >>>>> Cheers, >>>>> Roel >>>>> On 01/23/2012 03:31 AM, Brent Walker wrote: >>>>>> >>>>>> Hi LLVMers, >>>>>> I would like to ask a question regarding aliasing. Suppose I have the >>>>>> following program: >>>>>> >>>>>> double f(double** p ) >>>>>> { >>>>>> double a,b,c; >>>>>> double * x =&a; >>>>>> double * y =&b; >>>>>> double * z =&c; >>>>>> >>>>>> *x = 1; >>>>>> *y = *x + 2; >>>>>> *z = *x + 3; >>>>>> >>>>>> return *x+*y+*z; >>>>>> } >>>>>> >>>>>> LLVM can tell that the three pointers do not alias each other so can >>>>>> perform the constant folding at compile time. >>>>>> >>>>>> define double @f(double** nocapture %p) nounwind uwtable readnone { >>>>>> ret double 8.000000e+00 >>>>>> } >>>>>> >>>>>> Now consider the function below. I know (in my particluar case) that >>>>>> the pointers in the p array do not alias each other. I tried to >>>>>> communicate this information to llvm/clang via the __restrict__ >>>>>> qualifier but it does not seem to have an effect. Can you please >>>>>> suggest what is wrong. How can I achieve what I want? >>>>>> >>>>>> double f(double** p ) >>>>>> { >>>>>> double *__restrict__ x = p[0]; >>>>>> double *__restrict__ y = p[1]; >>>>>> double *__restrict__ z = p[2]; >>>>>> >>>>>> *x = 1; >>>>>> *y = *x + 2; >>>>>> *z = *x + 3; >>>>>> >>>>>> return *x+*y+*z; >>>>>> } >>>>>> >>>>>> define double @f(double** nocapture %p) nounwind uwtable { >>>>>> %1 = load double** %p, align 8, !tbaa !0 >>>>>> %2 = getelementptr inbounds double** %p, i64 1 >>>>>> %3 = load double** %2, align 8, !tbaa !0 >>>>>> %4 = getelementptr inbounds double** %p, i64 2 >>>>>> %5 = load double** %4, align 8, !tbaa !0 >>>>>> store double 1.000000e+00, double* %1, align 8, !tbaa !3 >>>>>> store double 3.000000e+00, double* %3, align 8, !tbaa !3 >>>>>> %6 = load double* %1, align 8, !tbaa !3 >>>>>> %7 = fadd double %6, 3.000000e+00 >>>>>> store double %7, double* %5, align 8, !tbaa !3 >>>>>> %8 = load double* %1, align 8, !tbaa !3 >>>>>> %9 = load double* %3, align 8, !tbaa !3 >>>>>> %10 = fadd double %8, %9 >>>>>> %11 = fadd double %10, %7 >>>>>> ret double %11 >>>>>> } >>>>>> >>>>>> !0 = metadata !{metadata !"any pointer", metadata !1} >>>>>> !1 = metadata !{metadata !"omnipotent char", metadata !2} >>>>>> !2 = metadata !{metadata !"Simple C/C++ TBAA", null} >>>>>> !3 = metadata !{metadata !"double", metadata !1} >>>>>> >>>>>> Thank you for any help. >>>>>> >>>>>> Brent >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Jan 24, 2012, at 7:45 AM, Brent Walker wrote:> Can you explain please why it works for this version of the function: > > double f(double *__restrict__ x, double *__restrict__ y, double > *__restrict__ z); > > What is different here? There are stores here as well.LLVM ignores restrict everywhere except function parameters. This is a compromise aimed at a sweet spot in the balance of compiler complexity vs. optimization opportunity. - Many analysis and optimization techniques naturally apply to whole functions. When restrict appears on a local variable inside a function, its special aliasing property applies to only a subset of the function. It's awkward to teach such code to understand and respect local scope boundaries, in general. - Function boundaries are often the boundaries of analysis. Interprocedural analysis can be expensive and complex, so many optimization passes are limited to thinking about one function at a time. And even interprocedural analysis passes are bounded by shared library boundaries. While local variables can often be analyzed automatically (as in your first example), function paramters are often incoming mystery values, so they are where restrict is most often interesting. This compromise does mean that some opportunities are lost (as in your second example), but from clang's perspective these cases are rare. Dan
Thank you for your reply. The compromise you describe below, is it a compromise in the LLVM back end or in clang? I run into this while building a compiler for a small DSL language for which I generate functions that receive a context from which they extract a bunch of pointers to doubles from which inputs are passed to the function (I just used C/clang in my examples to illustrate the problem). Unless I can mark these extracted pointers as non-aliasing, performance suffers a lot. It's not just CSE that suffers -- LLVM does not do copy propagation either so we keep loading the same values from memory over and over again. See the example here: double f(double a, double * x, double *y, double * z) { *x = a; *y = a+1; *z = *x + 3; return *x + *y + *z; } define double @f(double %a, double* nocapture %x, double* nocapture %y, double* nocapture %z) nounwind uwtable { store double %a, double* %x, align 8, !tbaa !0 %1 = fadd double %a, 1.000000e+00 store double %1, double* %y, align 8, !tbaa !0 %2 = load double* %x, align 8, !tbaa !0 %3 = fadd double %2, 3.000000e+00 store double %3, double* %z, align 8, !tbaa !0 %4 = load double* %x, align 8, !tbaa !0 %5 = load double* %y, align 8, !tbaa !0 %6 = fadd double %4, %5 %7 = fadd double %6, %3 ret double %7 } !0 = metadata !{metadata !"double", metadata !1} !1 = metadata !{metadata !"omnipotent char", metadata !2} !2 = metadata !{metadata !"Simple C/C++ TBAA", null} (I used arguments here without __restrict__ which has the same effect as loading my pointers from context as locals). As you can see we keep loading the value of x from memory, even though we just stored a local into it. Given that I am generating LLVM IR directly (via the C++ interface) can you suggest someway I could pass the noalias attribute onto the locals? One work around is of course to generate two functions as follows: double f1( struct ctx* ctx ) { return f2(ctx->a, ctx->x, ctx->y, ctx->z); } double f2( double a, double *__restrict__ x, double *__restrict__ y, double *__restrict__ z) { *x = a; *y = a+1; *z = *x + 3; return *x + *y + *z; } but if at all possible I would like to avoid such acrobatics. Thank you in advance for any help. Brent On Wed, Jan 25, 2012 at 4:52 AM, Dan Gohman <gohman at apple.com> wrote:> On Jan 24, 2012, at 7:45 AM, Brent Walker wrote: > >> Can you explain please why it works for this version of the function: >> >> double f(double *__restrict__ x, double *__restrict__ y, double >> *__restrict__ z); >> >> What is different here? There are stores here as well. > > LLVM ignores restrict everywhere except function parameters. This is a > compromise aimed at a sweet spot in the balance of compiler complexity > vs. optimization opportunity. > > - Many analysis and optimization techniques naturally apply to whole > functions. When restrict appears on a local variable inside a > function, its special aliasing property applies to only a subset of > the function. It's awkward to teach such code to understand and > respect local scope boundaries, in general. > > - Function boundaries are often the boundaries of analysis. > Interprocedural analysis can be expensive and complex, so many > optimization passes are limited to thinking about one function > at a time. And even interprocedural analysis passes are > bounded by shared library boundaries. While local variables can > often be analyzed automatically (as in your first example), > function paramters are often incoming mystery values, so they > are where restrict is most often interesting. > > This compromise does mean that some opportunities are lost (as in > your second example), but from clang's perspective these cases are > rare. > > Dan >