Hey everyone, I'm looking at the following two C functions: http://pastebin.com/mYWCj6d8 Second one is about 50% slower than the first one because in the second case d->data[i * 3 + {X, Y, Z}] loads are not moved out of the second loop and are computed every time j increments, even though they don't depend on it. If I move those out manually, the performance of the two is equal. What could be the reason for this behaviour? I looked at LICM logs; the compiler hoists GEPs but stops short of loading the actual values. Does it have something to do with possible aliasing? Thanks, Dimitri. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130117/eb82f46e/attachment.html>
On Jan 17, 2013, at 9:27 PM, Dimitri Tcaciuc <dtcaciuc at gmail.com> wrote:> Hey everyone, > > I'm looking at the following two C functions: > > http://pastebin.com/mYWCj6d8 > > Second one is about 50% slower than the first one because in the second case d->data[i * 3 + {X, Y, Z}] loads are not moved out of the second loop and are computed every time j increments, even though they don't depend on it. If I move those out manually, the performance of the two is equal. > > What could be the reason for this behaviour? I looked at LICM logs; the compiler hoists GEPs but stops short of loading the actual values. Does it have something to do with possible aliasing?Yes. Almost certainly, the compiler can't tell that the load from the struct (to get the array pointer) doesn't alias with the stores to the array itself. This is exactly the kind of thing that type-based alias analysis (-fstrict-aliasing) can help with. Use with caution, as it can break some type-unsafe idioms. --Owen -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130117/3b809c0b/attachment.html>
Ownen, thanks for the reply. The above, however, was already compiled with -fstrict-aliasing and -O3; doesn't look like it makes much of a difference in this case. As with my earlier question, I'm actually using C as a model to create a custom frontend, which I intend to bless with more strict aliasing rules by default. The Array is going to be a language intrinsic so I'm free to make a suitable tbaa tree. The problem is I haven't found a good example of how would such length-aware pointer container structure be annotated so that loads from the data pointer are optimized as though we didn't access it through a struct. The struct IR looks something like %struct.Array = type { double*, i64 } Then we get the GEP to data, followed by a load %data = getelementptr inbounds %struct.Array* %d, i64 0, i32 0 %0 = load double** %data, align 8, !tbaa !0 where relevant TBAA looks like !0 = metadata !{metadata !"any pointer", metadata !1} !1 = metadata !{metadata !"omnipotent char", metadata !2} !2 = metadata !{metadata !"Simple C/C++ TBAA"} !3 = metadata !{metadata !"double", metadata !1} So if I understand correctly, because double and any pointer are leafs of omnipotent char, they can easily alias. Thus, having something like !4 = metadata !{metadata !"Array data", metadata !2} %0 = load double** %data, align 8, !tbaa !4 should do the trick? Dimitri. On Thu, Jan 17, 2013 at 9:33 PM, Owen Anderson <resistor at mac.com> wrote:> > On Jan 17, 2013, at 9:27 PM, Dimitri Tcaciuc <dtcaciuc at gmail.com> wrote: > > Hey everyone, > > I'm looking at the following two C functions: > > http://pastebin.com/mYWCj6d8 > > Second one is about 50% slower than the first one because in the second > case d->data[i * 3 + {X, Y, Z}] loads are not moved out of the second loop > and are computed every time j increments, even though they don't depend on > it. If I move those out manually, the performance of the two is equal. > > What could be the reason for this behaviour? I looked at LICM logs; the > compiler hoists GEPs but stops short of loading the actual values. Does it > have something to do with possible aliasing? > > > Yes. > > Almost certainly, the compiler can't tell that the load from the struct > (to get the array pointer) doesn't alias with the stores to the array > itself. This is exactly the kind of thing that type-based alias analysis > (-fstrict-aliasing) can help with. Use with caution, as it can break some > type-unsafe idioms. > > --Owen > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130117/e8a55ba5/attachment.html>
On 1/17/2013 11:33 PM, Owen Anderson wrote:> > Almost certainly, the compiler can't tell that the load from the struct > (to get the array pointer) doesn't alias with the stores to the array > itself. This is exactly the kind of thing that type-based alias > analysis (-fstrict-aliasing) can help with. Use with caution, as it > can break some type-unsafe idioms.The problem is actually the stores to out->data. While "d" and "out" are restricted pointers, it tells us nothing about the relation between "d->data" and "out->data", since the "data" member is a pointer itself. Since both "d->data" and "out->data" are both of type double, the type-based alias analysis won't help. In the case when you pass "double*" to the function, the restrict keyword tells us that the arrays of doubles themselves do not alias, which allows the compiler to infer that the loads and stores in the loop do not depend on each other. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation