Jonas Paulsson via llvm-dev
2018-Sep-20 15:55 UTC
[llvm-dev] Aliasing rules difference between GCC and Clang
Hi, I found a difference between Clang and GCC in alias handling. This was with a benchmark where Clang was considerably slower, and in a hot function which does many more loads from the same address due to stores between the uses. In other words, a value is loaded and used, another value is stored, and then the first value is loaded once again before its second use. This happens many times, with three loads instead of one for each value. GCC only emits one load. The values are the arguments to this function: void su3_projector( su3_vector *a, su3_vector *b, su3_matrix *c ){ register int i,j; register double tmp,tmp2; for(i=0;i<3;i++)for(j=0;j<3;j++){ tmp2 = a->c[i].real * b->c[j].real; tmp = a->c[i].imag * b->c[j].imag; c->e[i][j].real = tmp + tmp2; tmp2 = a->c[i].real * b->c[j].imag; tmp = a->c[i].imag * b->c[j].real; c->e[i][j].imag = tmp - tmp2; } } The types are: typedef struct { complex e[3][3]; } su3_matrix; typedef struct { complex c[3]; } su3_vector; So the question here is if the su3_vector and su3_matrix pointers may alias? If they may alias, then clang is right in reloading after each store. If the standard says they cannot alias, then gcc is right in only loading the values once each. It seems to me that either GCC is too aggressive or LLVM is too conservative, but I don't know which one it is... As far as I understand, there is the fact of the different struct types of the arguments (which means they cannot alias), but also the question if su3_vector is included in su3_matrix (which would mean they may alias). I made a reduced test case, where the same difference seems to be present. It has just one struct type which contains a matrix of double:s. A store to an element of the struct via a pointer is surrounded with two loads of a global double variable. Only Clang emits two loads. typedef struct { double c[3][3]; } STRUCT_TY; double e = 0.0; STRUCT_TY *f; int g = 0; void h() { int i = e; f->c[0][i] = g; g = e; } clang -O3-march=z13 : h: # @h # %bb.0: # %entry larl %r1, e ld %f0, 0(%r1) // LOAD E lrl %r2, g cfdbr %r0, 5, %f0 // CONVERT E lgfr %r0, %r0 // EXTEND E cdfbr %f0, %r2 lgrl %r2, f sllg %r3, %r0, 3 std %f0, 0(%r3,%r2) // STORE F ELEMENT ld %f0, 0(%r1) // 2nd LOAD E <<<<<<< cfdbr %r0, 5, %f0 // CONVERT strl %r0, g // 2nd USE br %r14 gcc -O3-march=z13 : h: .LFB0: .cfi_startproc larl %r1,e ld %f0,0(%r1) // LOAD E lrl %r2,g lgrl %r3,f cfdbr %r1,5,%f0 // CONVERT E cdfbr %f0,%r2 lgfr %r2,%r1 // EXTEND E sllg %r2,%r2,3 std %f0,0(%r2,%r3) // STORE F ELEMENT strl %r1,g // 2nd USE br %r14 I hope somebody with enough experience and knowledge can guide the way here as this seems to be quite important. /Jonas -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180920/600914cd/attachment.html>
Friedman, Eli via llvm-dev
2018-Sep-20 19:25 UTC
[llvm-dev] Aliasing rules difference between GCC and Clang
On 9/20/2018 8:55 AM, Jonas Paulsson via llvm-dev wrote:> > Hi, > > I found a difference between Clang and GCC in alias handling. This was > with a benchmark where Clang was considerably slower, and in a hot > function which does many more loads from the same address due to > stores between the uses. In other words, a value is loaded and used, > another value is stored, and then the first value is loaded once again > before its second use. This happens many times, with three loads > instead of one for each value. GCC only emits one load. > > The values are the arguments to this function: > > void su3_projector( su3_vector *a, su3_vector *b, su3_matrix *c ){ > register int i,j; > register double tmp,tmp2; > for(i=0;i<3;i++)for(j=0;j<3;j++){ > tmp2 = a->c[i].real * b->c[j].real; > tmp = a->c[i].imag * b->c[j].imag; > c->e[i][j].real = tmp + tmp2; > tmp2 = a->c[i].real * b->c[j].imag; > tmp = a->c[i].imag * b->c[j].real; > c->e[i][j].imag = tmp - tmp2; > } > } > > The types are: > typedef struct { complex e[3][3]; } su3_matrix; > typedef struct { complex c[3]; } su3_vector; > > So the question here is if the su3_vector and su3_matrix pointers may > alias? If they may alias, then clang is right in reloading after each > store. If the standard says they cannot alias, then gcc is right in > only loading the values once each. > > It seems to me that either GCC is too aggressive or LLVM is too > conservative, but I don't know which one it is... As far as I > understand, there is the fact of the different struct types of the > arguments (which means they cannot alias), but also the question if > su3_vector is included in su3_matrix (which would mean they may alias).clang currently emits relatively conservative TBAA info... see CodeGenFunction::EmitLValueForField for the struct handling. It should be straightforward to add equivalent handling for array indexing. Not sure about the correctness off the top of my head. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Matthieu Brucher via llvm-dev
2018-Sep-20 21:40 UTC
[llvm-dev] Aliasing rules difference between GCC and Clang
Hi, I would say that GCC is wrong and should also have a version where a could be equal to b. There is no restrict keyword, so they could be equal. Cheers, Matthieu Le jeu. 20 sept. 2018 à 16:56, Jonas Paulsson via llvm-dev < llvm-dev at lists.llvm.org> a écrit :> Hi, > > I found a difference between Clang and GCC in alias handling. This was > with a benchmark where Clang was considerably slower, and in a hot function > which does many more loads from the same address due to stores between the > uses. In other words, a value is loaded and used, another value is stored, > and then the first value is loaded once again before its second use. This > happens many times, with three loads instead of one for each value. GCC > only emits one load. > > The values are the arguments to this function: > void su3_projector( su3_vector *a, su3_vector *b, su3_matrix *c ){ > register int i,j; > register double tmp,tmp2; > for(i=0;i<3;i++)for(j=0;j<3;j++){ > tmp2 = a->c[i].real * b->c[j].real; > tmp = a->c[i].imag * b->c[j].imag; > c->e[i][j].real = tmp + tmp2; > tmp2 = a->c[i].real * b->c[j].imag; > tmp = a->c[i].imag * b->c[j].real; > c->e[i][j].imag = tmp - tmp2; > } > } > > The types are: > typedef struct { complex e[3][3]; } su3_matrix; > typedef struct { complex c[3]; } su3_vector; > > So the question here is if the su3_vector and su3_matrix pointers may > alias? If they may alias, then clang is right in reloading after each > store. If the standard says they cannot alias, then gcc is right in only > loading the values once each. > > It seems to me that either GCC is too aggressive or LLVM is too > conservative, but I don't know which one it is... As far as I understand, > there is the fact of the different struct types of the arguments (which > means they cannot alias), but also the question if su3_vector is included > in su3_matrix (which would mean they may alias). > > I made a reduced test case, where the same difference seems to be present. > It has just one struct type which contains a matrix of double:s. A store to > an element of the struct via a pointer is surrounded with two loads of a > global double variable. Only Clang emits two loads. > > typedef struct { > double c[3][3]; > } STRUCT_TY; > > double e = 0.0; > STRUCT_TY *f; > int g = 0; > void h() { > int i = e; > f->c[0][i] = g; > g = e; > } > > clang -O3-march=z13 : > > h: # @h > # %bb.0: # %entry > larl %r1, e > ld %f0, 0(%r1) // LOAD E > lrl %r2, g > cfdbr %r0, 5, %f0 // CONVERT E > lgfr %r0, %r0 // EXTEND E > cdfbr %f0, %r2 > lgrl %r2, f > sllg %r3, %r0, 3 > std %f0, 0(%r3,%r2) // STORE F ELEMENT > ld %f0, 0(%r1) // 2nd LOAD E <<<<<<< > cfdbr %r0, 5, %f0 // CONVERT > strl %r0, g // 2nd USE > br %r14 > > gcc -O3-march=z13 : > > h: > .LFB0: > .cfi_startproc > larl %r1,e > ld %f0,0(%r1) // LOAD E > lrl %r2,g > lgrl %r3,f > cfdbr %r1,5,%f0 // CONVERT E > cdfbr %f0,%r2 > lgfr %r2,%r1 // EXTEND E > sllg %r2,%r2,3 > std %f0,0(%r2,%r3) // STORE F ELEMENT > strl %r1,g // 2nd USE > br %r14 > > I hope somebody with enough experience and knowledge can guide the way > here as this seems to be quite important. > > /Jonas > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Quantitative analyst, Ph.D. Blog: http://blog.audio-tk.com/ LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180920/65813e89/attachment.html>
Jonas Paulsson via llvm-dev
2018-Sep-21 07:21 UTC
[llvm-dev] Aliasing rules difference between GCC and Clang
> > I would say that GCC is wrong and should also have a version where a > could be equal to b. There is no restrict keyword, so they could be > equal. >This was between a/b and c, not between a and b. Could you explain your opinion a bit more in detail, please? /Jonas> Cheers, > > Matthieu > > Le jeu. 20 sept. 2018 à 16:56, Jonas Paulsson via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> a écrit : > > Hi, > > I found a difference between Clang and GCC in alias handling. This > was with a benchmark where Clang was considerably slower, and in a > hot function which does many more loads from the same address due > to stores between the uses. In other words, a value is loaded and > used, another value is stored, and then the first value is loaded > once again before its second use. This happens many times, with > three loads instead of one for each value. GCC only emits one load. > > The values are the arguments to this function: > > void su3_projector( su3_vector *a, su3_vector *b, su3_matrix *c ){ > register int i,j; > register double tmp,tmp2; > for(i=0;i<3;i++)for(j=0;j<3;j++){ > tmp2 = a->c[i].real * b->c[j].real; > tmp = a->c[i].imag * b->c[j].imag; > c->e[i][j].real = tmp + tmp2; > tmp2 = a->c[i].real * b->c[j].imag; > tmp = a->c[i].imag * b->c[j].real; > c->e[i][j].imag = tmp - tmp2; > } > } > > The types are: > typedef struct { complex e[3][3]; } su3_matrix; > typedef struct { complex c[3]; } su3_vector; > > So the question here is if the su3_vector and su3_matrix pointers > may alias? If they may alias, then clang is right in reloading > after each store. If the standard says they cannot alias, then gcc > is right in only loading the values once each. > > It seems to me that either GCC is too aggressive or LLVM is too > conservative, but I don't know which one it is... As far as I > understand, there is the fact of the different struct types of the > arguments (which means they cannot alias), but also the question > if su3_vector is included in su3_matrix (which would mean they may > alias). > > I made a reduced test case, where the same difference seems to be > present. It has just one struct type which contains a matrix of > double:s. A store to an element of the struct via a pointer is > surrounded with two loads of a global double variable. Only Clang > emits two loads. > > typedef struct { > double c[3][3]; > } STRUCT_TY; > > double e = 0.0; > STRUCT_TY *f; > int g = 0; > void h() { > int i = e; > f->c[0][i] = g; > g = e; > } > > clang -O3-march=z13 : > > h: # @h > # %bb.0: # %entry > larl %r1, e > ld %f0, 0(%r1) // LOAD E > lrl %r2, g > cfdbr %r0, 5, %f0 // CONVERT E > lgfr %r0, %r0 // EXTEND E > cdfbr %f0, %r2 > lgrl %r2, f > sllg %r3, %r0, 3 > std %f0, 0(%r3,%r2) // STORE F ELEMENT > ld %f0, 0(%r1) // 2nd LOAD E <<<<<<< > cfdbr %r0, 5, %f0 // CONVERT > strl %r0, g // 2nd USE > br %r14 > > gcc -O3-march=z13 : > > h: > .LFB0: > .cfi_startproc > larl %r1,e > ld %f0,0(%r1) // LOAD E > lrl %r2,g > lgrl %r3,f > cfdbr %r1,5,%f0 // CONVERT E > cdfbr %f0,%r2 > lgfr %r2,%r1 // EXTEND E > sllg %r2,%r2,3 > std %f0,0(%r2,%r3) // STORE F ELEMENT > strl %r1,g // 2nd USE > br %r14 > > I hope somebody with enough experience and knowledge can guide the > way here as this seems to be quite important. > > /Jonas > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > -- > Quantitative analyst, Ph.D. > Blog: http://blog.audio-tk.com/ > LinkedIn: http://www.linkedin.com/in/matthieubrucher-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180921/e3bf7d2f/attachment.html>
Possibly Parallel Threads
- Aliasing rules difference between GCC and Clang
- LLVM crashing while trying to build SPEC with Clang
- Comparing Clang and GCC: only clang stores updated value in each iteration.
- Support token type in struct for landingpad
- Support token type in struct for landingpad