Wonsun Ahn
2012-Feb-15 12:41 UTC
[LLVMdev] Performance problems with FORTRAN allocatable arrays
I've noticed that LLVM does a bad job of optimizing array indexing code for FORTRAN arrays declared using the ALLOCATABLE keyword. For example if you have something like the following: DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:) :: QAV ... ALLOCATE( QAV( -2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND) ) ... DO L = 1, 5 DO K = K1, K2 DO J = J1, J2 DO I = I1, I2 II = I + IADD IBD = II - IBDD ICD = II + IBDD QAV(I,J,K,L) = R6I * (2.0D0 * Q(IBD,J,K,L,N) +> 5.0D0 * Q( II,J,K,L,N) - > Q(ICD,J,K,L,N))END DO END DO END DO END DO Most of the code needed to calculate the address of QAV(I,J,K,L) should be hoisted out of the loop since J, K, and L are constant inside the loop. But I'm not seeing this happening because LLVM's alias analysis cannot distinguish between the loads of the array dimensions for QAV and the store to QAV(I,J,K,L). I've tried all the alias analyses available in the standard distribution, including type based analysis and scalar evolution. But if you think about it, the array dimensions of QAV is 'metadata' and should not alias with any actual accesses in the program. I've compiled the same code with GCC and it was able to hoist most of the address calculations out as expected. GCC was able to hoist address calculations for Q also. This is an actual piece of code in SPECCPU2006 437.leslie3d and the loop I analyzed is in line 1630 of file tml.f. 437.leslie3d suffers horrible performance problems because of this and similar problems. Is there anyway to enable this optimization? Is there a way to flag in the IR that a particular locations is array dimension meta data? Thanks, Wonsun
Duncan Sands
2012-Feb-15 14:34 UTC
[LLVMdev] Performance problems with FORTRAN allocatable arrays
Hi Wonsun, can you please provide a testcase. Best wishes, Duncan.> I've noticed that LLVM does a bad job of optimizing array indexing > code for FORTRAN arrays declared using the ALLOCATABLE keyword. > > For example if you have something like the following: > > DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:) :: QAV > ... > ALLOCATE( QAV( -2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND) ) > ... > DO L = 1, 5 > DO K = K1, K2 > DO J = J1, J2 > DO I = I1, I2 > II = I + IADD > IBD = II - IBDD > ICD = II + IBDD > > QAV(I,J,K,L) = R6I * (2.0D0 * Q(IBD,J,K,L,N) + >> 5.0D0 * Q( II,J,K,L,N) - >> Q(ICD,J,K,L,N)) > END DO > END DO > END DO > END DO > > Most of the code needed to calculate the address of QAV(I,J,K,L) > should be hoisted out of the loop since J, K, and L are constant > inside the loop. But I'm not seeing this happening because LLVM's > alias analysis cannot distinguish between the loads of the array > dimensions for QAV and the store to QAV(I,J,K,L). I've tried all the > alias analyses available in the standard distribution, including type > based analysis and scalar evolution. But if you think about it, the > array dimensions of QAV is 'metadata' and should not alias with any > actual accesses in the program. I've compiled the same code with GCC > and it was able to hoist most of the address calculations out as > expected. GCC was able to hoist address calculations for Q also. > > This is an actual piece of code in SPECCPU2006 437.leslie3d and the > loop I analyzed is in line 1630 of file tml.f. 437.leslie3d suffers > horrible performance problems because of this and similar problems. > > Is there anyway to enable this optimization? Is there a way to flag in > the IR that a particular locations is array dimension meta data? > > Thanks, > Wonsun > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Wonsun Ahn
2012-Feb-15 16:14 UTC
[LLVMdev] Performance problems with FORTRAN allocatable arrays
Hi Duncan, Here is the test case: ------------------------------------- snip ------------------------------------- MODULE LES3D_DATA IMPLICIT REAL*8 (A-H,O-Z) INTEGER IMAX, JMAX, KMAX, ND DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:) :: QAV DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:,:) :: Q END MODULE LES3D_DATA PROGRAM LES3D USE LES3D_DATA IMPLICIT REAL*8(A-H,O-Z) READ(5,*) IMAX, JMAX, KMAX, ND ALLOCATE(Q(-2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND,2), > QAV(-2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND)) DO L = 1, 5 DO K = 0, KMAX DO J = 0, JMAX DO I = 0, IMAX QAV(I,J,K,L) = 2.0D0 * Q(I,J,K,L,1) END DO END DO END DO END DO stop END ------------------------------------- snip ------------------------------------- I compiled the above using the following commands: llvm-gfortran -O3 <source name> -c -emit-llvm -o <bytecode name> If you disassemble the bytecode, you'll see that the matrix address calculations for QAV and Q are not hoisted out of the loop. Thanks, Wonsun On Wed, Feb 15, 2012 at 8:34 AM, Duncan Sands <baldrick at free.fr> wrote:> Hi Wonsun, can you please provide a testcase. > > Best wishes, Duncan. > >> I've noticed that LLVM does a bad job of optimizing array indexing >> code for FORTRAN arrays declared using the ALLOCATABLE keyword. >> >> For example if you have something like the following: >> >> DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:) :: QAV >> ... >> ALLOCATE( QAV( -2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND) ) >> ... >> DO L = 1, 5 >> DO K = K1, K2 >> DO J = J1, J2 >> DO I = I1, I2 >> II = I + IADD >> IBD = II - IBDD >> ICD = II + IBDD >> >> QAV(I,J,K,L) = R6I * (2.0D0 * Q(IBD,J,K,L,N) + >>> 5.0D0 * Q( II,J,K,L,N) - >>> Q(ICD,J,K,L,N)) >> END DO >> END DO >> END DO >> END DO >> >> Most of the code needed to calculate the address of QAV(I,J,K,L) >> should be hoisted out of the loop since J, K, and L are constant >> inside the loop. But I'm not seeing this happening because LLVM's >> alias analysis cannot distinguish between the loads of the array >> dimensions for QAV and the store to QAV(I,J,K,L). I've tried all the >> alias analyses available in the standard distribution, including type >> based analysis and scalar evolution. But if you think about it, the >> array dimensions of QAV is 'metadata' and should not alias with any >> actual accesses in the program. I've compiled the same code with GCC >> and it was able to hoist most of the address calculations out as >> expected. GCC was able to hoist address calculations for Q also. >> >> This is an actual piece of code in SPECCPU2006 437.leslie3d and the >> loop I analyzed is in line 1630 of file tml.f. 437.leslie3d suffers >> horrible performance problems because of this and similar problems. >> >> Is there anyway to enable this optimization? Is there a way to flag in >> the IR that a particular locations is array dimension meta data? >> >> Thanks, >> Wonsun >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev