Wonsun Ahn
2012-Feb-15 12:41 UTC
[LLVMdev] Performance problems with FORTRAN allocatable arrays
I've noticed that LLVM does a bad job of optimizing array indexing
code for FORTRAN arrays declared using the ALLOCATABLE keyword.
For example if you have something like the following:
DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:) :: QAV
...
ALLOCATE( QAV( -2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND) )
...
DO L = 1, 5
DO K = K1, K2
DO J = J1, J2
DO I = I1, I2
II = I + IADD
IBD = II - IBDD
ICD = II + IBDD
QAV(I,J,K,L) = R6I * (2.0D0 * Q(IBD,J,K,L,N)
+> 5.0D0 * Q( II,J,K,L,N) -
> Q(ICD,J,K,L,N))
END DO
END DO
END DO
END DO
Most of the code needed to calculate the address of QAV(I,J,K,L)
should be hoisted out of the loop since J, K, and L are constant
inside the loop. But I'm not seeing this happening because LLVM's
alias analysis cannot distinguish between the loads of the array
dimensions for QAV and the store to QAV(I,J,K,L). I've tried all the
alias analyses available in the standard distribution, including type
based analysis and scalar evolution. But if you think about it, the
array dimensions of QAV is 'metadata' and should not alias with any
actual accesses in the program. I've compiled the same code with GCC
and it was able to hoist most of the address calculations out as
expected. GCC was able to hoist address calculations for Q also.
This is an actual piece of code in SPECCPU2006 437.leslie3d and the
loop I analyzed is in line 1630 of file tml.f. 437.leslie3d suffers
horrible performance problems because of this and similar problems.
Is there anyway to enable this optimization? Is there a way to flag in
the IR that a particular locations is array dimension meta data?
Thanks,
Wonsun
Duncan Sands
2012-Feb-15 14:34 UTC
[LLVMdev] Performance problems with FORTRAN allocatable arrays
Hi Wonsun, can you please provide a testcase. Best wishes, Duncan.> I've noticed that LLVM does a bad job of optimizing array indexing > code for FORTRAN arrays declared using the ALLOCATABLE keyword. > > For example if you have something like the following: > > DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:) :: QAV > ... > ALLOCATE( QAV( -2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND) ) > ... > DO L = 1, 5 > DO K = K1, K2 > DO J = J1, J2 > DO I = I1, I2 > II = I + IADD > IBD = II - IBDD > ICD = II + IBDD > > QAV(I,J,K,L) = R6I * (2.0D0 * Q(IBD,J,K,L,N) + >> 5.0D0 * Q( II,J,K,L,N) - >> Q(ICD,J,K,L,N)) > END DO > END DO > END DO > END DO > > Most of the code needed to calculate the address of QAV(I,J,K,L) > should be hoisted out of the loop since J, K, and L are constant > inside the loop. But I'm not seeing this happening because LLVM's > alias analysis cannot distinguish between the loads of the array > dimensions for QAV and the store to QAV(I,J,K,L). I've tried all the > alias analyses available in the standard distribution, including type > based analysis and scalar evolution. But if you think about it, the > array dimensions of QAV is 'metadata' and should not alias with any > actual accesses in the program. I've compiled the same code with GCC > and it was able to hoist most of the address calculations out as > expected. GCC was able to hoist address calculations for Q also. > > This is an actual piece of code in SPECCPU2006 437.leslie3d and the > loop I analyzed is in line 1630 of file tml.f. 437.leslie3d suffers > horrible performance problems because of this and similar problems. > > Is there anyway to enable this optimization? Is there a way to flag in > the IR that a particular locations is array dimension meta data? > > Thanks, > Wonsun > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Wonsun Ahn
2012-Feb-15 16:14 UTC
[LLVMdev] Performance problems with FORTRAN allocatable arrays
Hi Duncan,
Here is the test case:
------------------------------------- snip -------------------------------------
MODULE LES3D_DATA
IMPLICIT REAL*8 (A-H,O-Z)
INTEGER IMAX, JMAX, KMAX, ND
DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:) :: QAV
DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:,:) :: Q
END MODULE LES3D_DATA
PROGRAM LES3D
USE LES3D_DATA
IMPLICIT REAL*8(A-H,O-Z)
READ(5,*) IMAX, JMAX, KMAX, ND
ALLOCATE(Q(-2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND,2),
> QAV(-2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND))
DO L = 1, 5
DO K = 0, KMAX
DO J = 0, JMAX
DO I = 0, IMAX
QAV(I,J,K,L) = 2.0D0 * Q(I,J,K,L,1)
END DO
END DO
END DO
END DO
stop
END
------------------------------------- snip -------------------------------------
I compiled the above using the following commands:
llvm-gfortran -O3 <source name> -c -emit-llvm -o <bytecode name>
If you disassemble the bytecode, you'll see that the matrix address
calculations for QAV and Q are not hoisted out of the loop.
Thanks,
Wonsun
On Wed, Feb 15, 2012 at 8:34 AM, Duncan Sands <baldrick at free.fr>
wrote:> Hi Wonsun, can you please provide a testcase.
>
> Best wishes, Duncan.
>
>> I've noticed that LLVM does a bad job of optimizing array indexing
>> code for FORTRAN arrays declared using the ALLOCATABLE keyword.
>>
>> For example if you have something like the following:
>>
>> DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:) :: QAV
>> ...
>> ALLOCATE( QAV( -2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND) )
>> ...
>> DO L = 1, 5
>> DO K = K1, K2
>> DO J = J1, J2
>> DO I = I1, I2
>> II = I + IADD
>> IBD = II - IBDD
>> ICD = II + IBDD
>>
>> QAV(I,J,K,L) = R6I * (2.0D0 * Q(IBD,J,K,L,N) +
>>> 5.0D0 * Q( II,J,K,L,N) -
>>> Q(ICD,J,K,L,N))
>> END DO
>> END DO
>> END DO
>> END DO
>>
>> Most of the code needed to calculate the address of QAV(I,J,K,L)
>> should be hoisted out of the loop since J, K, and L are constant
>> inside the loop. But I'm not seeing this happening because
LLVM's
>> alias analysis cannot distinguish between the loads of the array
>> dimensions for QAV and the store to QAV(I,J,K,L). I've tried all
the
>> alias analyses available in the standard distribution, including type
>> based analysis and scalar evolution. But if you think about it, the
>> array dimensions of QAV is 'metadata' and should not alias with
any
>> actual accesses in the program. I've compiled the same code with
GCC
>> and it was able to hoist most of the address calculations out as
>> expected. GCC was able to hoist address calculations for Q also.
>>
>> This is an actual piece of code in SPECCPU2006 437.leslie3d and the
>> loop I analyzed is in line 1630 of file tml.f. 437.leslie3d suffers
>> horrible performance problems because of this and similar problems.
>>
>> Is there anyway to enable this optimization? Is there a way to flag in
>> the IR that a particular locations is array dimension meta data?
>>
>> Thanks,
>> Wonsun
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev