thr3ads.net - llvm dev - [LLVMdev] Performance problems with FORTRAN allocatable arrays [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Wonsun Ahn

2012-Feb-15 12:41 UTC

[LLVMdev] Performance problems with FORTRAN allocatable arrays

I've noticed that LLVM does a bad job of optimizing array indexing
code for FORTRAN arrays declared using the ALLOCATABLE keyword.

For example if you have something like the following:

DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:) :: QAV
...
ALLOCATE( QAV( -2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND) )
...
DO L = 1, 5
   DO K = K1, K2
      DO J = J1, J2
         DO I = I1, I2
            II  =  I + IADD
            IBD = II - IBDD
            ICD = II + IBDD

            QAV(I,J,K,L) = R6I * (2.0D0 * Q(IBD,J,K,L,N)
+>                                     5.0D0 * Q( II,J,K,L,N) -
>                                             Q(ICD,J,K,L,N))         END DO
      END DO
   END DO
END DO

Most of the code needed to calculate the address of QAV(I,J,K,L)
should be hoisted out of the loop since J, K, and L are constant
inside the loop. But I'm not seeing this happening because LLVM's
alias analysis cannot distinguish between the loads of the array
dimensions for QAV and the store to QAV(I,J,K,L). I've tried all the
alias analyses available in the standard distribution, including type
based analysis and scalar evolution. But if you think about it, the
array dimensions of QAV is 'metadata' and should not alias with any
actual accesses in the program. I've compiled the same code with GCC
and it was able to hoist most of the address calculations out as
expected. GCC was able to hoist address calculations for Q also.

This is an actual piece of code in SPECCPU2006 437.leslie3d and the
loop I analyzed is in line 1630 of file tml.f. 437.leslie3d suffers
horrible performance problems because of this and similar problems.

Is there anyway to enable this optimization? Is there a way to flag in
the IR that a particular locations is array dimension meta data?

Thanks,
Wonsun

Duncan Sands

2012-Feb-15 14:34 UTC

head link

[LLVMdev] Performance problems with FORTRAN allocatable arrays

Hi Wonsun, can you please provide a testcase.

Best wishes, Duncan.
> I've noticed that LLVM does a bad job of optimizing array indexing
> code for FORTRAN arrays declared using the ALLOCATABLE keyword.
>
> For example if you have something like the following:
>
> DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:) :: QAV
> ...
> ALLOCATE( QAV( -2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND) )
> ...
> DO L = 1, 5
>     DO K = K1, K2
>        DO J = J1, J2
>           DO I = I1, I2
>              II  =  I + IADD
>              IBD = II - IBDD
>              ICD = II + IBDD
>
>              QAV(I,J,K,L) = R6I * (2.0D0 * Q(IBD,J,K,L,N) +
>>                                      5.0D0 * Q( II,J,K,L,N) -
>>                                              Q(ICD,J,K,L,N))
>           END DO
>        END DO
>     END DO
> END DO
>
> Most of the code needed to calculate the address of QAV(I,J,K,L)
> should be hoisted out of the loop since J, K, and L are constant
> inside the loop. But I'm not seeing this happening because LLVM's
> alias analysis cannot distinguish between the loads of the array
> dimensions for QAV and the store to QAV(I,J,K,L). I've tried all the
> alias analyses available in the standard distribution, including type
> based analysis and scalar evolution. But if you think about it, the
> array dimensions of QAV is 'metadata' and should not alias with any
> actual accesses in the program. I've compiled the same code with GCC
> and it was able to hoist most of the address calculations out as
> expected. GCC was able to hoist address calculations for Q also.
>
> This is an actual piece of code in SPECCPU2006 437.leslie3d and the
> loop I analyzed is in line 1630 of file tml.f. 437.leslie3d suffers
> horrible performance problems because of this and similar problems.
>
> Is there anyway to enable this optimization? Is there a way to flag in
> the IR that a particular locations is array dimension meta data?
>
> Thanks,
> Wonsun
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Wonsun Ahn

2012-Feb-15 16:14 UTC

head link

[LLVMdev] Performance problems with FORTRAN allocatable arrays

Hi Duncan,

Here is the test case:

------------------------------------- snip -------------------------------------

      MODULE LES3D_DATA
      IMPLICIT REAL*8 (A-H,O-Z)
      INTEGER IMAX, JMAX, KMAX, ND
      DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:) :: QAV
      DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:,:) :: Q
      END MODULE LES3D_DATA

      PROGRAM LES3D

      USE LES3D_DATA
      IMPLICIT REAL*8(A-H,O-Z)

      READ(5,*) IMAX, JMAX, KMAX, ND
      ALLOCATE(Q(-2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND,2),
     >       QAV(-2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND))

      DO L = 1, 5
         DO K = 0, KMAX
            DO J = 0, JMAX
               DO I = 0, IMAX
                  QAV(I,J,K,L) = 2.0D0 * Q(I,J,K,L,1)
               END DO
            END DO
         END DO
      END DO

      stop

      END

------------------------------------- snip -------------------------------------

I compiled the above using the following commands:

llvm-gfortran -O3 <source name> -c -emit-llvm -o <bytecode name>

If you disassemble the bytecode, you'll see that the matrix address
calculations for QAV and Q are not hoisted out of the loop.

Thanks,
Wonsun

On Wed, Feb 15, 2012 at 8:34 AM, Duncan Sands <baldrick at free.fr>
wrote:> Hi Wonsun, can you please provide a testcase.
>
> Best wishes, Duncan.
>
>> I've noticed that LLVM does a bad job of optimizing array indexing
>> code for FORTRAN arrays declared using the ALLOCATABLE keyword.
>>
>> For example if you have something like the following:
>>
>> DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:) :: QAV
>> ...
>> ALLOCATE( QAV( -2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND) )
>> ...
>> DO L = 1, 5
>>     DO K = K1, K2
>>        DO J = J1, J2
>>           DO I = I1, I2
>>              II  =  I + IADD
>>              IBD = II - IBDD
>>              ICD = II + IBDD
>>
>>              QAV(I,J,K,L) = R6I * (2.0D0 * Q(IBD,J,K,L,N) +
>>>                                      5.0D0 * Q( II,J,K,L,N) -
>>>                                              Q(ICD,J,K,L,N))
>>           END DO
>>        END DO
>>     END DO
>> END DO
>>
>> Most of the code needed to calculate the address of QAV(I,J,K,L)
>> should be hoisted out of the loop since J, K, and L are constant
>> inside the loop. But I'm not seeing this happening because
LLVM's
>> alias analysis cannot distinguish between the loads of the array
>> dimensions for QAV and the store to QAV(I,J,K,L). I've tried all
the
>> alias analyses available in the standard distribution, including type
>> based analysis and scalar evolution. But if you think about it, the
>> array dimensions of QAV is 'metadata' and should not alias with
any
>> actual accesses in the program. I've compiled the same code with
GCC
>> and it was able to hoist most of the address calculations out as
>> expected. GCC was able to hoist address calculations for Q also.
>>
>> This is an actual piece of code in SPECCPU2006 437.leslie3d and the
>> loop I analyzed is in line 1630 of file tml.f. 437.leslie3d suffers
>> horrible performance problems because of this and similar problems.
>>
>> Is there anyway to enable this optimization? Is there a way to flag in
>> the IR that a particular locations is array dimension meta data?
>>
>> Thanks,
>> Wonsun
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - Feb 2012 - [LLVMdev] Performance problems with FORTRAN allocatable arrays

[LLVMdev] Performance problems with FORTRAN allocatable arrays

[LLVMdev] Performance problems with FORTRAN allocatable arrays

[LLVMdev] Performance problems with FORTRAN allocatable arrays

Maybe Matching Threads