thr3ads.net - llvm dev - [LLVMdev] Scalar replacement of arrays [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Nicolas Capens

2012-Mar-07 20:47 UTC

[LLVMdev] Scalar replacement of arrays

Hi all,

I'm implementing a virtual processor which features dynamic register 
indexing, and I'm struggling to make LLVM 3.0 produce good code for it. 
The register set is implemented as an LLVM array so it can be 
dynamically indexed using GEP. However, most of the time the virtual 
processor's registers are just statically indexed, and so I 
expected/hoped the code would be as optimal as when the virtual 
registers are implemented using individual scalars, which are allocated 
to the target machine's physical registers as much as possible. But that 
turns out not to be the case and I end up with code which constantly 
reads and writes memory to access my virtual registers.

The "Scalar Replacement of Aggregates" pass (scalarrepl) seems to be 
capable of splitting structures into separate fields so that mem2reg can 
produce efficient code which avoids redundant memory operations. But it 
skips my array entirely. Here's a small piece of C code which 
illustrates the problem:

int foo(int x, int y)
{
     int r[2];
     r[0] = x;
     r[1] = y;
     r[0] = r[0] + r[1];
     return r[0];
}

This gives me the following (x86) assembly code:

  sub         esp,8
  mov         eax,dword ptr [esp+0Ch]
  mov         dword ptr [esp],eax
  mov         eax,dword ptr [esp+10h]
  mov         dword ptr [esp+4],eax
  add         eax,dword ptr [esp]
  mov         dword ptr [esp],eax
  add         esp,8
  ret

If I replace the array with two individual scalars, I get the following 
perfect result instead:

  mov         eax,dword ptr [esp+8]
  add         eax,dword ptr [esp+4]
  ret

Unfortunately, I don't think that having scalarrepl handle arrays will 
do the trick. It will work for the above trivial example, but my array 
of registers does get indexed dynamically from time to time, and this 
would completely prevent scalarrepl from doing anything, right?

Ideally LLVM should keep things in physical registers as long as 
possible, and when the virtual register array is being dynamically 
indexed it should write the physical registers back to the array...

So does anyone know if this can already be achieved using some other 
passes or settings? If not, what would be the best approach to implement it?

Thanks for any help,
Nicolas

Eli Friedman

2012-Mar-07 21:00 UTC

head link

[LLVMdev] Scalar replacement of arrays

On Wed, Mar 7, 2012 at 12:47 PM, Nicolas Capens
<nicolas.capens at gmail.com> wrote:> Hi all,
>
> I'm implementing a virtual processor which features dynamic register
> indexing, and I'm struggling to make LLVM 3.0 produce good code for it.
> The register set is implemented as an LLVM array so it can be
> dynamically indexed using GEP. However, most of the time the virtual
> processor's registers are just statically indexed, and so I
> expected/hoped the code would be as optimal as when the virtual
> registers are implemented using individual scalars, which are allocated
> to the target machine's physical registers as much as possible. But
that
> turns out not to be the case and I end up with code which constantly
> reads and writes memory to access my virtual registers.
>
> The "Scalar Replacement of Aggregates" pass (scalarrepl) seems to
be
> capable of splitting structures into separate fields so that mem2reg can
> produce efficient code which avoids redundant memory operations. But it
> skips my array entirely. Here's a small piece of C code which
> illustrates the problem:
>
> int foo(int x, int y)
> {
>     int r[2];
>     r[0] = x;
>     r[1] = y;
>     r[0] = r[0] + r[1];
>     return r[0];
> }
clang -O2 for that C code gives:

	pushl	%ebp
	movl	%esp, %ebp
	movl	12(%ebp), %eax
	addl	8(%ebp), %eax
	popl	%ebp
	ret

> If I replace the array with two individual scalars, I get the following
> perfect result instead:
>
>  mov         eax,dword ptr [esp+8]
>  add         eax,dword ptr [esp+4]
>  ret
>
> Unfortunately, I don't think that having scalarrepl handle arrays will
> do the trick. It will work for the above trivial example, but my array
> of registers does get indexed dynamically from time to time, and this
> would completely prevent scalarrepl from doing anything, right?
Yes; you wouldn't really want it to try.
> Ideally LLVM should keep things in physical registers as long as
> possible, and when the virtual register array is being dynamically
> indexed it should write the physical registers back to the array...
>
> So does anyone know if this can already be achieved using some other
> passes or settings? If not, what would be the best approach to implement
it?
Conceptually, we ought to be able to handle that sort of issue with a
combination of GVN and dead store elimination (DSE).  Unfortunately,
LLVM's DSE pass is rather weak. so that approach might not be so
effective in practice.

-Eli

Nicolas Capens

2012-Mar-08 18:14 UTC

head link

[LLVMdev] Scalar replacement of arrays

Hi Eli,

I was surprised to see that you're getting more optimal code. But I'm 
using the JIT so I realized there must be some extra optimization passes 
in -O2 which I wasn't using. It turns out that a second scalarrepl 
followed by ADCE did the trick for that small sample.

Initially it didn't work for my larger project though. So I dug a little 
deeper and found out that scalarrepl uses a default threshold of 128 
bytes for arrays. Increasing that to the size of the virtual processor's 
register array made it work as expected! That is, as long as the array 
isn't being dynamically indexed. In any case this is a little more 
promising than I thought since scalarrepl does handle arrays.

However, I'm not sure if that's going to help achieve optimal code for 
when the array is sometimes being dynamically indexed. Essentially there 
should be some kind of store to load copy propagation. As far as I know 
that's exactly what mem2reg does, except that it only considers scalars 
and not elements of arrays.

So would it be hard to extend mem2reg to also consider elements of 
arrays for promotion? It should obviously not perform the promotion when 
in between the store and load there's a dynamically indexed access to 
the array. Correct me if I'm wrong, but that seems it would be superior 
to scalarrepl itself (for arrays).

Is there anyone experienced with mem2reg who wants to implement this? If 
not, any advice on how to best approach this?

Thanks,
Nicolas

On 07/03/2012 4:00 PM, Eli Friedman wrote:> On Wed, Mar 7, 2012 at 12:47 PM, Nicolas Capens
> <nicolas.capens at gmail.com>  wrote:
>> Hi all,
>>
>> I'm implementing a virtual processor which features dynamic
register
>> indexing, and I'm struggling to make LLVM 3.0 produce good code for
it.
>> The register set is implemented as an LLVM array so it can be
>> dynamically indexed using GEP. However, most of the time the virtual
>> processor's registers are just statically indexed, and so I
>> expected/hoped the code would be as optimal as when the virtual
>> registers are implemented using individual scalars, which are allocated
>> to the target machine's physical registers as much as possible. But
that
>> turns out not to be the case and I end up with code which constantly
>> reads and writes memory to access my virtual registers.
>>
>> The "Scalar Replacement of Aggregates" pass (scalarrepl)
seems to be
>> capable of splitting structures into separate fields so that mem2reg
can
>> produce efficient code which avoids redundant memory operations. But it
>> skips my array entirely. Here's a small piece of C code which
>> illustrates the problem:
>>
>> int foo(int x, int y)
>> {
>>      int r[2];
>>      r[0] = x;
>>      r[1] = y;
>>      r[0] = r[0] + r[1];
>>      return r[0];
>> }
> clang -O2 for that C code gives:
>
> 	pushl	%ebp
> 	movl	%esp, %ebp
> 	movl	12(%ebp), %eax
> 	addl	8(%ebp), %eax
> 	popl	%ebp
> 	ret
>
>
>> If I replace the array with two individual scalars, I get the following
>> perfect result instead:
>>
>>   mov         eax,dword ptr [esp+8]
>>   add         eax,dword ptr [esp+4]
>>   ret
>>
>> Unfortunately, I don't think that having scalarrepl handle arrays
will
>> do the trick. It will work for the above trivial example, but my array
>> of registers does get indexed dynamically from time to time, and this
>> would completely prevent scalarrepl from doing anything, right?
> Yes; you wouldn't really want it to try.
>
>> Ideally LLVM should keep things in physical registers as long as
>> possible, and when the virtual register array is being dynamically
>> indexed it should write the physical registers back to the array...
>>
>> So does anyone know if this can already be achieved using some other
>> passes or settings? If not, what would be the best approach to
implement it?
> Conceptually, we ought to be able to handle that sort of issue with a
> combination of GVN and dead store elimination (DSE).  Unfortunately,
> LLVM's DSE pass is rather weak. so that approach might not be so
> effective in practice.
>
> -Eli

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Mar 2012 - [LLVMdev] Scalar replacement of arrays

[LLVMdev] Scalar replacement of arrays

[LLVMdev] Scalar replacement of arrays

[LLVMdev] Scalar replacement of arrays

Maybe Matching Threads