thr3ads.net - llvm dev - [LLVMdev] Optimization opportunity [Aug 2004]

If this information is useful, please help other people find it:
Share via:

Jeff Cohen

2004-Aug-27 06:16 UTC

[LLVMdev] Optimization opportunity

There seems to be a disadvantage to the approach of allocating all
locals on the stack using alloca.  Consider the following code:

extern void func(int*);
extern int xyz();

void abc()
{
	int a = xyz();
	int i;

	{
		int b[10];
		for (i = 0; i < 10; i++) b[i] = xyz();
		func(b);
	}

	{
		int c[10];
		for (i = 0; i < 10; i++) c[i] = xyz();
		func(c);
	}

	func(&a);
}

We have two arrays, b and c, only one of which can exist at any given
time.   Both g++ and Microsoft VC++ map both arrays to the same bytes in
the stack frame.  LLVM doesn't.  As there isn't anything in the LLVM
assembly code to mark the scope of locals, it will take some analysis to
determine which locals can share the same memory locations.

Also, the store into the arrays generates two x86 machine instructions:

	lea %ECX, DWORD PTR [%ESP + 16]
	mov DWORD PTR [%ECX + <element offset>], %EAX

These can be combined into a single instruction.  I'm tempted to work on
this one myself :)

On the plus side, only LLVM unrolled the loops.

Chris Lattner

2004-Aug-27 07:20 UTC

head link

[LLVMdev] Optimization opportunity

On Thu, 26 Aug 2004, Jeff Cohen wrote:
> There seems to be a disadvantage to the approach of allocating all
> locals on the stack using alloca.  Consider the following code:
There is nothing intrinsic in LLVM that prevents this from happening, we
just have not yet implemented 'stack packing'.
> We have two arrays, b and c, only one of which can exist at any given
> time.   Both g++ and Microsoft VC++ map both arrays to the same bytes in
> the stack frame.  LLVM doesn't.  As there isn't anything in the
LLVM
> assembly code to mark the scope of locals, it will take some analysis to
> determine which locals can share the same memory locations.
Note that using scoping is really only a hack.  Those two arrays could
very well have been declared on entry to the function, and should be
optimized as such.  LLVM will eventually support this.
> Also, the store into the arrays generates two x86 machine instructions:
>
> 	lea %ECX, DWORD PTR [%ESP + 16]
> 	mov DWORD PTR [%ECX + <element offset>], %EAX
>
> These can be combined into a single instruction.  I'm tempted to work
on
> this one myself :)
Yup, there are several things that can be improved in the X86 instruction
selector.  If you submit a patch to implement this, we would be happy to
apply it.  Continuing improvements in the code generator should eventually
make this kind of thing fall out automatically, but for now they must be
implemented manually.

-Chris

-- 
http://llvm.org/
http://nondot.org/sabre/

Jeff Cohen

2004-Aug-29 04:21 UTC

head link

[LLVMdev] Optimization opportunity

On Fri, 27 Aug 2004 02:20:34 -0500 (CDT)
Chris Lattner <sabre at nondot.org> wrote:
> On Thu, 26 Aug 2004, Jeff Cohen wrote:
> 
> > Also, the store into the arrays generates two x86 machine
> > instructions:
> >
> > 	lea %ECX, DWORD PTR [%ESP + 16]
> > 	mov DWORD PTR [%ECX + <element offset>], %EAX
> >
> > These can be combined into a single instruction.  I'm tempted to
> > work on this one myself :)
> 
> Yup, there are several things that can be improved in the X86
> instruction selector.  If you submit a patch to implement this, we
> would be happy to apply it.  Continuing improvements in the code
> generator should eventually make this kind of thing fall out
> automatically, but for now they must be implemented manually.
> 
> -Chris
I succumbed to temptation and made the improvement.  Diffs are attached
for X86ISelSimple.cpp and X86InstrBuilder.h.

I determined that the reason two instructions are generated in the first
place, instead of being folded immediately into one, is because locals
do not have a physical offset assigned at that time.  There is a
peephole optimization pass after the stack frame is finalized, but the
problem with folding them there is that the physical registers have also
been assigned, so register CX (in this case) would be wasted.

So I generalized the concept of a "full address".  There were four
variables, base reg, scale, index reg and displacement, that were being
passed around.  I converted them into a single struct and added a new
type field that allows the base reg to instead be a frame index.  The
extra lea instruction is now folded immediately, and the code for
processing store instructions shrinks quite nicely.

This should be generalizable even further to handle global variables.  I
didn't go that far, because it appeared there may be other work
required.  It wasn't clear a displacement to a global was supported.

Jeff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: X86ISelSimple.cpp.diff
Type: application/octet-stream
Size: 21366 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20040828/c7756315/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: X86InstrBuilder.h.diff
Type: application/octet-stream
Size: 2211 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20040828/c7756315/attachment-0001.obj>

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Aug 2004 - [LLVMdev] Optimization opportunity

[LLVMdev] Optimization opportunity

[LLVMdev] Optimization opportunity

[LLVMdev] Optimization opportunity

Maybe Matching Threads