There seems to be a disadvantage to the approach of allocating all locals on the stack using alloca. Consider the following code: extern void func(int*); extern int xyz(); void abc() { int a = xyz(); int i; { int b[10]; for (i = 0; i < 10; i++) b[i] = xyz(); func(b); } { int c[10]; for (i = 0; i < 10; i++) c[i] = xyz(); func(c); } func(&a); } We have two arrays, b and c, only one of which can exist at any given time. Both g++ and Microsoft VC++ map both arrays to the same bytes in the stack frame. LLVM doesn't. As there isn't anything in the LLVM assembly code to mark the scope of locals, it will take some analysis to determine which locals can share the same memory locations. Also, the store into the arrays generates two x86 machine instructions: lea %ECX, DWORD PTR [%ESP + 16] mov DWORD PTR [%ECX + <element offset>], %EAX These can be combined into a single instruction. I'm tempted to work on this one myself :) On the plus side, only LLVM unrolled the loops.
On Thu, 26 Aug 2004, Jeff Cohen wrote:> There seems to be a disadvantage to the approach of allocating all > locals on the stack using alloca. Consider the following code:There is nothing intrinsic in LLVM that prevents this from happening, we just have not yet implemented 'stack packing'.> We have two arrays, b and c, only one of which can exist at any given > time. Both g++ and Microsoft VC++ map both arrays to the same bytes in > the stack frame. LLVM doesn't. As there isn't anything in the LLVM > assembly code to mark the scope of locals, it will take some analysis to > determine which locals can share the same memory locations.Note that using scoping is really only a hack. Those two arrays could very well have been declared on entry to the function, and should be optimized as such. LLVM will eventually support this.> Also, the store into the arrays generates two x86 machine instructions: > > lea %ECX, DWORD PTR [%ESP + 16] > mov DWORD PTR [%ECX + <element offset>], %EAX > > These can be combined into a single instruction. I'm tempted to work on > this one myself :)Yup, there are several things that can be improved in the X86 instruction selector. If you submit a patch to implement this, we would be happy to apply it. Continuing improvements in the code generator should eventually make this kind of thing fall out automatically, but for now they must be implemented manually. -Chris -- http://llvm.org/ http://nondot.org/sabre/
On Fri, 27 Aug 2004 02:20:34 -0500 (CDT) Chris Lattner <sabre at nondot.org> wrote:> On Thu, 26 Aug 2004, Jeff Cohen wrote: > > > Also, the store into the arrays generates two x86 machine > > instructions: > > > > lea %ECX, DWORD PTR [%ESP + 16] > > mov DWORD PTR [%ECX + <element offset>], %EAX > > > > These can be combined into a single instruction. I'm tempted to > > work on this one myself :) > > Yup, there are several things that can be improved in the X86 > instruction selector. If you submit a patch to implement this, we > would be happy to apply it. Continuing improvements in the code > generator should eventually make this kind of thing fall out > automatically, but for now they must be implemented manually. > > -ChrisI succumbed to temptation and made the improvement. Diffs are attached for X86ISelSimple.cpp and X86InstrBuilder.h. I determined that the reason two instructions are generated in the first place, instead of being folded immediately into one, is because locals do not have a physical offset assigned at that time. There is a peephole optimization pass after the stack frame is finalized, but the problem with folding them there is that the physical registers have also been assigned, so register CX (in this case) would be wasted. So I generalized the concept of a "full address". There were four variables, base reg, scale, index reg and displacement, that were being passed around. I converted them into a single struct and added a new type field that allows the base reg to instead be a frame index. The extra lea instruction is now folded immediately, and the code for processing store instructions shrinks quite nicely. This should be generalizable even further to handle global variables. I didn't go that far, because it appeared there may be other work required. It wasn't clear a displacement to a global was supported. Jeff -------------- next part -------------- A non-text attachment was scrubbed... Name: X86ISelSimple.cpp.diff Type: application/octet-stream Size: 21366 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040828/c7756315/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: X86InstrBuilder.h.diff Type: application/octet-stream Size: 2211 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040828/c7756315/attachment-0001.obj>