On Mon, Jul 11, 2011 at 2:44 PM, Eric Christopher <echristo at apple.com> wrote:> > On Jul 11, 2011, at 1:48 PM, Nick Lewycky wrote: > >> I discovered recently that RegAllocFast spills all the registers before every function call. This is the root cause of one of our recursive functions that takes about 150 bytes of stack when built with gcc (same at -O0 and -O2, or 120 bytes at llc -O2) taking 960 bytes of stack when built by llc -O0. That's pretty bad for situations where you have small stacks, which is not uncommon for threaded software on a 32-bit architecture, or a signal handler. >> >> I realize that some of this is intentional and we don't want to do optimization at -O0, but I'm really hoping there's something we could sensibly do to improve this. Here's an example: >> >> extern "C" void foo(int); >> >> void test() { >> foo(0); >> foo(1); >> foo(2); >> } >> >> This doesn't just spill out all the registers to the stack before each call, we also set up 0, 1 and 2 into regs first, then spill them and don't even get a chance to reuse stack slots. That's just bad: >> >> pushq %rax >> movl $2, %edi >> movl $1, %eax >> movl $0, %ecx >> movl %edi, 4(%rsp) # 4-byte Spill >> movl %ecx, %edi >> movl %eax, (%rsp) # 4-byte Spill >> callq foo >> movl (%rsp), %edi # 4-byte Reload >> callq foo >> movl 4(%rsp), %edi # 4-byte Reload >> callq foo >> popq %rax >> ret >> >> Does anyone have any ideas what we could do that doesn't add to the compile time? > > This seems odd. I'd think that fast-isel should be able to materialize the constants when we want them rather than at the beginning of the block.I'm not entirely sure why, but FastISel does intentionally materialize constants at the beginning of the block. See FastISel::enterLocalValueArea etc. Maybe Dan knows why? -Eli
On Jul 11, 2011, at 3:06 PM, Eli Friedman wrote:> On Mon, Jul 11, 2011 at 2:44 PM, Eric Christopher <echristo at apple.com> wrote: >> >> On Jul 11, 2011, at 1:48 PM, Nick Lewycky wrote: >> >>> I discovered recently that RegAllocFast spills all the registers before every function call. This is the root cause of one of our recursive functions that takes about 150 bytes of stack when built with gcc (same at -O0 and -O2, or 120 bytes at llc -O2) taking 960 bytes of stack when built by llc -O0. That's pretty bad for situations where you have small stacks, which is not uncommon for threaded software on a 32-bit architecture, or a signal handler. >>> >>> I realize that some of this is intentional and we don't want to do optimization at -O0, but I'm really hoping there's something we could sensibly do to improve this. Here's an example: >>> >>> extern "C" void foo(int); >>> >>> void test() { >>> foo(0); >>> foo(1); >>> foo(2); >>> } >>> >>> This doesn't just spill out all the registers to the stack before each call, we also set up 0, 1 and 2 into regs first, then spill them and don't even get a chance to reuse stack slots. That's just bad: >>> >>> pushq %rax >>> movl $2, %edi >>> movl $1, %eax >>> movl $0, %ecx >>> movl %edi, 4(%rsp) # 4-byte Spill >>> movl %ecx, %edi >>> movl %eax, (%rsp) # 4-byte Spill >>> callq foo >>> movl (%rsp), %edi # 4-byte Reload >>> callq foo >>> movl 4(%rsp), %edi # 4-byte Reload >>> callq foo >>> popq %rax >>> ret >>> >>> Does anyone have any ideas what we could do that doesn't add to the compile time? >> >> This seems odd. I'd think that fast-isel should be able to materialize the constants when we want them rather than at the beginning of the block. > > I'm not entirely sure why, but FastISel does intentionally materialize > constants at the beginning of the block. See > FastISel::enterLocalValueArea etc. Maybe Dan knows why?Yeah, I'd seen that. Curious. Might be a remnant of the old top down method to avoid materializing constants too many times. -eric
On Jul 11, 2011, at 3:06 PM, Eli Friedman wrote:> On Mon, Jul 11, 2011 at 2:44 PM, Eric Christopher <echristo at apple.com> wrote: >> >> On Jul 11, 2011, at 1:48 PM, Nick Lewycky wrote: >> >> This seems odd. I'd think that fast-isel should be able to materialize the constants when we want them rather than at the beginning of the block. > > I'm not entirely sure why, but FastISel does intentionally materialize > constants at the beginning of the block. See > FastISel::enterLocalValueArea etc. Maybe Dan knows why?Going bottom-up, FastISel doesn't know when it'll see the first use of a value in a block. Cleverer schemes are possible. Dan
On Jul 11, 2011, at 3:41 PM, Dan Gohman wrote:> On Jul 11, 2011, at 3:06 PM, Eli Friedman wrote: > >> On Mon, Jul 11, 2011 at 2:44 PM, Eric Christopher <echristo at apple.com> wrote: >>> >>> On Jul 11, 2011, at 1:48 PM, Nick Lewycky wrote: >>> >>> This seems odd. I'd think that fast-isel should be able to materialize the constants when we want them rather than at the beginning of the block. >> >> I'm not entirely sure why, but FastISel does intentionally materialize >> constants at the beginning of the block. See >> FastISel::enterLocalValueArea etc. Maybe Dan knows why? > > > Going bottom-up, FastISel doesn't know when it'll see the first use of a value > in a block. Cleverer schemes are possible.Or less clever by not caching the result? :) -eric