Duncan Sands
2010-Nov-05 16:51 UTC
[LLVMdev] Hoisting elements of array argument into registers
> I see the same with clang. I'm not sure why the optimizers do so much better > when they can see that sp is a local array (the special initial values don't > matter).It is the scalar replacement of aggregates pass that puts everything into registers when sp is a local array. What happens is: the tail recursion in wf is eliminated. wf is inlined into g. scalarrepl turns the (local) array accesses in g into registers. Once everything is in registers, later passes clean everything up. Ciao, Duncan.
Max Bolingbroke
2010-Nov-05 22:16 UTC
[LLVMdev] Hoisting elements of array argument into registers
Duncan Sands <baldrick <at> free.fr> writes:> > > I see the same with clang. I'm not sure why the optimizers do so much better > > when they can see that sp is a local array (the special initial values don't > > matter). > > It is the scalar replacement of aggregates pass that puts everything into > registers when sp is a local array.Yes, I had hoped that scalar replacement would get the array case. What surprised me is that it didn't. However, I can reproduce your result (good optimisation for g()) with LLVM HEAD. My earlier tests used "opt -O2", but once I tried again with (HEAD) Clang -O3 I got an optimised g(). However, scalar replacement can't help with functions like a (non-inlined) wf() where the structure of sp is unknown. Is there any hope for LLVM optimising such functions? Some combination of passes that will do what I want? This problem is essentially killing all opportunities for loop optimisation in Haskell right now, so we would dearly like to have a solution :-) Cheers, Max
David Peixotto
2010-Nov-06 22:00 UTC
[LLVMdev] Hoisting elements of array argument into registers
I am seeing the wf loop get optimized just fine with llvm 2.8 (and almost as good with head). I'm running on Mac OS X 10.6. I have an apple supplied llvm-gcc and a self compiled llvm 2.8. When I run $ llvm-gcc -emit-llvm -S M.c $ opt -O2 M.s | llvm-dis I see that: 1. Tail recursion has been eliminated from wf 2. The accesses to sp have been promoted to registers 3. The loop has been translated to straight line code that computes the result directly based off of the induction variables. I am surprised that others are not seeing the same thing. If I download the llvm-gcc binary from http://llvm.org/releases/download.html#2.8 and run $ llvm-gcc -O2 -emit-llvm -S I see the same results as running `opt -O2` by hand. However, if I use the apple supplied llvm-gcc and run `llvm-gcc -O2`, I see that the loop does not get optimized. I am assuming this is because the apple supplied llvm-gcc links with an older version of the llvm optimizations. If I optimize with LLVM head I see basically the same results, except that it leaves in one redundant load. I'm curious why others are seeing such different results. I don't see anything in wf that should prevent the sp accesses from being promoted to registers. I would think they should be promoted to registers, but we will still have to write the results back to sp before returning from wf because in general we can't prove that sp is not accessed after returning from wf. -David On Nov 5, 2010, at 5:16 PM, Max Bolingbroke wrote:> Duncan Sands <baldrick <at> free.fr> writes: >> >>> I see the same with clang. I'm not sure why the optimizers do so much better >>> when they can see that sp is a local array (the special initial values don't >>> matter). >> >> It is the scalar replacement of aggregates pass that puts everything into >> registers when sp is a local array. > > Yes, I had hoped that scalar replacement would get the array case. What > surprised me is that it didn't. However, I can reproduce your result (good > optimisation for g()) with LLVM HEAD. My earlier tests used "opt -O2", but > once I tried again with (HEAD) Clang -O3 I got an optimised g(). > > However, scalar replacement can't help with functions like a (non-inlined) wf() > where the structure of sp is unknown. Is there any hope for LLVM optimising > such functions? Some combination of passes that will do what I want? This > problem is essentially killing all opportunities for loop optimisation in > Haskell right now, so we would dearly like to have a solution :-) > > Cheers, > Max > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Maybe Matching Threads
- [LLVMdev] Hoisting elements of array argument into registers
- [LLVMdev] Unrolling loops into constant-time expressions
- [LLVMdev] spilling & xmm register usage
- [LLVMdev] Hoisting elements of array argument into registers
- [LLVMdev] Hoisting elements of array argument into registers