Hi, I'd like to fix PR19590, which is about llvm.global_ctors containing functions that end up being empty after optimization (which causes the linker to add useless init_array entries to the output binary). globalopt removes empty functions from llvm.global_ctors, but by the time the function becomes empty globalopt has already run and it doesn't run again. I'm wondering what the best fix is: 1. Should globalopt run once more after all other passes have run? 2. Or should the llvm.global_ctors optimization code in globalopt be moved into a helper function somewhere that's called from both globalopt and a new optimization pass cdtoropt that does nothing but remove empty functions from llvm.global_ctors and llvm.global_dtors? Then only this new pass would be added after all other passes. (This new pass should run very quickly, it doesn't have to do much.) Thanks for any advice, Nico
Can we strengthen globalopt to symbolically execute at most one direct call to any given function, or is that crazy talk? On Wed, Apr 30, 2014 at 4:48 PM, Nico Weber <thakis at chromium.org> wrote:> Hi, > > I'd like to fix PR19590, which is about llvm.global_ctors containing > functions that end up being empty after optimization (which causes the > linker to add useless init_array entries to the output binary). > globalopt removes empty functions from llvm.global_ctors, but by the > time the function becomes empty globalopt has already run and it > doesn't run again. > > I'm wondering what the best fix is: > 1. Should globalopt run once more after all other passes have run? > 2. Or should the llvm.global_ctors optimization code in globalopt be > moved into a helper function somewhere that's called from both > globalopt and a new optimization pass cdtoropt that does nothing but > remove empty functions from llvm.global_ctors and llvm.global_dtors? > Then only this new pass would be added after all other passes. (This > new pass should run very quickly, it doesn't have to do much.) > > Thanks for any advice, > Nico > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140430/9f5f4db4/attachment.html>
I'm not sure I understand what you mean. The unoptimized IR is @llvm.global_ctors = appending global [1 x { i32, void ()* }] [{ i32, void ()* } { i32 65535, void ()* @_GLOBAL__I_a }] define internal void @_GLOBAL__I_a() section "__TEXT,__StaticInit,regular,pure_instructions" { entry: call void @__cxx_global_var_init() ret void } define internal void @__cxx_global_var_init() section "__TEXT,__StaticInit,regular,pure_instructions" { entry: %0 = load i32* @_ZN3Bar18LINKER_INITIALIZEDE, align 4 call void @_ZN3FooC1E17LinkerInitialized(%class.Foo* @foo, i32 %0) ret void } ; Function Attrs: ssp uwtable define linkonce_odr void @_ZN3FooC1E17LinkerInitialized(%class.Foo* %this, i32) unnamed_addr #0 align 2 { entry: %this.addr = alloca %class.Foo*, align 8 %.addr = alloca i32, align 4 store %class.Foo* %this, %class.Foo** %this.addr, align 8 store i32 %0, i32* %.addr, align 4 %this1 = load %class.Foo** %this.addr %1 = load i32* %.addr, align 4 call void @_ZN3FooC2E17LinkerInitialized(%class.Foo* %this1, i32 %1) ret void } …which is quite a bit of stuff. But other passes successfully reduce this to @llvm.global_ctors = appending global [1 x { i32, void ()* }] [{ i32, void ()* } { i32 65535, void ()* @_GLOBAL__I_a }] ; Function Attrs: nounwind readnone define internal void @_GLOBAL__I_a() #1 section "__TEXT,__StaticInit,regular,pure_instructions" { entry: ret void } which is much nicer, and globalopt is able to reduce the latter bit to an empty global_ctors list – the problem is that it runs before the IR is optimized that much. Just doing globalopt once more (or doing just the global_ctor bits of globalopt once more) is enough to fix the problem. The existing machinery seems to do all that's needed, it just needs to be used in a slightly different order as far as I understand. On Wed, Apr 30, 2014 at 4:59 PM, Reid Kleckner <rnk at google.com> wrote:> Can we strengthen globalopt to symbolically execute at most one direct call > to any given function, or is that crazy talk? > > > On Wed, Apr 30, 2014 at 4:48 PM, Nico Weber <thakis at chromium.org> wrote: >> >> Hi, >> >> I'd like to fix PR19590, which is about llvm.global_ctors containing >> functions that end up being empty after optimization (which causes the >> linker to add useless init_array entries to the output binary). >> globalopt removes empty functions from llvm.global_ctors, but by the >> time the function becomes empty globalopt has already run and it >> doesn't run again. >> >> I'm wondering what the best fix is: >> 1. Should globalopt run once more after all other passes have run? >> 2. Or should the llvm.global_ctors optimization code in globalopt be >> moved into a helper function somewhere that's called from both >> globalopt and a new optimization pass cdtoropt that does nothing but >> remove empty functions from llvm.global_ctors and llvm.global_dtors? >> Then only this new pass would be added after all other passes. (This >> new pass should run very quickly, it doesn't have to do much.) >> >> Thanks for any advice, >> Nico >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
On 30 April 2014 19:48, Nico Weber <thakis at chromium.org> wrote:> Hi, > > I'd like to fix PR19590, which is about llvm.global_ctors containing > functions that end up being empty after optimization (which causes the > linker to add useless init_array entries to the output binary). > globalopt removes empty functions from llvm.global_ctors, but by the > time the function becomes empty globalopt has already run and it > doesn't run again. > > I'm wondering what the best fix is: > 1. Should globalopt run once more after all other passes have run? > 2. Or should the llvm.global_ctors optimization code in globalopt be > moved into a helper function somewhere that's called from both > globalopt and a new optimization pass cdtoropt that does nothing but > remove empty functions from llvm.global_ctors and llvm.global_dtors? > Then only this new pass would be added after all other passes. (This > new pass should run very quickly, it doesn't have to do much.)How late do you need to move globalopt to get this to work? Do you get any performance regressions if you just move it there? Cheers, Rafael
I talked about this with Nick in person months ago, and my understanding is that GlobalOpt is also an enabling optimization that needs to run early. For example, if we can eliminate an initializer to an internal global with no other stores to it, we can propagate the result. Maybe we should run it twice. On Thu, May 1, 2014 at 11:01 AM, Rafael Espíndola < rafael.espindola at gmail.com> wrote:> On 30 April 2014 19:48, Nico Weber <thakis at chromium.org> wrote: > > Hi, > > > > I'd like to fix PR19590, which is about llvm.global_ctors containing > > functions that end up being empty after optimization (which causes the > > linker to add useless init_array entries to the output binary). > > globalopt removes empty functions from llvm.global_ctors, but by the > > time the function becomes empty globalopt has already run and it > > doesn't run again. > > > > I'm wondering what the best fix is: > > 1. Should globalopt run once more after all other passes have run? > > 2. Or should the llvm.global_ctors optimization code in globalopt be > > moved into a helper function somewhere that's called from both > > globalopt and a new optimization pass cdtoropt that does nothing but > > remove empty functions from llvm.global_ctors and llvm.global_dtors? > > Then only this new pass would be added after all other passes. (This > > new pass should run very quickly, it doesn't have to do much.) > > How late do you need to move globalopt to get this to work? Do you get > any performance regressions if you just move it there? > > Cheers, > Rafael > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140501/99741a9e/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Best way to clean up empty global_ctors
- [LLVMdev] Proposal: Adding an optional association field to llvm.global_ctors
- [LLVMdev] Inconsistent third field in global_ctors (was Re: [llvm] r214321 - UseListOrder: Visit global values)
- JIT, LTO and @llvm.global_ctors: Looking for advise
- [LLVMdev] Proposal: Adding an optional association field to llvm.global_ctors