Weiming Zhao
2014-Mar-12 17:43 UTC
[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable
Hi, When Im compiling a code with fvisibility=hidden fPIC for ARM, I find that LLVM generates less optimized code than GCC. For example: test.cpp: void init(void *); int g0[100]; int g1[100]; int g2[100]; void foo() { init(&g0); init(&g1); init(&g2); } Clang will emit 1 GOT entry for each GV and 2 instructions to get the address: ldr r0, .LCPI0_2 add r0, r0, r4 bl _Z4initPv(PLT) GCC does this only for the first GV. The rest GV address are computed directly: ldr r4, .L2 .LPIC0: add r4, pc, r4 è get &g0 via GOT_PC Relative mov r0, r4 bl _Z4initPv(PLT) add r0, r4, #400 è get &g1 bl _Z4initPv(PLT) add r0, r4, #800 è get &g2 ldmfd sp!, {r4, lr} b _Z4initPv(PLT) .L3: .align 2 .L2: .word .LANCHOR0-(.LPIC0+8) è 1 GOT offset entry It seems its a missing optimizing opportunity for LLVM both in code size and performance, any ideas? If so, I can open a bug and try to fix it. Thanks, Weiming Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140312/6d6d44d1/attachment.html>
Tim Northover
2014-Mar-12 19:49 UTC
[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable
Hi Weiming, On 12 March 2014 17:43, Weiming Zhao <weimingz at codeaurora.org> wrote:> Clang will emit 1 GOT entry for each GV and 2 instructions to get the > address: > > GCC does this only for the first GV. The rest GV address are computed > directly:This looks like it would be the job of lib/Transforms/GlobalMerge.cpp. It looks like ARM runs it in all cases, perhaps it doesn't understand some ELF linkage subtleties? Cheers. Tim.
Weiming Zhao
2014-Mar-12 20:54 UTC
[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable
Hi Tim, Thanks for the pointer. It seems GlobalMerge only considers static/local GVs: if (!I->hasLocalLinkage() || I->isThreadLocal() || I->hasSection()) continue; Let me try some experiments in GlobalMerge. Another place might be in ARMISelLowering.cpp :: LowerGlobalAddressELF(), but I think changing GlobalMerge makes more sense. Thanks, Weiming Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -----Original Message----- From: Tim Northover [mailto:t.p.northover at gmail.com] Sent: Wednesday, March 12, 2014 12:50 PM To: Weiming Zhao Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable Hi Weiming, On 12 March 2014 17:43, Weiming Zhao <weimingz at codeaurora.org> wrote:> Clang will emit 1 GOT entry for each GV and 2 instructions to get the > address: > > GCC does this only for the first GV. The rest GV address are computed > directly:This looks like it would be the job of lib/Transforms/GlobalMerge.cpp. It looks like ARM runs it in all cases, perhaps it doesn't understand some ELF linkage subtleties? Cheers. Tim.
weimingz at codeaurora.org
2014-Mar-14 04:45 UTC
[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable
Hi Tim, The global merge pass puts the GVs into a sturcture to guarantee their address are contiguous. It works for static GVs but for global hidden GVs, this will cause name resoltion fail during linking .o into .so Any thoughs? Thanks, Weiming> Hi Weiming, > > On 12 March 2014 17:43, Weiming Zhao <weimingz at codeaurora.org> wrote: >> Clang will emit 1 GOT entry for each GV and 2 instructions to get the >> address: >> >> GCC does this only for the first GV. The rest GV address are computed >> directly: > > This looks like it would be the job of lib/Transforms/GlobalMerge.cpp. > It looks like ARM runs it in all cases, perhaps it doesn't understand > some ELF linkage subtleties? > > Cheers. > > Tim. >