MAYUR PANDEY
2014-Nov-25 07:42 UTC
[LLVMdev] bx instruction getting generated in arm assembly for O1
<HTML><HEAD><TITLE>Samsung Enterprise Portal mySingle</TITLE> <META content=IE=5 http-equiv=X-UA-Compatible> <META content="text/html; charset=utf-8" http-equiv=Content-Type> <STYLE id=mysingle_style type=text/css>P { MARGIN-BOTTOM: 5px; FONT-SIZE: 9pt; FONT-FAMILY: Arial, arial; MARGIN-TOP: 5px } TD { MARGIN-BOTTOM: 5px; FONT-SIZE: 9pt; FONT-FAMILY: Arial, arial; MARGIN-TOP: 5px } LI { MARGIN-BOTTOM: 5px; FONT-SIZE: 9pt; FONT-FAMILY: Arial, arial; MARGIN-TOP: 5px } BODY { FONT-SIZE: 9pt; FONT-FAMILY: Arial, arial; MARGIN: 10px; LINE-HEIGHT: 1.4 } </STYLE> <META name=GENERATOR content=ActiveSquare></HEAD> <BODY> <META name=GENERATOR content=ActiveSquare> <P>Hi Jonathan,</P> <P>The assembly generated in case of clang-3.5 is</P> <P>indirect_call:<BR> .fnstart<BR>.Leh_func_begin0:<BR> ldr r0, .LCPI0_0<BR> ldr r1, .LCPI0_1<BR>.LPC0_0:<BR> add r0, pc, r0<BR> ldr r0, [r1, r0]<BR> ldr r0, [r0]<BR> bx r0<BR> .align 2<BR>.LCPI0_0:<BR> .long _GLOBAL_OFFSET_TABLE_-(.LPC0_0+8)<BR>.LCPI0_1:<BR> .long indirect_func(GOT)<BR>.Ltmp0:<BR> .size indirect_call, .Ltmp0-indirect_call<BR>.Leh_func_end0:<BR> .fnend</P> <P> </P> <P>with clang-3.4.2 the assembly generated is:</P> <P>ndirect_call:<BR> push {r11, lr}<BR> ldr r0, .LCPI0_0<BR> mov r11, sp<BR> ldr r1, .LCPI0_1<BR>.LPC0_0:<BR> add r0, pc, r0<BR> ldr r0, [r1, r0]<BR> ldr r0, [r0]<BR> blx r0<BR> pop {r11, pc}<BR> .align 2<BR>.LCPI0_0:<BR> .long _GLOBAL_OFFSET_TABLE_-(.LPC0_0+8)<BR>.LCPI0_1:<BR> .long indirect_func(GOT)<BR>.Ltmp0:<BR> .size indirect_call, .Ltmp0-indirect_call</P> <P> </P> <P>Both assemblies are generated with O1 optimization. The assembly generated with trunk version of clang is similar to 3.5</P> <P> </P> <P>Thanks,</P> <P>Mayur</P> <P> </P> <P>------- <B>Original Message</B> -------</P> <P><B>Sender</B> : Jonathan Roelofs<jonathan@codesourcery.com></P> <P><B>Date</B> : Nov 25, 2014 10:15 (GMT+09:00)</P> <P><B>Title</B> : Re: [LLVMdev] bx instruction getting generated in arm assembly for O1</P> <P> </P><BR><BR>On 11/24/14 8:00 AM, MAYUR PANDEY wrote:<BR>> Hi,<BR>><BR>> For the following test:<BR>><BR>> int (*indirect_func)();<BR>><BR>> int indirect_call()<BR>> {<BR>> return indirect_func();<BR>> }<BR>><BR>> when generating the assembly with clang-3.5, for -march=armv5te, there is a<BR>> difference in the assemblies generated with O0 and O1:<BR>><BR>> In the assembly generated with O0, we are getting the "blx" instruction whereas<BR>> with O1 we get "bx" (in 3.4.2 we used to get "blx" for both O0 and O1).<BR>Can you post the asm that you're seeing for this function?<BR><BR>There's a related case to this on armv4t which Iain has a patch for, that I <BR>think we forgot about... The problem there is that armv4t doesn't have blx at <BR>all, so should be generating a sequence like: 'mov r0, ...; bx _Ltmp; _Ltmp: bl r0'.<BR>><BR>> Is this because of this patch: [llvm] r214959 - ARM: do not generate BLX<BR>> instructions on Cortex-M CPUs<BR>I doubt it. armv5te isn't a cortex-m processor.<BR><BR><BR>Cheers,<BR><BR>Jon<BR>><BR>> Or I am missing something.<BR>><BR>> Thanks,<BR>><BR>> Mayur<BR>><BR>><BR>><BR>><BR>> _______________________________________________<BR>> LLVM Developers mailing list<BR>> LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu<BR>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev<BR>><BR><BR>-- <BR>Jon Roelofs<BR>jonathan@codesourcery.com<BR>CodeSourcery / Mentor Embedded<BR> <P> </P><!--SP:mayur.p--><!--mayur.p:EP--> <P> </P></BODY></HTML><img src='http://ext.samsung.net/mailcheck/SeenTimeChecker?do=e3e1a57eba32dd3d2f058a73fc979ba602f9d56712beed49f294c62af3385af4958dbbabe087ac9c0029dc535ebebcd3326bbdfb2ea96a2fcf878f9a26ce15a0' border=0 width=0 height=0 style='display:none'>
David Chisnall
2014-Nov-25 16:04 UTC
[LLVMdev] bx instruction getting generated in arm assembly for O1
Again, this looks correct - the only difference is that the first version is better optimised. r11 is spilled because it is not used to store the stack pointer. The following:> blx r0 > pop {r11, pc}Is restoring r11 and jumping to the saved link register (and adjusting the stack pointer: you've got to love AArch32 assembly, where a jump, stack pointer adjustment, and register reload is a single instruction). If r11 is not spilled, then we're left with: push lr ... blx r0 pop pc And this is equivalent to simply: bx r0 So, again, what is the bug that your test is testing for? Or are you just checking that clang 3.5 really is doing tail-call optimisation in trivial cases? David On 25 Nov 2014, at 07:42, MAYUR PANDEY <mayur.p at samsung.com> wrote:> Hi Jonathan, > The assembly generated in case of clang-3.5 is > indirect_call: > .fnstart > .Leh_func_begin0: > ldr r0, .LCPI0_0 > ldr r1, .LCPI0_1 > .LPC0_0: > add r0, pc, r0 > ldr r0, [r1, r0] > ldr r0, [r0] > bx r0 > .align 2 > .LCPI0_0: > .long _GLOBAL_OFFSET_TABLE_-(.LPC0_0+8) > .LCPI0_1: > .long indirect_func(GOT) > .Ltmp0: > .size indirect_call, .Ltmp0-indirect_call > .Leh_func_end0: > .fnend > > with clang-3.4.2 the assembly generated is: > ndirect_call: > push {r11, lr} > ldr r0, .LCPI0_0 > mov r11, sp > ldr r1, .LCPI0_1 > .LPC0_0: > add r0, pc, r0 > ldr r0, [r1, r0] > ldr r0, [r0] > blx r0 > pop {r11, pc} > .align 2 > .LCPI0_0: > .long _GLOBAL_OFFSET_TABLE_-(.LPC0_0+8) > .LCPI0_1: > .long indirect_func(GOT) > .Ltmp0: > .size indirect_call, .Ltmp0-indirect_call > > Both assemblies are generated with O1 optimization. The assembly generated with trunk version of clang is similar to 3.5 > > Thanks, > Mayur > > ------- Original Message ------- > Sender : Jonathan Roelofs<jonathan at codesourcery.com> > Date : Nov 25, 2014 10:15 (GMT+09:00) > Title : Re: [LLVMdev] bx instruction getting generated in arm assembly for O1 > > > > On 11/24/14 8:00 AM, MAYUR PANDEY wrote: > > Hi, > > > > For the following test: > > > > int (*indirect_func)(); > > > > int indirect_call() > > { > > return indirect_func(); > > } > > > > when generating the assembly with clang-3.5, for -march=armv5te, there is a > > difference in the assemblies generated with O0 and O1: > > > > In the assembly generated with O0, we are getting the "blx" instruction whereas > > with O1 we get "bx" (in 3.4.2 we used to get "blx" for both O0 and O1). > Can you post the asm that you're seeing for this function? > > There's a related case to this on armv4t which Iain has a patch for, that I > think we forgot about... The problem there is that armv4t doesn't have blx at > all, so should be generating a sequence like: 'mov r0, ...; bx _Ltmp; _Ltmp: bl r0'. > > > > Is this because of this patch: [llvm] r214959 - ARM: do not generate BLX > > instructions on Cortex-M CPUs > I doubt it. armv5te isn't a cortex-m processor. > > > Cheers, > > Jon > > > > Or I am missing something. > > > > Thanks, > > > > Mayur > > > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > Jon Roelofs > jonathan at codesourcery.com > CodeSourcery / Mentor Embedded > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev