MAYUR PANDEY
2014-Nov-25 07:42 UTC
[LLVMdev] bx instruction getting generated in arm assembly for O1
<HTML><HEAD><TITLE>Samsung Enterprise Portal
mySingle</TITLE>
<META content=IE=5 http-equiv=X-UA-Compatible>
<META content="text/html; charset=utf-8"
http-equiv=Content-Type>
<STYLE id=mysingle_style type=text/css>P {
MARGIN-BOTTOM: 5px; FONT-SIZE: 9pt; FONT-FAMILY: Arial, arial;
MARGIN-TOP: 5px
}
TD {
MARGIN-BOTTOM: 5px; FONT-SIZE: 9pt; FONT-FAMILY: Arial, arial;
MARGIN-TOP: 5px
}
LI {
MARGIN-BOTTOM: 5px; FONT-SIZE: 9pt; FONT-FAMILY: Arial, arial;
MARGIN-TOP: 5px
}
BODY {
FONT-SIZE: 9pt; FONT-FAMILY: Arial, arial; MARGIN: 10px; LINE-HEIGHT:
1.4
}
</STYLE>
<META name=GENERATOR content=ActiveSquare></HEAD>
<BODY>
<META name=GENERATOR content=ActiveSquare>
<P>Hi Jonathan,</P>
<P>The assembly generated in case of clang-3.5 is</P>
<P>indirect_call:<BR> .fnstart<BR>.Leh_func_begin0:<BR>
ldr r0, .LCPI0_0<BR> ldr r1, .LCPI0_1<BR>.LPC0_0:<BR> add r0,
pc, r0<BR> ldr r0, [r1, r0]<BR> ldr r0, [r0]<BR> bx
r0<BR> .align 2<BR>.LCPI0_0:<BR> .long
_GLOBAL_OFFSET_TABLE_-(.LPC0_0+8)<BR>.LCPI0_1:<BR> .long
indirect_func(GOT)<BR>.Ltmp0:<BR> .size indirect_call,
.Ltmp0-indirect_call<BR>.Leh_func_end0:<BR> .fnend</P>
<P> </P>
<P>with clang-3.4.2 the assembly generated is:</P>
<P>ndirect_call:<BR> push {r11, lr}<BR> ldr r0,
.LCPI0_0<BR> mov r11, sp<BR> ldr r1,
.LCPI0_1<BR>.LPC0_0:<BR> add r0, pc, r0<BR> ldr r0, [r1,
r0]<BR> ldr r0, [r0]<BR> blx r0<BR> pop {r11, pc}<BR>
.align 2<BR>.LCPI0_0:<BR> .long
_GLOBAL_OFFSET_TABLE_-(.LPC0_0+8)<BR>.LCPI0_1:<BR> .long
indirect_func(GOT)<BR>.Ltmp0:<BR> .size indirect_call,
.Ltmp0-indirect_call</P>
<P> </P>
<P>Both assemblies are generated with O1 optimization. The assembly
generated with trunk version of clang is similar to 3.5</P>
<P> </P>
<P>Thanks,</P>
<P>Mayur</P>
<P> </P>
<P>------- <B>Original Message</B> -------</P>
<P><B>Sender</B> : Jonathan
Roelofs<jonathan@codesourcery.com></P>
<P><B>Date</B> : Nov 25, 2014 10:15 (GMT+09:00)</P>
<P><B>Title</B> : Re: [LLVMdev] bx instruction getting
generated in arm assembly for O1</P>
<P> </P><BR><BR>On 11/24/14 8:00 AM, MAYUR PANDEY
wrote:<BR>> Hi,<BR>><BR>> For the following
test:<BR>><BR>> int
(*indirect_func)();<BR>><BR>> int
indirect_call()<BR>> {<BR>> return
indirect_func();<BR>> }<BR>><BR>> when generating the
assembly with clang-3.5, for -march=armv5te, there is a<BR>>
difference in the assemblies generated with O0 and
O1:<BR>><BR>> In the assembly generated with O0, we are
getting the "blx" instruction whereas<BR>> with O1 we get
"bx" (in 3.4.2 we used to get "blx" for both O0 and
O1).<BR>Can you post the asm that you're seeing for this
function?<BR><BR>There's a related case to this on armv4t which
Iain has a patch for, that I <BR>think we forgot about... The problem
there is that armv4t doesn't have blx at <BR>all, so should be
generating a sequence like: 'mov r0, ...; bx _Ltmp; _Ltmp: bl
r0'.<BR>><BR>> Is this because of this patch: [llvm]
r214959 - ARM: do not generate BLX<BR>> instructions on Cortex-M
CPUs<BR>I doubt it. armv5te isn't a cortex-m
processor.<BR><BR><BR>Cheers,<BR><BR>Jon<BR>><BR>>
Or I am missing something.<BR>><BR>>
Thanks,<BR>><BR>>
Mayur<BR>><BR>><BR>><BR>><BR>>
_______________________________________________<BR>> LLVM Developers
mailing list<BR>> LLVMdev@cs.uiuc.edu
http://llvm.cs.uiuc.edu<BR>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev<BR>><BR><BR>--
<BR>Jon Roelofs<BR>jonathan@codesourcery.com<BR>CodeSourcery /
Mentor Embedded<BR>
<P> </P><!--SP:mayur.p--><!--mayur.p:EP-->
<P> </P></BODY></HTML><img
src='http://ext.samsung.net/mailcheck/SeenTimeChecker?do=e3e1a57eba32dd3d2f058a73fc979ba602f9d56712beed49f294c62af3385af4958dbbabe087ac9c0029dc535ebebcd3326bbdfb2ea96a2fcf878f9a26ce15a0'
border=0 width=0 height=0 style='display:none'>
David Chisnall
2014-Nov-25 16:04 UTC
[LLVMdev] bx instruction getting generated in arm assembly for O1
Again, this looks correct - the only difference is that the first version is better optimised. r11 is spilled because it is not used to store the stack pointer. The following:> blx r0 > pop {r11, pc}Is restoring r11 and jumping to the saved link register (and adjusting the stack pointer: you've got to love AArch32 assembly, where a jump, stack pointer adjustment, and register reload is a single instruction). If r11 is not spilled, then we're left with: push lr ... blx r0 pop pc And this is equivalent to simply: bx r0 So, again, what is the bug that your test is testing for? Or are you just checking that clang 3.5 really is doing tail-call optimisation in trivial cases? David On 25 Nov 2014, at 07:42, MAYUR PANDEY <mayur.p at samsung.com> wrote:> Hi Jonathan, > The assembly generated in case of clang-3.5 is > indirect_call: > .fnstart > .Leh_func_begin0: > ldr r0, .LCPI0_0 > ldr r1, .LCPI0_1 > .LPC0_0: > add r0, pc, r0 > ldr r0, [r1, r0] > ldr r0, [r0] > bx r0 > .align 2 > .LCPI0_0: > .long _GLOBAL_OFFSET_TABLE_-(.LPC0_0+8) > .LCPI0_1: > .long indirect_func(GOT) > .Ltmp0: > .size indirect_call, .Ltmp0-indirect_call > .Leh_func_end0: > .fnend > > with clang-3.4.2 the assembly generated is: > ndirect_call: > push {r11, lr} > ldr r0, .LCPI0_0 > mov r11, sp > ldr r1, .LCPI0_1 > .LPC0_0: > add r0, pc, r0 > ldr r0, [r1, r0] > ldr r0, [r0] > blx r0 > pop {r11, pc} > .align 2 > .LCPI0_0: > .long _GLOBAL_OFFSET_TABLE_-(.LPC0_0+8) > .LCPI0_1: > .long indirect_func(GOT) > .Ltmp0: > .size indirect_call, .Ltmp0-indirect_call > > Both assemblies are generated with O1 optimization. The assembly generated with trunk version of clang is similar to 3.5 > > Thanks, > Mayur > > ------- Original Message ------- > Sender : Jonathan Roelofs<jonathan at codesourcery.com> > Date : Nov 25, 2014 10:15 (GMT+09:00) > Title : Re: [LLVMdev] bx instruction getting generated in arm assembly for O1 > > > > On 11/24/14 8:00 AM, MAYUR PANDEY wrote: > > Hi, > > > > For the following test: > > > > int (*indirect_func)(); > > > > int indirect_call() > > { > > return indirect_func(); > > } > > > > when generating the assembly with clang-3.5, for -march=armv5te, there is a > > difference in the assemblies generated with O0 and O1: > > > > In the assembly generated with O0, we are getting the "blx" instruction whereas > > with O1 we get "bx" (in 3.4.2 we used to get "blx" for both O0 and O1). > Can you post the asm that you're seeing for this function? > > There's a related case to this on armv4t which Iain has a patch for, that I > think we forgot about... The problem there is that armv4t doesn't have blx at > all, so should be generating a sequence like: 'mov r0, ...; bx _Ltmp; _Ltmp: bl r0'. > > > > Is this because of this patch: [llvm] r214959 - ARM: do not generate BLX > > instructions on Cortex-M CPUs > I doubt it. armv5te isn't a cortex-m processor. > > > Cheers, > > Jon > > > > Or I am missing something. > > > > Thanks, > > > > Mayur > > > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > Jon Roelofs > jonathan at codesourcery.com > CodeSourcery / Mentor Embedded > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev