I need to be able to emit .globl for the soft float routines used by mips16. The routines are called but there is no .globl definition for them. How can I do this? Background: I have a strange issue that I encountered with mips16 hard float. Part of mips16 hard float is to emit calls to runtime routines with the same signature as usual soft float routines, except that they are implemented using mips32 code which uses real floating point instructions (mips16 processor mode has no hardware floating point instructions). These routines have the same names as the corresponding softfloat routines, except with the additional prefix __mips16_ . So for example, __mips16_floatsidf. For these intrinsics, (and not others), gcc mips16 emits a .globl. Without this .globl ( which llvm does not emit), then the program will run very slow if compiled in -fPIC and linked as C++. It seems to be stuck in the loader (probably doing dynamic binding over and over again). I'm trying to understand why this happens but for now that's not important because it just works that way. Tia. Reed
Hi Reed, Still catching up on email, so hope this isn't already covered... reed kotler <rkotler at mips.com> writes:> I have a strange issue that I encountered with mips16 hard float. > > Part of mips16 hard float is to emit calls to runtime routines with the > same signature as usual soft float routines, except that they are > implemented using mips32 code which uses real floating point > instructions (mips16 processor mode has no hardware floating point > instructions). > > These routines have the same names as the corresponding softfloat > routines, except with the additional prefix __mips16_ . So for example, > __mips16_floatsidf. > > For these intrinsics, (and not others), gcc mips16 emits a .globl. > > Without this .globl ( which llvm does not emit), then the program will > run very slow if compiled in -fPIC and linked as C++. It seems to be > stuck in the loader (probably doing dynamic binding over and over again).This might or might not be related, but I notice that for the attached testcase, LLVM emits: lui $2, %hi(_gp_disp) addiu $2, $2, %lo(_gp_disp) addiu $sp, $sp, -32 $tmp2: .cfi_def_cfa_offset 32 sw $ra, 28($sp) # 4-byte Folded Spill sw $18, 24($sp) # 4-byte Folded Spill sw $17, 20($sp) # 4-byte Folded Spill sw $16, 16($sp) # 4-byte Folded Spill $tmp3: .cfi_offset 31, -4 $tmp4: .cfi_offset 18, -8 $tmp5: .cfi_offset 17, -12 $tmp6: .cfi_offset 16, -16 addu $16, $2, $25 move $17, $4 lw $18, %call16(foo)($16) $BB0_1: # %loop # =>This Inner Loop Header: Depth=1 move $25, $18 jalr $25 move $gp, $16 addiu $17, $17, -1 bnez $17, $BB0_1 nop # BB#2: # %exit lw $16, 16($sp) # 4-byte Folded Reload lw $17, 20($sp) # 4-byte Folded Reload lw $18, 24($sp) # 4-byte Folded Reload lw $ra, 28($sp) # 4-byte Folded Reload jr $ra addiu $sp, $sp, 32 where the %call16 is hoisted out of the loop. It really needs to be kept inside the loop and loaded for each iteration. The same goes for consecutive calls to the same function; the second call needs to load %call16 separately, after the first call has finished. As things stand, if foo() hasn't been bound by the time the function above is entered, $18 will contain the address of the lazy binding stub, and so the loop will try to resolve foo on every iteration. That's usually what's happened for me when a testcase gets bogged down in the dynamic linker. Maybe the lack of .globl is preventing the function from being resolved lazily, and so avoids this kind of problem? Does removing the .globls from the GCC asm output make any difference? Or is it just that adding them to LLVM output makes a difference? Thanks, Richard -------------- next part -------------- A non-text attachment was scrubbed... Name: foo.ll Type: application/octet-stream Size: 305 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130902/a31f9544/attachment.obj>
You the man! Nice catch. That make total sense. As you said, .global might prevent the symbol from participating in lazy binding but I need to investigate this issue thoroughly. http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00975.html Reed On 09/02/2013 03:29 AM, Richard Sandiford wrote:> Hi Reed, > > Still catching up on email, so hope this isn't already covered... > > reed kotler<rkotler at mips.com> writes: >> I have a strange issue that I encountered with mips16 hard float. >> >> Part of mips16 hard float is to emit calls to runtime routines with the >> same signature as usual soft float routines, except that they are >> implemented using mips32 code which uses real floating point >> instructions (mips16 processor mode has no hardware floating point >> instructions). >> >> These routines have the same names as the corresponding softfloat >> routines, except with the additional prefix __mips16_ . So for example, >> __mips16_floatsidf. >> >> For these intrinsics, (and not others), gcc mips16 emits a .globl. >> >> Without this .globl ( which llvm does not emit), then the program will >> run very slow if compiled in -fPIC and linked as C++. It seems to be >> stuck in the loader (probably doing dynamic binding over and over again). > This might or might not be related, but I notice that for the attached > testcase, LLVM emits: > > lui $2, %hi(_gp_disp) > addiu $2, $2, %lo(_gp_disp) > addiu $sp, $sp, -32 > $tmp2: > .cfi_def_cfa_offset 32 > sw $ra, 28($sp) # 4-byte Folded Spill > sw $18, 24($sp) # 4-byte Folded Spill > sw $17, 20($sp) # 4-byte Folded Spill > sw $16, 16($sp) # 4-byte Folded Spill > $tmp3: > .cfi_offset 31, -4 > $tmp4: > .cfi_offset 18, -8 > $tmp5: > .cfi_offset 17, -12 > $tmp6: > .cfi_offset 16, -16 > addu $16, $2, $25 > move $17, $4 > lw $18, %call16(foo)($16) > $BB0_1: # %loop > # =>This Inner Loop Header: Depth=1 > move $25, $18 > jalr $25 > move $gp, $16 > addiu $17, $17, -1 > bnez $17, $BB0_1 > nop > # BB#2: # %exit > lw $16, 16($sp) # 4-byte Folded Reload > lw $17, 20($sp) # 4-byte Folded Reload > lw $18, 24($sp) # 4-byte Folded Reload > lw $ra, 28($sp) # 4-byte Folded Reload > jr $ra > addiu $sp, $sp, 32 > > where the %call16 is hoisted out of the loop. It really needs to be > kept inside the loop and loaded for each iteration. The same goes for > consecutive calls to the same function; the second call needs to load > %call16 separately, after the first call has finished. > > As things stand, if foo() hasn't been bound by the time the function > above is entered, $18 will contain the address of the lazy binding stub, > and so the loop will try to resolve foo on every iteration. That's > usually what's happened for me when a testcase gets bogged down in > the dynamic linker. > > Maybe the lack of .globl is preventing the function from being resolved > lazily, and so avoids this kind of problem? > > Does removing the .globls from the GCC asm output make any difference? > Or is it just that adding them to LLVM output makes a difference? > > Thanks, > Richard >