thr3ads.net - llvm dev - [LLVMdev] .globl [Sep 2013]

If this information is useful, please help other people find it:
Share via:

reed kotler

2013-Aug-29 21:25 UTC

[LLVMdev] .globl

I need to be able to emit .globl for the soft float routines used by mips16.
The routines are called but there is no .globl definition for them.

How can I do this?

Background:

I have a strange issue that I encountered with mips16 hard float.

Part of mips16 hard float is to emit calls to runtime routines with the 
same signature as usual soft float routines, except that they are 
implemented using mips32 code which uses real floating point 
instructions (mips16 processor mode has no hardware floating point 
instructions).

These routines have the same names as the corresponding softfloat 
routines, except with the additional prefix __mips16_ . So for example, 
__mips16_floatsidf.

For these intrinsics, (and not others), gcc mips16 emits a .globl.

Without this .globl ( which llvm does not emit), then the program will 
run very slow if compiled in -fPIC and linked as C++. It seems to be 
stuck in the loader (probably doing dynamic binding over and over again).

I'm trying to understand why this happens but for now that's not 
important because it just works that way.

Tia.

Reed

Richard Sandiford

2013-Sep-02 10:29 UTC

head link

[LLVMdev] .globl

Hi Reed,

Still catching up on email, so hope this isn't already covered...

reed kotler <rkotler at mips.com> writes:> I have a strange issue that I encountered with mips16 hard float.
>
> Part of mips16 hard float is to emit calls to runtime routines with the 
> same signature as usual soft float routines, except that they are 
> implemented using mips32 code which uses real floating point 
> instructions (mips16 processor mode has no hardware floating point 
> instructions).
>
> These routines have the same names as the corresponding softfloat 
> routines, except with the additional prefix __mips16_ . So for example, 
> __mips16_floatsidf.
>
> For these intrinsics, (and not others), gcc mips16 emits a .globl.
>
> Without this .globl ( which llvm does not emit), then the program will 
> run very slow if compiled in -fPIC and linked as C++. It seems to be 
> stuck in the loader (probably doing dynamic binding over and over again).
This might or might not be related, but I notice that for the attached
testcase, LLVM emits:

	lui	$2, %hi(_gp_disp)
	addiu	$2, $2, %lo(_gp_disp)
	addiu	$sp, $sp, -32
$tmp2:
	.cfi_def_cfa_offset 32
	sw	$ra, 28($sp)            # 4-byte Folded Spill
	sw	$18, 24($sp)            # 4-byte Folded Spill
	sw	$17, 20($sp)            # 4-byte Folded Spill
	sw	$16, 16($sp)            # 4-byte Folded Spill
$tmp3:
	.cfi_offset 31, -4
$tmp4:
	.cfi_offset 18, -8
$tmp5:
	.cfi_offset 17, -12
$tmp6:
	.cfi_offset 16, -16
	addu	$16, $2, $25
	move	$17, $4
	lw	$18, %call16(foo)($16)
$BB0_1:                                 # %loop
                                        # =>This Inner Loop Header: Depth=1
	move	$25, $18
	jalr	$25
	move	$gp, $16
	addiu	$17, $17, -1
	bnez	$17, $BB0_1
	nop
# BB#2:                                 # %exit
	lw	$16, 16($sp)            # 4-byte Folded Reload
	lw	$17, 20($sp)            # 4-byte Folded Reload
	lw	$18, 24($sp)            # 4-byte Folded Reload
	lw	$ra, 28($sp)            # 4-byte Folded Reload
	jr	$ra
	addiu	$sp, $sp, 32

where the %call16 is hoisted out of the loop.  It really needs to be
kept inside the loop and loaded for each iteration.  The same goes for
consecutive calls to the same function; the second call needs to load
%call16 separately, after the first call has finished.

As things stand, if foo() hasn't been bound by the time the function
above is entered, $18 will contain the address of the lazy binding stub,
and so the loop will try to resolve foo on every iteration.  That's
usually what's happened for me when a testcase gets bogged down in
the dynamic linker.

Maybe the lack of .globl is preventing the function from being resolved
lazily, and so avoids this kind of problem?

Does removing the .globls from the GCC asm output make any difference?
Or is it just that adding them to LLVM output makes a difference?

Thanks,
Richard

-------------- next part --------------
A non-text attachment was scrubbed...
Name: foo.ll
Type: application/octet-stream
Size: 305 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130902/a31f9544/attachment.obj>

reed kotler

2013-Sep-03 19:05 UTC

head link

[LLVMdev] .globl

You the man!

Nice catch.

That make total sense.

As you said, .global might prevent the symbol from participating in lazy 
binding but I need to investigate this issue thoroughly.

http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00975.html

Reed
On 09/02/2013 03:29 AM, Richard Sandiford wrote:> Hi Reed,
>
> Still catching up on email, so hope this isn't already covered...
>
> reed kotler<rkotler at mips.com>  writes:
>> I have a strange issue that I encountered with mips16 hard float.
>>
>> Part of mips16 hard float is to emit calls to runtime routines with the
>> same signature as usual soft float routines, except that they are
>> implemented using mips32 code which uses real floating point
>> instructions (mips16 processor mode has no hardware floating point
>> instructions).
>>
>> These routines have the same names as the corresponding softfloat
>> routines, except with the additional prefix __mips16_ . So for example,
>> __mips16_floatsidf.
>>
>> For these intrinsics, (and not others), gcc mips16 emits a .globl.
>>
>> Without this .globl ( which llvm does not emit), then the program will
>> run very slow if compiled in -fPIC and linked as C++. It seems to be
>> stuck in the loader (probably doing dynamic binding over and over
again).
> This might or might not be related, but I notice that for the attached
> testcase, LLVM emits:
>
> 	lui	$2, %hi(_gp_disp)
> 	addiu	$2, $2, %lo(_gp_disp)
> 	addiu	$sp, $sp, -32
> $tmp2:
> 	.cfi_def_cfa_offset 32
> 	sw	$ra, 28($sp)            # 4-byte Folded Spill
> 	sw	$18, 24($sp)            # 4-byte Folded Spill
> 	sw	$17, 20($sp)            # 4-byte Folded Spill
> 	sw	$16, 16($sp)            # 4-byte Folded Spill
> $tmp3:
> 	.cfi_offset 31, -4
> $tmp4:
> 	.cfi_offset 18, -8
> $tmp5:
> 	.cfi_offset 17, -12
> $tmp6:
> 	.cfi_offset 16, -16
> 	addu	$16, $2, $25
> 	move	$17, $4
> 	lw	$18, %call16(foo)($16)
> $BB0_1:                                 # %loop
>                                          # =>This Inner Loop Header:
Depth=1
> 	move	$25, $18
> 	jalr	$25
> 	move	$gp, $16
> 	addiu	$17, $17, -1
> 	bnez	$17, $BB0_1
> 	nop
> # BB#2:                                 # %exit
> 	lw	$16, 16($sp)            # 4-byte Folded Reload
> 	lw	$17, 20($sp)            # 4-byte Folded Reload
> 	lw	$18, 24($sp)            # 4-byte Folded Reload
> 	lw	$ra, 28($sp)            # 4-byte Folded Reload
> 	jr	$ra
> 	addiu	$sp, $sp, 32
>
> where the %call16 is hoisted out of the loop.  It really needs to be
> kept inside the loop and loaded for each iteration.  The same goes for
> consecutive calls to the same function; the second call needs to load
> %call16 separately, after the first call has finished.
>
> As things stand, if foo() hasn't been bound by the time the function
> above is entered, $18 will contain the address of the lazy binding stub,
> and so the loop will try to resolve foo on every iteration.  That's
> usually what's happened for me when a testcase gets bogged down in
> the dynamic linker.
>
> Maybe the lack of .globl is preventing the function from being resolved
> lazily, and so avoids this kind of problem?
>
> Does removing the .globls from the GCC asm output make any difference?
> Or is it just that adding them to LLVM output makes a difference?
>
> Thanks,
> Richard
>

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Sep 2013 - [LLVMdev] .globl

[LLVMdev] .globl

[LLVMdev] .globl

[LLVMdev] .globl

Seemingly Similar Threads