thr3ads.net - llvm dev - [LLVMdev] FP emulation [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Roman Levenstein

2006-Oct-09 22:18 UTC

[LLVMdev] FP emulation

Hi,

I'm now ready to implement the FP support for my embedded target. 

My target supports only f64 at the moment.
Question: How can I tell LLVM that float is the same as double on my
target? May be by assigning the same register class to both MVT::f32
and MVT::f64?

But FP is supported only in the emulated mode, because the target does
not have any hardware support for FP. Therefore each FP operation is
supposed to be converted into a call of an assembler function
implementing a corresponding operation. All these FP operations
implemented in assembler always expect parameters on concrete f64
registers, i.e. %d0,%d1 and return their results in reg %d0. The value
of %d1 is clobbered by such calls. (actually %dX are pseudo regs, see
below).

1. Since these FP emulation functions takes operands on registers and
produce operands on registers without any further side-effects, they
look pretty much like real instructions. Thus I have the idea to
represent them in the tblgen instruction descriptions like
pseudo-instructions, where constraints define which concrete physical
%dX registers are to use. This would enfore correct register
allocation.

For example:
def FSUB64: I<0x11, (ops), "fsub64", [(set d0, (fsub d0, d1))]>,
           Imp<[d0,d1],[d0,d1]>; // Uses d0, d1 and defines d0,d1 

This seems to work, at least on simple test files. 
            
But I would also need a way to convert such a FSUB64 pseudo-instruction
into the assembler function call, e.g. "call __fsub64". At the moment
I
don't quite understand at which stage and how I should do it (lowering,
selection, combining??? ). What would be the easiest way to map it to
such a call instruction?

One issue with the described approach is a pretty inefficient code
resulting after the register allocation. For example, there are a lot
of instructions of the form "mov %d0, %d0", copying the register into
itself. My guess is that the following happens:
 before reg.alloc there are instructions of the form:
 mov %virtual_reg0, %d0 
 mov %virtual_reg1, %d1 
 fsub64
which ensure that operand constraints of the operation are fullfilled
and they are on the right registers. During the alloction register
allocator assigns the same physical register to the virtual register.
Therefore the code becomes:
 mov %d0, %d0 
 mov %d1, %d1 
 fsub64

But then there is no call to "useless copies elimination" pass or
peephole pass that would basically remove such copies. 

Question: Is there such a pass available in LLVM? Actually, it is also
interesting to know, why the regalloc does not eliminate such coalesced
moves itself? Wouldn't it make sense?

Does this idea of representing the emulated FP operation calls as
instructions as described above make some sense? Or do you see easier
or more useful ways to do it?

2. In reality, the processor has only 32bit regs. Therefore, any f64
value should be mapped to two 32bit registers. What is the best way to
achieve it? I guess this is a well-known kind of problem.

So far I was thinking about introducing some pseudo f64 registers, i.e.
%dX used above, and working with them in the instruction descriptions.
And then at the later stages, probably after lowering and selection,
expand them into pairs of load or store operations. 

But I'm not quite sure that this is a right way to go. I suspect that
something can be done using some form of EXPAND operation in the
lowering pass. For example, I see that assignments of f64 immediates to
globals is expanded by LLVM automatically into two 32bit stores, which
is very nice. May be it is also possible to do it for 64bit registers
as well?

OK, enough questions for today ;)

Thanks for any feedback, 
 Roman


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Chris Lattner

2006-Oct-09 22:55 UTC

head link

[LLVMdev] FP emulation

On Mon, 9 Oct 2006, Roman Levenstein wrote:> I'm now ready to implement the FP support for my embedded target.
cool.
> My target supports only f64 at the moment.
> Question: How can I tell LLVM that float is the same as double on my
> target? May be by assigning the same register class to both MVT::f32
> and MVT::f64?
Just don't assign a register class for the f32 type.  This is what the X86 
backend does when it is in "floating point stack mode".  This will 
implicitly promote everything to f64.
> But FP is supported only in the emulated mode, because the target does
> not have any hardware support for FP. Therefore each FP operation is
> supposed to be converted into a call of an assembler function
> implementing a corresponding operation.
Ok.
> All these FP operations
> implemented in assembler always expect parameters on concrete f64
> registers, i.e. %d0,%d1 and return their results in reg %d0. The value
> of %d1 is clobbered by such calls. (actually %dX are pseudo regs, see
> below).
Ok.
> 1. Since these FP emulation functions takes operands on registers and
> produce operands on registers without any further side-effects, they
> look pretty much like real instructions. Thus I have the idea to
> represent them in the tblgen instruction descriptions like
> pseudo-instructions, where constraints define which concrete physical
> %dX registers are to use. This would enfore correct register
> allocation.
>
> For example:
> def FSUB64: I<0x11, (ops), "fsub64", [(set d0, (fsub d0,
d1))]>,
>           Imp<[d0,d1],[d0,d1]>; // Uses d0, d1 and defines d0,d1
>
> This seems to work, at least on simple test files.
That should be a robust solution.
> But I would also need a way to convert such a FSUB64 pseudo-instruction
> into the assembler function call, e.g. "call __fsub64". At the
moment I
> don't quite understand at which stage and how I should do it (lowering,
> selection, combining??? ). What would be the easiest way to map it to
> such a call instruction?
Why not just make the asm string be "call __fsub64"?
> One issue with the described approach is a pretty inefficient code
> resulting after the register allocation. For example, there are a lot
> of instructions of the form "mov %d0, %d0", copying the register
into
> itself. My guess is that the following happens:
Make sure to implement TargetInstrInfo::isMoveInstr.  This will allow the 
coallescer to eliminate these.
> before reg.alloc there are instructions of the form:
> mov %virtual_reg0, %d0
> mov %virtual_reg1, %d1
> fsub64
> which ensure that operand constraints of the operation are fullfilled
> and they are on the right registers. During the alloction register
> allocator assigns the same physical register to the virtual register.
> Therefore the code becomes:
> mov %d0, %d0
> mov %d1, %d1
> fsub64
>
> But then there is no call to "useless copies elimination" pass or
> peephole pass that would basically remove such copies.
Yep.
> Question: Is there such a pass available in LLVM? Actually, it is also
> interesting to know, why the regalloc does not eliminate such coalesced
> moves itself? Wouldn't it make sense?
The coallescer does, please implement isMoveInstr.
> Does this idea of representing the emulated FP operation calls as
> instructions as described above make some sense? Or do you see easier
> or more useful ways to do it?
That is a reasonable way to do it.  Another reasonable way would be to 
lower them in the instruction selector itself though the use of custom 
expanders.  In practice, using instructions with "call foo" in them 
instead of lowering to calls may be simpler.  Also, if you *know* that 
these calls don't clobber the normal set of callee clobbered registers, 
using the asm string is the right way to go.
> 2. In reality, the processor has only 32bit regs. Therefore, any f64
> value should be mapped to two 32bit registers. What is the best way to
> achieve it? I guess this is a well-known kind of problem.
Ah, this is trickier. :)  We have a robust solution in the integer side, 
but don't allow the FP side to use it.

For the time being, I'd suggest defining an "fp register set"
which just
aliases the integer register set (i.e. say that d0 overlaps r0+r1).
> So far I was thinking about introducing some pseudo f64 registers, i.e.
> %dX used above, and working with them in the instruction descriptions.
> And then at the later stages, probably after lowering and selection,
> expand them into pairs of load or store operations.
If you tell the register allocator about the "aliases", it should do
the
right thing for you.  Take a look at how aliasing in the X86 register set 
is handled in X86RegisterInfo.td.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Roman Levenstein

2006-Oct-10 08:30 UTC

head link

[LLVMdev] FP emulation

Hi,
>> My target supports only f64 at the moment.
>> Question: How can I tell LLVM that float is the same as double on my
>> target? May be by assigning the same register class to both MVT::f32?> and MVT::f64?
>Just don't assign a register class for the f32 type.  This is what the
>X86 backend does when it is in "floating point stack mode".  This
will
>implicitly promote everything to f64.
OK. This is even easier than I expected :)

> > 1. Since these FP emulation functions takes operands on registers
> and
> > produce operands on registers without any further side-effects,
> they
> > look pretty much like real instructions. Thus I have the idea to
> > represent them in the tblgen instruction descriptions like
> > pseudo-instructions, where constraints define which concrete
> physical
> > %dX registers are to use. This would enfore correct register
> > allocation.
> >
> > For example:
> > def FSUB64: I<0x11, (ops), "fsub64", [(set d0, (fsub d0,
d1))]>,
> >           Imp<[d0,d1],[d0,d1]>; // Uses d0, d1 and defines d0,d1
> >
> > This seems to work, at least on simple test files.
> 
> That should be a robust solution.
> 
> > But I would also need a way to convert such a FSUB64
> pseudo-instruction
> > into the assembler function call, e.g. "call __fsub64". At
the
> moment I
> > don't quite understand at which stage and how I should do it
> (lowering,
> > selection, combining??? ). What would be the easiest way to map it
> to
> > such a call instruction?
> 
> Why not just make the asm string be "call __fsub64"?
Well, of course it would be the best solution. But the interesting part
is that I need to generate the machine code directly because for
different reasons use of a system assembler is not an option. As a
result, I need to do this conversion in the target backend and later
generate object code directly. But when and how this conversion "fsub64
insn -> call __fsub64" insn should be done? What is your advice?
 > > One issue with the described approach is a pretty inefficient code
> > resulting after the register allocation. For example, there are a
> lot
> > of instructions of the form "mov %d0, %d0", copying the
register
> into
> > itself. My guess is that the following happens:
 > Make sure to implement TargetInstrInfo::isMoveInstr.  This will allow
> the  coallescer to eliminate these.
Very good point. Now I see how to improve this part.
 > > Does this idea of representing the emulated FP operation calls as
> > instructions as described above make some sense? Or do you see
> easier
> > or more useful ways to do it?
> 
> That is a reasonable way to do it.  Another reasonable way would be
> to  lower them in the instruction selector itself though the use of
> custom  expanders.  In practice, using instructions with "call
foo"in > them  instead of lowering to calls may be simpler.  

Hmm, let me see. Just to check that I understand your proposal
correctly:
You mean I don't need to define any FP operations as machine
instructions at all. Instead, I basically tell that I will expand all
FP operations myself and then I simply expand them into the following
sequence of instructions:
  mov arg1, %d0 // enfore register constraint
  mov arg2, %d1 // enfore register constraint
  call __fsub64

Is it correct understanding? If yes, how do I explain that arguments
are to be passed on the concrete physical registers like %d0 and %d1
and result comes on %d0? Do I need to allocate virtual regs for them
and pre-assign physical regs somehow?

Or my be I have to define a new calling convention that would enforce
it?
Actually, how can this be done with LLVM? I mean, if I want to
introduce a new calling convention, what do I need to do in backend to
define and register it? Is it required to change the frontend to make
it visible at the source program level?
>Also, if you *know*
> that these calls don't clobber the normal set of callee clobbered
> registers,  using the asm string is the right way to go.

 > > 2. In reality, the processor has only 32bit regs. Therefore, any
>> f64 value should be mapped to two 32bit registers. What is the best 
>> way to achieve it? I guess this is a well-known kind of problem.
>  Ah, this is trickier. :)  We have a robust solution in the integer
> side,  but don't allow the FP side to use it.
> 
> For the time being, I'd suggest defining an "fp register set"
which
> just aliases the integer register set (i.e. say that d0 overlaps 
> r0+r1).
OK. I almost did this way already. But I introduced two FP register
sets. One for fp32 (for the future) and one for fp64. fp32 aliases the
integer register set. fp64 aliases the fp32 register set, but not the
integer register set explicitly. I thought that aliases are transitive?
Or do I have to mention all aliases explicitly, e.g. for %d0 I need to
say [%s0,%s1,%GR0,%GR1]?

But a more interesting question is this:
The scheme above assumes that there is a "hardwired" mapping between
FP
registers and concerete pairs of integer registers. In many cases this
is enough, since the emulated operations indeed expect parameters on
predefined pairs of 32bit integer registers. But when it comes to other
uses of FP registers (mainly for storing some values) there is no this
limitation that a concrete pair of integer registers should be used.
Actually, any combination of two 32bit integer registers would do. How
this can be modelled and represented to regalloc, if at all? One guess
it to define one FP reg for each possible combination of two integer
registers, which would lead to definition N*(N-1) FP registers, where N
is the number of integer registers (And I have only 8 integer regs).
But it seems to be not very elegant for my taste,or?
 > > So far I was thinking about introducing some pseudo f64 registers,
> i.e.
> > %dX used above, and working with them in the instruction
> descriptions.
> > And then at the later stages, probably after lowering and
> selection,
> > expand them into pairs of load or store operations.
> 
> If you tell the register allocator about the "aliases", it should
do
> the right thing for you.  Take a look at how aliasing in the X86 
> register set is handled in X86RegisterInfo.td.
Can you elaborate a bit? Does it mean that I don't need to define fp64
loads from memory or fp64 stores to memory and reg<-reg tranfers for
64bit ops, because all that will be done automatically using pairs of
32bit instructions? So far, I had the impression I need to use fp64
regs in the instruction descriptions explicitly. But in this case
generated selected instructions operation on these 64bit regs and there
is a problem how to expand them into pairs of 32bit instructions.

-Roman


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Oct 2006 - [LLVMdev] FP emulation

[LLVMdev] FP emulation

[LLVMdev] FP emulation

[LLVMdev] FP emulation

Reasonably Related Threads