thr3ads.net - llvm dev - [LLVMdev] FP emulation [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Roman Levenstein

2006-Oct-10 08:30 UTC

[LLVMdev] FP emulation

Hi,
>> My target supports only f64 at the moment.
>> Question: How can I tell LLVM that float is the same as double on my
>> target? May be by assigning the same register class to both MVT::f32?> and MVT::f64?
>Just don't assign a register class for the f32 type.  This is what the
>X86 backend does when it is in "floating point stack mode".  This
will
>implicitly promote everything to f64.
OK. This is even easier than I expected :)

> > 1. Since these FP emulation functions takes operands on registers
> and
> > produce operands on registers without any further side-effects,
> they
> > look pretty much like real instructions. Thus I have the idea to
> > represent them in the tblgen instruction descriptions like
> > pseudo-instructions, where constraints define which concrete
> physical
> > %dX registers are to use. This would enfore correct register
> > allocation.
> >
> > For example:
> > def FSUB64: I<0x11, (ops), "fsub64", [(set d0, (fsub d0,
d1))]>,
> >           Imp<[d0,d1],[d0,d1]>; // Uses d0, d1 and defines d0,d1
> >
> > This seems to work, at least on simple test files.
> 
> That should be a robust solution.
> 
> > But I would also need a way to convert such a FSUB64
> pseudo-instruction
> > into the assembler function call, e.g. "call __fsub64". At
the
> moment I
> > don't quite understand at which stage and how I should do it
> (lowering,
> > selection, combining??? ). What would be the easiest way to map it
> to
> > such a call instruction?
> 
> Why not just make the asm string be "call __fsub64"?
Well, of course it would be the best solution. But the interesting part
is that I need to generate the machine code directly because for
different reasons use of a system assembler is not an option. As a
result, I need to do this conversion in the target backend and later
generate object code directly. But when and how this conversion "fsub64
insn -> call __fsub64" insn should be done? What is your advice?
 > > One issue with the described approach is a pretty inefficient code
> > resulting after the register allocation. For example, there are a
> lot
> > of instructions of the form "mov %d0, %d0", copying the
register
> into
> > itself. My guess is that the following happens:
 > Make sure to implement TargetInstrInfo::isMoveInstr.  This will allow
> the  coallescer to eliminate these.
Very good point. Now I see how to improve this part.
 > > Does this idea of representing the emulated FP operation calls as
> > instructions as described above make some sense? Or do you see
> easier
> > or more useful ways to do it?
> 
> That is a reasonable way to do it.  Another reasonable way would be
> to  lower them in the instruction selector itself though the use of
> custom  expanders.  In practice, using instructions with "call
foo"in > them  instead of lowering to calls may be simpler.  

Hmm, let me see. Just to check that I understand your proposal
correctly:
You mean I don't need to define any FP operations as machine
instructions at all. Instead, I basically tell that I will expand all
FP operations myself and then I simply expand them into the following
sequence of instructions:
  mov arg1, %d0 // enfore register constraint
  mov arg2, %d1 // enfore register constraint
  call __fsub64

Is it correct understanding? If yes, how do I explain that arguments
are to be passed on the concrete physical registers like %d0 and %d1
and result comes on %d0? Do I need to allocate virtual regs for them
and pre-assign physical regs somehow?

Or my be I have to define a new calling convention that would enforce
it?
Actually, how can this be done with LLVM? I mean, if I want to
introduce a new calling convention, what do I need to do in backend to
define and register it? Is it required to change the frontend to make
it visible at the source program level?
>Also, if you *know*
> that these calls don't clobber the normal set of callee clobbered
> registers,  using the asm string is the right way to go.

 > > 2. In reality, the processor has only 32bit regs. Therefore, any
>> f64 value should be mapped to two 32bit registers. What is the best 
>> way to achieve it? I guess this is a well-known kind of problem.
>  Ah, this is trickier. :)  We have a robust solution in the integer
> side,  but don't allow the FP side to use it.
> 
> For the time being, I'd suggest defining an "fp register set"
which
> just aliases the integer register set (i.e. say that d0 overlaps 
> r0+r1).
OK. I almost did this way already. But I introduced two FP register
sets. One for fp32 (for the future) and one for fp64. fp32 aliases the
integer register set. fp64 aliases the fp32 register set, but not the
integer register set explicitly. I thought that aliases are transitive?
Or do I have to mention all aliases explicitly, e.g. for %d0 I need to
say [%s0,%s1,%GR0,%GR1]?

But a more interesting question is this:
The scheme above assumes that there is a "hardwired" mapping between
FP
registers and concerete pairs of integer registers. In many cases this
is enough, since the emulated operations indeed expect parameters on
predefined pairs of 32bit integer registers. But when it comes to other
uses of FP registers (mainly for storing some values) there is no this
limitation that a concrete pair of integer registers should be used.
Actually, any combination of two 32bit integer registers would do. How
this can be modelled and represented to regalloc, if at all? One guess
it to define one FP reg for each possible combination of two integer
registers, which would lead to definition N*(N-1) FP registers, where N
is the number of integer registers (And I have only 8 integer regs).
But it seems to be not very elegant for my taste,or?
 > > So far I was thinking about introducing some pseudo f64 registers,
> i.e.
> > %dX used above, and working with them in the instruction
> descriptions.
> > And then at the later stages, probably after lowering and
> selection,
> > expand them into pairs of load or store operations.
> 
> If you tell the register allocator about the "aliases", it should
do
> the right thing for you.  Take a look at how aliasing in the X86 
> register set is handled in X86RegisterInfo.td.
Can you elaborate a bit? Does it mean that I don't need to define fp64
loads from memory or fp64 stores to memory and reg<-reg tranfers for
64bit ops, because all that will be done automatically using pairs of
32bit instructions? So far, I had the impression I need to use fp64
regs in the instruction descriptions explicitly. But in this case
generated selected instructions operation on these 64bit regs and there
is a problem how to expand them into pairs of 32bit instructions.

-Roman


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Andrew Lenharth

2006-Oct-10 15:57 UTC

head link

[LLVMdev] FP emulation

> > That is a reasonable way to do it.  Another reasonable way would be
> > to  lower them in the instruction selector itself though the use of
> > custom  expanders.  In practice, using instructions with "call
foo"
> in > them  instead of lowering to calls may be simpler.
>
> Hmm, let me see. Just to check that I understand your proposal
> correctly:
> You mean I don't need to define any FP operations as machine
> instructions at all. Instead, I basically tell that I will expand all
> FP operations myself and then I simply expand them into the following
> sequence of instructions:
>   mov arg1, %d0 // enfore register constraint
>   mov arg2, %d1 // enfore register constraint
>   call __fsub64
>
> Is it correct understanding? If yes, how do I explain that arguments
> are to be passed on the concrete physical registers like %d0 and %d1
> and result comes on %d0? Do I need to allocate virtual regs for them
> and pre-assign physical regs somehow?
>
> Or my be I have to define a new calling convention that would enforce
> it?
The Alpha backend does this for division and remainder of integers.
See AlphaISelLowering.cpp:501 for the lowering to a custom call node,
then AlphaISelDAGToDAG.cpp:215 for the enforcing of the register
constraints (copy into/out of physical registers), then
AlphaInsrInfo.td:476 (JSRs) for the call instruction with special
register DEF/USE sets to match the calling convention of the library
function.

Hope that helps.

Andrew

Chris Lattner

2006-Oct-10 21:29 UTC

head link

[LLVMdev] FP emulation

On Tue, 10 Oct 2006, Roman Levenstein wrote:>>> such a call instruction?
>>
>> Why not just make the asm string be "call __fsub64"?
>
> Well, of course it would be the best solution. But the interesting part
> is that I need to generate the machine code directly because for
> different reasons use of a system assembler is not an option. As a
ok.
> result, I need to do this conversion in the target backend and later
> generate object code directly. But when and how this conversion
"fsub64
> insn -> call __fsub64" insn should be done? What is your advice?
I don't understand.  If you are writing out the .o file directly, you 
already know how to encode calls... can't you just encode it as the right 
sort of call?  What facilities are you using to emit the machine code, are 
you using the llvm machine code emitter generator stuff (like PPC)?
>>> Does this idea of representing the emulated FP operation calls as
>>> instructions as described above make some sense? Or do you see
>> easier
>>> or more useful ways to do it?
>>
>> That is a reasonable way to do it.  Another reasonable way would be
>> to  lower them in the instruction selector itself though the use of
>> custom  expanders.  In practice, using instructions with "call
foo"
> in > them  instead of lowering to calls may be simpler.
>
> Hmm, let me see. Just to check that I understand your proposal
> correctly:
> You mean I don't need to define any FP operations as machine
> instructions at all. Instead, I basically tell that I will expand all
> FP operations myself and then I simply expand them into the following
> sequence of instructions:
>  mov arg1, %d0 // enfore register constraint
>  mov arg2, %d1 // enfore register constraint
>  call __fsub64
>
> Is it correct understanding?
Yes, if you tell the legalizer you want to custom expand everything, you 
can do this.  In practice, there may be ones the legalizer doesn't know 
how to custom expand yet, b ut that is an easy addition.
> If yes, how do I explain that arguments are to be passed on the concrete 
> physical registers like %d0 and %d1 and result comes on %d0? Do I need 
> to allocate virtual regs for them and pre-assign physical regs somehow?
As others have pointed out, you flag copy{to/from}reg nodes to the call.
> Or my be I have to define a new calling convention that would enforce
> it?
> Actually, how can this be done with LLVM? I mean, if I want to
> introduce a new calling convention, what do I need to do in backend to
> define and register it? Is it required to change the frontend to make
> it visible at the source program level?
You should be able to handle this in the lowering stuff, you don't need 
anything complex here.
>> For the time being, I'd suggest defining an "fp register
set" which
>> just aliases the integer register set (i.e. say that d0 overlaps
>> r0+r1).
>
> OK. I almost did this way already. But I introduced two FP register
> sets. One for fp32 (for the future) and one for fp64. fp32 aliases the
> integer register set. fp64 aliases the fp32 register set, but not the
> integer register set explicitly. I thought that aliases are transitive?
> Or do I have to mention all aliases explicitly, e.g. for %d0 I need to
> say [%s0,%s1,%GR0,%GR1]?
Depending on how you defined the aliases, they aren't necessarily 
transitive.  I'd like at the <yourtarget>GenRegisterInfo.inc file, and
see
what it lists as the aliases for each reg.
> But a more interesting question is this: The scheme above assumes that 
> there is a "hardwired" mapping between FP registers and concerete
pairs
> of integer registers. In many cases this is enough, since the emulated 
> operations indeed expect parameters on predefined pairs of 32bit integer 
> registers. But when it comes to other uses of FP registers (mainly for 
> storing some values) there is no this limitation that a concrete pair of 
> integer registers should be used. Actually, any combination of two 32bit 
> integer registers would do. How this can be modelled and represented to 
> regalloc, if at all? One guess it to define one FP reg for each possible 
> combination of two integer registers, which would lead to definition 
> N*(N-1) FP registers, where N is the number of integer registers (And I 
> have only 8 integer regs). But it seems to be not very elegant for my 
> taste,or?
The right way would be to expose the fact that these really are integer 
registers, and just use integer registers for it.  This would be no 
problem except that the legalizer doesn't know how to convert f64 -> 2 x 
i32 registers.  This could be added, but a simpler approach to get you 
running faster is to add the bogus register set.
>>> So far I was thinking about introducing some pseudo f64 registers,
>> i.e.
>>> %dX used above, and working with them in the instruction
>> descriptions.
>>> And then at the later stages, probably after lowering and
>> selection,
>>> expand them into pairs of load or store operations.
>>
>> If you tell the register allocator about the "aliases", it
should do
>> the right thing for you.  Take a look at how aliasing in the X86
>> register set is handled in X86RegisterInfo.td.
>
> Can you elaborate a bit? Does it mean that I don't need to define fp64
> loads from memory or fp64 stores to memory and reg<-reg tranfers for
> 64bit ops, because all that will be done automatically using pairs of
> 32bit instructions? So far, I had the impression I need to use fp64
> regs in the instruction descriptions explicitly. But in this case
> generated selected instructions operation on these 64bit regs and there
> is a problem how to expand them into pairs of 32bit instructions.
Oh, I'm sorry, I misunderstood.  Yes, you're right, you'll have to
define
"f64" loads and stores and copies.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Roman Levenstein

2006-Oct-10 23:45 UTC

head link

[LLVMdev] FP emulation

Hi Andrew,

--- Andrew Lenharth <andrewl at lenharth.org>
wrote:> > > That is a reasonable way to do it.  Another reasonable way would
> be
> > > to  lower them in the instruction selector itself though the use
> of
> > > custom  expanders.  In practice, using instructions with
"call
> foo"
> > in > them  instead of lowering to calls may be simpler.
> >
> > Hmm, let me see. Just to check that I understand your proposal
> > correctly:
> > You mean I don't need to define any FP operations as machine
> > instructions at all. Instead, I basically tell that I will expand
> all
> > FP operations myself and then I simply expand them into the
> following
> > sequence of instructions:
> >   mov arg1, %d0 // enfore register constraint
> >   mov arg2, %d1 // enfore register constraint
> >   call __fsub64
> >
> > Is it correct understanding? If yes, how do I explain that
> arguments
> > are to be passed on the concrete physical registers like %d0 and
> %d1
> > and result comes on %d0? Do I need to allocate virtual regs for
> them
> > and pre-assign physical regs somehow?
> >
> > Or my be I have to define a new calling convention that would
> enforce
> > it?
> 
> The Alpha backend does this for division and remainder of integers.
> See AlphaISelLowering.cpp:501 for the lowering to a custom call node,
> then AlphaISelDAGToDAG.cpp:215 for the enforcing of the register
> constraints (copy into/out of physical registers), then
> AlphaInsrInfo.td:476 (JSRs) for the call instruction with special
> register DEF/USE sets to match the calling convention of the library
> function.
> 
> Hope that helps.
Yes, it does. Thank you for giving very concrete references. This
implementation for Alpha backend does exactly what I had in mind for
the emulated FP operations on my target. I'll try to do it the same
way.

Thanks,
 Roman

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Roman Levenstein

2006-Oct-11 00:20 UTC

head link

[LLVMdev] FP emulation

> On Tue, 10 Oct 2006, Roman Levenstein wrote:
> >>> such a call instruction?
> >>
> >> Why not just make the asm string be "call __fsub64"?
> >
> > Well, of course it would be the best solution. But the interesting
> part
> > is that I need to generate the machine code directly because for
> > different reasons use of a system assembler is not an option. As a
> 
> ok.
> 
> > result, I need to do this conversion in the target backend and
> later
> > generate object code directly. But when and how this conversion
> "fsub64
> > insn -> call __fsub64" insn should be done? What is your
advice?
> 
> I don't understand.  If you are writing out the .o file directly, you
> already know how to encode calls... can't you just encode it as the
> right sort of call? 
Yes, sure. I simply overlooked it, because it is too simple and obvious
;) I was thinking about doing it at a higher level, but this can be
done as well.

But I think that Andrew's approach as used in the Alpha backend will do
the job and it is rather easy to implement.
> What facilities are you using to emit the machine
> code, are  you using the llvm machine code emitter generator stuff
>(like PPC)?
At the moment, I do not emit real machine code. But I'm planning to do
it. If possible, I'll try to use the code emitter stuff of tblgen. But
I'm not sure if it can handle the insntruction encodings of my target.
This target uses variable length instruction encoding, where 2 bytes
are used for opcodes and encodings of memory references and some
registers are put between these two bytes. Therefore, the bit offsets
are not constant and depend on the type of instruction (e.g. rm, ri and
so on). Do you think it would be easy to express such encoding for
tblgen?
 > >>> Does this idea of representing the emulated FP operation calls
as
> >>> instructions as described above make some sense? Or do you see
> >> easier
> >>> or more useful ways to do it?
> >>
> >> That is a reasonable way to do it.  Another reasonable way would
> be
> >> to  lower them in the instruction selector itself though the use
> of
> >> custom  expanders.  In practice, using instructions with
"call
> foo"
> > in > them  instead of lowering to calls may be simpler.
> >
> > Hmm, let me see. Just to check that I understand your proposal
> > correctly:
> > You mean I don't need to define any FP operations as machine
> > instructions at all. Instead, I basically tell that I will expand
> all
> > FP operations myself and then I simply expand them into the
> following
> > sequence of instructions:
> >  mov arg1, %d0 // enfore register constraint
> >  mov arg2, %d1 // enfore register constraint
> >  call __fsub64
> >
> > Is it correct understanding?
> 
> Yes, if you tell the legalizer you want to custom expand everything,
> you 
> can do this.  In practice, there may be ones the legalizer doesn't
> know 
> how to custom expand yet, b ut that is an easy addition.
 > > If yes, how do I explain that arguments are to be passed on the
> concrete 
> > physical registers like %d0 and %d1 and result comes on %d0? Do I
> need 
> > to allocate virtual regs for them and pre-assign physical regs
> somehow?
> 
> As others have pointed out, you flag copy{to/from}reg nodes to the
> call.
OK. Andrew has explained how to do it. I'll give it a try.
 > > Or my be I have to define a new calling convention that would
> enforce
> > it?
> > Actually, how can this be done with LLVM? I mean, if I want to
> > introduce a new calling convention, what do I need to do in backend
> to
> > define and register it? Is it required to change the frontend to
> make
> > it visible at the source program level?
> 
> You should be able to handle this in the lowering stuff, you don't
> need anything complex here.
OK, I see. But I'd like to know how to introduce a new calling
convention so that it is visible at the source level. Basically, there
are some drivers existing for this system and I'd like to be able to
call some functions defined there. But these drivers are using very
custom calling convention. I thought that declaring functions like
follows could be the most appropriate solution:

extern __MySpecialDriverAttribute int read_from_device(int devid, int
channel);

But for doing this I need to define a custom attribute or a new calling
convention. Or do you see any other opportunity?
 > >> For the time being, I'd suggest defining an "fp register
set"
> which
> >> just aliases the integer register set (i.e. say that d0 overlaps
> >> r0+r1).
> >
> > OK. I almost did this way already. But I introduced two FP register
> > sets. One for fp32 (for the future) and one for fp64. fp32 aliases
> the
> > integer register set. fp64 aliases the fp32 register set, but not
> the
> > integer register set explicitly. I thought that aliases are
> transitive?
> > Or do I have to mention all aliases explicitly, e.g. for %d0 I need
> to
> > say [%s0,%s1,%GR0,%GR1]?
> 
> Depending on how you defined the aliases, they aren't necessarily 
> transitive.  I'd like at the <yourtarget>GenRegisterInfo.inc
file,
> and see 
> what it lists as the aliases for each reg.
Done. And I looked into the tblgen code. Tarnsitivity is not ensured by
tblgen in any form, since it does not compute it. What it ensures is
the commutativity of aliases, i.e. if A aliases B, then B aliases A. I
think it would make sense if tblgen would compute a transitive closure
automatically for alias sets, because I can hardly imagine
non-transitive aliasing semantics. If you think that this makes sense,
I could probably write a patch for tblgen to do that.
 > > But a more interesting question is this: The scheme above assumes
> that 
> > there is a "hardwired" mapping between FP registers and
concerete
> pairs 
> > of integer registers. In many cases this is enough, since the
> emulated 
> > operations indeed expect parameters on predefined pairs of 32bit
> integer 
> > registers. But when it comes to other uses of FP registers (mainly
> for 
> > storing some values) there is no this limitation that a concrete
> pair of 
> > integer registers should be used. Actually, any combination of two
> 32bit 
> > integer registers would do. How this can be modelled and
> represented to 
> > regalloc, if at all? One guess it to define one FP reg for each
> possible 
> > combination of two integer registers, which would lead to
> definition 
> > N*(N-1) FP registers, where N is the number of integer registers
> (And I 
> > have only 8 integer regs). But it seems to be not very elegant for
> my 
> > taste,or?
> 
> The right way would be to expose the fact that these really are
> integer  registers, and just use integer registers for it.  
How and where can this fact be exposed? In register set descriptions?
Or may be telling to use i32 register class when we assign the register
class to f64 values?
> This 
> would be no problem except that the legalizer doesn't know how to
> convert f64 -> 2 x  i32 registers.  This could be added, 
Can you elaborate a bit more about how this can be added? Do you mean
that legalizer would always create two new virtual i32 registers for
each such f64 value, copy the parts of f64 into it and let the register
allocator later allocate some physical registers for it? 
Would it require adaptations only in the target-specific legalizer or
do you think that some changes in the common part (in Target directory)
of the legalizer are required?
> but a
> simpler approach to get you running faster is to add the bogus
> register set.
True. To get something that works as soon as possible, this is simpler.
But to produce a faster code, a more complex approach described above
could be a (big?) win.

- Roman



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Oct 2006 - [LLVMdev] FP emulation

[LLVMdev] FP emulation

[LLVMdev] FP emulation

[LLVMdev] FP emulation

[LLVMdev] FP emulation

[LLVMdev] FP emulation

Reasonably Related Threads