thr3ads.net - llvm dev - [LLVMdev] FP emulation [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Roman Levenstein

2006-Oct-11 00:20 UTC

[LLVMdev] FP emulation

> On Tue, 10 Oct 2006, Roman Levenstein wrote:
> >>> such a call instruction?
> >>
> >> Why not just make the asm string be "call __fsub64"?
> >
> > Well, of course it would be the best solution. But the interesting
> part
> > is that I need to generate the machine code directly because for
> > different reasons use of a system assembler is not an option. As a
> 
> ok.
> 
> > result, I need to do this conversion in the target backend and
> later
> > generate object code directly. But when and how this conversion
> "fsub64
> > insn -> call __fsub64" insn should be done? What is your
advice?
> 
> I don't understand.  If you are writing out the .o file directly, you
> already know how to encode calls... can't you just encode it as the
> right sort of call? 
Yes, sure. I simply overlooked it, because it is too simple and obvious
;) I was thinking about doing it at a higher level, but this can be
done as well.

But I think that Andrew's approach as used in the Alpha backend will do
the job and it is rather easy to implement.
> What facilities are you using to emit the machine
> code, are  you using the llvm machine code emitter generator stuff
>(like PPC)?
At the moment, I do not emit real machine code. But I'm planning to do
it. If possible, I'll try to use the code emitter stuff of tblgen. But
I'm not sure if it can handle the insntruction encodings of my target.
This target uses variable length instruction encoding, where 2 bytes
are used for opcodes and encodings of memory references and some
registers are put between these two bytes. Therefore, the bit offsets
are not constant and depend on the type of instruction (e.g. rm, ri and
so on). Do you think it would be easy to express such encoding for
tblgen?
 > >>> Does this idea of representing the emulated FP operation calls
as
> >>> instructions as described above make some sense? Or do you see
> >> easier
> >>> or more useful ways to do it?
> >>
> >> That is a reasonable way to do it.  Another reasonable way would
> be
> >> to  lower them in the instruction selector itself though the use
> of
> >> custom  expanders.  In practice, using instructions with
"call
> foo"
> > in > them  instead of lowering to calls may be simpler.
> >
> > Hmm, let me see. Just to check that I understand your proposal
> > correctly:
> > You mean I don't need to define any FP operations as machine
> > instructions at all. Instead, I basically tell that I will expand
> all
> > FP operations myself and then I simply expand them into the
> following
> > sequence of instructions:
> >  mov arg1, %d0 // enfore register constraint
> >  mov arg2, %d1 // enfore register constraint
> >  call __fsub64
> >
> > Is it correct understanding?
> 
> Yes, if you tell the legalizer you want to custom expand everything,
> you 
> can do this.  In practice, there may be ones the legalizer doesn't
> know 
> how to custom expand yet, b ut that is an easy addition.
 > > If yes, how do I explain that arguments are to be passed on the
> concrete 
> > physical registers like %d0 and %d1 and result comes on %d0? Do I
> need 
> > to allocate virtual regs for them and pre-assign physical regs
> somehow?
> 
> As others have pointed out, you flag copy{to/from}reg nodes to the
> call.
OK. Andrew has explained how to do it. I'll give it a try.
 > > Or my be I have to define a new calling convention that would
> enforce
> > it?
> > Actually, how can this be done with LLVM? I mean, if I want to
> > introduce a new calling convention, what do I need to do in backend
> to
> > define and register it? Is it required to change the frontend to
> make
> > it visible at the source program level?
> 
> You should be able to handle this in the lowering stuff, you don't
> need anything complex here.
OK, I see. But I'd like to know how to introduce a new calling
convention so that it is visible at the source level. Basically, there
are some drivers existing for this system and I'd like to be able to
call some functions defined there. But these drivers are using very
custom calling convention. I thought that declaring functions like
follows could be the most appropriate solution:

extern __MySpecialDriverAttribute int read_from_device(int devid, int
channel);

But for doing this I need to define a custom attribute or a new calling
convention. Or do you see any other opportunity?
 > >> For the time being, I'd suggest defining an "fp register
set"
> which
> >> just aliases the integer register set (i.e. say that d0 overlaps
> >> r0+r1).
> >
> > OK. I almost did this way already. But I introduced two FP register
> > sets. One for fp32 (for the future) and one for fp64. fp32 aliases
> the
> > integer register set. fp64 aliases the fp32 register set, but not
> the
> > integer register set explicitly. I thought that aliases are
> transitive?
> > Or do I have to mention all aliases explicitly, e.g. for %d0 I need
> to
> > say [%s0,%s1,%GR0,%GR1]?
> 
> Depending on how you defined the aliases, they aren't necessarily 
> transitive.  I'd like at the <yourtarget>GenRegisterInfo.inc
file,
> and see 
> what it lists as the aliases for each reg.
Done. And I looked into the tblgen code. Tarnsitivity is not ensured by
tblgen in any form, since it does not compute it. What it ensures is
the commutativity of aliases, i.e. if A aliases B, then B aliases A. I
think it would make sense if tblgen would compute a transitive closure
automatically for alias sets, because I can hardly imagine
non-transitive aliasing semantics. If you think that this makes sense,
I could probably write a patch for tblgen to do that.
 > > But a more interesting question is this: The scheme above assumes
> that 
> > there is a "hardwired" mapping between FP registers and
concerete
> pairs 
> > of integer registers. In many cases this is enough, since the
> emulated 
> > operations indeed expect parameters on predefined pairs of 32bit
> integer 
> > registers. But when it comes to other uses of FP registers (mainly
> for 
> > storing some values) there is no this limitation that a concrete
> pair of 
> > integer registers should be used. Actually, any combination of two
> 32bit 
> > integer registers would do. How this can be modelled and
> represented to 
> > regalloc, if at all? One guess it to define one FP reg for each
> possible 
> > combination of two integer registers, which would lead to
> definition 
> > N*(N-1) FP registers, where N is the number of integer registers
> (And I 
> > have only 8 integer regs). But it seems to be not very elegant for
> my 
> > taste,or?
> 
> The right way would be to expose the fact that these really are
> integer  registers, and just use integer registers for it.  
How and where can this fact be exposed? In register set descriptions?
Or may be telling to use i32 register class when we assign the register
class to f64 values?
> This 
> would be no problem except that the legalizer doesn't know how to
> convert f64 -> 2 x  i32 registers.  This could be added, 
Can you elaborate a bit more about how this can be added? Do you mean
that legalizer would always create two new virtual i32 registers for
each such f64 value, copy the parts of f64 into it and let the register
allocator later allocate some physical registers for it? 
Would it require adaptations only in the target-specific legalizer or
do you think that some changes in the common part (in Target directory)
of the legalizer are required?
> but a
> simpler approach to get you running faster is to add the bogus
> register set.
True. To get something that works as soon as possible, this is simpler.
But to produce a faster code, a more complex approach described above
could be a (big?) win.

- Roman



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Roman Levenstein

2006-Oct-14 21:58 UTC

head link

[LLVMdev] Implicit defs

Hi,

Is it possible to dynamically define implicit defs for some
instructions?
Concretely, I'd like to define a register for a return value of a call
in a dynamic way, instead of using current static approach looking
like:
 let Defs = [R0] in 
 def CALLimm :  I<...>;

The reason for this wish is that some of the calling conventions on my
target use different sets of physical registers for their return
values. Therefore I cannot describe it by one static set of regs, as
shown above.

By looking into LLVM code, I've found how to force some arguments of
calls to be placed on required registers and how to copy the result of
the call from certain (return) registers. But if I try to copy from a
return register that is not in the static Defs set of the CALL machine
instruction, then register allocator complains that the interval for a
return value of a CALL does not exist. Which means that the register
allocator does not understand that this register is implicitly defined
by the CALL insn.

One obvious solution is to define several machine instructions for a
CALL, each defining its own set of implicitly defined registers. But it
is not very elegent in my opinion. Are there any other ways to achieve
the same result? May be it can be solved simpler?

Thanks,
 -Roman



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Roman Levenstein

2006-Oct-14 22:04 UTC

head link

[LLVMdev] Setting the set of implicit defs dynamically

Hi,

Is it possible to dynamically define implicit defs for some
instructions?
And if possible, already in the target lowering pass, e.g. in
LowerCALL()?

Concretely, I'd like to define a register for a return value of a call
in a dynamic way, instead of using current static approach looking
like:
 let Defs = [R0] in 
 def CALLimm :  I<...>;

The reason for this wish is that some of the calling conventions on my
target use different sets of physical registers for their return
values. Therefore I cannot describe it by one static set of regs, as
shown above.

By looking into the LLVM code, in particular inside LowerCALL() and the
like, I've found how to force some arguments of calls to be placed on
required registers and how to copy the result of the call from certain
(return) registers. But if I try to copy from a return register that is
not in the static Defs set of the CALL machine instruction, then
register allocator complains that the interval for a return value of a
CALL does not exist. Which means that the register allocator does not
understand that this register is implicitly defined by the CALL insn.

One obvious solution is to define several machine instructions for a
CALL, each defining its own set of implicitly defined registers. But it
is not very elegent in my opinion. Are there any other ways to achieve
the same result? May be it can be solved simpler?

Thanks,
 -Roman



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Chris Lattner

2006-Oct-15 00:37 UTC

head link

[LLVMdev] Implicit defs

On Sat, 14 Oct 2006, Roman Levenstein wrote:> Is it possible to dynamically define implicit defs for some
> instructions?
Yes!  This is what explicit operands are :).  Specifically, if you want to 
vary on a per-opcode basis what registers are used/def'd by the 
instruction, you can just add those registers as explicit use/def operands 
in the machine instruction with the physical registers directly filled in.
> The reason for this wish is that some of the calling conventions on my
> target use different sets of physical registers for their return
> values. Therefore I cannot describe it by one static set of regs, as
> shown above.
Right.
> One obvious solution is to define several machine instructions for a
> CALL, each defining its own set of implicitly defined registers. But it
> is not very elegent in my opinion. Are there any other ways to achieve
> the same result? May be it can be solved simpler?
This is another solution which works great if there are only a few 
variants.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Chris Lattner

2006-Oct-16 02:31 UTC

head link

[LLVMdev] FP emulation

On Tue, 10 Oct 2006, Roman Levenstein wrote:>> I don't understand.  If you are writing out the .o file directly,
you
>> already know how to encode calls... can't you just encode it as the
>> right sort of call?
>
> Yes, sure. I simply overlooked it, because it is too simple and obvious
> ;) I was thinking about doing it at a higher level, but this can be
> done as well.
> But I think that Andrew's approach as used in the Alpha backend will do
> the job and it is rather easy to implement.
ok
>> What facilities are you using to emit the machine
>> code, are  you using the llvm machine code emitter generator stuff
>> (like PPC)?
>
> At the moment, I do not emit real machine code. But I'm planning to do
> it. If possible, I'll try to use the code emitter stuff of tblgen. But
ok.
> I'm not sure if it can handle the insntruction encodings of my target.
> This target uses variable length instruction encoding, where 2 bytes
> are used for opcodes and encodings of memory references and some
> registers are put between these two bytes. Therefore, the bit offsets
> are not constant and depend on the type of instruction (e.g. rm, ri and
> so on). Do you think it would be easy to express such encoding for
> tblgen?
No.  Unfortunately, tblgen can only handle targets with 32-bit wide 
instruction words at the moment (alpha, ppc, sparc, etc).  You'll have to 
write a custom code emitter like the X86 backend has.
>> You should be able to handle this in the lowering stuff, you don't
>> need anything complex here.
>
> OK, I see. But I'd like to know how to introduce a new calling
> convention so that it is visible at the source level. Basically, there
> are some drivers existing for this system and I'd like to be able to
> call some functions defined there. But these drivers are using very
> custom calling convention. I thought that declaring functions like
> follows could be the most appropriate solution:
>
> extern __MySpecialDriverAttribute int read_from_device(int devid, int
> channel);
>
> But for doing this I need to define a custom attribute or a new calling
> convention. Or do you see any other opportunity?
I would suggest following the model of stdcall/fastcall in the windows x86 
world.  Specifically, this would require modifying llvm-gcc to recognize 
attributes on your target, then passing that on to llvm through its 
calling convention representation.  This will let you define stuff like:

void foo() __attribute__(((mycall))) {
}

You can then use a #define to hide the attribute syntax.
>> Depending on how you defined the aliases, they aren't necessarily
>> transitive.  I'd like at the <yourtarget>GenRegisterInfo.inc
file,
>> and see
>> what it lists as the aliases for each reg.
>
> Done. And I looked into the tblgen code. Tarnsitivity is not ensured by
> tblgen in any form, since it does not compute it. What it ensures is
> the commutativity of aliases, i.e. if A aliases B, then B aliases A. I
> think it would make sense if tblgen would compute a transitive closure
> automatically for alias sets, because I can hardly imagine
> non-transitive aliasing semantics. If you think that this makes sense,
> I could probably write a patch for tblgen to do that.
Be careful.  On X86, "AL aliases AX" and "AH aliases AX" but
"AL does not
alias AH".
>> The right way would be to expose the fact that these really are
>> integer  registers, and just use integer registers for it.
>
> How and where can this fact be exposed? In register set descriptions?
> Or may be telling to use i32 register class when we assign the register
> class to f64 values?
It currently cannot be done without changes to the legalize pass.
>> This
>> would be no problem except that the legalizer doesn't know how to
>> convert f64 -> 2 x  i32 registers.  This could be added,
>
> Can you elaborate a bit more about how this can be added? Do you mean
> that legalizer would always create two new virtual i32 registers for
> each such f64 value, copy the parts of f64 into it and let the register
> allocator later allocate some physical registers for it?
Yes.
> Would it require adaptations only in the target-specific legalizer or
> do you think that some changes in the common part (in Target directory)
> of the legalizer are required?
The target independent parts would need to know how to do this. 
Specifically it would need to know how to "expand" f64 to 2x i32.
>> but a simpler approach to get you running faster is to add the bogus 
>> register set.
>
> True. To get something that works as soon as possible, this is simpler.
> But to produce a faster code, a more complex approach described above
> could be a (big?) win.
That is possible.  I would try implementing the simple approach and seeing 
what cases are being missed.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Roman Levenstein

2006-Nov-17 09:29 UTC

head link

[LLVMdev] FP emulation (continued)

Hi,

I still have some questions about FP emulation for my embedded target.

To recap a bit:
My target only has integer registers and no hardware support for FP. FP
is supported only via emulation. Only f64 is supported. All FP
operations should be implemented to use i32 registers.

Based on the fruitful discussions on this list I was already able to
implement mapping of the FP operations to special library calls. 

I also implemented a simple version of the register mapping, where I
introduced a bogus f64 register set and used it during the code
selection and register allocation. After register allocation a special
post-RA pass just converts instructions using f64 operands into
multiple instructions using i32 operands. This seems to work, but has
one disadvantage. Since it is a post-RA pass, it uses a fixed mapping
between physical f64 registers and a pair of physical i32 registers,
e.g. D0:f64 -> i1:i32 x i2:i32. This leads to a non-optimal register
allocation. But anyway, I have an almost working compiler with integer
and FP support for my rather specific embedded target! This shows a
very impressive quality of the LLVM compiler.

Another opportunity, as Chris indicated in his previous mails (see
below), would be to expose the fact that f64 regs really are integer
registers. 
> >> The right way would be to expose the fact that these really are
> >> integer  registers, and just use integer registers for it.
> >
> > How and where can this fact be exposed? In register set
> descriptions?
> > Or may be telling to use i32 register class when we assign the
> register
> > class to f64 values?
> 
> It currently cannot be done without changes to the legalize pass.
> 
> >> This
> >> would be no problem except that the legalize doesn't know how
to
> >> convert f64 -> 2 x  i32 registers.  This could be added,
> >
> > Can you elaborate a bit more about how this can be added? Do you
> mean
> > that legalize would always create two new virtual i32 registers
> for
> > each such f64 value, copy the parts of f64 into it and let the
> register
> > allocator later allocate some physical registers for it?
> 
> Yes.
> 
> > Would it require adaptations only in the target-specific legalize
> > or do you think that some changes in the common part (in Target
> directory)  of the legalize are required?
> 
> The target independent parts would need to know how to do this. 
> Specifically it would need to know how to "expand" f64 to 2x i32. 
I tried to implement it, but I still have some troubles with that. 
In my understanding, the code in TargetLowering.cpp and also in
SelectioNDAGISel.cpp should be altered. I tried for example to modify
the computeRegisterProperties to tell that f64 is actually represented
as 2xi32. I also added some code into the function
FunctionLoweringInfo::CreateRegForValue for allocating this pair of i32
regs for f64 values. But it does not seem to help. >From what I can see, the problem is that emitNode() still looks at themachine instruction descriptions. And since I still have some insns for
load and stores of f64 values (do I still need to have them, if I do
the mapping?), it basically allocates f64 registers without even being
affected in any form by the modifications described above, because it
does not use any information prepared there. 

So, I'm a bit lost now. I don't quite understand what should be done to
explain the CodeGen how to map virtual f64 regs to the pairs of virtual
i32 regs? May be I'm doing something wrong? May be I need to explain
the codegen that f64 is a packed type consisting of 2xi32 or a vector
of i32???  Chris could you elaborate a bit more about this? What needs
to be explained to the codegen/legalizer and where?

Another thing I have in mind is:
It looks like the easiest way at all would be to have a special pass
after the assignment of virtual registers, but before a real register
allocation pass. This pass could define the mapping for each virtual
f64 register and then rewrite the machine insns to use the
corresponding i32 regs. The problem with this approach is that I don't
quite understand how to insert such a pass before physical register
allocation pass and if it can be done at all. Also, it worries me a bit
that it would eventually require modifications of PHI-nodes and
introduction of new ones in those cases, where f64 regs were used in
the PHI nodes. Now a pair of PHI-nodes would be required for that.
Since I don't have experience with PHI-nodes handling in LLVM, I'd like
to avoid this complexity, unless you say it is actually pretty easy to
do. What do you think of this approach? Does it make sense? Is it
easier than the previous one, which requires changes in the code
selector/legalizer?

Thanks,
  Roman




 
____________________________________________________________________________________
Sponsored Link

Mortgage rates near 39yr lows. 
$420k for $1,399/mo. Calculate new payment! 
www.LowerMyBills.com/lre

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Oct 2006 - [LLVMdev] FP emulation

[LLVMdev] FP emulation

[LLVMdev] Implicit defs

[LLVMdev] Setting the set of implicit defs dynamically

[LLVMdev] Implicit defs

[LLVMdev] FP emulation

[LLVMdev] FP emulation (continued)

Possibly Parallel Threads