thr3ads.net - llvm dev - [LLVMdev] FP emulation (continued) [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Roman Levenstein

2006-Nov-17 09:29 UTC

[LLVMdev] FP emulation (continued)

Hi,

I still have some questions about FP emulation for my embedded target.

To recap a bit:
My target only has integer registers and no hardware support for FP. FP
is supported only via emulation. Only f64 is supported. All FP
operations should be implemented to use i32 registers.

Based on the fruitful discussions on this list I was already able to
implement mapping of the FP operations to special library calls. 

I also implemented a simple version of the register mapping, where I
introduced a bogus f64 register set and used it during the code
selection and register allocation. After register allocation a special
post-RA pass just converts instructions using f64 operands into
multiple instructions using i32 operands. This seems to work, but has
one disadvantage. Since it is a post-RA pass, it uses a fixed mapping
between physical f64 registers and a pair of physical i32 registers,
e.g. D0:f64 -> i1:i32 x i2:i32. This leads to a non-optimal register
allocation. But anyway, I have an almost working compiler with integer
and FP support for my rather specific embedded target! This shows a
very impressive quality of the LLVM compiler.

Another opportunity, as Chris indicated in his previous mails (see
below), would be to expose the fact that f64 regs really are integer
registers. 
> >> The right way would be to expose the fact that these really are
> >> integer  registers, and just use integer registers for it.
> >
> > How and where can this fact be exposed? In register set
> descriptions?
> > Or may be telling to use i32 register class when we assign the
> register
> > class to f64 values?
> 
> It currently cannot be done without changes to the legalize pass.
> 
> >> This
> >> would be no problem except that the legalize doesn't know how
to
> >> convert f64 -> 2 x  i32 registers.  This could be added,
> >
> > Can you elaborate a bit more about how this can be added? Do you
> mean
> > that legalize would always create two new virtual i32 registers
> for
> > each such f64 value, copy the parts of f64 into it and let the
> register
> > allocator later allocate some physical registers for it?
> 
> Yes.
> 
> > Would it require adaptations only in the target-specific legalize
> > or do you think that some changes in the common part (in Target
> directory)  of the legalize are required?
> 
> The target independent parts would need to know how to do this. 
> Specifically it would need to know how to "expand" f64 to 2x i32. 
I tried to implement it, but I still have some troubles with that. 
In my understanding, the code in TargetLowering.cpp and also in
SelectioNDAGISel.cpp should be altered. I tried for example to modify
the computeRegisterProperties to tell that f64 is actually represented
as 2xi32. I also added some code into the function
FunctionLoweringInfo::CreateRegForValue for allocating this pair of i32
regs for f64 values. But it does not seem to help. >From what I can see, the problem is that emitNode() still looks at themachine instruction descriptions. And since I still have some insns for
load and stores of f64 values (do I still need to have them, if I do
the mapping?), it basically allocates f64 registers without even being
affected in any form by the modifications described above, because it
does not use any information prepared there. 

So, I'm a bit lost now. I don't quite understand what should be done to
explain the CodeGen how to map virtual f64 regs to the pairs of virtual
i32 regs? May be I'm doing something wrong? May be I need to explain
the codegen that f64 is a packed type consisting of 2xi32 or a vector
of i32???  Chris could you elaborate a bit more about this? What needs
to be explained to the codegen/legalizer and where?

Another thing I have in mind is:
It looks like the easiest way at all would be to have a special pass
after the assignment of virtual registers, but before a real register
allocation pass. This pass could define the mapping for each virtual
f64 register and then rewrite the machine insns to use the
corresponding i32 regs. The problem with this approach is that I don't
quite understand how to insert such a pass before physical register
allocation pass and if it can be done at all. Also, it worries me a bit
that it would eventually require modifications of PHI-nodes and
introduction of new ones in those cases, where f64 regs were used in
the PHI nodes. Now a pair of PHI-nodes would be required for that.
Since I don't have experience with PHI-nodes handling in LLVM, I'd like
to avoid this complexity, unless you say it is actually pretty easy to
do. What do you think of this approach? Does it make sense? Is it
easier than the previous one, which requires changes in the code
selector/legalizer?

Thanks,
  Roman




 
____________________________________________________________________________________
Sponsored Link

Mortgage rates near 39yr lows. 
$420k for $1,399/mo. Calculate new payment! 
www.LowerMyBills.com/lre

Chris Lattner

2006-Nov-20 19:14 UTC

head link

[LLVMdev] FP emulation (continued)

On Fri, 17 Nov 2006, Roman Levenstein wrote:> I still have some questions about FP emulation for my embedded target.
> To recap a bit:
> My target only has integer registers and no hardware support for FP. FP
> is supported only via emulation. Only f64 is supported. All FP
> operations should be implemented to use i32 registers.
ok
> allocation. But anyway, I have an almost working compiler with integer
> and FP support for my rather specific embedded target! This shows a
> very impressive quality of the LLVM compiler.
Great!
> Another opportunity, as Chris indicated in his previous mails (see
> below), would be to expose the fact that f64 regs really are integer
> registers.
Right.
>> The target independent parts would need to know how to do this.
>> Specifically it would need to know how to "expand" f64 to 2x
i32.
>
> I tried to implement it, but I still have some troubles with that.
> In my understanding, the code in TargetLowering.cpp and also in
> SelectioNDAGISel.cpp should be altered. I tried for example to modify
> the computeRegisterProperties to tell that f64 is actually represented
> as 2xi32.
Good, this is the first step.  Your goal is to get 
TLI.getTypeAction(MVT::f64) to return 'expand' and to get 
TLI.getTypeToTransformTo(f64) to return i32.
> I also added some code into the function
> FunctionLoweringInfo::CreateRegForValue for allocating this pair of i32
> regs for f64 values. But it does not seem to help.
Ok.
> From what I can see, the problem is that emitNode() still looks at the
> machine instruction descriptions. And since I still have some insns for
> load and stores of f64 values (do I still need to have them, if I do
> the mapping?), it basically allocates f64 registers without even being
> affected in any form by the modifications described above, because it
> does not use any information prepared there.
If you get here, something is wrong.  The code generator basically works 
like this:

1. Convert LLVM to naive dag
2. Optimize dag
3. Legalize
4. Optimize
5. Select
6. Schedule and emit.

If you properly mark f64 as expand, f64 values should only exist in stages 
1/2/3.  After legalization, they should be gone: only legal types (i32) 
should exist in the dag.
> So, I'm a bit lost now. I don't quite understand what should be
done to
> explain the CodeGen how to map virtual f64 regs to the pairs of virtual
> i32 regs? May be I'm doing something wrong? May be I need to explain
> the codegen that f64 is a packed type consisting of 2xi32 or a vector
> of i32???  Chris could you elaborate a bit more about this? What needs
> to be explained to the codegen/legalizer and where?
The first step is to get somethign simple like this working:

void %foo(double* %P) {
   store double 0.0, double* %P
   ret void
}

This will require the legalizer to turn the double 0.0 into two integer 
zeros, and the store into two integer stores.
> Another thing I have in mind is:
> It looks like the easiest way at all would be to have a special pass
> after the assignment of virtual registers, but before a real register
> allocation pass. This pass could define the mapping for each virtual
> f64 register and then rewrite the machine insns to use the
> corresponding i32 regs. The problem with this approach is that I don't
> quite understand how to insert such a pass before physical register
> allocation pass and if it can be done at all. Also, it worries me a bit
> that it would eventually require modifications of PHI-nodes and
> introduction of new ones in those cases, where f64 regs were used in
> the PHI nodes. Now a pair of PHI-nodes would be required for that.
> Since I don't have experience with PHI-nodes handling in LLVM, I'd
like
> to avoid this complexity, unless you say it is actually pretty easy to
> do. What do you think of this approach? Does it make sense? Is it
> easier than the previous one, which requires changes in the code
> selector/legalizer?
The best approach is to make the legalizer do this transformation.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Roman Levenstein

2006-Nov-20 23:06 UTC

head link

[LLVMdev] FP emulation (continued)

Hi Chris,

Thank you very much for your answer! It helps me to move in the right
direction. When you explain it, it sounds rather easy.  But I still
have some tricky issues. This is either because I'm not so familiar
with LLVM or because it is a bit underestimated how much LLVM
legalizer/expander relay on expandable types to be integers (see my
explanations below).

--- Chris Lattner <sabre at nondot.org> wrote:> > Another opportunity, as Chris indicated in his previous mails (see
> > below), would be to expose the fact that f64 regs really are
> integer
> > registers.
> 
> Right.
> 
> >> The target independent parts would need to know how to do this.
> >> Specifically it would need to know how to "expand" f64
to 2x i32.
> >
> > I tried to implement it, but I still have some troubles with that.
> > In my understanding, the code in TargetLowering.cpp and also in
> > SelectioNDAGISel.cpp should be altered. I tried for example to
> modify
> > the computeRegisterProperties to tell that f64 is actually
> represented
> > as 2xi32.
> 
> Good, this is the first step.  Your goal is to get 
> TLI.getTypeAction(MVT::f64) to return 'expand' and to get 
> TLI.getTypeToTransformTo(f64) to return i32.
After I sent a mail to the mailing list, I figured out that I need to
do this, so I added exactly what you describe and it helped. 
 > > I also added some code into the function
> > FunctionLoweringInfo::CreateRegForValue for allocating this pair of
> i32 regs for f64 values. But it does not seem to help.
> 
> Ok.
> 
> > From what I can see, the problem is that emitNode() still looks at
> > the machine instruction descriptions. And since I still have some
>> insns for
>> load and stores of f64 values (do I still need to have them, if I
>> do the mapping?), it basically allocates f64 registers without even
>> being affected in any form by the modifications described above, 
>> because it does not use any information prepared there.
OK. After the changes mentioned above, the pairs of virtual i32 regs
are used in most situations. And it does it exactly as it was intended
to do in most cases.
 > If you get here, something is wrong.  The code generator basically
> works 
> like this:
> 
> 1. Convert LLVM to naive dag
> 2. Optimize dag
> 3. Legalize
> 4. Optimize
> 5. Select
> 6. Schedule and emit.
> 
> If you properly mark f64 as expand, f64 values should only exist in
> stages 1/2/3.  After legalization, they should be gone: only legal 
> types(i32) should exist in the dag.
 > > So, I'm a bit lost now. I don't quite understand what should
be
> done to
> > explain the CodeGen how to map virtual f64 regs to the pairs of
> virtual
> > i32 regs? May be I'm doing something wrong? May be I need to
> explain
> > the codegen that f64 is a packed type consisting of 2xi32 or a
> vector
> > of i32???  Chris could you elaborate a bit more about this? What
> needs
> > to be explained to the codegen/legalizer and where?
> 
> The first step is to get somethign simple like this working:
> 
> void %foo(double* %P) {
>    store double 0.0, double* %P
>    ret void
> }
> 
> This will require the legalizer to turn the double 0.0 into two
> integer zeros, and the store into two integer stores.
Sample code like this, i.e. simple stores, loads or even some
arithmetic operations works fine now. No problems. 

But there are big issues with correct legalization and expansion, i.e.
with ExpandOp() and LegalizeOp(). I don't know how to explain it
properly, but basically these functions assume at many places that in
the case where an MVT requires more than one register this MVT is
always an integer type. There are some assertions checking for it, and
there are quite some places where it is assumed. More over, since
getTypeAction(MVT::f64) now returnes Expand, the legalizer tries to
expand too much and BTW it does not check for getOperationAction or
something like that in this case. For example, it tries to expand also
all the operations like ADD, SUB, etc into operations on the halves of
f64 (probably because it thinks it is an integer ;-) even though for
such operations I do not need any expanstion, since they are
implemented as library functions. 

For most of the places assuming the integer type to be expanded, I
inserted some code to explicitly check if MVT::f64 is being expanded.
This worked for most of the cases, but not for all. In particular I
cannot solve the SELECT_CC on f64 expansion. It generates a target
specific SELECT_CC node that correctly contains pairs of i32 for the
TrueValue and FalseValue. But when the value of this operation is used
later, then expander tries to expand the result of it. And it cannot do
it, since it seems to have a problem with EXTRACT_ELEMENT applied to
SELECT_CC mentioned above. The problem is probably that it cannot
extract the corresponding halves from the target specific SELECT_CC
node (and it can do it without problems for usual integer-based
ISD::SELECT_CC nodes). At this place I got stuck, since I do not see
how I can overcome it.

Overall, changing the lagalizer to support the expansion of tge
MVT::f64 proves to be more complicated as I initially expected. And it
also seems to be a bit of overkill. Therefore I was thinking about the
special pass after code selection, but before register allocation.
After all, I just want to do a transformation on all instructions that
read or write from/into virtual f64 regs.

  load/store vregf64, val 
->  
  load/store vregi32_1, val_low  
  load/store vregi32_2, val_high  

My subjective feeling is that is can be done easier in a separate pass
rather then chaning the legalizer all over the place in a rather
non-elegant way.
> > Another thing I have in mind is:
> > It looks like the easiest way at all would be to have a special
> pass after the assignment of virtual registers, but before a real
> register allocation pass. This pass could define the mapping for each
> virtual
> > f64 register and then rewrite the machine insns to use the
> > corresponding i32 regs. The problem with this approach is that I
> don't quite understand how to insert such a pass before physical 
> register  allocation pass and if it can be done at all. Also, it 
> worries me a bit
> > that it would eventually require modifications of PHI-nodes and
> > introduction of new ones in those cases, where f64 regs were used
> in the PHI nodes. Now a pair of PHI-nodes would be required for that.
> > Since I don't have experience with PHI-nodes handling in LLVM,
I'd
> like to avoid this complexity, unless you say it is actually pretty 
> easy to
> > do. What do you think of this approach? Does it make sense? Is it
> > easier than the previous one, which requires changes in the code
> > selector/legalizer?
> 
> The best approach is to make the legalizer do this transformation.
I believe, since you know it certainly better than me. But I
experienced quite some problems, as I described above. Now, if we would
assume for a second that this approach with a separate pass makes some
sense. I'm just curious how I could insert a new pass after the code
selection, but before any other passes including regiser allocation? I
have not found any easy way to do it yet. For post-RA pass it is very
easy and supported, but for pre-RA or post-code-selection - it is non
obvious.
I was thinking about to possibilities:
1) Mark all f64 load/store/move target insns as
usesCustomDAGSchedInserter = 1 and then intercept in the
InsertAtEndOfBasicBlock() their expansion. This should be fine, since
at this stage machine insns are still using the virtual registers and
it happens before register allocation. Then this function could expand
them into pairs of insns operating on i32 virtual regs. The problem
here is that InsertAtEndOfBasicBlock() is called not for all of the
emitted insns. Ironically enough, it is not called for ISD::CopyToReg
and ISD::CopyFromReg, which are the load and store insns. BTW, is it
intended or was it simply overseen? What would happen, if instructions
produced for these nodes are marked usesCustomDAGSchedInserter?
Shouldn't they be passed then to the custom target MI expander as it is
done for all other instructions? Would it make sense to always check
during the insertion of an MI into a BB if it is a
usesCustomDAGSchedInserter marked MI and if yes call a target-specific
expander for it? 

2) Introduce a fake register allocation pass and make it require an
f64toi32 pass as a pre-requisite. And basically call an existing
register allocator like in this code?

namespace {

  static RegisterRegAlloc
    TargetXRegAlloc("targetx", "  targetx register
allocator",
                       createTargetXRegisterAllocator);

  struct VISIBILITY_HIDDEN RA : public MachineFunctionPass {
  private:

    MachineFunctionPass *RealRegAlloc;

  public:

    RA()
    {
      // Instantiate a real allocator to do the job!
      RealRegAlloc (MachineFunctionPass*)(createLinearScanRegisterAllocator());
    }

    virtual const char* getPassName() const {
      return "TargetX Register Allocator";
    }

    virtual void getAnalysisUsage(AnalysisUsage &AU) const {

        // Add target specific pass as a requirement
        AU.addRequired<f64toi32pass>();

        // Reuse all requirements from the real allocator
        RealRegAlloc->getAnalysisUsage(AU);
    }

    /// runOnMachineFunction - register allocate the whole function
    bool runOnMachineFunction(MachineFunction&);
  };
}

bool RA::runOnMachineFunction(MachineFunction &fn) {
  return RealRegAlloc->runOnMachineFunction(fn);
}

FunctionPass* llvm::createTigerRegisterAllocator() {
  return new RA();
}

Looks fine and pretty obvious, but it does not work. When
runOnMachineFunction is invoked, I get the error, which I don't quite
understand. Why do I get it at all?

AnalysisType& llvm::Pass::getAnalysis() const [with AnalysisType
llvm::LiveIntervals]: Assertion `Resolver && "Pass has not been
inserted into a PassManager object!"' failed.

OK. These are my current problems with f64 to 2xi32 conversion. So far
I cannot solve it using any of the mentioned methods :(

Any further help and advice are very welcome!

Thanks,
 Roman

P.S. A minor off-topic question: Is it possible to explain the LLVM
backend that "float" is the same type as "double" on my
target? I
managed to explain it for immediates and also told to promote f32 to
f64. But it does not work for float variables or parameters, because
LLVM considers them to be float in any case and to have a 32bit
representation in memory. Or do I need to handle this equivalence in
the front-end only?

____________________________________________________________________________________
Sponsored Link

Mortgage rates near 39yr lows. 
$510k for $1,698/mo. Calculate new payment! 
www.LowerMyBills.com/lre

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Nov 2006 - [LLVMdev] FP emulation (continued)

[LLVMdev] FP emulation (continued)

[LLVMdev] FP emulation (continued)

[LLVMdev] FP emulation (continued)

Apparently Analagous Threads