> On Tue, 10 Oct 2006, Roman Levenstein wrote: > >>> such a call instruction? > >> > >> Why not just make the asm string be "call __fsub64"? > > > > Well, of course it would be the best solution. But the interesting > part > > is that I need to generate the machine code directly because for > > different reasons use of a system assembler is not an option. As a > > ok. > > > result, I need to do this conversion in the target backend and > later > > generate object code directly. But when and how this conversion > "fsub64 > > insn -> call __fsub64" insn should be done? What is your advice? > > I don't understand. If you are writing out the .o file directly, you> already know how to encode calls... can't you just encode it as the > right sort of call?Yes, sure. I simply overlooked it, because it is too simple and obvious ;) I was thinking about doing it at a higher level, but this can be done as well. But I think that Andrew's approach as used in the Alpha backend will do the job and it is rather easy to implement.> What facilities are you using to emit the machine > code, are you using the llvm machine code emitter generator stuff >(like PPC)?At the moment, I do not emit real machine code. But I'm planning to do it. If possible, I'll try to use the code emitter stuff of tblgen. But I'm not sure if it can handle the insntruction encodings of my target. This target uses variable length instruction encoding, where 2 bytes are used for opcodes and encodings of memory references and some registers are put between these two bytes. Therefore, the bit offsets are not constant and depend on the type of instruction (e.g. rm, ri and so on). Do you think it would be easy to express such encoding for tblgen?> >>> Does this idea of representing the emulated FP operation calls as > >>> instructions as described above make some sense? Or do you see > >> easier > >>> or more useful ways to do it? > >> > >> That is a reasonable way to do it. Another reasonable way would > be > >> to lower them in the instruction selector itself though the use > of > >> custom expanders. In practice, using instructions with "call > foo" > > in > them instead of lowering to calls may be simpler. > > > > Hmm, let me see. Just to check that I understand your proposal > > correctly: > > You mean I don't need to define any FP operations as machine > > instructions at all. Instead, I basically tell that I will expand > all > > FP operations myself and then I simply expand them into the > following > > sequence of instructions: > > mov arg1, %d0 // enfore register constraint > > mov arg2, %d1 // enfore register constraint > > call __fsub64 > > > > Is it correct understanding? > > Yes, if you tell the legalizer you want to custom expand everything, > you > can do this. In practice, there may be ones the legalizer doesn't > know > how to custom expand yet, b ut that is an easy addition.> > If yes, how do I explain that arguments are to be passed on the > concrete > > physical registers like %d0 and %d1 and result comes on %d0? Do I > need > > to allocate virtual regs for them and pre-assign physical regs > somehow? > > As others have pointed out, you flag copy{to/from}reg nodes to the > call.OK. Andrew has explained how to do it. I'll give it a try.> > Or my be I have to define a new calling convention that would > enforce > > it? > > Actually, how can this be done with LLVM? I mean, if I want to > > introduce a new calling convention, what do I need to do in backend > to > > define and register it? Is it required to change the frontend to > make > > it visible at the source program level? > > You should be able to handle this in the lowering stuff, you don't > need anything complex here.OK, I see. But I'd like to know how to introduce a new calling convention so that it is visible at the source level. Basically, there are some drivers existing for this system and I'd like to be able to call some functions defined there. But these drivers are using very custom calling convention. I thought that declaring functions like follows could be the most appropriate solution: extern __MySpecialDriverAttribute int read_from_device(int devid, int channel); But for doing this I need to define a custom attribute or a new calling convention. Or do you see any other opportunity?> >> For the time being, I'd suggest defining an "fp register set" > which > >> just aliases the integer register set (i.e. say that d0 overlaps > >> r0+r1). > > > > OK. I almost did this way already. But I introduced two FP register > > sets. One for fp32 (for the future) and one for fp64. fp32 aliases > the > > integer register set. fp64 aliases the fp32 register set, but not > the > > integer register set explicitly. I thought that aliases are > transitive? > > Or do I have to mention all aliases explicitly, e.g. for %d0 I need > to > > say [%s0,%s1,%GR0,%GR1]? > > Depending on how you defined the aliases, they aren't necessarily > transitive. I'd like at the <yourtarget>GenRegisterInfo.inc file, > and see > what it lists as the aliases for each reg.Done. And I looked into the tblgen code. Tarnsitivity is not ensured by tblgen in any form, since it does not compute it. What it ensures is the commutativity of aliases, i.e. if A aliases B, then B aliases A. I think it would make sense if tblgen would compute a transitive closure automatically for alias sets, because I can hardly imagine non-transitive aliasing semantics. If you think that this makes sense, I could probably write a patch for tblgen to do that.> > But a more interesting question is this: The scheme above assumes > that > > there is a "hardwired" mapping between FP registers and concerete > pairs > > of integer registers. In many cases this is enough, since the > emulated > > operations indeed expect parameters on predefined pairs of 32bit > integer > > registers. But when it comes to other uses of FP registers (mainly > for > > storing some values) there is no this limitation that a concrete > pair of > > integer registers should be used. Actually, any combination of two > 32bit > > integer registers would do. How this can be modelled and > represented to > > regalloc, if at all? One guess it to define one FP reg for each > possible > > combination of two integer registers, which would lead to > definition > > N*(N-1) FP registers, where N is the number of integer registers > (And I > > have only 8 integer regs). But it seems to be not very elegant for > my > > taste,or? > > The right way would be to expose the fact that these really are > integer registers, and just use integer registers for it.How and where can this fact be exposed? In register set descriptions? Or may be telling to use i32 register class when we assign the register class to f64 values?> This > would be no problem except that the legalizer doesn't know how to > convert f64 -> 2 x i32 registers. This could be added,Can you elaborate a bit more about how this can be added? Do you mean that legalizer would always create two new virtual i32 registers for each such f64 value, copy the parts of f64 into it and let the register allocator later allocate some physical registers for it? Would it require adaptations only in the target-specific legalizer or do you think that some changes in the common part (in Target directory) of the legalizer are required?> but a > simpler approach to get you running faster is to add the bogus > register set.True. To get something that works as soon as possible, this is simpler. But to produce a faster code, a more complex approach described above could be a (big?) win. - Roman __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Hi, Is it possible to dynamically define implicit defs for some instructions? Concretely, I'd like to define a register for a return value of a call in a dynamic way, instead of using current static approach looking like: let Defs = [R0] in def CALLimm : I<...>; The reason for this wish is that some of the calling conventions on my target use different sets of physical registers for their return values. Therefore I cannot describe it by one static set of regs, as shown above. By looking into LLVM code, I've found how to force some arguments of calls to be placed on required registers and how to copy the result of the call from certain (return) registers. But if I try to copy from a return register that is not in the static Defs set of the CALL machine instruction, then register allocator complains that the interval for a return value of a CALL does not exist. Which means that the register allocator does not understand that this register is implicitly defined by the CALL insn. One obvious solution is to define several machine instructions for a CALL, each defining its own set of implicitly defined registers. But it is not very elegent in my opinion. Are there any other ways to achieve the same result? May be it can be solved simpler? Thanks, -Roman __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Roman Levenstein
2006-Oct-14 22:04 UTC
[LLVMdev] Setting the set of implicit defs dynamically
Hi, Is it possible to dynamically define implicit defs for some instructions? And if possible, already in the target lowering pass, e.g. in LowerCALL()? Concretely, I'd like to define a register for a return value of a call in a dynamic way, instead of using current static approach looking like: let Defs = [R0] in def CALLimm : I<...>; The reason for this wish is that some of the calling conventions on my target use different sets of physical registers for their return values. Therefore I cannot describe it by one static set of regs, as shown above. By looking into the LLVM code, in particular inside LowerCALL() and the like, I've found how to force some arguments of calls to be placed on required registers and how to copy the result of the call from certain (return) registers. But if I try to copy from a return register that is not in the static Defs set of the CALL machine instruction, then register allocator complains that the interval for a return value of a CALL does not exist. Which means that the register allocator does not understand that this register is implicitly defined by the CALL insn. One obvious solution is to define several machine instructions for a CALL, each defining its own set of implicitly defined registers. But it is not very elegent in my opinion. Are there any other ways to achieve the same result? May be it can be solved simpler? Thanks, -Roman __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Sat, 14 Oct 2006, Roman Levenstein wrote:> Is it possible to dynamically define implicit defs for some > instructions?Yes! This is what explicit operands are :). Specifically, if you want to vary on a per-opcode basis what registers are used/def'd by the instruction, you can just add those registers as explicit use/def operands in the machine instruction with the physical registers directly filled in.> The reason for this wish is that some of the calling conventions on my > target use different sets of physical registers for their return > values. Therefore I cannot describe it by one static set of regs, as > shown above.Right.> One obvious solution is to define several machine instructions for a > CALL, each defining its own set of implicitly defined registers. But it > is not very elegent in my opinion. Are there any other ways to achieve > the same result? May be it can be solved simpler?This is another solution which works great if there are only a few variants. -Chris -- http://nondot.org/sabre/ http://llvm.org/
On Tue, 10 Oct 2006, Roman Levenstein wrote:>> I don't understand. If you are writing out the .o file directly, you >> already know how to encode calls... can't you just encode it as the >> right sort of call? > > Yes, sure. I simply overlooked it, because it is too simple and obvious > ;) I was thinking about doing it at a higher level, but this can be > done as well. > But I think that Andrew's approach as used in the Alpha backend will do > the job and it is rather easy to implement.ok>> What facilities are you using to emit the machine >> code, are you using the llvm machine code emitter generator stuff >> (like PPC)? > > At the moment, I do not emit real machine code. But I'm planning to do > it. If possible, I'll try to use the code emitter stuff of tblgen. Butok.> I'm not sure if it can handle the insntruction encodings of my target. > This target uses variable length instruction encoding, where 2 bytes > are used for opcodes and encodings of memory references and some > registers are put between these two bytes. Therefore, the bit offsets > are not constant and depend on the type of instruction (e.g. rm, ri and > so on). Do you think it would be easy to express such encoding for > tblgen?No. Unfortunately, tblgen can only handle targets with 32-bit wide instruction words at the moment (alpha, ppc, sparc, etc). You'll have to write a custom code emitter like the X86 backend has.>> You should be able to handle this in the lowering stuff, you don't >> need anything complex here. > > OK, I see. But I'd like to know how to introduce a new calling > convention so that it is visible at the source level. Basically, there > are some drivers existing for this system and I'd like to be able to > call some functions defined there. But these drivers are using very > custom calling convention. I thought that declaring functions like > follows could be the most appropriate solution: > > extern __MySpecialDriverAttribute int read_from_device(int devid, int > channel); > > But for doing this I need to define a custom attribute or a new calling > convention. Or do you see any other opportunity?I would suggest following the model of stdcall/fastcall in the windows x86 world. Specifically, this would require modifying llvm-gcc to recognize attributes on your target, then passing that on to llvm through its calling convention representation. This will let you define stuff like: void foo() __attribute__(((mycall))) { } You can then use a #define to hide the attribute syntax.>> Depending on how you defined the aliases, they aren't necessarily >> transitive. I'd like at the <yourtarget>GenRegisterInfo.inc file, >> and see >> what it lists as the aliases for each reg. > > Done. And I looked into the tblgen code. Tarnsitivity is not ensured by > tblgen in any form, since it does not compute it. What it ensures is > the commutativity of aliases, i.e. if A aliases B, then B aliases A. I > think it would make sense if tblgen would compute a transitive closure > automatically for alias sets, because I can hardly imagine > non-transitive aliasing semantics. If you think that this makes sense, > I could probably write a patch for tblgen to do that.Be careful. On X86, "AL aliases AX" and "AH aliases AX" but "AL does not alias AH".>> The right way would be to expose the fact that these really are >> integer registers, and just use integer registers for it. > > How and where can this fact be exposed? In register set descriptions? > Or may be telling to use i32 register class when we assign the register > class to f64 values?It currently cannot be done without changes to the legalize pass.>> This >> would be no problem except that the legalizer doesn't know how to >> convert f64 -> 2 x i32 registers. This could be added, > > Can you elaborate a bit more about how this can be added? Do you mean > that legalizer would always create two new virtual i32 registers for > each such f64 value, copy the parts of f64 into it and let the register > allocator later allocate some physical registers for it?Yes.> Would it require adaptations only in the target-specific legalizer or > do you think that some changes in the common part (in Target directory) > of the legalizer are required?The target independent parts would need to know how to do this. Specifically it would need to know how to "expand" f64 to 2x i32.>> but a simpler approach to get you running faster is to add the bogus >> register set. > > True. To get something that works as soon as possible, this is simpler. > But to produce a faster code, a more complex approach described above > could be a (big?) win.That is possible. I would try implementing the simple approach and seeing what cases are being missed. -Chris -- http://nondot.org/sabre/ http://llvm.org/
Hi, I still have some questions about FP emulation for my embedded target. To recap a bit: My target only has integer registers and no hardware support for FP. FP is supported only via emulation. Only f64 is supported. All FP operations should be implemented to use i32 registers. Based on the fruitful discussions on this list I was already able to implement mapping of the FP operations to special library calls. I also implemented a simple version of the register mapping, where I introduced a bogus f64 register set and used it during the code selection and register allocation. After register allocation a special post-RA pass just converts instructions using f64 operands into multiple instructions using i32 operands. This seems to work, but has one disadvantage. Since it is a post-RA pass, it uses a fixed mapping between physical f64 registers and a pair of physical i32 registers, e.g. D0:f64 -> i1:i32 x i2:i32. This leads to a non-optimal register allocation. But anyway, I have an almost working compiler with integer and FP support for my rather specific embedded target! This shows a very impressive quality of the LLVM compiler. Another opportunity, as Chris indicated in his previous mails (see below), would be to expose the fact that f64 regs really are integer registers.> >> The right way would be to expose the fact that these really are > >> integer registers, and just use integer registers for it. > > > > How and where can this fact be exposed? In register set > descriptions? > > Or may be telling to use i32 register class when we assign the > register > > class to f64 values? > > It currently cannot be done without changes to the legalize pass. > > >> This > >> would be no problem except that the legalize doesn't know how to > >> convert f64 -> 2 x i32 registers. This could be added, > > > > Can you elaborate a bit more about how this can be added? Do you > mean > > that legalize would always create two new virtual i32 registers > for > > each such f64 value, copy the parts of f64 into it and let the > register > > allocator later allocate some physical registers for it? > > Yes. > > > Would it require adaptations only in the target-specific legalize > > or do you think that some changes in the common part (in Target > directory) of the legalize are required? > > The target independent parts would need to know how to do this. > Specifically it would need to know how to "expand" f64 to 2x i32.I tried to implement it, but I still have some troubles with that. In my understanding, the code in TargetLowering.cpp and also in SelectioNDAGISel.cpp should be altered. I tried for example to modify the computeRegisterProperties to tell that f64 is actually represented as 2xi32. I also added some code into the function FunctionLoweringInfo::CreateRegForValue for allocating this pair of i32 regs for f64 values. But it does not seem to help.>From what I can see, the problem is that emitNode() still looks at themachine instruction descriptions. And since I still have some insns for load and stores of f64 values (do I still need to have them, if I do the mapping?), it basically allocates f64 registers without even being affected in any form by the modifications described above, because it does not use any information prepared there. So, I'm a bit lost now. I don't quite understand what should be done to explain the CodeGen how to map virtual f64 regs to the pairs of virtual i32 regs? May be I'm doing something wrong? May be I need to explain the codegen that f64 is a packed type consisting of 2xi32 or a vector of i32??? Chris could you elaborate a bit more about this? What needs to be explained to the codegen/legalizer and where? Another thing I have in mind is: It looks like the easiest way at all would be to have a special pass after the assignment of virtual registers, but before a real register allocation pass. This pass could define the mapping for each virtual f64 register and then rewrite the machine insns to use the corresponding i32 regs. The problem with this approach is that I don't quite understand how to insert such a pass before physical register allocation pass and if it can be done at all. Also, it worries me a bit that it would eventually require modifications of PHI-nodes and introduction of new ones in those cases, where f64 regs were used in the PHI nodes. Now a pair of PHI-nodes would be required for that. Since I don't have experience with PHI-nodes handling in LLVM, I'd like to avoid this complexity, unless you say it is actually pretty easy to do. What do you think of this approach? Does it make sense? Is it easier than the previous one, which requires changes in the code selector/legalizer? Thanks, Roman ____________________________________________________________________________________ Sponsored Link Mortgage rates near 39yr lows. $420k for $1,399/mo. Calculate new payment! www.LowerMyBills.com/lre