Walter Zambotti via llvm-dev
2020-Mar-12  07:22 UTC
[llvm-dev] Getting up to speed with llvm backends. Machine Instruction operands.
Welcome to all Questions from veteran programmer with no LLVM backend experience evaluating llvm for creating a Hitachi 6309 backend. This post is about finding out more about machine instruction operands. The documentation I have read so far includes: - the online manuals - Building an LLVM Backend. Fraser Cormack Pierre-André Saulais - The Design of a Custom 32-bit RISC CPU and LLVM Compiler Backend. Connor Jan Goldberg - Design and Implementation of a TriCore Backend for the LLVM Compiler Framework. Christoph Erhardt I have also cloned llvm 9.0.1 and started looking at some of the targets. A little overwhelming! At this point I'm at information overload!>From the "The LLVM Target-Independent Code Generator"The MachineInstr class The operands of a machine instruction can be of several different types: a register reference, a constant integer, a basic block reference, etc. Where are these operand types defined or documented (especially the etcs)? How do these operand types relate to the operands specified in the instruction selection and selection patterns? A concern I have is raised in "Design and Implementation of a TriCore Backend for the LLVM Compiler" where the instruction set is non orthogonal (contains special purpose address registers) The strict distinction between pointers and integers is highly problematic because LLVMs code generator implicitly converts all pointers to integers of the same width ... upon construction of the SelectionDAG. . . . As mentioned above, LLVMs agnosticism regarding pointers initially makes it impos- sible to comply with the EABI as there is no way to tell whether an integer argument should go into an address register or a data register. However this document is dated circa 2008/2009 and I ask if this situation still remains the same today. I ask because the backend I would like to target the Hitachi/Motorola 6309/6809 which too provides dedicated indexing (addressing) registers. In fact in all binary operations the second operand is either immediate or some kind of a memory reference via a index/address register. The syntax being: {[}{OffsetReg | Disp{5,8,16}},{- | --}IndexReg{+ | ++ | ]} OffsetReg can be 8bit or 16bit accumulator (so only certain regs allowed) Displacment can be 5, 8 or 16 bit signed IndexReg can only be special index registers or PC or stack + ++ is post increment by 1, 2 repsectively - -- is pre decrement by 1, 2 respectively [ ] the entire effective address is a pointer to pointer [] and any incrementors/decrementors are mutally exclusive So given the machine instruction : add d ,x # to the d register add what the x register points at further examples of the second arguement are: ,x+ # what register x points to and post inc x ie. *x++ 10,y # what register y + 10 pointer to ie. *(y+10) [20,u] # what register u + 20 pointer to pointer to ie. **(u+20) w,y # what register y + register w points to ie. *(y+w) Is there a way to pattern match these kinds of operands? In MachineOperand.h I see this operand type. I assume I can match to it?!?!? MO_TargetIndex, ///< Target-dependent index+offset operand. At https://llvm.org/docs/CodeGenerator.html#x86-addressing-mode The x86 has a very flexible way of accessing memory. It is capable of forming memory addresses of the following expression directly in integer instructions (which use ModR/M addressing): SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32 In order to represent this, LLVM tracks no less than 5 operands for each memory operand of this form. This means that the load form of mov has the following MachineOperands in this order: Index: 0 | 1 2 3 4 5 Meaning: DestReg, | BaseReg, Scale, IndexReg, Displacement Segment OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg, SignExtImm PhysReg Stores, and all other instructions, treat the four memory operands in the same way and in the same order. If the segment register is unspecified (regno = 0), then no segment override is generated. Lea operations do not have a segment register specified, so they only have 4 operands for their memory reference. I then went and looked at the files in target/x86 and I have to admit I got lost trying to find where and how this is implemented. At this (learning) stage I would appreciate any input or pointers including any other documentation or tutorials that might help in relation to how I can implement indexed memory addressing operands. So appreciate comments. Walter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200312/7816f30e/attachment.html>
David Greene via llvm-dev
2020-Mar-12  16:05 UTC
[llvm-dev] Getting up to speed with llvm backends. Machine Instruction operands.
Walter Zambotti via llvm-dev <llvm-dev at lists.llvm.org> writes:> I have also cloned llvm 9.0.1 and started looking at some of the targets. A > little overwhelming!I would work off master if possible. LLVM APIs are a moving target.> The operands of a machine instruction can be of several different types: a > register reference, a constant integer, a basic block reference, etc. > > Where are these operand types defined or documented (especially the etcs)?Look in include/llvm/CodeGen/MachineOperand.h> How do these operand types relate to the operands specified in the > instruction selection and selection patterns?That is entirely up to the target to define. You write a combination of DAG patterns to match (using TableGen) and custom C++ lowering code to map SelectionDAG nodes to MachineInstrs (which reference MachineOperands). Most of the time these are one-to-one mappings (a SelectionDAG node maps to an instruction defining a virtual register, a SelectionDAG constant maps to a MachineOperand constant, SelectionDAG operands map to virtual register references, etc.)> As mentioned above, LLVM’s agnosticism regarding pointers initially > makes it impossible to comply with the EABI as there is no way to tell > whether an integer argument should go into an address register or a > data register. > > However this document is dated circa 2008/2009 and I ask if this situation > still remains the same today.Yes. There is no machine-level "pointer type."> I ask because the backend I would like to target the Hitachi/Motorola > 6309/6809 which too provides dedicated indexing (addressing) > registers. In fact in all binary operations the second operand is > either immediate or some kind of a memory reference via a > index/address register.I'm not familiar with this architecture, but I will try to answer question as best I can.> So given the machine instruction : > > > > add d ,x # to the d register add what the x register > points at > > further examples of the second arguement are: > > > > ,x+ # what register x points to and post inc x ie. *x++ > > 10,y # what register y + 10 pointer to ie. *(y+10) > > [20,u] # what register u + 20 pointer to pointer to ie. **(u+20) > > w,y # what register y + register w points to ie. *(y+w) > > Is there a way to pattern match these kinds of operands?Yes. The AArch64 backend might be a good guide as it supports pre- and post-increment. I don't know if any existing target has pointer-to-pointer operands (that's kind of a strange thing as it requires two memory operands in one instruction) but I don't think it would be super difficult to add. The X86 backend has special matching code to construct its more complex addressing modes.> In MachineOperand.h I see this operand type. I assume I can match to > it?!?!? > > MO_TargetIndex, ///< Target-dependent index+offset operand.I think the interpretation of this operand type is up to the target (hence, "Target-dependent"). So yes, I think you could use it.> The x86 has a very flexible way of accessing memory. It is capable of > forming memory addresses of the following > > I then went and looked at the files in target/x86 and I have to admit > I got lost trying to find where and how this is implemented.For X86, the magic happens with the "addr" TableGen class in X86InstrInfo.td which is used to type address operands of Load/Store nodes in the DAG patterns. This ends up calling info custom matching code "selectAddr" which lives in X86ISelDAGToDAG.cpp. There are similar variants of "addr" in the TableGen files each with different requirements (for example alignment restrictions for vector instructions). Look at the instruction patterns in X86InstrArithmetic.td to see how "addr" is used. Hopefully that will get you started.> At this (learning) stage I would appreciate any input or pointers > including any other documentation or tutorials that might help in > relation to how I can implement indexed memory addressing operands.For backend work it is absolutely necessary to understand TableGen. http://llvm.org/docs/TableGen/index.html http://llvm.org/docs/TableGen/LangIntro.html http://llvm.org/docs/TableGen/LangRef.html http://llvm.org/docs/TableGen/BackEnds.html The way existing code generators work will make much more sense after reading at least the first three above. The fourth is mostly brief summaries of the different things TableGen is used to generate. It started out mostly as a code generator generator but has become much more over the years. -David
Anton Korobeynikov via llvm-dev
2020-Mar-12  16:26 UTC
[llvm-dev] Getting up to speed with llvm backends. Machine Instruction operands.
> Yes. The AArch64 backend might be a good guide as it supports pre- and > post-increment. I don't know if any existing target has > pointer-to-pointer operands (that's kind of a strange thing as it > requires two memory operands in one instruction) but I don't think it > would be super difficult to add. The X86 backend has special matching > code to construct its more complex addressing modes.MSP430 has memory-memory instructions. As well as post-increments. Probably the smallest backend to check :) -- With best regards, Anton Korobeynikov Department of Statistical Modelling, Saint Petersburg State University
David Greene via llvm-dev
2020-Mar-12  19:56 UTC
[llvm-dev] Getting up to speed with llvm backends. Machine Instruction operands.
David Greene via llvm-dev <llvm-dev at lists.llvm.org> writes:>> I ask because the backend I would like to target the Hitachi/Motorola >> 6309/6809 which too provides dedicated indexing (addressing) >> registers. In fact in all binary operations the second operand is >> either immediate or some kind of a memory reference via a >> index/address register. > > I'm not familiar with this architecture, but I will try to answer > question as best I can.I missed this the first time around so I'll add it now. Special index/address registers are really no problem. They're just a different register class. There are target hooks you would redefine to implement copying between registers of different classes when the code generator realizes the value it needs is in the "wrong" register class. It *should* mostly just fall out given correct register class definitions and typing the TableGen pattern inputs correctly. I think the concern raised by the thesis paper is very specific to calling convention. The TriCore ABI apparently requires that arguments/return values of pointer and integer type be passed in different register classes. Does your target have a similar requirement? If not, then I don't think you have anything to worry about. -David