Hi Chris, Thanks a lot for your answer! Chris Lattner wrote:>> 1. Why does X86 instruction set description provide different >> descriptions for the same instructions, which differ only in thesize>> of operands? >> E.g. >> >> def MOV8rm : I<0x8A, MRMSrcMem, (ops GR8 :$dst, i8mem :$src), >> "mov{b} {$src, $dst|$dst, $src}", >> [(set GR8:$dst, (load addr:$src))]>; >> def MOV16rm : I<0x8B, MRMSrcMem, (ops GR16:$dst, i16mem:$src), >> "mov{w} {$src, $dst|$dst, $src}", >> [(set GR16:$dst, (load addr:$src))]>, OpSize; >> def MOV32rm : I<0x8B, MRMSrcMem, (ops GR32:$dst, i32mem:$src), >> "mov{l} {$src, $dst|$dst, $src}", >> [(set GR32:$dst, (load addr:$src))]>; > > We must do this, because they perform different operations.Specifically,> the loads all read a different number of bytes (1/2/4).OK.>> For my target processor, only the types of operands (e.g. rm) are >> really important. The size is encoded into the opcode, but otherwiseit>> does not affect anything except for constraints on the sizes of >> operands. And constraint is basically that the size of both operands >> should be the same. > > Ok. > >> Wouldn't it be possible and even more clean to have just one >> description like (I use a pseudo-description here): >> >> def MOVrr : I<0x88, MRMDestReg, (ops (GR8|GR16|GR32) :$dst, >> (i8mem|i16mem|i32mem):$src), >> "mov{b} {$src, $dst|$dst, $src}", []>,isSameSize($dst,>> $src); > > We already have something like this, but it's a little more general.The> X86 backend hasn't been converted to use it.Any plans to do it, to make things simpler?> This is the 'multiclass' > facility in tblgen: > http://llvm.org/docs/TableGenFundamentals.html#multiclassThis is very interesting.> Basically this lets you use one definition to implement multipledifferent> instructions. For example, most instructions in the sparc targetcome in> "reg,reg" and "reg,imm" forms. As such, it defines: > > multiclass F3_12<string OpcStr, bits<6> Op3Val, SDNode OpNode> { > def rr : F3_1<2, Op3Val, > (ops IntRegs:$dst, IntRegs:$b, IntRegs:$c), > !strconcat(OpcStr, " $b, $c, $dst"), > [(set IntRegs:$dst, (OpNode IntRegs:$b,IntRegs:$c))]>;> def ri : F3_2<2, Op3Val, > (ops IntRegs:$dst, IntRegs:$b, i32imm:$c), > !strconcat(OpcStr, " $b, $c, $dst"), > [(set IntRegs:$dst, (OpNode IntRegs:$b,simm13:$c))]>;> } > > which allows it to use instructions like: > > defm AND : F3_12<"and" , 0b000001, and>; > defm OR : F3_12<"or" , 0b000010, or>; > defm XOR : F3_12<"xor" , 0b000011, xor>; > defm SLL : F3_12<"sll" , 0b100101, shl>; > defm SRL : F3_12<"srl" , 0b100110, srl>; > defm SRA : F3_12<"sra" , 0b100111, sra>; > defm ADD : F3_12<"add" , 0b000000, add>; > defm ADDCC : F3_12<"addcc", 0b010000, addc>; > defm ADDX : F3_12<"addx" , 0b001000, adde>; > defm SUB : F3_12<"sub" , 0b000100, sub>; > defm SUBX : F3_12<"subx" , 0b001100, sube>; > defm SUBCC : F3_12<"subcc", 0b010100, SPcmpicc>; > ... > > Each of these 'defm's expand into two instructions. > >> The semantic of such a description would mean that $dst should beone>> of GR8, GR16, GR32 and $dst is one of i8mem, i16mem, i32mem with the >> additional constraint that the sizes of both operands are the same >> (this is checked by isSameSize predicate). > > Sure. It would be straight-forward to define a multiclass for yourarch,> in which each use of defm makes three instructions: one for eachwidth. Beautiful! This saved my day! Multiclass is an extremely useful construct. Thanks a lot for explanations. Multiclasses seem to provide almost everything (if not all) of the features that I was asking for. I'll try to use it for writing a short and concise definition of instructions for my target.>> 2. Another related question is about instruciton costs. InBURG-based>> selectors there is usually a possibility to describe costs for the >> instructions so that a least-cost cover can be found during the >> instruction selection process. I have not seen any such cost >> descriptions in TableGen files. Does it mean that it is notsupported?> > LLVM currently uses assumes that all instructions have cost 1, unlesstold> otherwise ('Pat' patterns which produce multiple instructions havecost> equal to the sum of their result). Given this, it attempts to matchthe> largest patterns possible, which reduces cost. > > The 'largest' metric is somewhat intricate, but you can directlyaffect it> in your .td files with the 'AddedComplexity'. If you increase it, itwill> make an instruction more likely to be matched. However, you really > shouldn't need to do this, except in extreme cases. Are you seeingcases> where the selector is missing an optimal match, or are you just usedto> other tools which aren't as smart at inferring cost as tblgen?I guess, I'm just used to other more BURG-like tools. And yes, they are not too intelligent compared to tblgen;) So far, I haven't seen a real case where an optimal match is missed by tblgen. But I have not completely converted the old BURG-spec into tblgen yet. When I'm ready, I'll try to compare the results and check some corner cases (for example, I was using tree grammar to automatically select a best addressing mode, which tblgen doesn't support yet as far as I understand from the documentation; there were also interesting tree patterns for C operations of the form X OP= Y and increment operators) - let's see how tblgen will perform.>> Why? As far as I understand, LLVM uses BURG/IBURG for instruction >> selection and both of these tools support costs, AFAIK. > > Nope, it uses neither. It uses custom code built into tblgen.OK. Now I understand. Probably I thought that tblgen is BURG-based because there is a Burg subdirectory in the llvm/utils. And in older versions of llvm it was even non-empty, giving the impression that it is used. BTW, does it use the same theory as BURG/IBURG when it generates the selector? I.e. dynamic programming + precomputed costs,etc? Does it implicitly build a tree grammar or something like that? Does it try to merge/share patterns as much as possible to speedup the recognition phase as it is done by BURMs? I have the impression (by looking at the produced C++ selector code) that tblgen-based selectors are not as "precomputed" and optimal as BURG-based ones. Do you have an idea or figures about how it compares to BURG/IBURG selectors? Thanks again, Roman __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Mon, 2 Oct 2006, Roman Levenstein wrote:>>> Wouldn't it be possible and even more clean to have just one >>> description like (I use a pseudo-description here): >>> >>> def MOVrr : I<0x88, MRMDestReg, (ops (GR8|GR16|GR32) :$dst, >>> (i8mem|i16mem|i32mem):$src), >>> "mov{b} {$src, $dst|$dst, $src}", []>, > isSameSize($dst, >>> $src); >> >> We already have something like this, but it's a little more general. >> The X86 backend hasn't been converted to use it. > > Any plans to do it, to make things simpler?It would be nice, but noone has stepped up to do it yet. It's a mostly mechanical change, so it should be straight-forward.>> Sure. It would be straight-forward to define a multiclass for your > arch, >> in which each use of defm makes three instructions: one for each > width. > > Beautiful! This saved my day! Multiclass is an extremely useful > construct. Thanks a lot for explanations. Multiclasses seem to provide > almost everything (if not all) of the features that I was asking for. > I'll try to use it for writing a short and concise definition of > instructions for my target.Yay :)>> shouldn't need to do this, except in extreme cases. Are you seeing > cases >> where the selector is missing an optimal match, or are you just used > to >> other tools which aren't as smart at inferring cost as tblgen? > > I guess, I'm just used to other more BURG-like tools. And yes, they are > not too intelligent compared to tblgen;)ok.> So far, I haven't seen a real case where an optimal match is missed by > tblgen. But I have not completely converted the old BURG-spec into > tblgen yet. When I'm ready, I'll try to compare the results and check > some corner cases (for example, I was using tree grammar to > automatically select a best addressing mode, which tblgen doesn't > support yet as far as I understand from the documentation;Right. Currently we end up handling this with ComplexPattern matchers, which are written in C++. In practice, I found that this was simpler than writing patterns in the .td files anyway, at least for architectures with really complex addressing modes (x86). For example, on x86, we want to match "(A+1)*8" as "A*8 + 8" and many other similar things. Handling these with patterns is quite difficult when there are many orthogonal cases that need to be handled.> there were also interesting tree patterns for C operations of the form X > OP= Y and increment operators) - let's see how tblgen will perform.tblgen treats "updating" or "two-address" instructions as if they were three-address instructions, then uses the register allocator to pin the first two operands to the same register. See use of 'isTwoAddress' in the X86 or PPC backends for examples.>>> Why? As far as I understand, LLVM uses BURG/IBURG for instruction >>> selection and both of these tools support costs, AFAIK. >> >> Nope, it uses neither. It uses custom code built into tblgen. > > OK. Now I understand. Probably I thought that tblgen is BURG-based > because there is a Burg subdirectory in the llvm/utils. And in older > versions of llvm it was even non-empty, giving the impression that it > is used.Yep, it used to be used for the SparcV9 backend.> BTW, does it use the same theory as BURG/IBURG when it generates the > selector? I.e. dynamic programming + precomputed costs,etc? Does it > implicitly build a tree grammar or something like that?Currently it uses a very simple greedy bottom-up iterative matcher. We've found that most of the complexity of instruction selection is providing the DAG in a form that can easily be matched, rather than having really aggressive matching techniques. That said, we hope to move to a dynamic programming two pass matcher at some point.> Does it try to merge/share patterns as much as possible to speedup the > recognition phase as it is done by BURMs?Yes. This is also important to reduce code size.> I have the impression (by looking at the produced C++ selector code) > that tblgen-based selectors are not as "precomputed" and optimal as > BURG-based ones. Do you have an idea or figures about how it compares to > BURG/IBURG selectors?No measurements have been made, because noone has adapted burg/iburg to work with the LLVM DAGs. It would be an interesting project if you're interested though! Given a choice, I'd prefer that the matcher allow dynamic costs to be associated with the pattern. This is really important when subtargets need to be supported (e.g. pentium vs athlon vs core vs ...) -Chris -- http://nondot.org/sabre/ http://llvm.org/
Roman Levenstein
2006-Oct-03 21:37 UTC
[LLVMdev] Questions about instruction selection and instruction definitions
Hi, Few more questions that I found while trying to develop a new backend. And sorry if I ask too many questions. 1) My target (embedded processor, which is a "not so direct" successor of Z80 family of processors) does not support SELECT, so I was looking for a workaround. First I was thinking about expanding it into conditional flow with branching, but then I have found that there exists a pass called LowerSelect already. I added in the getAnalysisUsage of my DAGtoDAGSel class (derived from SelectionDAGSel) as a required analysis. After that, there are no more SELECT instructions passed to my code selector, which is what I wanted. Is it a correct to do it this way? Anyhow, I have a very strong impression that after I did it, something goes wrong with the code generations of copies from PHI nodes. Sometimes this code is not generated at all. Can it be that LowerSelect breaks some information like PHIs or some CFG information and this is the reason why I see these side-effects? May be I have to call some other pass after LowerSelect to recompute some properties and to bring the code again into the correct form? Any ideas? BTW, which pass(es) does transformation of PHIs into copies during code selection? 2) In the X86 target, I've seen the special register classes GR16_ and GR32_, which are subsclasses of GR16 and GR32. These two classes are used just in a few places inside the instructions definition file. There they are only used to define the fact that normal regs from GR32 can be moved into GR32_ regs. This is not quite obvious for me why this classes are introduced at all, if they are almost not used anywhere in the descriptions ... On my target, there are many instructions that have certain constraints using register subclasses, for example: loads from memory are only allowed to the GR_ regs copies between GR and GR_ regs are allowed stores to memory are only possible from GR_ regs How this can be expressed in the definitions of instructions? I tried to use GR_ register classes at appropriate places, when describing instructions operands. Is it a right method? So far I was not very successful, since after I did it as I described, tblgen started to produce errors like "Pattern x is impossible to select". Why do I have it and what should it mean? May be I'm doing something wrong. Besides what I described, my target has additionally support for banks if registers. Access to the registers in the non-current bank is more expensive than access to regs from the current bank. And many operations cannot have such regs from other banks as first operand. How something like this can be expressed? Is there any support for such kind of constraints. I can roughly imagine how instruction descriptions could look like. But I guess also a register allocator should take this into account. Otherwise too many copies between regs from current bank and other banks will be generated (e.g. virtual reg was allocated to a physical register from another bank, but later it is required in an instruction that cannot take this register as operand. So, it must be copied into a register in the current bank...). And decisions of register allocator need to be based on minimization of total costs, probably. Sounds very complex. I just wonder if LLVM already supports something like this in any form. 3) My target has only 2 addr instructions, i.e. each instruction has at most 2 operands. Do I still need to use 3 operands for such operations like OR, AND, ADD, SUB in my instruction descriptions, but mark them as isTwoAddrress=1??? Or do I need to call a special pass that converts everything into 2-operand instructions? 4) Following Chris advice regarding multiclasses, I defined several multiclasses, each containing 6 defs. But I noticed that I cannot use "let isTwoAddress=1" or "let isCommutable=1" inside the multiclass definition, which would be very handy. Is it really impossible or may be there is a special syntax to be used? As a workaround, I can use "let ..." with defm definitions using this multiclass. 5) Can definitionss inside a multiclass use another multiclass? I.e. is it possible to use defm inside multiclass definitions? Thanks, Roman __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Possibly Parallel Threads
- [LLVMdev] Questions about instruction selection and instruction definitions
- [LLVMdev] Instruction descriptions question
- [LLVMdev] Instruction descriptions question
- [LLVMdev] Questions about instruction selection and instruction definitions
- [LLVMdev] Patch for transform dependencies