Min-Yih Hsu via llvm-dev
2021-Dec-06 03:34 UTC
[llvm-dev] [RFC] A new CodeEmitterGen infrastructure for variable-length instructions
(This is a long proposal. If you prefer, here is the web version: https://gist.github.com/mshockwave/66e98d099256deefc062633909bb7b5b <https://gist.github.com/mshockwave/66e98d099256deefc062633909bb7b5b>) ## Background CodeEmitterGen is a TableGen backend that generates instruction encoder functions for `MCCodeEmitter` from a concise TableGen syntax. It is, however, almost exclusively designed for targets that use fixed-length instructions. It's nearly impossible to use this infrastructure to describe instruction encoding scheme for ISAs with variable-length instructions, like X86 and M68k. To have a better understanding on this problem, let's look at an example. For a fixed-length instruction ISA, developers write the following TableGen syntax to describe an instruction encoding: ``` class MyInst<(outs GR64:$dst), (ins GR64, i16imm:$imm)> : Instruction { bits<32> Inst; bits<4> dst; bits<16> imm; let Inst{31-28} = 0b1011; ... let Inst{19-16} = dst; let Inst{15-0} = imm; } ``` The `Inst` field tells us the length of an instruction -- 32 bits in this case. Each bit in this field describes the encoded value, which is either a concrete value or symbolic one like `dst` and `imm` in the example above. The `dst` and `imm` variables correspond to the output operand (`$dst`) and the second input operand (`$imm`), respectively. Meaning, the encoder function (generated by CodeEmitterGen) will eventually insert the encoding for these two operands into the right bit ranges (bit 19\~16 for `dst` and 15\~0 for `imm`). Though this TableGen syntax fits well for fixed-length instructions, it imposes some difficulties to instructions with variable length and memory poerands with complex addressing modes: 1. The bit width of the `Inst` field is fixed. Though we can declare the field with maximum instruction size in the ISA, it requires extra code to adjust the final instruction size. 2. Operand encoding can only be placed at fixed bit positions. However, the size of an operand in a variable-length instruction might vary. 3. In the situation where a single logical operand is consisting of multiple MachineOperand-s / MCOperand-s, the current syntax cannot reference a sub-operand. Which means we can only reference the entire logical operand at places where we actually should put sub-operands. Making the TG code less readable and bring more burden to the operand encoding functions (because they don't know which sub-operand to encode). In short, we need a more flexible CodeEmitterGen infrastructure for variable-length instructions: describe the instruction encoding in a "position independent" fashion and be able to reference sub-operands with ease. ## Proposal We propose a new infrastructure, called VarLenCodeEmitterGen, to solve the aforementioned shortcomings. It is consisting of new TableGen syntax and some modifications to the existing CodeEmitterGen TableGen backend. Suppose we are dealing with an instruction `MyVarInst`: ``` class MyMemOperand<dag sub_ops> : Operand<iPTR> { let MIOperandInfo = sub_ops; } class MyVarInst<MyMemOperand memory_op> : Instruction { let OutOperandList = (outs GR64:$dst); let InOperandList = (ins memory_operand:$src); } ``` It has the following encoding format: ``` 15 8 0 ---------------------------------------------------- | Fixed bits | Sub-operand 0 in source operand | ---------------------------------------------------- X 16 ---------------------------------------------------- | Sub-operand 1 in source operand | ---------------------------------------------------- X + 4 X + 1 ------------------------------------ | Destination register | ------------------------------------ ``` We have two different kinds of memory operands: ``` def MemOp16 : MyMemOperand<(ops GR64:$reg, i16imm:$offset)>; def MemOp16 : MyMemOperand<(ops GR64:$reg, i32imm:$offset)>; def FOO16 : MyVarInst<MemOp16>; def FOO32 : MyVarInst<MemOp32>; ``` So the size of `FOO16` and `FOO32` will be 36 and 52 bits, respectively. To express the encoding, first, we modify `MyVarInst` and `MyMemOperand`: ``` class MyMemOperand<dag sub_ops> : Operand<iPTR> { let MIOperandInfo = sub_ops; dag Base; dag Extension; } class MyVarInst<MyMemOperand memory_op> : Instruction { dag Inst; let OutOperandList = (outs GR64:$dst); let InOperandList = (ins memory_op:$src); let Inst = (seq (seq:$dec /*Fixed bits*/0b10110111, memory_op.Base), memory_op.Extension, // Destination register (operand "$dst", 4) ); } ``` Then, we use a slightly different representation for `MemOp16` and `MemOp32`: ``` class MemOp16<string op_name> : MyMemOperand<(ops GR64:$reg, i16imm:$offset)> { let Base = (operand "$"#op_name#".reg", 8); let Extension = (operand "$"#op_name#".offset", 16); } class MemOp32<string op_name> : MyMemOperand<(ops GR64:$reg, i32imm:$offset)> { let Base = (operand "$"#op_name#".reg", 8); let Extension = (operand "$"#op_name#".offset", 32); } def FOO16 : MyVarInst<MemOp16<"src">>; def FOO32 : MyVarInst<MemOp32<"src">>; ``` This new TableGen syntax uses `dag` rather than `bits<N>` for the `Inst` field. Allowing instructions to place their operand (and sub-operand) encodings without worrying about the actual bit positions. The new syntax is underpinned by two new DAG operators: `seq` and `operand`. The `seq` operator sequentially places its arguments -- fragments of encoding -- from LSB to MSB. If the operator is "tagged" by `$dec`, it goes from MSB to LSB instead. The `operand` operator references the encoding of an operand. Its first DAG argument is a string referencing the name of an operand in either `InOperandList` or `OutOperandList` of an instruction. We can also reference an sub-operand using syntax like `$<operand name>.<sub-operand name>`. The second DAG argument for `operand` is the bit width of the encoded operand. The other variant of `operand` is having two arguments instead of one that follow the operand referencing string. More specifically: ``` (operand "$src.reg", 8, 4) ``` In this case, 8 and 4 represents a bit range -- high bit and low bit, respectively -- to the encoded `$src.reg` operand. Finally, a new sub-component added to the existing CodeEmitterGen TableGen backend, VarLenCodeEmitterGen, will turn the above syntax into a C++ encoder function -- `MCCodeEmitter::getBinaryCodeForInstr` -- that uses the same mechanism as the fixed-length instruction version (except few details, like it always uses APInt to store the result). We think the proposed solution has the following advantages: - Flexible and versatile in terms of expressing instruction encodings. - The TableGen syntax is easy to read, write and understand. - Only adds a few new TableGen syntax. - Tightly integrated with the existing CodeEmitterGen. ### Previous approaches Both X86 and M68k -- the only two LLVM targets with variable-length instructions -- are using custom instruction encoders. X86 leverages TSFlags in `MCInst` to carry encoding info. Simply speaking, X86 enumerates and numbers every possible combinations of operands and stores the corresponding index into a segment of TSFlags for an instruction. This approach, of course, requires none trivial amount of workforce to maintain. M68k, on the other hand, uses an obscured infrastructure called code beads. It is conceptually similar to the VarLenCodeEmitterGen we're proposing here -- concatenating encoding fragments. Except that the syntax is bulky and it uses too many specialized TableGen infrastructures, including a separate TableGen backend, that make the maintainence really really hard. ## Patches TableGen modifications: https://reviews.llvm.org/D115128 ## FAQ - Do I need to toggle some flags -- either a command line flag or a TableGen bit field -- to use the new code emitter scheme? - No, having a `dag` type `Inst` field will automatically opt-in this new code emitter scheme. - Can I adopt this for fixed-length instructions? - Absolutely yes. But it's not recommended because CodeEmitterGen can generate more optimal encoder functions for fixed-length instructions. The TableGen syntax of CodeEmitterGen makes more sense for fixed-length instructions, too. - Can X86 adopt this infrastructure? - Theoritically, yes (In practice? I dunno). - What about the disassembler? Can we TableGen-enerate the corresponding disassembling functions? - Since we have a structural description of the encoded instruction, it's probably easier to create a disassembler from the new TableGen syntax. But I haven't worked on that yet. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211206/6e86c47d/attachment.html>
Ricky Taylor via llvm-dev
2021-Dec-07 10:07 UTC
[llvm-dev] [RFC] A new CodeEmitterGen infrastructure for variable-length instructions
>From an M68k point-of-view, I like this. I'm a little uneasy about the`operand` node though. Is it possible that there will ever be an operand type which needs multiple encodings? I also wonder whether $dec mode and slicing inputs might be better handled by separate nodes. Would something like `(seq (flip 0b0101010) (slice my_operand.Base, 4, 8))`, be possible? As you say, I suspect that implementing an interpreter-style disassembler generator like the fixed length one would be fairly straight-forward (and much better than the M68k disassembler implementation I provided). Ricky, On Mon, 6 Dec 2021 at 03:34, Min-Yih Hsu <minyihh at uci.edu> wrote:> (This is a long proposal. If you prefer, here is the web version: > https://gist.github.com/mshockwave/66e98d099256deefc062633909bb7b5b) > > ## Background > CodeEmitterGen is a TableGen backend that generates instruction encoder > functions for `MCCodeEmitter` from a concise TableGen syntax. It is, > however, almost exclusively designed for targets that use fixed-length > instructions. It's nearly impossible to use this infrastructure to describe > instruction encoding scheme for ISAs with variable-length instructions, > like X86 and M68k. > > To have a better understanding on this problem, let's look at an example. > For a fixed-length instruction ISA, developers write the following TableGen > syntax to describe an instruction encoding: > ``` > class MyInst<(outs GR64:$dst), (ins GR64, i16imm:$imm)> : Instruction { > bits<32> Inst; > > bits<4> dst; > bits<16> imm; > let Inst{31-28} = 0b1011; > ... > let Inst{19-16} = dst; > let Inst{15-0} = imm; > } > ``` > The `Inst` field tells us the length of an instruction -- 32 bits in this > case. Each bit in this field describes the encoded value, which is either a > concrete value or symbolic one like `dst` and `imm` in the example above. > The `dst` and `imm` variables correspond to the output operand (`$dst`) and > the second input operand (`$imm`), respectively. Meaning, the encoder > function (generated by CodeEmitterGen) will eventually insert the encoding > for these two operands into the right bit ranges (bit 19\~16 for `dst` and > 15\~0 for `imm`). > > Though this TableGen syntax fits well for fixed-length instructions, it > imposes some difficulties to instructions with variable length and memory > poerands with complex addressing modes: > 1. The bit width of the `Inst` field is fixed. Though we can declare the > field with maximum instruction size in the ISA, it requires extra code to > adjust the final instruction size. > 2. Operand encoding can only be placed at fixed bit positions. However, > the size of an operand in a variable-length instruction might vary. > 3. In the situation where a single logical operand is consisting of > multiple MachineOperand-s / MCOperand-s, the current syntax cannot > reference a sub-operand. Which means we can only reference the entire > logical operand at places where we actually should put sub-operands. Making > the TG code less readable and bring more burden to the operand encoding > functions (because they don't know which sub-operand to encode). > > In short, we need a more flexible CodeEmitterGen infrastructure for > variable-length instructions: describe the instruction encoding in a > "position independent" fashion and be able to reference sub-operands with > ease. > > ## Proposal > We propose a new infrastructure, called VarLenCodeEmitterGen, to solve the > aforementioned shortcomings. It is consisting of new TableGen syntax and > some modifications to the existing CodeEmitterGen TableGen backend. > > Suppose we are dealing with an instruction `MyVarInst`: > ``` > class MyMemOperand<dag sub_ops> : Operand<iPTR> { > let MIOperandInfo = sub_ops; > } > > class MyVarInst<MyMemOperand memory_op> : Instruction { > let OutOperandList = (outs GR64:$dst); > let InOperandList = (ins memory_operand:$src); > } > ``` > It has the following encoding format: > ``` > 15 8 0 > ---------------------------------------------------- > | Fixed bits | Sub-operand 0 in source operand | > ---------------------------------------------------- > X 16 > ---------------------------------------------------- > | Sub-operand 1 in source operand | > ---------------------------------------------------- > X + 4 X + 1 > ------------------------------------ > | Destination register | > ------------------------------------ > ``` > We have two different kinds of memory operands: > ``` > def MemOp16 : MyMemOperand<(ops GR64:$reg, i16imm:$offset)>; > def MemOp16 : MyMemOperand<(ops GR64:$reg, i32imm:$offset)>; > > def FOO16 : MyVarInst<MemOp16>; > def FOO32 : MyVarInst<MemOp32>; > ``` > So the size of `FOO16` and `FOO32` will be 36 and 52 bits, respectively. > > To express the encoding, first, we modify `MyVarInst` and `MyMemOperand`: > ``` > class MyMemOperand<dag sub_ops> : Operand<iPTR> { > let MIOperandInfo = sub_ops; > dag Base; > dag Extension; > } > > class MyVarInst<MyMemOperand memory_op> : Instruction { > dag Inst; > > let OutOperandList = (outs GR64:$dst); > let InOperandList = (ins memory_op:$src); > > let Inst = (seq > (seq:$dec /*Fixed bits*/0b10110111, memory_op.Base), > memory_op.Extension, > // Destination register > (operand "$dst", 4) > ); > } > ``` > Then, we use a slightly different representation > for `MemOp16` and `MemOp32`: > ``` > class MemOp16<string op_name> : MyMemOperand<(ops GR64:$reg, > i16imm:$offset)> { > let Base = (operand "$"#op_name#".reg", 8); > let Extension = (operand "$"#op_name#".offset", 16); > } > > class MemOp32<string op_name> : MyMemOperand<(ops GR64:$reg, > i32imm:$offset)> { > let Base = (operand "$"#op_name#".reg", 8); > let Extension = (operand "$"#op_name#".offset", 32); > } > > def FOO16 : MyVarInst<MemOp16<"src">>; > def FOO32 : MyVarInst<MemOp32<"src">>; > ``` > > This new TableGen syntax uses `dag` rather than `bits<N>` for > the `Inst` field. Allowing instructions to place their operand (and > sub-operand) encodings without worrying about the actual bit positions. The > new syntax is underpinned by two new DAG operators: `seq` and `operand`. > > The `seq` operator sequentially places its arguments -- fragments of > encoding -- from LSB to MSB. If the operator is "tagged" by `$dec`, it goes > from MSB to LSB instead. The `operand` operator references the encoding of > an operand. Its first DAG argument is a string referencing the name of an > operand in either `InOperandList` or `OutOperandList` of an instruction. We > can also reference an sub-operand using syntax like `$<operand > name>.<sub-operand name>`. The second DAG argument for `operand` is the bit > width of the encoded operand. The other variant of `operand` is having two > arguments instead of one that follow the operand referencing string. More > specifically: > ``` > (operand "$src.reg", 8, 4) > ``` > In this case, 8 and 4 represents a bit range -- high bit and low bit, > respectively -- to the encoded `$src.reg` operand. > > Finally, a new sub-component added to the existing CodeEmitterGen TableGen > backend, VarLenCodeEmitterGen, will turn the above syntax into a C++ > encoder function -- `MCCodeEmitter::getBinaryCodeForInstr` -- that uses the > same mechanism as the fixed-length instruction version (except few details, > like it always uses APInt to store the result). > > We think the proposed solution has the following advantages: > - Flexible and versatile in terms of expressing instruction encodings. > - The TableGen syntax is easy to read, write and understand. > - Only adds a few new TableGen syntax. > - Tightly integrated with the existing CodeEmitterGen. > > ### Previous approaches > Both X86 and M68k -- the only two LLVM targets with variable-length > instructions -- are using custom instruction encoders. X86 leverages > TSFlags in `MCInst` to carry encoding info. Simply speaking, X86 enumerates > and numbers every possible combinations of operands and stores the > corresponding index into a segment of TSFlags for an instruction. This > approach, of course, requires none trivial amount of workforce to maintain. > > M68k, on the other hand, uses an obscured infrastructure called code > beads. It is conceptually similar to the VarLenCodeEmitterGen we're > proposing here -- concatenating encoding fragments. Except that the syntax > is bulky and it uses too many specialized TableGen infrastructures, > including a separate TableGen backend, that make the maintainence really > really hard. > > ## Patches > TableGen modifications: https://reviews.llvm.org/D115128 > > ## FAQ > - Do I need to toggle some flags -- either a command line flag or a > TableGen bit field -- to use the new code emitter scheme? > - No, having a `dag` type `Inst` field will automatically opt-in this > new code emitter scheme. > - Can I adopt this for fixed-length instructions? > - Absolutely yes. But it's not recommended because CodeEmitterGen can > generate more optimal encoder functions for fixed-length instructions. The > TableGen syntax of CodeEmitterGen makes more sense for fixed-length > instructions, too. > - Can X86 adopt this infrastructure? > - Theoritically, yes (In practice? I dunno). > - What about the disassembler? Can we TableGen-enerate the corresponding > disassembling functions? > - Since we have a structural description of the encoded instruction, > it's probably easier to create a disassembler from the new TableGen syntax. > But I haven't worked on that yet. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211207/5ee00468/attachment.html>
Min-Yih Hsu via llvm-dev
2021-Dec-09 14:04 UTC
[llvm-dev] [RFC] A new CodeEmitterGen infrastructure for variable-length instructions
FYI: As a preview for this new CodeEmitterGen component, I’ve refactored some of the M68k instructions using VarLenCodeEmitterGen: https://reviews.llvm.org/D115234 <https://reviews.llvm.org/D115234> -Min> On Dec 6, 2021, at 11:34 AM, Min-Yih Hsu <minyihh at uci.edu> wrote: > > (This is a long proposal. If you prefer, here is the web version: https://gist.github.com/mshockwave/66e98d099256deefc062633909bb7b5b <https://gist.github.com/mshockwave/66e98d099256deefc062633909bb7b5b>) > > ## Background > CodeEmitterGen is a TableGen backend that generates instruction encoder functions for `MCCodeEmitter` from a concise TableGen syntax. It is, however, almost exclusively designed for targets that use fixed-length instructions. It's nearly impossible to use this infrastructure to describe instruction encoding scheme for ISAs with variable-length instructions, like X86 and M68k. > > To have a better understanding on this problem, let's look at an example. For a fixed-length instruction ISA, developers write the following TableGen syntax to describe an instruction encoding: > ``` > class MyInst<(outs GR64:$dst), (ins GR64, i16imm:$imm)> : Instruction { > bits<32> Inst; > > bits<4> dst; > bits<16> imm; > let Inst{31-28} = 0b1011; > ... > let Inst{19-16} = dst; > let Inst{15-0} = imm; > } > ``` > The `Inst` field tells us the length of an instruction -- 32 bits in this case. Each bit in this field describes the encoded value, which is either a concrete value or symbolic one like `dst` and `imm` in the example above. The `dst` and `imm` variables correspond to the output operand (`$dst`) and the second input operand (`$imm`), respectively. Meaning, the encoder function (generated by CodeEmitterGen) will eventually insert the encoding for these two operands into the right bit ranges (bit 19\~16 for `dst` and 15\~0 for `imm`). > > Though this TableGen syntax fits well for fixed-length instructions, it imposes some difficulties to instructions with variable length and memory poerands with complex addressing modes: > 1. The bit width of the `Inst` field is fixed. Though we can declare the field with maximum instruction size in the ISA, it requires extra code to adjust the final instruction size. > 2. Operand encoding can only be placed at fixed bit positions. However, the size of an operand in a variable-length instruction might vary. > 3. In the situation where a single logical operand is consisting of multiple MachineOperand-s / MCOperand-s, the current syntax cannot reference a sub-operand. Which means we can only reference the entire logical operand at places where we actually should put sub-operands. Making the TG code less readable and bring more burden to the operand encoding functions (because they don't know which sub-operand to encode). > > In short, we need a more flexible CodeEmitterGen infrastructure for variable-length instructions: describe the instruction encoding in a "position independent" fashion and be able to reference sub-operands with ease. > > ## Proposal > We propose a new infrastructure, called VarLenCodeEmitterGen, to solve the aforementioned shortcomings. It is consisting of new TableGen syntax and some modifications to the existing CodeEmitterGen TableGen backend. > > Suppose we are dealing with an instruction `MyVarInst`: > ``` > class MyMemOperand<dag sub_ops> : Operand<iPTR> { > let MIOperandInfo = sub_ops; > } > > class MyVarInst<MyMemOperand memory_op> : Instruction { > let OutOperandList = (outs GR64:$dst); > let InOperandList = (ins memory_operand:$src); > } > ``` > It has the following encoding format: > ``` > 15 8 0 > ---------------------------------------------------- > | Fixed bits | Sub-operand 0 in source operand | > ---------------------------------------------------- > X 16 > ---------------------------------------------------- > | Sub-operand 1 in source operand | > ---------------------------------------------------- > X + 4 X + 1 > ------------------------------------ > | Destination register | > ------------------------------------ > ``` > We have two different kinds of memory operands: > ``` > def MemOp16 : MyMemOperand<(ops GR64:$reg, i16imm:$offset)>; > def MemOp16 : MyMemOperand<(ops GR64:$reg, i32imm:$offset)>; > > def FOO16 : MyVarInst<MemOp16>; > def FOO32 : MyVarInst<MemOp32>; > ``` > So the size of `FOO16` and `FOO32` will be 36 and 52 bits, respectively. > > To express the encoding, first, we modify `MyVarInst` and `MyMemOperand`: > ``` > class MyMemOperand<dag sub_ops> : Operand<iPTR> { > let MIOperandInfo = sub_ops; > dag Base; > dag Extension; > } > > class MyVarInst<MyMemOperand memory_op> : Instruction { > dag Inst; > > let OutOperandList = (outs GR64:$dst); > let InOperandList = (ins memory_op:$src); > > let Inst = (seq > (seq:$dec /*Fixed bits*/0b10110111, memory_op.Base), > memory_op.Extension, > // Destination register > (operand "$dst", 4) > ); > } > ``` > Then, we use a slightly different representation for `MemOp16` and `MemOp32`: > ``` > class MemOp16<string op_name> : MyMemOperand<(ops GR64:$reg, i16imm:$offset)> { > let Base = (operand "$"#op_name#".reg", 8); > let Extension = (operand "$"#op_name#".offset", 16); > } > > class MemOp32<string op_name> : MyMemOperand<(ops GR64:$reg, i32imm:$offset)> { > let Base = (operand "$"#op_name#".reg", 8); > let Extension = (operand "$"#op_name#".offset", 32); > } > > def FOO16 : MyVarInst<MemOp16<"src">>; > def FOO32 : MyVarInst<MemOp32<"src">>; > ``` > > This new TableGen syntax uses `dag` rather than `bits<N>` for the `Inst` field. Allowing instructions to place their operand (and sub-operand) encodings without worrying about the actual bit positions. The new syntax is underpinned by two new DAG operators: `seq` and `operand`. > > The `seq` operator sequentially places its arguments -- fragments of encoding -- from LSB to MSB. If the operator is "tagged" by `$dec`, it goes from MSB to LSB instead. The `operand` operator references the encoding of an operand. Its first DAG argument is a string referencing the name of an operand in either `InOperandList` or `OutOperandList` of an instruction. We can also reference an sub-operand using syntax like `$<operand name>.<sub-operand name>`. The second DAG argument for `operand` is the bit width of the encoded operand. The other variant of `operand` is having two arguments instead of one that follow the operand referencing string. More specifically: > ``` > (operand "$src.reg", 8, 4) > ``` > In this case, 8 and 4 represents a bit range -- high bit and low bit, respectively -- to the encoded `$src.reg` operand. > > Finally, a new sub-component added to the existing CodeEmitterGen TableGen backend, VarLenCodeEmitterGen, will turn the above syntax into a C++ encoder function -- `MCCodeEmitter::getBinaryCodeForInstr` -- that uses the same mechanism as the fixed-length instruction version (except few details, like it always uses APInt to store the result). > > We think the proposed solution has the following advantages: > - Flexible and versatile in terms of expressing instruction encodings. > - The TableGen syntax is easy to read, write and understand. > - Only adds a few new TableGen syntax. > - Tightly integrated with the existing CodeEmitterGen. > > ### Previous approaches > Both X86 and M68k -- the only two LLVM targets with variable-length instructions -- are using custom instruction encoders. X86 leverages TSFlags in `MCInst` to carry encoding info. Simply speaking, X86 enumerates and numbers every possible combinations of operands and stores the corresponding index into a segment of TSFlags for an instruction. This approach, of course, requires none trivial amount of workforce to maintain. > > M68k, on the other hand, uses an obscured infrastructure called code beads. It is conceptually similar to the VarLenCodeEmitterGen we're proposing here -- concatenating encoding fragments. Except that the syntax is bulky and it uses too many specialized TableGen infrastructures, including a separate TableGen backend, that make the maintainence really really hard. > > ## Patches > TableGen modifications: https://reviews.llvm.org/D115128 <https://reviews.llvm.org/D115128> > > ## FAQ > - Do I need to toggle some flags -- either a command line flag or a TableGen bit field -- to use the new code emitter scheme? > - No, having a `dag` type `Inst` field will automatically opt-in this new code emitter scheme. > - Can I adopt this for fixed-length instructions? > - Absolutely yes. But it's not recommended because CodeEmitterGen can generate more optimal encoder functions for fixed-length instructions. The TableGen syntax of CodeEmitterGen makes more sense for fixed-length instructions, too. > - Can X86 adopt this infrastructure? > - Theoritically, yes (In practice? I dunno). > - What about the disassembler? Can we TableGen-enerate the corresponding disassembling functions? > - Since we have a structural description of the encoded instruction, it's probably easier to create a disassembler from the new TableGen syntax. But I haven't worked on that yet. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211209/a7b14895/attachment-0001.html>