Mark R V Murray via llvm-dev
2019-Mar-25 16:18 UTC
[llvm-dev] Printing PC-relative offsets - how to get the instruction length?
Hi In my MC6809 backend, in llvm/lib/Target/MC6809/InstPrinter/MC6809InstPrinter.cpp, I have the routine void MC6809InstPrinter::printPCRelImmOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O) { const MCOperand &Op = MI->getOperand(OpNo); ZZ if (Op.isImm()) { int64_t Imm = Op.getImm() + 2; <<<======================= O << "$"; if (Imm >= 0) O << '+'; O << Imm; } else { assert(Op.isExpr() && "unknown pcrel immediate operand"); Op.getExpr()->print(O, &MAI); } } Which works well enough except for the constant 2 that I've arrowed - it needs to be the length of the binary instruction in bytes. The MC6809 has a *LOT* of variability here, so a case statement would be a right pain to maintain. An answer is tantalisingly close: $ bin/llvm-mc -triple mc6809 -show-inst-operands -show-inst -show-encoding <<< "lda 0,pc" .text <stdin>:1:1: note: parsed instruction: ['lda', 0, <register 13>] lda 0,pc ^ lda $+2,pc ; encoding: [0xa6,0x8c,0x00] <<========== ; <MCInst #1849 LDAi8oPC ; <MCOperand Imm:0> ; <MCOperand Imm:0>> The "encoding:" knows that I have a three-byte instruction, but that is generated by another chunk of code miles away. I suppose I could replicate that, but it seems wasteful. Is there a better way, not involving nasty layering violations, to get the length of an instruction in bytes in the context of llvm/lib/Target/*/InstPrinter/*InstPrinter.cpp? Also, both 8 and 16-bit variants are possible. The instruction picked is LDAi8oPC with is the 8-bit offset version. If I supply a bigger offset: $ bin/llvm-mc -triple mc6809 -show-inst-operands -show-inst -show-encoding <<< "lda 1000,pc" .text <stdin>:1:1: note: parsed instruction: ['lda', 1000, <register 13>] lda 1000,pc ^ lda $+1002,pc ; encoding: [0xa6,0x8c,0xe8] ; <MCInst #1849 LDAi8oPC ; <MCOperand Imm:0> ; <MCOperand Imm:1000>> I still get the 8-bit variant instead of LDAi16oPC, and the operand is truncated. The TableGen-generated .inc file has { 444 /* lda */, MC6809::LDAi8oPC, Convert__imm_95_0__Imm81_0, AMFBS_None, { MCK_Imm8, MCK_PC }, }, { 444 /* lda */, MC6809::LDAi16oPC, Convert__imm_95_0__Imm161_0, AMFBS_None, { MCK_Imm16, MCK_PC }, }, ... so how do I get the 16-bit variant with MCK_Imm16 selected instead? The instructions are defined as def LDAi8oPC : MC6809LoadIndexed_i8oPC_P1< (outs GR8:$dst8), (ins pcoffset8:$offset), !strconcat("lda", "\t", "${offset}", ",", "pc"), 0x00, 0xA6, []> { let Inst{23-16} = offset{7-0}; let Inst{15} = 0b1; let Inst{14-13} = 0b00; let Inst{12-8} = 0b01100; let Inst{7-0} = opcode; }def LDAi16oPC : MC6809LoadIndexed_i16oPC_P1< (outs GR8:$dst8), (ins pcoffset16:$offset), !strconcat("lda", "\t", "${offset}", ",", "pc"), 0x00, 0xA6, []> { let Inst{31-24} = offset{7-0}; let Inst{23-16} = offset{15-8}; let Inst{15} = 0b1; let Inst{14-13} = 0b00; let Inst{12-8} = 0b01101; let Inst{7-0} = opcode; }and I have def pcoffset8 : Operand<i8>, ImmLeaf<i8, [{ return Immediate >= -128 && Immediate <= 127; }]> { let PrintMethod = "printPCRelImmOperand"; let MIOperandInfo = (ops i8imm); let ParserMatchClass = ImmediateAsmOperand<"Imm8">; let EncoderMethod = "getMemOpValue"; let DecoderMethod = "DecodeMemOperand"; } def pcoffset16 : Operand<i16>, ImmLeaf<i16, [{ return Immediate >= -32768 && Immediate <= 32767; }]> { let PrintMethod = "printPCRelImmOperand"; let MIOperandInfo = (ops i16imm); let ParserMatchClass = ImmediateAsmOperand<"Imm16">; let EncoderMethod = "getMemOpValue"; let DecoderMethod = "DecodeMemOperand"; } M -- Mark R V Murray
Oliver Stannard via llvm-dev
2019-Mar-27 14:56 UTC
[llvm-dev] Printing PC-relative offsets - how to get the instruction length?
Hi Mark, For your first question, the MCInstPrinter has a reference to the MCInstrInfo object for your target, so something like this should give you the instruction encoding size in bytes: MII.get(Op.getOpcode()).getSize() For your second question, it looks like the MCK_Imm8 operand class is matching the immediate even when it is out of range. This should be checked by a function in your assembly parser. The ImmediateAsmOperand<"Imm8"> record (which you didn't show the definition of, so I'm guessing a bit here) should have a PredicateMethod value giving the name of that function. If that's not specified, the default function name is based on the tablegen class name, which won't be correct for both Imm8 and Imm16. Note that the ImmLeaf in the code snippet you posted is only used for code generation from IR, not by the assembler. Oliver> -----Original Message----- > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Mark > R V Murray via llvm-dev > Sent: 25 March 2019 16:19 > To: llvm-dev at lists.llvm.org > Subject: [llvm-dev] Printing PC-relative offsets - how to get the > instruction length? > > Hi > > In my MC6809 backend, in > llvm/lib/Target/MC6809/InstPrinter/MC6809InstPrinter.cpp, I have the > routine > > void MC6809InstPrinter::printPCRelImmOperand(const MCInst *MI, unsigned > OpNo, raw_ostream &O) { > const MCOperand &Op = MI->getOperand(OpNo); > ZZ > if (Op.isImm()) { > int64_t Imm = Op.getImm() + 2; <<<=======================> O << "$"; > if (Imm >= 0) > O << '+'; > O << Imm; > } else { > assert(Op.isExpr() && "unknown pcrel immediate operand"); > Op.getExpr()->print(O, &MAI); > } > } > > Which works well enough except for the constant 2 that I've arrowed - it > needs to be the length of the binary instruction in bytes. The MC6809 has > a *LOT* of variability here, so a case statement would be a right pain to > maintain. > > An answer is tantalisingly close: > > $ bin/llvm-mc -triple mc6809 -show-inst-operands -show-inst -show- > encoding <<< "lda 0,pc" > .text > <stdin>:1:1: note: parsed instruction: ['lda', 0, <register 13>] > lda 0,pc > ^ > lda $+2,pc ; encoding: [0xa6,0x8c,0x00] > <<==========> ; <MCInst #1849 LDAi8oPC > ; <MCOperand Imm:0> > ; <MCOperand Imm:0>> > > The "encoding:" knows that I have a three-byte instruction, but that is > generated by another chunk of code miles away. I suppose I could > replicate that, but it seems wasteful. Is there a better way, not > involving nasty layering violations, to get the length of an instruction > in bytes in the context of > llvm/lib/Target/*/InstPrinter/*InstPrinter.cpp? > > Also, both 8 and 16-bit variants are possible. The instruction picked is > LDAi8oPC with is the 8-bit offset version. If I supply a bigger offset: > > $ bin/llvm-mc -triple mc6809 -show-inst-operands -show-inst -show- > encoding <<< "lda 1000,pc" > .text > <stdin>:1:1: note: parsed instruction: ['lda', 1000, <register 13>] > lda 1000,pc > ^ > lda $+1002,pc ; encoding: [0xa6,0x8c,0xe8] > ; <MCInst #1849 LDAi8oPC > ; <MCOperand Imm:0> > ; <MCOperand Imm:1000>> > > I still get the 8-bit variant instead of LDAi16oPC, and the operand is > truncated. > > The TableGen-generated .inc file has > > { 444 /* lda */, MC6809::LDAi8oPC, Convert__imm_95_0__Imm81_0, > AMFBS_None, { MCK_Imm8, MCK_PC }, }, > { 444 /* lda */, MC6809::LDAi16oPC, Convert__imm_95_0__Imm161_0, > AMFBS_None, { MCK_Imm16, MCK_PC }, }, > > ... so how do I get the 16-bit variant with MCK_Imm16 selected instead? > > The instructions are defined as > > def LDAi8oPC : MC6809LoadIndexed_i8oPC_P1< > (outs GR8:$dst8), > (ins pcoffset8:$offset), > !strconcat("lda", "\t", "${offset}", ",", "pc"), > 0x00, > 0xA6, > [] > > { let Inst{23-16} = offset{7-0}; let Inst{15} = 0b1; let Inst{14-13} > 0b00; let Inst{12-8} = 0b01100; let Inst{7-0} = opcode; } > > def LDAi16oPC : MC6809LoadIndexed_i16oPC_P1< > (outs GR8:$dst8), > (ins pcoffset16:$offset), > !strconcat("lda", "\t", "${offset}", ",", "pc"), > 0x00, > 0xA6, > [] > > { let Inst{31-24} = offset{7-0}; let Inst{23-16} = offset{15-8}; let > Inst{15} = 0b1; let Inst{14-13} = 0b00; let Inst{12-8} = 0b01101; let > Inst{7-0} = opcode; } > > and I have > > def pcoffset8 : Operand<i8>, ImmLeaf<i8, [{ return Immediate >= -128 && > Immediate <= 127; }]> { > let PrintMethod = "printPCRelImmOperand"; > let MIOperandInfo = (ops i8imm); > let ParserMatchClass = ImmediateAsmOperand<"Imm8">; > let EncoderMethod = "getMemOpValue"; > let DecoderMethod = "DecodeMemOperand"; > } > > def pcoffset16 : Operand<i16>, ImmLeaf<i16, [{ return Immediate >= -32768 > && Immediate <= 32767; }]> { > let PrintMethod = "printPCRelImmOperand"; > let MIOperandInfo = (ops i16imm); > let ParserMatchClass = ImmediateAsmOperand<"Imm16">; > let EncoderMethod = "getMemOpValue"; > let DecoderMethod = "DecodeMemOperand"; > } > > M > -- > Mark R V Murray > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Mark R V Murray via llvm-dev
2019-Mar-28 09:25 UTC
[llvm-dev] Printing PC-relative offsets - how to get the instruction length?
Hi Oliver, Thanks! Both your answers got me on the right track! Regarding the second, I'm now correctly parsing an immediate using an MCExpr if it is not an actual number. When does the MCExpr get resolved to an actual number? During assembly time? Or is it a Link/Fixup thing? If I have a snippet of code like (e.g.): foo equ 12 lda foo,x ... for a constant offset off the X index register. When and and by what will the foo get resolved to 12 for the LDA indstruction? M> On 27 Mar 2019, at 14:56, Oliver Stannard <Oliver.Stannard at arm.com> wrote: > > Hi Mark, > > For your first question, the MCInstPrinter has a reference to the MCInstrInfo > object for your target, so something like this should give you the instruction > encoding size in bytes: > > MII.get(Op.getOpcode()).getSize() > > For your second question, it looks like the MCK_Imm8 operand class is matching > the immediate even when it is out of range. This should be checked by a > function in your assembly parser. The ImmediateAsmOperand<"Imm8"> record (which > you didn't show the definition of, so I'm guessing a bit here) should have a > PredicateMethod value giving the name of that function. If that's not > specified, the default function name is based on the tablegen class name, which > won't be correct for both Imm8 and Imm16. Note that the ImmLeaf in the code > snippet you posted is only used for code generation from IR, not by the > assembler. > > Oliver > >> -----Original Message----- >> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Mark >> R V Murray via llvm-dev >> Sent: 25 March 2019 16:19 >> To: llvm-dev at lists.llvm.org >> Subject: [llvm-dev] Printing PC-relative offsets - how to get the >> instruction length? >> >> Hi >> >> In my MC6809 backend, in >> llvm/lib/Target/MC6809/InstPrinter/MC6809InstPrinter.cpp, I have the >> routine >> >> void MC6809InstPrinter::printPCRelImmOperand(const MCInst *MI, unsigned >> OpNo, raw_ostream &O) { >> const MCOperand &Op = MI->getOperand(OpNo); >> ZZ >> if (Op.isImm()) { >> int64_t Imm = Op.getImm() + 2; <<<=======================>> O << "$"; >> if (Imm >= 0) >> O << '+'; >> O << Imm; >> } else { >> assert(Op.isExpr() && "unknown pcrel immediate operand"); >> Op.getExpr()->print(O, &MAI); >> } >> } >> >> Which works well enough except for the constant 2 that I've arrowed - it >> needs to be the length of the binary instruction in bytes. The MC6809 has >> a *LOT* of variability here, so a case statement would be a right pain to >> maintain. >> >> An answer is tantalisingly close: >> >> $ bin/llvm-mc -triple mc6809 -show-inst-operands -show-inst -show- >> encoding <<< "lda 0,pc" >> .text >> <stdin>:1:1: note: parsed instruction: ['lda', 0, <register 13>] >> lda 0,pc >> ^ >> lda $+2,pc ; encoding: [0xa6,0x8c,0x00] >> <<==========>> ; <MCInst #1849 LDAi8oPC >> ; <MCOperand Imm:0> >> ; <MCOperand Imm:0>> >> >> The "encoding:" knows that I have a three-byte instruction, but that is >> generated by another chunk of code miles away. I suppose I could >> replicate that, but it seems wasteful. Is there a better way, not >> involving nasty layering violations, to get the length of an instruction >> in bytes in the context of >> llvm/lib/Target/*/InstPrinter/*InstPrinter.cpp? >> >> Also, both 8 and 16-bit variants are possible. The instruction picked is >> LDAi8oPC with is the 8-bit offset version. If I supply a bigger offset: >> >> $ bin/llvm-mc -triple mc6809 -show-inst-operands -show-inst -show- >> encoding <<< "lda 1000,pc" >> .text >> <stdin>:1:1: note: parsed instruction: ['lda', 1000, <register 13>] >> lda 1000,pc >> ^ >> lda $+1002,pc ; encoding: [0xa6,0x8c,0xe8] >> ; <MCInst #1849 LDAi8oPC >> ; <MCOperand Imm:0> >> ; <MCOperand Imm:1000>> >> >> I still get the 8-bit variant instead of LDAi16oPC, and the operand is >> truncated. >> >> The TableGen-generated .inc file has >> >> { 444 /* lda */, MC6809::LDAi8oPC, Convert__imm_95_0__Imm81_0, >> AMFBS_None, { MCK_Imm8, MCK_PC }, }, >> { 444 /* lda */, MC6809::LDAi16oPC, Convert__imm_95_0__Imm161_0, >> AMFBS_None, { MCK_Imm16, MCK_PC }, }, >> >> ... so how do I get the 16-bit variant with MCK_Imm16 selected instead? >> >> The instructions are defined as >> >> def LDAi8oPC : MC6809LoadIndexed_i8oPC_P1< >> (outs GR8:$dst8), >> (ins pcoffset8:$offset), >> !strconcat("lda", "\t", "${offset}", ",", "pc"), >> 0x00, >> 0xA6, >> [] >>> { let Inst{23-16} = offset{7-0}; let Inst{15} = 0b1; let Inst{14-13} >> 0b00; let Inst{12-8} = 0b01100; let Inst{7-0} = opcode; } >> >> def LDAi16oPC : MC6809LoadIndexed_i16oPC_P1< >> (outs GR8:$dst8), >> (ins pcoffset16:$offset), >> !strconcat("lda", "\t", "${offset}", ",", "pc"), >> 0x00, >> 0xA6, >> [] >>> { let Inst{31-24} = offset{7-0}; let Inst{23-16} = offset{15-8}; let >> Inst{15} = 0b1; let Inst{14-13} = 0b00; let Inst{12-8} = 0b01101; let >> Inst{7-0} = opcode; } >> >> and I have >> >> def pcoffset8 : Operand<i8>, ImmLeaf<i8, [{ return Immediate >= -128 && >> Immediate <= 127; }]> { >> let PrintMethod = "printPCRelImmOperand"; >> let MIOperandInfo = (ops i8imm); >> let ParserMatchClass = ImmediateAsmOperand<"Imm8">; >> let EncoderMethod = "getMemOpValue"; >> let DecoderMethod = "DecodeMemOperand"; >> } >> >> def pcoffset16 : Operand<i16>, ImmLeaf<i16, [{ return Immediate >= -32768 >> && Immediate <= 32767; }]> { >> let PrintMethod = "printPCRelImmOperand"; >> let MIOperandInfo = (ops i16imm); >> let ParserMatchClass = ImmediateAsmOperand<"Imm16">; >> let EncoderMethod = "getMemOpValue"; >> let DecoderMethod = "DecodeMemOperand"; >> } >> >> M >> -- >> Mark R V Murray >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Mark R V Murray