David Greene via llvm-dev
2021-Jul-07 23:27 UTC
[llvm-dev] Encoding instructions in packets
Hello, I am pretty inexperienced with the MC layer and am looking at encoding instructions for a machine with an unusual encoding scheme. Each group of three instructions is encoded in a 64-bit packet along with a handful of extra control bits. These are not VLIW packets but simply the way individual instructions are encoded. Packets may straddle basic block boundaries and control may exit or enter a packet at any of its instructions. In other words, it is purely an encoding scheme. To throw an even bigger wrench into things, each instruction is not a byte-multiple number of bits and the three instructions aren't encoded in order. The first and thirds instructions are contiguous but the third instruction is inserted in the middle of the second instruction such that it splits it. So it looks something like this: .-------------------------. | CTL | Instr 2 | Instr 1 | Word 0 +-------------------------+ | CTL | Instr 2 | Instr 3 | Word 1 '-------------------------' It seems that the MCCodeEmitter encodeInstruction assumes that a single instruction is encoded. But here I can't encode Instr 2 with that interface since its encoding isn't contiguous. Moreover, the CTL bits depend on the contents of all three instructions so can't be set until we've seen all three. Originally I had planned to "queue up" three instructions before having encodeInstruction actually write any bits, but I think the MC layer assumes encodeInstruction always writes something out. Is that true or can I actually get away with encodeInstruction not emitting anything? There is the pesky detail of handling the last packet and I don't see any kind of "end function" interface to pad the final packet if needed. Assuming that won't work, my next thought was to encode each instruction sequentially, such that each MCFragment has a single instruction, with separate fragments for the control bits. By carefully controlling how the bits are emitted I *think* (hope?) I can keep things such that MCFixup offsets remain valid. The fixup offset is from the start of the section, yes? So as long as things stay encoded that way until after fixup/relaxation/etc. it should be fine? Then in either MCAsmBackend::finishLayout (as overridden in a target class) or MCElfStreamer::finishImpl (as overridden in a target class) I could run through all the MCFragments, combining every three fragments (plus the control fragments) into a single MCFragment encoded as a packet, discarding the separate fragments I no longer need. I'm assuming here that as long as the fragments of every three instructions is size 64 bits, the fixup addresses will remain correct even after re-encoding into packets. True? I have no idea if either of these options is a viable path. I don't want to use MCInstrBundle to represent a packet because those can't straddle basic block boundaries and so I'd need to insert NOPs in every basic block that is not a multiple of three instructions. Is there another option I'm overlooking? If not, are either of the paths above viable? Thanks for your help! -David