Craig Topper via llvm-dev
2018-Jun-24 00:45 UTC
[llvm-dev] MachineFunction Instructions Pass using Segment Registers
More specifically there is no instruction that can add/subtract segment registers. They can only be updated my the mov segment register instructions, opcodes 0x8c and 0x8e in x86 assembly. I suggest you write the text version of the assembly you want to generate and assemble it with llvm-mc. This will tell you if its even valid. After that you can use -show-inst to print the names of the instructions that X86 uses that you can give to BuildMI. ~Craig On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <craig.topper at gmail.com> wrote:> The SUB32ri can't instruction can't operate on segment registers. It > operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets encoded only 3 or 4 bits > of the register value make it into the binary encoding. Objdump just > extracts those 3 or 4 bits back out and prints one of the > EAX/EBX/EDX/ECX/EBP registers that those bits correspond to. > > ~Craig > > > On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Dear All, >> >> Currently I am trying to inject custom x86-64 assembly into a >> functions entry basic block. More specifically, I am trying to build >> assembly in a machine function pass from scratch. >> >> While the dumped machine function instruction info displays that %gs >> will be used, when I perform objdump -d on my executable I am see that >> %gs is replaced by %ebp? Why is this happening? >> >> I know it probably has something to do with me not specifying operands >> properly, but I cannot find enough documentation on this besides >> looking through code comments such as X86BaseInfo.cpp. I feel there >> isn't enough for me to be able to connect the dots. >> >> Below I have sample code: %gs holds a base address to a memory >> location where I am trying to store information. I am trying to update >> the %gs register pointer location before saving more values, etc. >> >> LLVM C++ codeMachine Function pass code: >> MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL, >> TII->get(X86::SUB32ri),X86::GS) >> .addReg(X86::GS) >> .addImm(0x8); >> >> machine function pass dump: >> %gs = SUB32ri %gs, 8, implicit-def %eflags >> >> Objdump -d assembly from executable >> 400510: 81 ed 04 00 00 00 sub $0x8,%ebp >> >> >> TLDR: I am trying to create custom assembly via BuildMI() and manipulate >> segment >> registers via a MachineFunctionPass. >> >> I have looked at LLVMs safestack implementation, but they are taking a >> fairly complicated hybrid approach between an IR Function pass with >> Backend support. I would like to stay as a single machinefunction >> pass. >> >> Believe me I would do this at the IR level if I didnt need to >> specifically use the segment registers. >> >> Thanks for the help in advance! >> >> Sincerely, >> >> Christopher Jelesnianski >> Graduate Research Assistant >> Virginia Tech >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180623/1a401efc/attachment.html>
K Jelesnianski via llvm-dev
2018-Jun-24 02:55 UTC
[llvm-dev] MachineFunction Instructions Pass using Segment Registers
Dear Craig, Thank you super much for the quick reply! Yea I'm still new to working on the back-end and that sounds great. I already have the raw assembly of what I want to accomplish so this is perfect. I just tried it and yea, I will have to break down my assembly even further to more simpler operations. You're right about my assembly dealing with segment registers as I'm getting the following error: "error: unknown use of instruction mnemonic without a size suffix" Just curious, what does it mean by size suffix?? It's super cool to see the equivalent with "-show-inst"!!! Thank you so much for this help! Last note, I know that the definitions (e.g. def SUB32ri) of the various instructions can be found in the various ****.td, but is there documentation where the meaning or quick reference of every X86::XXXXXX llvm instruction macro can found, so I can quickly pick and choose which actual macro I need to use, to "work forwards" rather than working backwards by writing the assembly first and using llvm-mc -show-inst ?? Thanks super much again. Sincerely, Chris Jelesnianski Graduate Research Assistant Virginia Tech On Sat, Jun 23, 2018 at 8:45 PM, Craig Topper <craig.topper at gmail.com> wrote:> More specifically there is no instruction that can add/subtract segment > registers. They can only be updated my the mov segment register > instructions, opcodes 0x8c and 0x8e in x86 assembly. > > I suggest you write the text version of the assembly you want to generate > and assemble it with llvm-mc. This will tell you if its even valid. After > that you can use -show-inst to print the names of the instructions that X86 > uses that you can give to BuildMI. > > ~Craig > > > On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <craig.topper at gmail.com> wrote: >> >> The SUB32ri can't instruction can't operate on segment registers. It >> operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets encoded only 3 or 4 bits >> of the register value make it into the binary encoding. Objdump just >> extracts those 3 or 4 bits back out and prints one of the >> EAX/EBX/EDX/ECX/EBP registers that those bits correspond to. >> >> ~Craig >> >> >> On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >>> >>> Dear All, >>> >>> Currently I am trying to inject custom x86-64 assembly into a >>> functions entry basic block. More specifically, I am trying to build >>> assembly in a machine function pass from scratch. >>> >>> While the dumped machine function instruction info displays that %gs >>> will be used, when I perform objdump -d on my executable I am see that >>> %gs is replaced by %ebp? Why is this happening? >>> >>> I know it probably has something to do with me not specifying operands >>> properly, but I cannot find enough documentation on this besides >>> looking through code comments such as X86BaseInfo.cpp. I feel there >>> isn't enough for me to be able to connect the dots. >>> >>> Below I have sample code: %gs holds a base address to a memory >>> location where I am trying to store information. I am trying to update >>> the %gs register pointer location before saving more values, etc. >>> >>> LLVM C++ codeMachine Function pass code: >>> MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL, >>> TII->get(X86::SUB32ri),X86::GS) >>> .addReg(X86::GS) >>> .addImm(0x8); >>> >>> machine function pass dump: >>> %gs = SUB32ri %gs, 8, implicit-def %eflags >>> >>> Objdump -d assembly from executable >>> 400510: 81 ed 04 00 00 00 sub $0x8,%ebp >>> >>> >>> TLDR: I am trying to create custom assembly via BuildMI() and manipulate >>> segment >>> registers via a MachineFunctionPass. >>> >>> I have looked at LLVMs safestack implementation, but they are taking a >>> fairly complicated hybrid approach between an IR Function pass with >>> Backend support. I would like to stay as a single machinefunction >>> pass. >>> >>> Believe me I would do this at the IR level if I didnt need to >>> specifically use the segment registers. >>> >>> Thanks for the help in advance! >>> >>> Sincerely, >>> >>> Christopher Jelesnianski >>> Graduate Research Assistant >>> Virginia Tech >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Craig Topper via llvm-dev
2018-Jun-24 03:32 UTC
[llvm-dev] MachineFunction Instructions Pass using Segment Registers
The size suffix thing is a weird quirk in our assembler I should look into fixing. Instructions in at&t syntax usually have a size suffix that is often optional For example: add %ax, %bx and addw %ax, %bx Are equivalent because the register name indicates the size. but for an instruction like this addw $1, (%ax) There is nothing to infer the size from so an explicit suffix is required. So for an instruction like "add %ax, %bx" from above, we try to guess the size suffix from the register. In your case, you used a segment register which we couldn't guess the size from. And then we printed a bad error message. There's no quick reference as such for the meaning of the various X86::XXXXXX names. But the complete list of them is in lib/Target/X86/X86GenInstrInfo.inc in your build area. The names are meant to be fairly straight forward to understand. The first part of the name should almost always be the instruction name from the Intel/AMD manuals. The lower case letters at the end sort of convey operand types, but often not the number of operands even though it looks that way. The most common letters are 'r' for register, 'm' for memory and 'i' for immediate. Numbers after 'i' specify the size of the immediate if its important to distinguish from other sizes or different than the size of the instruction. The lower case letters are most useful to distinguish different instructions from each other. So for example, if two instructions only differ in the lower case letters and one says "rr" and one says "rm", the first is the register form and the second is the memory form of the same instruction. ~Craig On Sat, Jun 23, 2018 at 7:55 PM K Jelesnianski <kjski at vt.edu> wrote:> Dear Craig, > > Thank you super much for the quick reply! Yea I'm still new to working > on the back-end and that sounds great. I already have the raw assembly > of what I want to accomplish so this is perfect. I just tried it and > yea, I will have to break down my assembly even further to more > simpler operations. You're right about my assembly dealing with > segment registers as I'm getting the following error: > "error: unknown use of instruction mnemonic without a size suffix" > > Just curious, what does it mean by size suffix?? > > It's super cool to see the equivalent with "-show-inst"!!! Thank you > so much for this help! > > Last note, I know that the definitions (e.g. def SUB32ri) of the > various instructions can be found in the various ****.td, but is there > documentation where the meaning or quick reference of every > X86::XXXXXX llvm instruction macro can found, so I can quickly pick > and choose which actual macro I need to use, to "work forwards" rather > than working backwards by writing the assembly first and using llvm-mc > -show-inst ?? > > Thanks super much again. > > Sincerely, > > Chris Jelesnianski > Graduate Research Assistant > Virginia Tech > > On Sat, Jun 23, 2018 at 8:45 PM, Craig Topper <craig.topper at gmail.com> > wrote: > > More specifically there is no instruction that can add/subtract segment > > registers. They can only be updated my the mov segment register > > instructions, opcodes 0x8c and 0x8e in x86 assembly. > > > > I suggest you write the text version of the assembly you want to generate > > and assemble it with llvm-mc. This will tell you if its even valid. After > > that you can use -show-inst to print the names of the instructions that > X86 > > uses that you can give to BuildMI. > > > > ~Craig > > > > > > On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <craig.topper at gmail.com> > wrote: > >> > >> The SUB32ri can't instruction can't operate on segment registers. It > >> operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets encoded only 3 or 4 > bits > >> of the register value make it into the binary encoding. Objdump just > >> extracts those 3 or 4 bits back out and prints one of the > >> EAX/EBX/EDX/ECX/EBP registers that those bits correspond to. > >> > >> ~Craig > >> > >> > >> On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via llvm-dev > >> <llvm-dev at lists.llvm.org> wrote: > >>> > >>> Dear All, > >>> > >>> Currently I am trying to inject custom x86-64 assembly into a > >>> functions entry basic block. More specifically, I am trying to build > >>> assembly in a machine function pass from scratch. > >>> > >>> While the dumped machine function instruction info displays that %gs > >>> will be used, when I perform objdump -d on my executable I am see that > >>> %gs is replaced by %ebp? Why is this happening? > >>> > >>> I know it probably has something to do with me not specifying operands > >>> properly, but I cannot find enough documentation on this besides > >>> looking through code comments such as X86BaseInfo.cpp. I feel there > >>> isn't enough for me to be able to connect the dots. > >>> > >>> Below I have sample code: %gs holds a base address to a memory > >>> location where I am trying to store information. I am trying to update > >>> the %gs register pointer location before saving more values, etc. > >>> > >>> LLVM C++ codeMachine Function pass code: > >>> MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL, > >>> TII->get(X86::SUB32ri),X86::GS) > >>> .addReg(X86::GS) > >>> .addImm(0x8); > >>> > >>> machine function pass dump: > >>> %gs = SUB32ri %gs, 8, implicit-def %eflags > >>> > >>> Objdump -d assembly from executable > >>> 400510: 81 ed 04 00 00 00 sub $0x8,%ebp > >>> > >>> > >>> TLDR: I am trying to create custom assembly via BuildMI() and > manipulate > >>> segment > >>> registers via a MachineFunctionPass. > >>> > >>> I have looked at LLVMs safestack implementation, but they are taking a > >>> fairly complicated hybrid approach between an IR Function pass with > >>> Backend support. I would like to stay as a single machinefunction > >>> pass. > >>> > >>> Believe me I would do this at the IR level if I didnt need to > >>> specifically use the segment registers. > >>> > >>> Thanks for the help in advance! > >>> > >>> Sincerely, > >>> > >>> Christopher Jelesnianski > >>> Graduate Research Assistant > >>> Virginia Tech > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> llvm-dev at lists.llvm.org > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180623/84ebd911/attachment.html>