Craig Topper via llvm-dev
2018-Jun-26 20:13 UTC
[llvm-dev] MachineFunction Instructions Pass using Segment Registers
This shouldn't have parsed. movq (%gs), %r14 That's trying to use%gs as a base register which isn't valid. GNU assembler rejects it. And coincidentally llvm-mc started rejecting it on trunk late last week. That's probably why it printed as %ebp. I don't know if there is an instruction to read the base of %gs directly. Maybe rdgsbase, but that's only available on Ivy Bridge and later CPUs.. But ussing %gs as part of the memory address for any other instruction is automatically relative to the base of %gs. ~Craig On Tue, Jun 26, 2018 at 12:57 PM K Jelesnianski <kjski at vt.edu> wrote:> Dear Craig, > > Thanks for the help so far. I have rewritten my assembly to comply > with user-land not being able to directly modify the segment registers > %GS/%FS. I used llvm-mc with -show-inst to get the equivalent LLVM > instruction + operands. Now I am working backwards to actually code > this assembly into my MachineFunctionPass and got the easy assembly > implemented, however my more complicated asm is still struggling as I > am still seeing 0x0(%rbp) instead of (%gs) or errors. > Core question here being: how do I properly create BuildMI statements > for assembly dealing with offsets? > > ------------------------------------------------------------------------------------------------- > Assembly I want to translate: > mov (%gs), %r14 //get value off %GS base addresss > mov %r15, %gs:0x0(%r14) //put value in R15 into R14:(%GS) [ (%GS) + > R14 ] > > -------------------------------------------------------------------------------------------------- > LLVM-MC -show inst gives: > movq (%gs), %r14 # <MCInst #1810 MOV64rm > # <MCOperand Reg:117> > # <MCOperand Reg:33> > # <MCOperand Imm:1> > # <MCOperand Reg:0> > # <MCOperand Imm:0> > # <MCOperand Reg:0>> > movq %r15, %gs:(%r14) # <MCInst #1803 MOV64mr > # <MCOperand Reg:117> > # <MCOperand Imm:1> > # <MCOperand Reg:0> > # <MCOperand Imm:0> > # <MCOperand Reg:33> > # <MCOperand Reg:118>> > > ------------------------------------------------------------------------------------------------------- > I'll be honest and say I don't really know how to add the operands > properly to BuildMI. I figured out the following so far > MachineInstrBuilder thing = BuildMI(MachineBB, Position in MBB , > DebugLoc(not sure what this accomplishes), TII->get( X86 instruction I > want), where instruction result goes) > > this has .add(MachineOperand) > .addReg(X86::a reg macro) > .addIMM(a constant like 0x8) > and a few more I dont think apply to me. > > but I am not sure I must follow a specific order? I am assuming yes > and it has something to do with X86InstrInfo.td definitions, but not > sure. > > -------------------------------------------------------------------------------------------------------- > LLVM C++ code I tried to translate this to: > /* 1 mov (%gs), %r14 */ > MachineInstrBuilder e1 > BuildMI(MBB,MBB.end(),DL,TII->get(X86::MOV64rm),X86::R14) > .addReg(X86::GS); > /* 2 mov %r15, %gs:0x0(%r14) */ > MachineOperand baseReg = MachineOperand::CreateReg(X86::GS,false); > MachineOperand scaleAmt = MachineOperand::CreateImm(0x1); > MachineOperand indexReg = MachineOperand::CreateReg(X86::R14,false); > MachineOperand disp = MachineOperand::CreateImm(0x0); > > BuildMI(MBB, MBB.end(), DL, TII->get(X86::MOV64mr)) > .add(baseReg) > .add(scaleAmt) > .add(indexReg); > > /* both instructions give the following error > > clang-6.0: /LLVM6.0.0/llvm/include/llvm/ADT/SmallVector.h:154: const > T& llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2> > >::operator[](llvm::SmallVectorTemplateCommon<T, > <template-parameter-1-2> >::size_type) const [with T > llvm::MCOperand; <template-parameter-1-2> = void; > llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2> > >::const_reference = const llvm::MCOperand&; > llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2> > >::size_type = long unsigned int]: Assertion `idx < size()' failed. > > I saw this function in the code base but not sure what it does > "addDirectMem(MachineInstructionBuilder_thing, register you want to > use);" > > > This is be the last bit of information I think I need to finish up > this implementation. Thanks again for your help! > > Sincerely, > > Chris Jelesnianski > > On Sat, Jun 23, 2018 at 11:32 PM, Craig Topper <craig.topper at gmail.com> > wrote: > > The size suffix thing is a weird quirk in our assembler I should look > into > > fixing. Instructions in at&t syntax usually have a size suffix that is > often > > optional > > > > For example: > > add %ax, %bx > > and > > addw %ax, %bx > > > > Are equivalent because the register name indicates the size. > > > > but for an instruction like this > > addw $1, (%ax) > > > > There is nothing to infer the size from so an explicit suffix is > required. > > > > So for an instruction like "add %ax, %bx" from above, we try to guess the > > size suffix from the register. In your case, you used a segment register > > which we couldn't guess the size from. And then we printed a bad error > > message. > > > > There's no quick reference as such for the meaning of the various > > X86::XXXXXX names. But the complete list of them is in > > lib/Target/X86/X86GenInstrInfo.inc in your build area. The names are > meant > > to be fairly straight forward to understand. The first part of the name > > should almost always be the instruction name from the Intel/AMD manuals. > The > > lower case letters at the end sort of convey operand types, but often not > > the number of operands even though it looks that way. The most common > > letters are 'r' for register, 'm' for memory and 'i' for immediate. > Numbers > > after 'i' specify the size of the immediate if its important to > distinguish > > from other sizes or different than the size of the instruction. The lower > > case letters are most useful to distinguish different instructions from > each > > other. So for example, if two instructions only differ in the lower case > > letters and one says "rr" and one says "rm", the first is the register > form > > and the second is the memory form of the same instruction. > > > > ~Craig > > > > > > On Sat, Jun 23, 2018 at 7:55 PM K Jelesnianski <kjski at vt.edu> wrote: > >> > >> Dear Craig, > >> > >> Thank you super much for the quick reply! Yea I'm still new to working > >> on the back-end and that sounds great. I already have the raw assembly > >> of what I want to accomplish so this is perfect. I just tried it and > >> yea, I will have to break down my assembly even further to more > >> simpler operations. You're right about my assembly dealing with > >> segment registers as I'm getting the following error: > >> "error: unknown use of instruction mnemonic without a size suffix" > >> > >> Just curious, what does it mean by size suffix?? > >> > >> It's super cool to see the equivalent with "-show-inst"!!! Thank you > >> so much for this help! > >> > >> Last note, I know that the definitions (e.g. def SUB32ri) of the > >> various instructions can be found in the various ****.td, but is there > >> documentation where the meaning or quick reference of every > >> X86::XXXXXX llvm instruction macro can found, so I can quickly pick > >> and choose which actual macro I need to use, to "work forwards" rather > >> than working backwards by writing the assembly first and using llvm-mc > >> -show-inst ?? > >> > >> Thanks super much again. > >> > >> Sincerely, > >> > >> Chris Jelesnianski > >> Graduate Research Assistant > >> Virginia Tech > >> > >> On Sat, Jun 23, 2018 at 8:45 PM, Craig Topper <craig.topper at gmail.com> > >> wrote: > >> > More specifically there is no instruction that can add/subtract > segment > >> > registers. They can only be updated my the mov segment register > >> > instructions, opcodes 0x8c and 0x8e in x86 assembly. > >> > > >> > I suggest you write the text version of the assembly you want to > >> > generate > >> > and assemble it with llvm-mc. This will tell you if its even valid. > >> > After > >> > that you can use -show-inst to print the names of the instructions > that > >> > X86 > >> > uses that you can give to BuildMI. > >> > > >> > ~Craig > >> > > >> > > >> > On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <craig.topper at gmail.com> > >> > wrote: > >> >> > >> >> The SUB32ri can't instruction can't operate on segment registers. It > >> >> operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets encoded only 3 or > 4 > >> >> bits > >> >> of the register value make it into the binary encoding. Objdump just > >> >> extracts those 3 or 4 bits back out and prints one of the > >> >> EAX/EBX/EDX/ECX/EBP registers that those bits correspond to. > >> >> > >> >> ~Craig > >> >> > >> >> > >> >> On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via llvm-dev > >> >> <llvm-dev at lists.llvm.org> wrote: > >> >>> > >> >>> Dear All, > >> >>> > >> >>> Currently I am trying to inject custom x86-64 assembly into a > >> >>> functions entry basic block. More specifically, I am trying to build > >> >>> assembly in a machine function pass from scratch. > >> >>> > >> >>> While the dumped machine function instruction info displays that %gs > >> >>> will be used, when I perform objdump -d on my executable I am see > that > >> >>> %gs is replaced by %ebp? Why is this happening? > >> >>> > >> >>> I know it probably has something to do with me not specifying > operands > >> >>> properly, but I cannot find enough documentation on this besides > >> >>> looking through code comments such as X86BaseInfo.cpp. I feel there > >> >>> isn't enough for me to be able to connect the dots. > >> >>> > >> >>> Below I have sample code: %gs holds a base address to a memory > >> >>> location where I am trying to store information. I am trying to > update > >> >>> the %gs register pointer location before saving more values, etc. > >> >>> > >> >>> LLVM C++ codeMachine Function pass code: > >> >>> MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL, > >> >>> TII->get(X86::SUB32ri),X86::GS) > >> >>> .addReg(X86::GS) > >> >>> .addImm(0x8); > >> >>> > >> >>> machine function pass dump: > >> >>> %gs = SUB32ri %gs, 8, implicit-def %eflags > >> >>> > >> >>> Objdump -d assembly from executable > >> >>> 400510: 81 ed 04 00 00 00 sub $0x8,%ebp > >> >>> > >> >>> > >> >>> TLDR: I am trying to create custom assembly via BuildMI() and > >> >>> manipulate > >> >>> segment > >> >>> registers via a MachineFunctionPass. > >> >>> > >> >>> I have looked at LLVMs safestack implementation, but they are > taking a > >> >>> fairly complicated hybrid approach between an IR Function pass with > >> >>> Backend support. I would like to stay as a single machinefunction > >> >>> pass. > >> >>> > >> >>> Believe me I would do this at the IR level if I didnt need to > >> >>> specifically use the segment registers. > >> >>> > >> >>> Thanks for the help in advance! > >> >>> > >> >>> Sincerely, > >> >>> > >> >>> Christopher Jelesnianski > >> >>> Graduate Research Assistant > >> >>> Virginia Tech > >> >>> _______________________________________________ > >> >>> LLVM Developers mailing list > >> >>> llvm-dev at lists.llvm.org > >> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180626/bdb59878/attachment-0001.html>
Matthias Braun via llvm-dev
2018-Jun-26 20:37 UTC
[llvm-dev] MachineFunction Instructions Pass using Segment Registers
BTW: If you work on the MI level, then I recommend to use a debug build of llvm and to pass -verify-machineinstrs to llc and it should catch you using registers that are not part of the instructions register classes. - Matthias> On Jun 26, 2018, at 1:13 PM, Craig Topper via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > This shouldn't have parsed. > > movq (%gs), %r14 > > That's trying to use%gs as a base register which isn't valid. GNU assembler rejects it. And coincidentally llvm-mc started rejecting it on trunk late last week. That's probably why it printed as %ebp. > > I don't know if there is an instruction to read the base of %gs directly. Maybe rdgsbase, but that's only available on Ivy Bridge and later CPUs.. But ussing %gs as part of the memory address for any other instruction is automatically relative to the base of %gs. > > > ~Craig > > > On Tue, Jun 26, 2018 at 12:57 PM K Jelesnianski <kjski at vt.edu <mailto:kjski at vt.edu>> wrote: > Dear Craig, > > Thanks for the help so far. I have rewritten my assembly to comply > with user-land not being able to directly modify the segment registers > %GS/%FS. I used llvm-mc with -show-inst to get the equivalent LLVM > instruction + operands. Now I am working backwards to actually code > this assembly into my MachineFunctionPass and got the easy assembly > implemented, however my more complicated asm is still struggling as I > am still seeing 0x0(%rbp) instead of (%gs) or errors. > Core question here being: how do I properly create BuildMI statements > for assembly dealing with offsets? > ------------------------------------------------------------------------------------------------- > Assembly I want to translate: > mov (%gs), %r14 //get value off %GS base addresss > mov %r15, %gs:0x0(%r14) //put value in R15 into R14:(%GS) [ (%GS) + R14 ] > -------------------------------------------------------------------------------------------------- > LLVM-MC -show inst gives: > movq (%gs), %r14 # <MCInst #1810 MOV64rm > # <MCOperand Reg:117> > # <MCOperand Reg:33> > # <MCOperand Imm:1> > # <MCOperand Reg:0> > # <MCOperand Imm:0> > # <MCOperand Reg:0>> > movq %r15, %gs:(%r14) # <MCInst #1803 MOV64mr > # <MCOperand Reg:117> > # <MCOperand Imm:1> > # <MCOperand Reg:0> > # <MCOperand Imm:0> > # <MCOperand Reg:33> > # <MCOperand Reg:118>> > ------------------------------------------------------------------------------------------------------- > I'll be honest and say I don't really know how to add the operands > properly to BuildMI. I figured out the following so far > MachineInstrBuilder thing = BuildMI(MachineBB, Position in MBB , > DebugLoc(not sure what this accomplishes), TII->get( X86 instruction I > want), where instruction result goes) > > this has .add(MachineOperand) > .addReg(X86::a reg macro) > .addIMM(a constant like 0x8) > and a few more I dont think apply to me. > > but I am not sure I must follow a specific order? I am assuming yes > and it has something to do with X86InstrInfo.td definitions, but not > sure. > -------------------------------------------------------------------------------------------------------- > LLVM C++ code I tried to translate this to: > /* 1 mov (%gs), %r14 */ > MachineInstrBuilder e1 > BuildMI(MBB,MBB.end(),DL,TII->get(X86::MOV64rm),X86::R14) > .addReg(X86::GS); > /* 2 mov %r15, %gs:0x0(%r14) */ > MachineOperand baseReg = MachineOperand::CreateReg(X86::GS,false); > MachineOperand scaleAmt = MachineOperand::CreateImm(0x1); > MachineOperand indexReg = MachineOperand::CreateReg(X86::R14,false); > MachineOperand disp = MachineOperand::CreateImm(0x0); > > BuildMI(MBB, MBB.end(), DL, TII->get(X86::MOV64mr)) > .add(baseReg) > .add(scaleAmt) > .add(indexReg); > > /* both instructions give the following error > > clang-6.0: /LLVM6.0.0/llvm/include/llvm/ADT/SmallVector.h:154: const > T& llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2> > >::operator[](llvm::SmallVectorTemplateCommon<T, > <template-parameter-1-2> >::size_type) const [with T > llvm::MCOperand; <template-parameter-1-2> = void; > llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2> > >::const_reference = const llvm::MCOperand&; > llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2> > >::size_type = long unsigned int]: Assertion `idx < size()' failed. > > I saw this function in the code base but not sure what it does > "addDirectMem(MachineInstructionBuilder_thing, register you want to > use);" > > > This is be the last bit of information I think I need to finish up > this implementation. Thanks again for your help! > > Sincerely, > > Chris Jelesnianski > > On Sat, Jun 23, 2018 at 11:32 PM, Craig Topper <craig.topper at gmail.com <mailto:craig.topper at gmail.com>> wrote: > > The size suffix thing is a weird quirk in our assembler I should look into > > fixing. Instructions in at&t syntax usually have a size suffix that is often > > optional > > > > For example: > > add %ax, %bx > > and > > addw %ax, %bx > > > > Are equivalent because the register name indicates the size. > > > > but for an instruction like this > > addw $1, (%ax) > > > > There is nothing to infer the size from so an explicit suffix is required. > > > > So for an instruction like "add %ax, %bx" from above, we try to guess the > > size suffix from the register. In your case, you used a segment register > > which we couldn't guess the size from. And then we printed a bad error > > message. > > > > There's no quick reference as such for the meaning of the various > > X86::XXXXXX names. But the complete list of them is in > > lib/Target/X86/X86GenInstrInfo.inc in your build area. The names are meant > > to be fairly straight forward to understand. The first part of the name > > should almost always be the instruction name from the Intel/AMD manuals. The > > lower case letters at the end sort of convey operand types, but often not > > the number of operands even though it looks that way. The most common > > letters are 'r' for register, 'm' for memory and 'i' for immediate. Numbers > > after 'i' specify the size of the immediate if its important to distinguish > > from other sizes or different than the size of the instruction. The lower > > case letters are most useful to distinguish different instructions from each > > other. So for example, if two instructions only differ in the lower case > > letters and one says "rr" and one says "rm", the first is the register form > > and the second is the memory form of the same instruction. > > > > ~Craig > > > > > > On Sat, Jun 23, 2018 at 7:55 PM K Jelesnianski <kjski at vt.edu <mailto:kjski at vt.edu>> wrote: > >> > >> Dear Craig, > >> > >> Thank you super much for the quick reply! Yea I'm still new to working > >> on the back-end and that sounds great. I already have the raw assembly > >> of what I want to accomplish so this is perfect. I just tried it and > >> yea, I will have to break down my assembly even further to more > >> simpler operations. You're right about my assembly dealing with > >> segment registers as I'm getting the following error: > >> "error: unknown use of instruction mnemonic without a size suffix" > >> > >> Just curious, what does it mean by size suffix?? > >> > >> It's super cool to see the equivalent with "-show-inst"!!! Thank you > >> so much for this help! > >> > >> Last note, I know that the definitions (e.g. def SUB32ri) of the > >> various instructions can be found in the various ****.td, but is there > >> documentation where the meaning or quick reference of every > >> X86::XXXXXX llvm instruction macro can found, so I can quickly pick > >> and choose which actual macro I need to use, to "work forwards" rather > >> than working backwards by writing the assembly first and using llvm-mc > >> -show-inst ?? > >> > >> Thanks super much again. > >> > >> Sincerely, > >> > >> Chris Jelesnianski > >> Graduate Research Assistant > >> Virginia Tech > >> > >> On Sat, Jun 23, 2018 at 8:45 PM, Craig Topper <craig.topper at gmail.com <mailto:craig.topper at gmail.com>> > >> wrote: > >> > More specifically there is no instruction that can add/subtract segment > >> > registers. They can only be updated my the mov segment register > >> > instructions, opcodes 0x8c and 0x8e in x86 assembly. > >> > > >> > I suggest you write the text version of the assembly you want to > >> > generate > >> > and assemble it with llvm-mc. This will tell you if its even valid. > >> > After > >> > that you can use -show-inst to print the names of the instructions that > >> > X86 > >> > uses that you can give to BuildMI. > >> > > >> > ~Craig > >> > > >> > > >> > On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <craig.topper at gmail.com <mailto:craig.topper at gmail.com>> > >> > wrote: > >> >> > >> >> The SUB32ri can't instruction can't operate on segment registers. It > >> >> operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets encoded only 3 or 4 > >> >> bits > >> >> of the register value make it into the binary encoding. Objdump just > >> >> extracts those 3 or 4 bits back out and prints one of the > >> >> EAX/EBX/EDX/ECX/EBP registers that those bits correspond to. > >> >> > >> >> ~Craig > >> >> > >> >> > >> >> On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via llvm-dev > >> >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > >> >>> > >> >>> Dear All, > >> >>> > >> >>> Currently I am trying to inject custom x86-64 assembly into a > >> >>> functions entry basic block. More specifically, I am trying to build > >> >>> assembly in a machine function pass from scratch. > >> >>> > >> >>> While the dumped machine function instruction info displays that %gs > >> >>> will be used, when I perform objdump -d on my executable I am see that > >> >>> %gs is replaced by %ebp? Why is this happening? > >> >>> > >> >>> I know it probably has something to do with me not specifying operands > >> >>> properly, but I cannot find enough documentation on this besides > >> >>> looking through code comments such as X86BaseInfo.cpp. I feel there > >> >>> isn't enough for me to be able to connect the dots. > >> >>> > >> >>> Below I have sample code: %gs holds a base address to a memory > >> >>> location where I am trying to store information. I am trying to update > >> >>> the %gs register pointer location before saving more values, etc. > >> >>> > >> >>> LLVM C++ codeMachine Function pass code: > >> >>> MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL, > >> >>> TII->get(X86::SUB32ri),X86::GS) > >> >>> .addReg(X86::GS) > >> >>> .addImm(0x8); > >> >>> > >> >>> machine function pass dump: > >> >>> %gs = SUB32ri %gs, 8, implicit-def %eflags > >> >>> > >> >>> Objdump -d assembly from executable > >> >>> 400510: 81 ed 04 00 00 00 sub $0x8,%ebp > >> >>> > >> >>> > >> >>> TLDR: I am trying to create custom assembly via BuildMI() and > >> >>> manipulate > >> >>> segment > >> >>> registers via a MachineFunctionPass. > >> >>> > >> >>> I have looked at LLVMs safestack implementation, but they are taking a > >> >>> fairly complicated hybrid approach between an IR Function pass with > >> >>> Backend support. I would like to stay as a single machinefunction > >> >>> pass. > >> >>> > >> >>> Believe me I would do this at the IR level if I didnt need to > >> >>> specifically use the segment registers. > >> >>> > >> >>> Thanks for the help in advance! > >> >>> > >> >>> Sincerely, > >> >>> > >> >>> Christopher Jelesnianski > >> >>> Graduate Research Assistant > >> >>> Virginia Tech > >> >>> _______________________________________________ > >> >>> LLVM Developers mailing list > >> >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > >> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180626/b684302d/attachment.html>
K Jelesnianski via llvm-dev
2018-Jun-27 00:49 UTC
[llvm-dev] MachineFunction Instructions Pass using Segment Registers
Dear Craig, Whoops, you're right. thats still what I theoretically want to do though. I replaced it with the following: movq %gs:0x0, %r14 - this doesn't get any complaints from gnu-as or llvm-mc Got the following LLVM-MC -show-inst movq %gs:0, %r14 # <MCInst #1810 MOV64rm # <MCOperand Reg:117> //Destination # <MCOperand Reg:0> //Base Reg # <MCOperand Imm:1> //Scale # <MCOperand Reg:0> //Index Reg # <MCOperand Imm:0> //Displacement # <MCOperand Reg:33>> //Segment Reg This looks better as 33 (%gs) is in the right spot now in the Segment spot of the MCOperands instead of the BaseReg spot, according to http://llvm.org/doxygen/X86BaseInfo_8h_source.html The only weird behavior I could not figure out was for an XOR instruction LLVM-MC -show-inst: xorq %r15, %r15 # <MCInst #15401 XOR64rr # <MCOperand Reg:118> # <MCOperand Reg:118> # <MCOperand Reg:118>> I had to use .addDef instead of .addReg My C++ code: BuildMI(MBB,MBB.end(),DL,TII->get(X86::XOR64rr),X86::R15) .addDef(X86::R15) .addReg(X86::R15); I the same error until I replaced the first instance of .addReg to .addDef. Why do I need to use .addDEF here?? Nonetheless, my machineFunctionPass compiles now with no errors, with this edit. I'm only taking the llvm-mc -show-inst information (from above) as a cue to what my C++ code should look like. I got LLVM to compile my BuildMI instructions finally. Thanks again for the help! ------------------------------------ Dear Matthias, Thanks for the tip! Both of your responses helped in my debugging. Sincerely, Chris Jelesnianski On Tue, Jun 26, 2018 at 4:37 PM, Matthias Braun <mbraun at apple.com> wrote:> BTW: If you work on the MI level, then I recommend to use a debug build of > llvm and to pass -verify-machineinstrs to llc and it should catch you using > registers that are not part of the instructions register classes. > > - Matthias > > > On Jun 26, 2018, at 1:13 PM, Craig Topper via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > This shouldn't have parsed. > > movq (%gs), %r14 > > That's trying to use%gs as a base register which isn't valid. GNU assembler > rejects it. And coincidentally llvm-mc started rejecting it on trunk late > last week. That's probably why it printed as %ebp. > > I don't know if there is an instruction to read the base of %gs directly. > Maybe rdgsbase, but that's only available on Ivy Bridge and later CPUs.. But > ussing %gs as part of the memory address for any other instruction is > automatically relative to the base of %gs. > > > ~Craig > > > On Tue, Jun 26, 2018 at 12:57 PM K Jelesnianski <kjski at vt.edu> wrote: >> >> Dear Craig, >> >> Thanks for the help so far. I have rewritten my assembly to comply >> with user-land not being able to directly modify the segment registers >> %GS/%FS. I used llvm-mc with -show-inst to get the equivalent LLVM >> instruction + operands. Now I am working backwards to actually code >> this assembly into my MachineFunctionPass and got the easy assembly >> implemented, however my more complicated asm is still struggling as I >> am still seeing 0x0(%rbp) instead of (%gs) or errors. >> Core question here being: how do I properly create BuildMI statements >> for assembly dealing with offsets? >> >> ------------------------------------------------------------------------------------------------- >> Assembly I want to translate: >> mov (%gs), %r14 //get value off %GS base addresss >> mov %r15, %gs:0x0(%r14) //put value in R15 into R14:(%GS) [ (%GS) + >> R14 ] >> >> -------------------------------------------------------------------------------------------------- >> LLVM-MC -show inst gives: >> movq (%gs), %r14 # <MCInst #1810 MOV64rm >> # <MCOperand Reg:117> >> # <MCOperand Reg:33> >> # <MCOperand Imm:1> >> # <MCOperand Reg:0> >> # <MCOperand Imm:0> >> # <MCOperand Reg:0>> >> movq %r15, %gs:(%r14) # <MCInst #1803 MOV64mr >> # <MCOperand Reg:117> >> # <MCOperand Imm:1> >> # <MCOperand Reg:0> >> # <MCOperand Imm:0> >> # <MCOperand Reg:33> >> # <MCOperand Reg:118>> >> >> ------------------------------------------------------------------------------------------------------- >> I'll be honest and say I don't really know how to add the operands >> properly to BuildMI. I figured out the following so far >> MachineInstrBuilder thing = BuildMI(MachineBB, Position in MBB , >> DebugLoc(not sure what this accomplishes), TII->get( X86 instruction I >> want), where instruction result goes) >> >> this has .add(MachineOperand) >> .addReg(X86::a reg macro) >> .addIMM(a constant like 0x8) >> and a few more I dont think apply to me. >> >> but I am not sure I must follow a specific order? I am assuming yes >> and it has something to do with X86InstrInfo.td definitions, but not >> sure. >> >> -------------------------------------------------------------------------------------------------------- >> LLVM C++ code I tried to translate this to: >> /* 1 mov (%gs), %r14 */ >> MachineInstrBuilder e1 >> BuildMI(MBB,MBB.end(),DL,TII->get(X86::MOV64rm),X86::R14) >> .addReg(X86::GS); >> /* 2 mov %r15, %gs:0x0(%r14) */ >> MachineOperand baseReg = MachineOperand::CreateReg(X86::GS,false); >> MachineOperand scaleAmt = MachineOperand::CreateImm(0x1); >> MachineOperand indexReg = MachineOperand::CreateReg(X86::R14,false); >> MachineOperand disp = MachineOperand::CreateImm(0x0); >> >> BuildMI(MBB, MBB.end(), DL, TII->get(X86::MOV64mr)) >> .add(baseReg) >> .add(scaleAmt) >> .add(indexReg); >> >> /* both instructions give the following error >> >> clang-6.0: /LLVM6.0.0/llvm/include/llvm/ADT/SmallVector.h:154: const >> T& llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2> >> >::operator[](llvm::SmallVectorTemplateCommon<T, >> <template-parameter-1-2> >::size_type) const [with T >> llvm::MCOperand; <template-parameter-1-2> = void; >> llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2> >> >::const_reference = const llvm::MCOperand&; >> llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2> >> >::size_type = long unsigned int]: Assertion `idx < size()' failed. >> >> I saw this function in the code base but not sure what it does >> "addDirectMem(MachineInstructionBuilder_thing, register you want to >> use);" >> >> >> This is be the last bit of information I think I need to finish up >> this implementation. Thanks again for your help! >> >> Sincerely, >> >> Chris Jelesnianski >> >> On Sat, Jun 23, 2018 at 11:32 PM, Craig Topper <craig.topper at gmail.com> >> wrote: >> > The size suffix thing is a weird quirk in our assembler I should look >> > into >> > fixing. Instructions in at&t syntax usually have a size suffix that is >> > often >> > optional >> > >> > For example: >> > add %ax, %bx >> > and >> > addw %ax, %bx >> > >> > Are equivalent because the register name indicates the size. >> > >> > but for an instruction like this >> > addw $1, (%ax) >> > >> > There is nothing to infer the size from so an explicit suffix is >> > required. >> > >> > So for an instruction like "add %ax, %bx" from above, we try to guess >> > the >> > size suffix from the register. In your case, you used a segment register >> > which we couldn't guess the size from. And then we printed a bad error >> > message. >> > >> > There's no quick reference as such for the meaning of the various >> > X86::XXXXXX names. But the complete list of them is in >> > lib/Target/X86/X86GenInstrInfo.inc in your build area. The names are >> > meant >> > to be fairly straight forward to understand. The first part of the name >> > should almost always be the instruction name from the Intel/AMD manuals. >> > The >> > lower case letters at the end sort of convey operand types, but often >> > not >> > the number of operands even though it looks that way. The most common >> > letters are 'r' for register, 'm' for memory and 'i' for immediate. >> > Numbers >> > after 'i' specify the size of the immediate if its important to >> > distinguish >> > from other sizes or different than the size of the instruction. The >> > lower >> > case letters are most useful to distinguish different instructions from >> > each >> > other. So for example, if two instructions only differ in the lower case >> > letters and one says "rr" and one says "rm", the first is the register >> > form >> > and the second is the memory form of the same instruction. >> > >> > ~Craig >> > >> > >> > On Sat, Jun 23, 2018 at 7:55 PM K Jelesnianski <kjski at vt.edu> wrote: >> >> >> >> Dear Craig, >> >> >> >> Thank you super much for the quick reply! Yea I'm still new to working >> >> on the back-end and that sounds great. I already have the raw assembly >> >> of what I want to accomplish so this is perfect. I just tried it and >> >> yea, I will have to break down my assembly even further to more >> >> simpler operations. You're right about my assembly dealing with >> >> segment registers as I'm getting the following error: >> >> "error: unknown use of instruction mnemonic without a size suffix" >> >> >> >> Just curious, what does it mean by size suffix?? >> >> >> >> It's super cool to see the equivalent with "-show-inst"!!! Thank you >> >> so much for this help! >> >> >> >> Last note, I know that the definitions (e.g. def SUB32ri) of the >> >> various instructions can be found in the various ****.td, but is there >> >> documentation where the meaning or quick reference of every >> >> X86::XXXXXX llvm instruction macro can found, so I can quickly pick >> >> and choose which actual macro I need to use, to "work forwards" rather >> >> than working backwards by writing the assembly first and using llvm-mc >> >> -show-inst ?? >> >> >> >> Thanks super much again. >> >> >> >> Sincerely, >> >> >> >> Chris Jelesnianski >> >> Graduate Research Assistant >> >> Virginia Tech >> >> >> >> On Sat, Jun 23, 2018 at 8:45 PM, Craig Topper <craig.topper at gmail.com> >> >> wrote: >> >> > More specifically there is no instruction that can add/subtract >> >> > segment >> >> > registers. They can only be updated my the mov segment register >> >> > instructions, opcodes 0x8c and 0x8e in x86 assembly. >> >> > >> >> > I suggest you write the text version of the assembly you want to >> >> > generate >> >> > and assemble it with llvm-mc. This will tell you if its even valid. >> >> > After >> >> > that you can use -show-inst to print the names of the instructions >> >> > that >> >> > X86 >> >> > uses that you can give to BuildMI. >> >> > >> >> > ~Craig >> >> > >> >> > >> >> > On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <craig.topper at gmail.com> >> >> > wrote: >> >> >> >> >> >> The SUB32ri can't instruction can't operate on segment registers. It >> >> >> operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets encoded only 3 or >> >> >> 4 >> >> >> bits >> >> >> of the register value make it into the binary encoding. Objdump just >> >> >> extracts those 3 or 4 bits back out and prints one of the >> >> >> EAX/EBX/EDX/ECX/EBP registers that those bits correspond to. >> >> >> >> >> >> ~Craig >> >> >> >> >> >> >> >> >> On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via llvm-dev >> >> >> <llvm-dev at lists.llvm.org> wrote: >> >> >>> >> >> >>> Dear All, >> >> >>> >> >> >>> Currently I am trying to inject custom x86-64 assembly into a >> >> >>> functions entry basic block. More specifically, I am trying to >> >> >>> build >> >> >>> assembly in a machine function pass from scratch. >> >> >>> >> >> >>> While the dumped machine function instruction info displays that >> >> >>> %gs >> >> >>> will be used, when I perform objdump -d on my executable I am see >> >> >>> that >> >> >>> %gs is replaced by %ebp? Why is this happening? >> >> >>> >> >> >>> I know it probably has something to do with me not specifying >> >> >>> operands >> >> >>> properly, but I cannot find enough documentation on this besides >> >> >>> looking through code comments such as X86BaseInfo.cpp. I feel there >> >> >>> isn't enough for me to be able to connect the dots. >> >> >>> >> >> >>> Below I have sample code: %gs holds a base address to a memory >> >> >>> location where I am trying to store information. I am trying to >> >> >>> update >> >> >>> the %gs register pointer location before saving more values, etc. >> >> >>> >> >> >>> LLVM C++ codeMachine Function pass code: >> >> >>> MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL, >> >> >>> TII->get(X86::SUB32ri),X86::GS) >> >> >>> .addReg(X86::GS) >> >> >>> .addImm(0x8); >> >> >>> >> >> >>> machine function pass dump: >> >> >>> %gs = SUB32ri %gs, 8, implicit-def %eflags >> >> >>> >> >> >>> Objdump -d assembly from executable >> >> >>> 400510: 81 ed 04 00 00 00 sub $0x8,%ebp >> >> >>> >> >> >>> >> >> >>> TLDR: I am trying to create custom assembly via BuildMI() and >> >> >>> manipulate >> >> >>> segment >> >> >>> registers via a MachineFunctionPass. >> >> >>> >> >> >>> I have looked at LLVMs safestack implementation, but they are >> >> >>> taking a >> >> >>> fairly complicated hybrid approach between an IR Function pass with >> >> >>> Backend support. I would like to stay as a single machinefunction >> >> >>> pass. >> >> >>> >> >> >>> Believe me I would do this at the IR level if I didnt need to >> >> >>> specifically use the segment registers. >> >> >>> >> >> >>> Thanks for the help in advance! >> >> >>> >> >> >>> Sincerely, >> >> >>> >> >> >>> Christopher Jelesnianski >> >> >>> Graduate Research Assistant >> >> >>> Virginia Tech >> >> >>> _______________________________________________ >> >> >>> LLVM Developers mailing list >> >> >>> llvm-dev at lists.llvm.org >> >> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >