thr3ads.net - llvm dev - [llvm-dev] MachineFunction Instructions Pass using Segment Registers [Jun 2018]

If this information is useful, please help other people find it:
Share via:

Craig Topper via llvm-dev

2018-Jun-26 20:13 UTC

[llvm-dev] MachineFunction Instructions Pass using Segment Registers

This shouldn't have parsed.

movq    (%gs), %r14

That's trying to use%gs as a base register which isn't valid. GNU
assembler
rejects it. And coincidentally llvm-mc started rejecting it on trunk late
last week.  That's probably why it printed as %ebp.

I don't know if there is an instruction to read the base of %gs directly.
Maybe rdgsbase, but that's only available on Ivy Bridge and later CPUs..
But ussing %gs as part of the memory address for any other instruction is
automatically relative to the base of %gs.


~Craig


On Tue, Jun 26, 2018 at 12:57 PM K Jelesnianski <kjski at vt.edu> wrote:
> Dear Craig,
>
> Thanks for the help so far. I have rewritten my assembly to comply
> with user-land not being able to directly modify the segment registers
> %GS/%FS. I used llvm-mc with -show-inst to get the equivalent LLVM
> instruction + operands. Now I am working backwards to actually code
> this assembly into my MachineFunctionPass and got the easy assembly
> implemented, however my more complicated asm is still struggling as I
> am still seeing 0x0(%rbp) instead of (%gs) or errors.
> Core question here being: how do I properly create BuildMI statements
> for assembly dealing with offsets?
>
>
-------------------------------------------------------------------------------------------------
> Assembly I want to translate:
> mov   (%gs), %r14                  //get value off %GS base addresss
> mov %r15, %gs:0x0(%r14)     //put value in R15 into R14:(%GS)  [ (%GS) +
> R14 ]
>
>
--------------------------------------------------------------------------------------------------
> LLVM-MC -show inst gives:
> movq    (%gs), %r14          # <MCInst #1810 MOV64rm
>                                         #  <MCOperand Reg:117>
>                                         #  <MCOperand Reg:33>
>                                         #  <MCOperand Imm:1>
>                                         #  <MCOperand Reg:0>
>                                         #  <MCOperand Imm:0>
>                                         #  <MCOperand Reg:0>>
> movq    %r15, %gs:(%r14)        # <MCInst #1803 MOV64mr
>                                         #  <MCOperand Reg:117>
>                                         #  <MCOperand Imm:1>
>                                         #  <MCOperand Reg:0>
>                                         #  <MCOperand Imm:0>
>                                         #  <MCOperand Reg:33>
>                                         #  <MCOperand Reg:118>>
>
>
-------------------------------------------------------------------------------------------------------
> I'll be honest and say I don't really know how to add the operands
> properly to BuildMI. I figured out the following so far
> MachineInstrBuilder thing = BuildMI(MachineBB, Position in MBB ,
> DebugLoc(not sure what this accomplishes), TII->get( X86 instruction I
> want), where instruction result goes)
>
> this has .add(MachineOperand)
>             .addReg(X86::a reg macro)
>             .addIMM(a constant like 0x8)
>             and a few more I dont think apply to me.
>
> but I am not sure I must follow a specific order? I am assuming yes
> and it has something to do with X86InstrInfo.td definitions, but not
> sure.
>
>
--------------------------------------------------------------------------------------------------------
> LLVM C++ code I tried to translate this to:
> /* 1 mov   (%gs), %r14 */
>     MachineInstrBuilder e1 >
BuildMI(MBB,MBB.end(),DL,TII->get(X86::MOV64rm),X86::R14)
>        .addReg(X86::GS);
> /* 2 mov %r15, %gs:0x0(%r14) */
>     MachineOperand baseReg = MachineOperand::CreateReg(X86::GS,false);
>     MachineOperand scaleAmt = MachineOperand::CreateImm(0x1);
>     MachineOperand indexReg = MachineOperand::CreateReg(X86::R14,false);
>     MachineOperand disp = MachineOperand::CreateImm(0x0);
>
>     BuildMI(MBB, MBB.end(), DL, TII->get(X86::MOV64mr))
>       .add(baseReg)
>       .add(scaleAmt)
>       .add(indexReg);
>
> /* both instructions give the following error
>
> clang-6.0: /LLVM6.0.0/llvm/include/llvm/ADT/SmallVector.h:154: const
> T& llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
> >::operator[](llvm::SmallVectorTemplateCommon<T,
> <template-parameter-1-2> >::size_type) const [with T >
llvm::MCOperand; <template-parameter-1-2> = void;
> llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
> >::const_reference = const llvm::MCOperand&;
> llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
> >::size_type = long unsigned int]: Assertion `idx < size()'
failed.
>
> I saw this function in the code base but not sure what it does
> "addDirectMem(MachineInstructionBuilder_thing, register you want to
> use);"
>
>
> This is be the last bit of information I think I need to finish up
> this implementation. Thanks again for your help!
>
> Sincerely,
>
> Chris Jelesnianski
>
> On Sat, Jun 23, 2018 at 11:32 PM, Craig Topper <craig.topper at
gmail.com>
> wrote:
> > The size suffix thing is a weird quirk in our assembler I should look
> into
> > fixing. Instructions in at&t syntax usually have a size suffix
that is
> often
> > optional
> >
> > For example:
> >   add %ax, %bx
> > and
> >   addw %ax, %bx
> >
> > Are equivalent because the register name indicates the size.
> >
> > but for an instruction like this
> >   addw $1, (%ax)
> >
> > There is nothing to infer the size from so an explicit suffix is
> required.
> >
> > So for an instruction like "add %ax, %bx" from above, we try
to guess the
> > size suffix from the register. In your case, you used a segment
register
> > which we couldn't guess the size from. And then we printed a bad
error
> > message.
> >
> > There's no quick reference as such for the meaning of the various
> > X86::XXXXXX names. But the complete list of them is in
> > lib/Target/X86/X86GenInstrInfo.inc in your build area. The names are
> meant
> > to be fairly straight forward to understand. The first part of the
name
> > should almost always be the instruction name from the Intel/AMD
manuals.
> The
> > lower case letters at the end sort of convey operand types, but often
not
> > the number of operands even though it looks that way. The most common
> > letters are 'r' for register, 'm' for memory and
'i' for immediate.
> Numbers
> > after 'i' specify the size of the immediate if its important
to
> distinguish
> > from other sizes or different than the size of the instruction. The
lower
> > case letters are most useful to distinguish different instructions
from
> each
> > other. So for example, if two instructions only differ in the lower
case
> > letters and one says "rr" and one says "rm", the
first is the register
> form
> > and the second is the memory form of the same instruction.
> >
> > ~Craig
> >
> >
> > On Sat, Jun 23, 2018 at 7:55 PM K Jelesnianski <kjski at vt.edu>
wrote:
> >>
> >> Dear Craig,
> >>
> >> Thank you super much for the quick reply! Yea I'm still new to
working
> >> on the back-end and that sounds great. I already have the raw
assembly
> >> of what I want to accomplish so this is perfect. I just tried it
and
> >> yea, I will have to break down my assembly even further to more
> >> simpler operations. You're right about my assembly dealing
with
> >> segment registers as I'm getting the following error:
> >> "error: unknown use of instruction mnemonic without a size
suffix"
> >>
> >> Just curious, what does it mean by size suffix??
> >>
> >> It's super cool to see the equivalent with
"-show-inst"!!! Thank you
> >> so much for this help!
> >>
> >> Last note, I know that the definitions (e.g. def SUB32ri) of the
> >> various instructions can be found in the various ****.td, but is
there
> >> documentation where the meaning or quick reference of every
> >> X86::XXXXXX llvm instruction macro can found, so I can quickly
pick
> >> and choose which actual macro I need to use, to "work
forwards" rather
> >> than working backwards by writing the assembly first and using
llvm-mc
> >> -show-inst  ??
> >>
> >> Thanks super much again.
> >>
> >> Sincerely,
> >>
> >> Chris Jelesnianski
> >> Graduate Research Assistant
> >> Virginia Tech
> >>
> >> On Sat, Jun 23, 2018 at 8:45 PM, Craig Topper <craig.topper at
gmail.com>
> >> wrote:
> >> > More specifically there is no instruction that can
add/subtract
> segment
> >> > registers. They can only be updated my the mov segment
register
> >> > instructions, opcodes 0x8c and 0x8e in x86 assembly.
> >> >
> >> > I suggest you write the text version of the assembly you want
to
> >> > generate
> >> > and assemble it with llvm-mc. This will tell you if its even
valid.
> >> > After
> >> > that you can use -show-inst to print the names of the
instructions
> that
> >> > X86
> >> > uses that you can give to BuildMI.
> >> >
> >> > ~Craig
> >> >
> >> >
> >> > On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <craig.topper
at gmail.com>
> >> > wrote:
> >> >>
> >> >> The SUB32ri can't instruction can't operate on
segment registers. It
> >> >> operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets
encoded only 3 or
> 4
> >> >> bits
> >> >> of the register value make it into the binary encoding.
Objdump just
> >> >> extracts those 3 or 4 bits back out and prints one of the
> >> >> EAX/EBX/EDX/ECX/EBP registers that those bits correspond
to.
> >> >>
> >> >> ~Craig
> >> >>
> >> >>
> >> >> On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via
llvm-dev
> >> >> <llvm-dev at lists.llvm.org> wrote:
> >> >>>
> >> >>> Dear All,
> >> >>>
> >> >>> Currently I am trying to inject custom x86-64
assembly into a
> >> >>> functions entry basic block. More specifically, I am
trying to build
> >> >>> assembly in a machine function pass from scratch.
> >> >>>
> >> >>> While the dumped machine function instruction info
displays that %gs
> >> >>> will be used, when I perform objdump -d on my
executable I am see
> that
> >> >>> %gs is replaced by %ebp? Why is this happening?
> >> >>>
> >> >>> I know it probably has something to do with me not
specifying
> operands
> >> >>> properly, but I cannot find enough documentation on
this besides
> >> >>> looking through code comments such as
X86BaseInfo.cpp. I feel there
> >> >>> isn't enough for me to be able to connect the
dots.
> >> >>>
> >> >>> Below I have sample code: %gs holds a base address to
a memory
> >> >>> location where I am trying to store information. I am
trying to
> update
> >> >>> the %gs register pointer location before saving more
values, etc.
> >> >>>
> >> >>> LLVM C++ codeMachine Function pass code:
> >> >>> MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(),
DL,
> >> >>> TII->get(X86::SUB32ri),X86::GS)
> >> >>>                     .addReg(X86::GS)
> >> >>>                     .addImm(0x8);
> >> >>>
> >> >>> machine function pass dump:
> >> >>>  %gs = SUB32ri %gs, 8, implicit-def %eflags
> >> >>>
> >> >>> Objdump -d assembly from executable
> >> >>>   400510:   81 ed 04 00 00 00       sub    $0x8,%ebp
> >> >>>
> >> >>>
> >> >>> TLDR: I am trying to create custom assembly via
BuildMI() and
> >> >>> manipulate
> >> >>> segment
> >> >>> registers via a MachineFunctionPass.
> >> >>>
> >> >>> I have looked at LLVMs safestack implementation, but
they are
> taking a
> >> >>> fairly complicated hybrid approach between an IR
Function pass with
> >> >>> Backend support. I would like to stay as a single
machinefunction
> >> >>> pass.
> >> >>>
> >> >>> Believe me I would do this at the IR level if I didnt
need to
> >> >>> specifically use the segment registers.
> >> >>>
> >> >>> Thanks for the help in advance!
> >> >>>
> >> >>> Sincerely,
> >> >>>
> >> >>> Christopher Jelesnianski
> >> >>> Graduate Research Assistant
> >> >>> Virginia Tech
> >> >>> _______________________________________________
> >> >>> LLVM Developers mailing list
> >> >>> llvm-dev at lists.llvm.org
> >> >>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180626/bdb59878/attachment-0001.html>

Matthias Braun via llvm-dev

2018-Jun-26 20:37 UTC

head link

[llvm-dev] MachineFunction Instructions Pass using Segment Registers

BTW: If you work on the MI level, then I recommend to use a debug build of llvm
and to pass -verify-machineinstrs to llc and it should catch you using registers
that are not part of the instructions register classes.

- Matthias
> On Jun 26, 2018, at 1:13 PM, Craig Topper via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> This shouldn't have parsed. 
> 
> movq    (%gs), %r14 
> 
> That's trying to use%gs as a base register which isn't valid. GNU
assembler rejects it. And coincidentally llvm-mc started rejecting it on trunk
late last week.  That's probably why it printed as %ebp.
> 
> I don't know if there is an instruction to read the base of %gs
directly. Maybe rdgsbase, but that's only available on Ivy Bridge and later
CPUs.. But ussing %gs as part of the memory address for any other instruction is
automatically relative to the base of %gs.
> 
>   
> ~Craig
> 
> 
> On Tue, Jun 26, 2018 at 12:57 PM K Jelesnianski <kjski at vt.edu
<mailto:kjski at vt.edu>> wrote:
> Dear Craig,
> 
> Thanks for the help so far. I have rewritten my assembly to comply
> with user-land not being able to directly modify the segment registers
> %GS/%FS. I used llvm-mc with -show-inst to get the equivalent LLVM
> instruction + operands. Now I am working backwards to actually code
> this assembly into my MachineFunctionPass and got the easy assembly
> implemented, however my more complicated asm is still struggling as I
> am still seeing 0x0(%rbp) instead of (%gs) or errors.
> Core question here being: how do I properly create BuildMI statements
> for assembly dealing with offsets?
>
-------------------------------------------------------------------------------------------------
> Assembly I want to translate:
> mov   (%gs), %r14                  //get value off %GS base addresss
> mov %r15, %gs:0x0(%r14)     //put value in R15 into R14:(%GS)  [ (%GS) +
R14 ]
>
--------------------------------------------------------------------------------------------------
> LLVM-MC -show inst gives:
> movq    (%gs), %r14          # <MCInst #1810 MOV64rm
>                                         #  <MCOperand Reg:117>
>                                         #  <MCOperand Reg:33>
>                                         #  <MCOperand Imm:1>
>                                         #  <MCOperand Reg:0>
>                                         #  <MCOperand Imm:0>
>                                         #  <MCOperand Reg:0>>
> movq    %r15, %gs:(%r14)        # <MCInst #1803 MOV64mr
>                                         #  <MCOperand Reg:117>
>                                         #  <MCOperand Imm:1>
>                                         #  <MCOperand Reg:0>
>                                         #  <MCOperand Imm:0>
>                                         #  <MCOperand Reg:33>
>                                         #  <MCOperand Reg:118>>
>
-------------------------------------------------------------------------------------------------------
> I'll be honest and say I don't really know how to add the operands
> properly to BuildMI. I figured out the following so far
> MachineInstrBuilder thing = BuildMI(MachineBB, Position in MBB ,
> DebugLoc(not sure what this accomplishes), TII->get( X86 instruction I
> want), where instruction result goes)
> 
> this has .add(MachineOperand)
>             .addReg(X86::a reg macro)
>             .addIMM(a constant like 0x8)
>             and a few more I dont think apply to me.
> 
> but I am not sure I must follow a specific order? I am assuming yes
> and it has something to do with X86InstrInfo.td definitions, but not
> sure.
>
--------------------------------------------------------------------------------------------------------
> LLVM C++ code I tried to translate this to:
> /* 1 mov   (%gs), %r14 */
>     MachineInstrBuilder e1 >
BuildMI(MBB,MBB.end(),DL,TII->get(X86::MOV64rm),X86::R14)
>        .addReg(X86::GS);
> /* 2 mov %r15, %gs:0x0(%r14) */
>     MachineOperand baseReg = MachineOperand::CreateReg(X86::GS,false);
>     MachineOperand scaleAmt = MachineOperand::CreateImm(0x1);
>     MachineOperand indexReg = MachineOperand::CreateReg(X86::R14,false);
>     MachineOperand disp = MachineOperand::CreateImm(0x0);
> 
>     BuildMI(MBB, MBB.end(), DL, TII->get(X86::MOV64mr))
>       .add(baseReg)
>       .add(scaleAmt)
>       .add(indexReg);
> 
> /* both instructions give the following error
> 
> clang-6.0: /LLVM6.0.0/llvm/include/llvm/ADT/SmallVector.h:154: const
> T& llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
> >::operator[](llvm::SmallVectorTemplateCommon<T,
> <template-parameter-1-2> >::size_type) const [with T >
llvm::MCOperand; <template-parameter-1-2> = void;
> llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
> >::const_reference = const llvm::MCOperand&;
> llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
> >::size_type = long unsigned int]: Assertion `idx < size()'
failed.
> 
> I saw this function in the code base but not sure what it does
> "addDirectMem(MachineInstructionBuilder_thing, register you want to
> use);"
> 
> 
> This is be the last bit of information I think I need to finish up
> this implementation. Thanks again for your help!
> 
> Sincerely,
> 
> Chris Jelesnianski
> 
> On Sat, Jun 23, 2018 at 11:32 PM, Craig Topper <craig.topper at
gmail.com <mailto:craig.topper at gmail.com>> wrote:
> > The size suffix thing is a weird quirk in our assembler I should look
into
> > fixing. Instructions in at&t syntax usually have a size suffix
that is often
> > optional
> >
> > For example:
> >   add %ax, %bx
> > and
> >   addw %ax, %bx
> >
> > Are equivalent because the register name indicates the size.
> >
> > but for an instruction like this
> >   addw $1, (%ax)
> >
> > There is nothing to infer the size from so an explicit suffix is
required.
> >
> > So for an instruction like "add %ax, %bx" from above, we try
to guess the
> > size suffix from the register. In your case, you used a segment
register
> > which we couldn't guess the size from. And then we printed a bad
error
> > message.
> >
> > There's no quick reference as such for the meaning of the various
> > X86::XXXXXX names. But the complete list of them is in
> > lib/Target/X86/X86GenInstrInfo.inc in your build area. The names are
meant
> > to be fairly straight forward to understand. The first part of the
name
> > should almost always be the instruction name from the Intel/AMD
manuals. The
> > lower case letters at the end sort of convey operand types, but often
not
> > the number of operands even though it looks that way. The most common
> > letters are 'r' for register, 'm' for memory and
'i' for immediate. Numbers
> > after 'i' specify the size of the immediate if its important
to distinguish
> > from other sizes or different than the size of the instruction. The
lower
> > case letters are most useful to distinguish different instructions
from each
> > other. So for example, if two instructions only differ in the lower
case
> > letters and one says "rr" and one says "rm", the
first is the register form
> > and the second is the memory form of the same instruction.
> >
> > ~Craig
> >
> >
> > On Sat, Jun 23, 2018 at 7:55 PM K Jelesnianski <kjski at vt.edu
<mailto:kjski at vt.edu>> wrote:
> >>
> >> Dear Craig,
> >>
> >> Thank you super much for the quick reply! Yea I'm still new to
working
> >> on the back-end and that sounds great. I already have the raw
assembly
> >> of what I want to accomplish so this is perfect. I just tried it
and
> >> yea, I will have to break down my assembly even further to more
> >> simpler operations. You're right about my assembly dealing
with
> >> segment registers as I'm getting the following error:
> >> "error: unknown use of instruction mnemonic without a size
suffix"
> >>
> >> Just curious, what does it mean by size suffix??
> >>
> >> It's super cool to see the equivalent with
"-show-inst"!!! Thank you
> >> so much for this help!
> >>
> >> Last note, I know that the definitions (e.g. def SUB32ri) of the
> >> various instructions can be found in the various ****.td, but is
there
> >> documentation where the meaning or quick reference of every
> >> X86::XXXXXX llvm instruction macro can found, so I can quickly
pick
> >> and choose which actual macro I need to use, to "work
forwards" rather
> >> than working backwards by writing the assembly first and using
llvm-mc
> >> -show-inst  ??
> >>
> >> Thanks super much again.
> >>
> >> Sincerely,
> >>
> >> Chris Jelesnianski
> >> Graduate Research Assistant
> >> Virginia Tech
> >>
> >> On Sat, Jun 23, 2018 at 8:45 PM, Craig Topper <craig.topper at
gmail.com <mailto:craig.topper at gmail.com>>
> >> wrote:
> >> > More specifically there is no instruction that can
add/subtract segment
> >> > registers. They can only be updated my the mov segment
register
> >> > instructions, opcodes 0x8c and 0x8e in x86 assembly.
> >> >
> >> > I suggest you write the text version of the assembly you want
to
> >> > generate
> >> > and assemble it with llvm-mc. This will tell you if its even
valid.
> >> > After
> >> > that you can use -show-inst to print the names of the
instructions that
> >> > X86
> >> > uses that you can give to BuildMI.
> >> >
> >> > ~Craig
> >> >
> >> >
> >> > On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <craig.topper
at gmail.com <mailto:craig.topper at gmail.com>>
> >> > wrote:
> >> >>
> >> >> The SUB32ri can't instruction can't operate on
segment registers. It
> >> >> operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets
encoded only 3 or 4
> >> >> bits
> >> >> of the register value make it into the binary encoding.
Objdump just
> >> >> extracts those 3 or 4 bits back out and prints one of the
> >> >> EAX/EBX/EDX/ECX/EBP registers that those bits correspond
to.
> >> >>
> >> >> ~Craig
> >> >>
> >> >>
> >> >> On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via
llvm-dev
> >> >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> >> >>>
> >> >>> Dear All,
> >> >>>
> >> >>> Currently I am trying to inject custom x86-64
assembly into a
> >> >>> functions entry basic block. More specifically, I am
trying to build
> >> >>> assembly in a machine function pass from scratch.
> >> >>>
> >> >>> While the dumped machine function instruction info
displays that %gs
> >> >>> will be used, when I perform objdump -d on my
executable I am see that
> >> >>> %gs is replaced by %ebp? Why is this happening?
> >> >>>
> >> >>> I know it probably has something to do with me not
specifying operands
> >> >>> properly, but I cannot find enough documentation on
this besides
> >> >>> looking through code comments such as
X86BaseInfo.cpp. I feel there
> >> >>> isn't enough for me to be able to connect the
dots.
> >> >>>
> >> >>> Below I have sample code: %gs holds a base address to
a memory
> >> >>> location where I am trying to store information. I am
trying to update
> >> >>> the %gs register pointer location before saving more
values, etc.
> >> >>>
> >> >>> LLVM C++ codeMachine Function pass code:
> >> >>> MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(),
DL,
> >> >>> TII->get(X86::SUB32ri),X86::GS)
> >> >>>                     .addReg(X86::GS)
> >> >>>                     .addImm(0x8);
> >> >>>
> >> >>> machine function pass dump:
> >> >>>  %gs = SUB32ri %gs, 8, implicit-def %eflags
> >> >>>
> >> >>> Objdump -d assembly from executable
> >> >>>   400510:   81 ed 04 00 00 00       sub    $0x8,%ebp
> >> >>>
> >> >>>
> >> >>> TLDR: I am trying to create custom assembly via
BuildMI() and
> >> >>> manipulate
> >> >>> segment
> >> >>> registers via a MachineFunctionPass.
> >> >>>
> >> >>> I have looked at LLVMs safestack implementation, but
they are taking a
> >> >>> fairly complicated hybrid approach between an IR
Function pass with
> >> >>> Backend support. I would like to stay as a single
machinefunction
> >> >>> pass.
> >> >>>
> >> >>> Believe me I would do this at the IR level if I didnt
need to
> >> >>> specifically use the segment registers.
> >> >>>
> >> >>> Thanks for the help in advance!
> >> >>>
> >> >>> Sincerely,
> >> >>>
> >> >>> Christopher Jelesnianski
> >> >>> Graduate Research Assistant
> >> >>> Virginia Tech
> >> >>> _______________________________________________
> >> >>> LLVM Developers mailing list
> >> >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
> >> >>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180626/b684302d/attachment.html>

K Jelesnianski via llvm-dev

2018-Jun-27 00:49 UTC

head link

[llvm-dev] MachineFunction Instructions Pass using Segment Registers

Dear Craig,

Whoops, you're right. thats still what I theoretically want to do
though. I replaced it with the following:

movq %gs:0x0, %r14   - this doesn't get any complaints from gnu-as or
llvm-mc

Got the following LLVM-MC -show-inst
movq    %gs:0, %r14             # <MCInst #1810 MOV64rm
                                        #  <MCOperand Reg:117>
       //Destination
                                        #  <MCOperand Reg:0>
        //Base Reg
                                        #  <MCOperand Imm:1>
       //Scale
                                        #  <MCOperand Reg:0>
        //Index Reg
                                        #  <MCOperand Imm:0>
       //Displacement
                                        #  <MCOperand Reg:33>>
      //Segment Reg

This looks better as 33 (%gs) is in the right spot now in the Segment
spot of the MCOperands instead of the BaseReg spot, according to
http://llvm.org/doxygen/X86BaseInfo_8h_source.html

The only weird behavior I could not figure out was for an XOR instruction
LLVM-MC -show-inst:
xorq    %r15, %r15              # <MCInst #15401 XOR64rr
                                        #  <MCOperand Reg:118>
                                        #  <MCOperand Reg:118>
                                        #  <MCOperand Reg:118>>

I had to use .addDef instead of .addReg
My C++ code:
    BuildMI(MBB,MBB.end(),DL,TII->get(X86::XOR64rr),X86::R15)
      .addDef(X86::R15)
      .addReg(X86::R15);

I the same error until I replaced the first instance of .addReg to
.addDef. Why do I need to use .addDEF here??

Nonetheless, my machineFunctionPass compiles now with no errors, with this edit.
I'm only taking the llvm-mc -show-inst information (from above) as a
cue to what my C++ code should look like. I got LLVM to compile my
BuildMI instructions finally. Thanks again for the help!
------------------------------------
Dear Matthias,

Thanks for the tip! Both of your responses helped in my debugging.

Sincerely,

Chris Jelesnianski

On Tue, Jun 26, 2018 at 4:37 PM, Matthias Braun <mbraun at apple.com>
wrote:> BTW: If you work on the MI level, then I recommend to use a debug build of
> llvm and to pass -verify-machineinstrs to llc and it should catch you using
> registers that are not part of the instructions register classes.
>
> - Matthias
>
>
> On Jun 26, 2018, at 1:13 PM, Craig Topper via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> This shouldn't have parsed.
>
> movq    (%gs), %r14
>
> That's trying to use%gs as a base register which isn't valid. GNU
assembler
> rejects it. And coincidentally llvm-mc started rejecting it on trunk late
> last week.  That's probably why it printed as %ebp.
>
> I don't know if there is an instruction to read the base of %gs
directly.
> Maybe rdgsbase, but that's only available on Ivy Bridge and later
CPUs.. But
> ussing %gs as part of the memory address for any other instruction is
> automatically relative to the base of %gs.
>
>
> ~Craig
>
>
> On Tue, Jun 26, 2018 at 12:57 PM K Jelesnianski <kjski at vt.edu>
wrote:
>>
>> Dear Craig,
>>
>> Thanks for the help so far. I have rewritten my assembly to comply
>> with user-land not being able to directly modify the segment registers
>> %GS/%FS. I used llvm-mc with -show-inst to get the equivalent LLVM
>> instruction + operands. Now I am working backwards to actually code
>> this assembly into my MachineFunctionPass and got the easy assembly
>> implemented, however my more complicated asm is still struggling as I
>> am still seeing 0x0(%rbp) instead of (%gs) or errors.
>> Core question here being: how do I properly create BuildMI statements
>> for assembly dealing with offsets?
>>
>>
-------------------------------------------------------------------------------------------------
>> Assembly I want to translate:
>> mov   (%gs), %r14                  //get value off %GS base addresss
>> mov %r15, %gs:0x0(%r14)     //put value in R15 into R14:(%GS)  [ (%GS)
+
>> R14 ]
>>
>>
--------------------------------------------------------------------------------------------------
>> LLVM-MC -show inst gives:
>> movq    (%gs), %r14          # <MCInst #1810 MOV64rm
>>                                         #  <MCOperand Reg:117>
>>                                         #  <MCOperand Reg:33>
>>                                         #  <MCOperand Imm:1>
>>                                         #  <MCOperand Reg:0>
>>                                         #  <MCOperand Imm:0>
>>                                         #  <MCOperand Reg:0>>
>> movq    %r15, %gs:(%r14)        # <MCInst #1803 MOV64mr
>>                                         #  <MCOperand Reg:117>
>>                                         #  <MCOperand Imm:1>
>>                                         #  <MCOperand Reg:0>
>>                                         #  <MCOperand Imm:0>
>>                                         #  <MCOperand Reg:33>
>>                                         #  <MCOperand
Reg:118>>
>>
>>
-------------------------------------------------------------------------------------------------------
>> I'll be honest and say I don't really know how to add the
operands
>> properly to BuildMI. I figured out the following so far
>> MachineInstrBuilder thing = BuildMI(MachineBB, Position in MBB ,
>> DebugLoc(not sure what this accomplishes), TII->get( X86 instruction
I
>> want), where instruction result goes)
>>
>> this has .add(MachineOperand)
>>             .addReg(X86::a reg macro)
>>             .addIMM(a constant like 0x8)
>>             and a few more I dont think apply to me.
>>
>> but I am not sure I must follow a specific order? I am assuming yes
>> and it has something to do with X86InstrInfo.td definitions, but not
>> sure.
>>
>>
--------------------------------------------------------------------------------------------------------
>> LLVM C++ code I tried to translate this to:
>> /* 1 mov   (%gs), %r14 */
>>     MachineInstrBuilder e1 >>
BuildMI(MBB,MBB.end(),DL,TII->get(X86::MOV64rm),X86::R14)
>>        .addReg(X86::GS);
>> /* 2 mov %r15, %gs:0x0(%r14) */
>>     MachineOperand baseReg = MachineOperand::CreateReg(X86::GS,false);
>>     MachineOperand scaleAmt = MachineOperand::CreateImm(0x1);
>>     MachineOperand indexReg =
MachineOperand::CreateReg(X86::R14,false);
>>     MachineOperand disp = MachineOperand::CreateImm(0x0);
>>
>>     BuildMI(MBB, MBB.end(), DL, TII->get(X86::MOV64mr))
>>       .add(baseReg)
>>       .add(scaleAmt)
>>       .add(indexReg);
>>
>> /* both instructions give the following error
>>
>> clang-6.0: /LLVM6.0.0/llvm/include/llvm/ADT/SmallVector.h:154: const
>> T& llvm::SmallVectorTemplateCommon<T,
<template-parameter-1-2>
>> >::operator[](llvm::SmallVectorTemplateCommon<T,
>> <template-parameter-1-2> >::size_type) const [with T >>
llvm::MCOperand; <template-parameter-1-2> = void;
>> llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>> >::const_reference = const llvm::MCOperand&;
>> llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>> >::size_type = long unsigned int]: Assertion `idx < size()'
failed.
>>
>> I saw this function in the code base but not sure what it does
>> "addDirectMem(MachineInstructionBuilder_thing, register you want
to
>> use);"
>>
>>
>> This is be the last bit of information I think I need to finish up
>> this implementation. Thanks again for your help!
>>
>> Sincerely,
>>
>> Chris Jelesnianski
>>
>> On Sat, Jun 23, 2018 at 11:32 PM, Craig Topper <craig.topper at
gmail.com>
>> wrote:
>> > The size suffix thing is a weird quirk in our assembler I should
look
>> > into
>> > fixing. Instructions in at&t syntax usually have a size suffix
that is
>> > often
>> > optional
>> >
>> > For example:
>> >   add %ax, %bx
>> > and
>> >   addw %ax, %bx
>> >
>> > Are equivalent because the register name indicates the size.
>> >
>> > but for an instruction like this
>> >   addw $1, (%ax)
>> >
>> > There is nothing to infer the size from so an explicit suffix is
>> > required.
>> >
>> > So for an instruction like "add %ax, %bx" from above, we
try to guess
>> > the
>> > size suffix from the register. In your case, you used a segment
register
>> > which we couldn't guess the size from. And then we printed a
bad error
>> > message.
>> >
>> > There's no quick reference as such for the meaning of the
various
>> > X86::XXXXXX names. But the complete list of them is in
>> > lib/Target/X86/X86GenInstrInfo.inc in your build area. The names
are
>> > meant
>> > to be fairly straight forward to understand. The first part of the
name
>> > should almost always be the instruction name from the Intel/AMD
manuals.
>> > The
>> > lower case letters at the end sort of convey operand types, but
often
>> > not
>> > the number of operands even though it looks that way. The most
common
>> > letters are 'r' for register, 'm' for memory and
'i' for immediate.
>> > Numbers
>> > after 'i' specify the size of the immediate if its
important to
>> > distinguish
>> > from other sizes or different than the size of the instruction.
The
>> > lower
>> > case letters are most useful to distinguish different instructions
from
>> > each
>> > other. So for example, if two instructions only differ in the
lower case
>> > letters and one says "rr" and one says "rm",
the first is the register
>> > form
>> > and the second is the memory form of the same instruction.
>> >
>> > ~Craig
>> >
>> >
>> > On Sat, Jun 23, 2018 at 7:55 PM K Jelesnianski <kjski at
vt.edu> wrote:
>> >>
>> >> Dear Craig,
>> >>
>> >> Thank you super much for the quick reply! Yea I'm still
new to working
>> >> on the back-end and that sounds great. I already have the raw
assembly
>> >> of what I want to accomplish so this is perfect. I just tried
it and
>> >> yea, I will have to break down my assembly even further to
more
>> >> simpler operations. You're right about my assembly dealing
with
>> >> segment registers as I'm getting the following error:
>> >> "error: unknown use of instruction mnemonic without a
size suffix"
>> >>
>> >> Just curious, what does it mean by size suffix??
>> >>
>> >> It's super cool to see the equivalent with
"-show-inst"!!! Thank you
>> >> so much for this help!
>> >>
>> >> Last note, I know that the definitions (e.g. def SUB32ri) of
the
>> >> various instructions can be found in the various ****.td, but
is there
>> >> documentation where the meaning or quick reference of every
>> >> X86::XXXXXX llvm instruction macro can found, so I can quickly
pick
>> >> and choose which actual macro I need to use, to "work
forwards" rather
>> >> than working backwards by writing the assembly first and using
llvm-mc
>> >> -show-inst  ??
>> >>
>> >> Thanks super much again.
>> >>
>> >> Sincerely,
>> >>
>> >> Chris Jelesnianski
>> >> Graduate Research Assistant
>> >> Virginia Tech
>> >>
>> >> On Sat, Jun 23, 2018 at 8:45 PM, Craig Topper <craig.topper
at gmail.com>
>> >> wrote:
>> >> > More specifically there is no instruction that can
add/subtract
>> >> > segment
>> >> > registers. They can only be updated my the mov segment
register
>> >> > instructions, opcodes 0x8c and 0x8e in x86 assembly.
>> >> >
>> >> > I suggest you write the text version of the assembly you
want to
>> >> > generate
>> >> > and assemble it with llvm-mc. This will tell you if its
even valid.
>> >> > After
>> >> > that you can use -show-inst to print the names of the
instructions
>> >> > that
>> >> > X86
>> >> > uses that you can give to BuildMI.
>> >> >
>> >> > ~Craig
>> >> >
>> >> >
>> >> > On Sat, Jun 23, 2018 at 5:36 PM Craig Topper
<craig.topper at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> The SUB32ri can't instruction can't operate
on segment registers. It
>> >> >> operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets
encoded only 3 or
>> >> >> 4
>> >> >> bits
>> >> >> of the register value make it into the binary
encoding. Objdump just
>> >> >> extracts those 3 or 4 bits back out and prints one of
the
>> >> >> EAX/EBX/EDX/ECX/EBP registers that those bits
correspond to.
>> >> >>
>> >> >> ~Craig
>> >> >>
>> >> >>
>> >> >> On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via
llvm-dev
>> >> >> <llvm-dev at lists.llvm.org> wrote:
>> >> >>>
>> >> >>> Dear All,
>> >> >>>
>> >> >>> Currently I am trying to inject custom x86-64
assembly into a
>> >> >>> functions entry basic block. More specifically, I
am trying to
>> >> >>> build
>> >> >>> assembly in a machine function pass from scratch.
>> >> >>>
>> >> >>> While the dumped machine function instruction
info displays that
>> >> >>> %gs
>> >> >>> will be used, when I perform objdump -d on my
executable I am see
>> >> >>> that
>> >> >>> %gs is replaced by %ebp? Why is this happening?
>> >> >>>
>> >> >>> I know it probably has something to do with me
not specifying
>> >> >>> operands
>> >> >>> properly, but I cannot find enough documentation
on this besides
>> >> >>> looking through code comments such as
X86BaseInfo.cpp. I feel there
>> >> >>> isn't enough for me to be able to connect the
dots.
>> >> >>>
>> >> >>> Below I have sample code: %gs holds a base
address to a memory
>> >> >>> location where I am trying to store information.
I am trying to
>> >> >>> update
>> >> >>> the %gs register pointer location before saving
more values, etc.
>> >> >>>
>> >> >>> LLVM C++ codeMachine Function pass code:
>> >> >>> MachineInstrBuilder sss = BuildMI(MBB,
MBB.begin(), DL,
>> >> >>> TII->get(X86::SUB32ri),X86::GS)
>> >> >>>                     .addReg(X86::GS)
>> >> >>>                     .addImm(0x8);
>> >> >>>
>> >> >>> machine function pass dump:
>> >> >>>  %gs = SUB32ri %gs, 8, implicit-def %eflags
>> >> >>>
>> >> >>> Objdump -d assembly from executable
>> >> >>>   400510:   81 ed 04 00 00 00       sub   
$0x8,%ebp
>> >> >>>
>> >> >>>
>> >> >>> TLDR: I am trying to create custom assembly via
BuildMI() and
>> >> >>> manipulate
>> >> >>> segment
>> >> >>> registers via a MachineFunctionPass.
>> >> >>>
>> >> >>> I have looked at LLVMs safestack implementation,
but they are
>> >> >>> taking a
>> >> >>> fairly complicated hybrid approach between an IR
Function pass with
>> >> >>> Backend support. I would like to stay as a single
machinefunction
>> >> >>> pass.
>> >> >>>
>> >> >>> Believe me I would do this at the IR level if I
didnt need to
>> >> >>> specifically use the segment registers.
>> >> >>>
>> >> >>> Thanks for the help in advance!
>> >> >>>
>> >> >>> Sincerely,
>> >> >>>
>> >> >>> Christopher Jelesnianski
>> >> >>> Graduate Research Assistant
>> >> >>> Virginia Tech
>> >> >>> _______________________________________________
>> >> >>> LLVM Developers mailing list
>> >> >>> llvm-dev at lists.llvm.org
>> >> >>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>

llvm dev - Jun 2018 - MachineFunction Instructions Pass using Segment Registers

[llvm-dev] MachineFunction Instructions Pass using Segment Registers

[llvm-dev] MachineFunction Instructions Pass using Segment Registers

[llvm-dev] MachineFunction Instructions Pass using Segment Registers