thr3ads.net - llvm dev - [llvm-dev] Caller callee calling convention enforcement in C++ bin. code [Jul 2017]

If this information is useful, please help other people find it:
Share via:

Paul Muntean via llvm-dev

2017-Jul-08 07:37 UTC

[llvm-dev] Caller callee calling convention enforcement in C++ bin. code

On Sat, Jul 8, 2017 at 9:36 AM, Paul Muntean <paulmuntean at gmail.com>
wrote:
> Hi Reid,
>
> please see underneath some clarification.
> Thank you for your answer. It did provide a lot of helpful information!
> I've included some follow up questions below and would really
appreciate
> your answers!
> Further help/suggestion are highly welcome.
>
> The technique we use:
> I infer the ranges of the callsites from the order in which my
> maschinefunctionpass is invoked. As far as i can see, this order has to be
> the same as the order in which the asmprinter is invoked and therefore the
> order in which data is written to the ELF .text section. Since the code is
> layed out in memory relative to the start of the section, this order is
> well defined inside a single section.
>
> From there I'm currently writing code that emits EH label (with mark
> machine basicblock edges). All I now need to do, is to store symbols to
> these labels in .rodata (or a similar/custom ELF section). Then the loader
> will relocate the address for us and we check the ranges by load
> instructions on the read-only data.
> Some advice on how to the add the relocations in a clean way would be
> amazing :) But I can also figure this out myself I think.
>
> What do you mean by "the return address "VA" (I think, in
ELF parlance)"?
>
> Here are our comments to your post.
>
> > Is it enough to compute the set of all possible return addresses, or
do
> you need to limit the set to only C++ method calls? If you just need the
> full set of return addresses for a given DSO, I'd recommend
disassembling
> the object after linking, scraping the output for "callq"
instructions, and
> taking the address of the next instruction. This will give you the return
> address "VA" (I think, in ELF parlance), which is the address of
the
> instruction assuming the ELF binary is loaded at the address listed in its
> program headers. You can compute the possible return addresses at runtime
> by adding the difference between the on-disk p_vaddr values and the actual
> addresses that the loader used at runtime. You can probably discover the
> load addresses with dl_iterate_phdr.
>
> We've made modifications to the llvm x86 backend that allow us to find
and
> filter the call instructions on the machineInstr level. i.e. the set of
> calls we are interested in is known to us in the backend.
> Right now I assume that the order in which functions are written to the
> ELF file is only based on the order in which the X86AsmPrinter
> MachineFunctionPass processes them.
> Are we correct to assume this, and additionally that this order consistent
> throughout all machineFunctionPasses added in the backend?
> To get actual addresses relative to the image base of the ELF file, we
> would probably have to parse (and maybe fully disassemble) the file.
> Exactly as you said.
>
> > If you need only some specific annotated list of return addresses, you
> will probably have to make complicated changes to LLVM that insert labels
> after certain CALL instructions and emit some object file section with
> relocations against those labels. This is doable but complicated. You can
> follow the EH label machinery to see how to insert labels into the
> instruction stream and create relocations against them from read-only data
> sections.
>
> After looking at how EH labels are generated, I'd fully agree with you:
> Combined with relocations this would be the cleaner, but also considerably
> more complicated solution.
> Do you think for this approach it would be better to patch an additional
> read-only section using an external program, or to add the relocations to
> the .rodata section emitted by LLVM?
>
>
> On Thu, Jul 6, 2017 at 5:53 PM, Reid Kleckner <rnk at google.com>
wrote:
>
>> Is it enough to compute the set of all possible return addresses, or do
>> you need to limit the set to only C++ method calls? If you just need
the
>> full set of return addresses for a given DSO, I'd recommend
disassembling
>> the object after linking, scraping the output for "callq"
instructions, and
>> taking the address of the next instruction. This will give you the
return
>> address "VA" (I think, in ELF parlance), which is the address
of the
>> instruction assuming the ELF binary is loaded at the address listed in
its
>> program headers. You can compute the possible return addresses at
runtime
>> by adding the difference between the on-disk p_vaddr values and the
actual
>> addresses that the loader used at runtime. You can probably discover
the
>> load addresses with dl_iterate_phdr.
>>
>> If you need only some specific annotated list of return addresses, you
>> will probably have to make complicated changes to LLVM that insert
labels
>> after certain CALL instructions and emit some object file section with
>> relocations against those labels. This is doable but complicated. You
can
>> follow the EH label machinery to see how to insert labels into the
>> instruction stream and create relocations against them from read-only
data
>> sections.
>>
>> On Wed, Jul 5, 2017 at 9:22 AM, Paul Muntean via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi guys,
>>>
>>> maybe you can help with an issue which I have.
>>>
>>> I want to recuperate for a C++ program compiled with Clang/LLVM on
an
>>> Ubuntu CPU x86_64 bit architecture all the addresses of the call
>>> instructions (C++ object dispatches) or directly the return address
>>> which are just the next address after a call instruction.
>>>
>>> I think that this information is not obtainable during link time
since
>>> we have at that moment only IR code. Please corect me if I am
wrong.
>>> So my assumption is that in the compiler back end after the IR code
is
>>> lowered to machine code and the addresses for the call instructions
>>> and the addresses next to the call instructions are available.
>>>
>>> Has anybody a suggestion where are the possible places in the
compiler
>>> where I should look for?
>>>
>>> Since I am new to this topic suggestions or solutions are highly
welcome.
>>>
>>> -Paul
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>
>
> --
> Mit freundlichen Grüßen,
>
> Paul Muntean
>
>
>
>

-- 
Mit freundlichen Grüßen,

Paul Muntean
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170708/3130081a/attachment.html>

Paul Muntean via llvm-dev

2017-Jul-08 07:42 UTC

head link

[llvm-dev] Caller callee calling convention enforcement in C++ bin. code

>
>
> Hi Reid,
>>
>> please see underneath some clarification.
>> Thank you for your answer. It did provide a lot of helpful information!
>> I've included some follow up questions below and would really
appreciate
>> your answers!
>> Further help/suggestion are highly welcome.
>>
>> The technique we use:
>> I infer the ranges of the callsites from the order in which my
>> maschinefunctionpass is invoked. As far as i can see, this order has to
be
>> the same as the order in which the asmprinter is invoked and therefore
the
>> order in which data is written to the ELF .text section. Since the code
is
>> layed out in memory relative to the start of the section, this order is
>> well defined inside a single section.
>>
>> From there I'm currently writing code that emits EH label (with
mark
>> machine basicblock edges). All I now need to do, is to store symbols to
>> these labels in .rodata (or a similar/custom ELF section). Then the
loader
>> will relocate the address for us and we check the ranges by load
>> instructions on the read-only data.
>> Some advice on how to the add the relocations in a clean way would be
>> amazing :) But I can also figure this out myself I think.
>>
>> What do you mean by "the return address "VA" (I think,
in ELF parlance)"?
>>
>> Here are our comments to your post.
>>
>> > Is it enough to compute the set of all possible return addresses,
or do
>> you need to limit the set to only C++ method calls? If you just need
the
>> full set of return addresses for a given DSO, I'd recommend
disassembling
>> the object after linking, scraping the output for "callq"
instructions, and
>> taking the address of the next instruction. This will give you the
return
>> address "VA" (I think, in ELF parlance), which is the address
of the
>> instruction assuming the ELF binary is loaded at the address listed in
its
>> program headers. You can compute the possible return addresses at
runtime
>> by adding the difference between the on-disk p_vaddr values and the
actual
>> addresses that the loader used at runtime. You can probably discover
the
>> load addresses with dl_iterate_phdr.
>>
>> We've made modifications to the llvm x86 backend that allow us to
find
>> and filter the call instructions on the machineInstr level. i.e. the
set of
>> calls we are interested in is known to us in the backend.
>> Right now I assume that the order in which functions are written to the
>> ELF file is only based on the order in which the X86AsmPrinter
>> MachineFunctionPass processes them.
>> Are we correct to assume this, and additionally that this order
>> consistent throughout all machineFunctionPasses added in the backend?
>> To get actual addresses relative to the image base of the ELF file, we
>> would probably have to parse (and maybe fully disassemble) the file.
>> Exactly as you said.
>>
>> > If you need only some specific annotated list of return addresses,
you
>> will probably have to make complicated changes to LLVM that insert
labels
>> after certain CALL instructions and emit some object file section with
>> relocations against those labels. This is doable but complicated. You
can
>> follow the EH label machinery to see how to insert labels into the
>> instruction stream and create relocations against them from read-only
data
>> sections.
>>
>> After looking at how EH labels are generated, I'd fully agree with
you:
>> Combined with relocations this would be the cleaner, but also
considerably
>> more complicated solution.
>> Do you think for this approach it would be better to patch an
additional
>> read-only section using an external program, or to add the relocations
to
>> the .rodata section emitted by LLVM?
>>
>>
>> On Thu, Jul 6, 2017 at 5:53 PM, Reid Kleckner <rnk at google.com>
wrote:
>>
>>> Is it enough to compute the set of all possible return addresses,
or do
>>> you need to limit the set to only C++ method calls? If you just
need the
>>> full set of return addresses for a given DSO, I'd recommend
disassembling
>>> the object after linking, scraping the output for "callq"
instructions, and
>>> taking the address of the next instruction. This will give you the
return
>>> address "VA" (I think, in ELF parlance), which is the
address of the
>>> instruction assuming the ELF binary is loaded at the address listed
in its
>>> program headers. You can compute the possible return addresses at
runtime
>>> by adding the difference between the on-disk p_vaddr values and the
actual
>>> addresses that the loader used at runtime. You can probably
discover the
>>> load addresses with dl_iterate_phdr.
>>>
>>> If you need only some specific annotated list of return addresses,
you
>>> will probably have to make complicated changes to LLVM that insert
labels
>>> after certain CALL instructions and emit some object file section
with
>>> relocations against those labels. This is doable but complicated.
You can
>>> follow the EH label machinery to see how to insert labels into the
>>> instruction stream and create relocations against them from
read-only data
>>> sections.
>>>
>>> On Wed, Jul 5, 2017 at 9:22 AM, Paul Muntean via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> maybe you can help with an issue which I have.
>>>>
>>>> I want to recuperate for a C++ program compiled with Clang/LLVM
on an
>>>> Ubuntu CPU x86_64 bit architecture all the addresses of the
call
>>>> instructions (C++ object dispatches) or directly the return
address
>>>> which are just the next address after a call instruction.
>>>>
>>>> I think that this information is not obtainable during link
time since
>>>> we have at that moment only IR code. Please corect me if I am
wrong.
>>>> So my assumption is that in the compiler back end after the IR
code is
>>>> lowered to machine code and the addresses for the call
instructions
>>>> and the addresses next to the call instructions are available.
>>>>
>>>> Has anybody a suggestion where are the possible places in the
compiler
>>>> where I should look for?
>>>>
>>>> Since I am new to this topic suggestions or solutions are
highly
>>>> welcome.
>>>>
>>>> -Paul
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170708/6c07bfeb/attachment.html>

llvm dev - Jul 2017 - Caller callee calling convention enforcement in C++ bin. code

[llvm-dev] Caller callee calling convention enforcement in C++ bin. code

[llvm-dev] Caller callee calling convention enforcement in C++ bin. code