Paul Muntean via llvm-dev
2017-Jul-08 07:37 UTC
[llvm-dev] Caller callee calling convention enforcement in C++ bin. code
On Sat, Jul 8, 2017 at 9:36 AM, Paul Muntean <paulmuntean at gmail.com> wrote:> Hi Reid, > > please see underneath some clarification. > Thank you for your answer. It did provide a lot of helpful information! > I've included some follow up questions below and would really appreciate > your answers! > Further help/suggestion are highly welcome. > > The technique we use: > I infer the ranges of the callsites from the order in which my > maschinefunctionpass is invoked. As far as i can see, this order has to be > the same as the order in which the asmprinter is invoked and therefore the > order in which data is written to the ELF .text section. Since the code is > layed out in memory relative to the start of the section, this order is > well defined inside a single section. > > From there I'm currently writing code that emits EH label (with mark > machine basicblock edges). All I now need to do, is to store symbols to > these labels in .rodata (or a similar/custom ELF section). Then the loader > will relocate the address for us and we check the ranges by load > instructions on the read-only data. > Some advice on how to the add the relocations in a clean way would be > amazing :) But I can also figure this out myself I think. > > What do you mean by "the return address "VA" (I think, in ELF parlance)"? > > Here are our comments to your post. > > > Is it enough to compute the set of all possible return addresses, or do > you need to limit the set to only C++ method calls? If you just need the > full set of return addresses for a given DSO, I'd recommend disassembling > the object after linking, scraping the output for "callq" instructions, and > taking the address of the next instruction. This will give you the return > address "VA" (I think, in ELF parlance), which is the address of the > instruction assuming the ELF binary is loaded at the address listed in its > program headers. You can compute the possible return addresses at runtime > by adding the difference between the on-disk p_vaddr values and the actual > addresses that the loader used at runtime. You can probably discover the > load addresses with dl_iterate_phdr. > > We've made modifications to the llvm x86 backend that allow us to find and > filter the call instructions on the machineInstr level. i.e. the set of > calls we are interested in is known to us in the backend. > Right now I assume that the order in which functions are written to the > ELF file is only based on the order in which the X86AsmPrinter > MachineFunctionPass processes them. > Are we correct to assume this, and additionally that this order consistent > throughout all machineFunctionPasses added in the backend? > To get actual addresses relative to the image base of the ELF file, we > would probably have to parse (and maybe fully disassemble) the file. > Exactly as you said. > > > If you need only some specific annotated list of return addresses, you > will probably have to make complicated changes to LLVM that insert labels > after certain CALL instructions and emit some object file section with > relocations against those labels. This is doable but complicated. You can > follow the EH label machinery to see how to insert labels into the > instruction stream and create relocations against them from read-only data > sections. > > After looking at how EH labels are generated, I'd fully agree with you: > Combined with relocations this would be the cleaner, but also considerably > more complicated solution. > Do you think for this approach it would be better to patch an additional > read-only section using an external program, or to add the relocations to > the .rodata section emitted by LLVM? > > > On Thu, Jul 6, 2017 at 5:53 PM, Reid Kleckner <rnk at google.com> wrote: > >> Is it enough to compute the set of all possible return addresses, or do >> you need to limit the set to only C++ method calls? If you just need the >> full set of return addresses for a given DSO, I'd recommend disassembling >> the object after linking, scraping the output for "callq" instructions, and >> taking the address of the next instruction. This will give you the return >> address "VA" (I think, in ELF parlance), which is the address of the >> instruction assuming the ELF binary is loaded at the address listed in its >> program headers. You can compute the possible return addresses at runtime >> by adding the difference between the on-disk p_vaddr values and the actual >> addresses that the loader used at runtime. You can probably discover the >> load addresses with dl_iterate_phdr. >> >> If you need only some specific annotated list of return addresses, you >> will probably have to make complicated changes to LLVM that insert labels >> after certain CALL instructions and emit some object file section with >> relocations against those labels. This is doable but complicated. You can >> follow the EH label machinery to see how to insert labels into the >> instruction stream and create relocations against them from read-only data >> sections. >> >> On Wed, Jul 5, 2017 at 9:22 AM, Paul Muntean via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Hi guys, >>> >>> maybe you can help with an issue which I have. >>> >>> I want to recuperate for a C++ program compiled with Clang/LLVM on an >>> Ubuntu CPU x86_64 bit architecture all the addresses of the call >>> instructions (C++ object dispatches) or directly the return address >>> which are just the next address after a call instruction. >>> >>> I think that this information is not obtainable during link time since >>> we have at that moment only IR code. Please corect me if I am wrong. >>> So my assumption is that in the compiler back end after the IR code is >>> lowered to machine code and the addresses for the call instructions >>> and the addresses next to the call instructions are available. >>> >>> Has anybody a suggestion where are the possible places in the compiler >>> where I should look for? >>> >>> Since I am new to this topic suggestions or solutions are highly welcome. >>> >>> -Paul >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >> > > > -- > Mit freundlichen Grüßen, > > Paul Muntean > > > >-- Mit freundlichen Grüßen, Paul Muntean -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170708/3130081a/attachment.html>
Paul Muntean via llvm-dev
2017-Jul-08 07:42 UTC
[llvm-dev] Caller callee calling convention enforcement in C++ bin. code
> > > Hi Reid, >> >> please see underneath some clarification. >> Thank you for your answer. It did provide a lot of helpful information! >> I've included some follow up questions below and would really appreciate >> your answers! >> Further help/suggestion are highly welcome. >> >> The technique we use: >> I infer the ranges of the callsites from the order in which my >> maschinefunctionpass is invoked. As far as i can see, this order has to be >> the same as the order in which the asmprinter is invoked and therefore the >> order in which data is written to the ELF .text section. Since the code is >> layed out in memory relative to the start of the section, this order is >> well defined inside a single section. >> >> From there I'm currently writing code that emits EH label (with mark >> machine basicblock edges). All I now need to do, is to store symbols to >> these labels in .rodata (or a similar/custom ELF section). Then the loader >> will relocate the address for us and we check the ranges by load >> instructions on the read-only data. >> Some advice on how to the add the relocations in a clean way would be >> amazing :) But I can also figure this out myself I think. >> >> What do you mean by "the return address "VA" (I think, in ELF parlance)"? >> >> Here are our comments to your post. >> >> > Is it enough to compute the set of all possible return addresses, or do >> you need to limit the set to only C++ method calls? If you just need the >> full set of return addresses for a given DSO, I'd recommend disassembling >> the object after linking, scraping the output for "callq" instructions, and >> taking the address of the next instruction. This will give you the return >> address "VA" (I think, in ELF parlance), which is the address of the >> instruction assuming the ELF binary is loaded at the address listed in its >> program headers. You can compute the possible return addresses at runtime >> by adding the difference between the on-disk p_vaddr values and the actual >> addresses that the loader used at runtime. You can probably discover the >> load addresses with dl_iterate_phdr. >> >> We've made modifications to the llvm x86 backend that allow us to find >> and filter the call instructions on the machineInstr level. i.e. the set of >> calls we are interested in is known to us in the backend. >> Right now I assume that the order in which functions are written to the >> ELF file is only based on the order in which the X86AsmPrinter >> MachineFunctionPass processes them. >> Are we correct to assume this, and additionally that this order >> consistent throughout all machineFunctionPasses added in the backend? >> To get actual addresses relative to the image base of the ELF file, we >> would probably have to parse (and maybe fully disassemble) the file. >> Exactly as you said. >> >> > If you need only some specific annotated list of return addresses, you >> will probably have to make complicated changes to LLVM that insert labels >> after certain CALL instructions and emit some object file section with >> relocations against those labels. This is doable but complicated. You can >> follow the EH label machinery to see how to insert labels into the >> instruction stream and create relocations against them from read-only data >> sections. >> >> After looking at how EH labels are generated, I'd fully agree with you: >> Combined with relocations this would be the cleaner, but also considerably >> more complicated solution. >> Do you think for this approach it would be better to patch an additional >> read-only section using an external program, or to add the relocations to >> the .rodata section emitted by LLVM? >> >> >> On Thu, Jul 6, 2017 at 5:53 PM, Reid Kleckner <rnk at google.com> wrote: >> >>> Is it enough to compute the set of all possible return addresses, or do >>> you need to limit the set to only C++ method calls? If you just need the >>> full set of return addresses for a given DSO, I'd recommend disassembling >>> the object after linking, scraping the output for "callq" instructions, and >>> taking the address of the next instruction. This will give you the return >>> address "VA" (I think, in ELF parlance), which is the address of the >>> instruction assuming the ELF binary is loaded at the address listed in its >>> program headers. You can compute the possible return addresses at runtime >>> by adding the difference between the on-disk p_vaddr values and the actual >>> addresses that the loader used at runtime. You can probably discover the >>> load addresses with dl_iterate_phdr. >>> >>> If you need only some specific annotated list of return addresses, you >>> will probably have to make complicated changes to LLVM that insert labels >>> after certain CALL instructions and emit some object file section with >>> relocations against those labels. This is doable but complicated. You can >>> follow the EH label machinery to see how to insert labels into the >>> instruction stream and create relocations against them from read-only data >>> sections. >>> >>> On Wed, Jul 5, 2017 at 9:22 AM, Paul Muntean via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> Hi guys, >>>> >>>> maybe you can help with an issue which I have. >>>> >>>> I want to recuperate for a C++ program compiled with Clang/LLVM on an >>>> Ubuntu CPU x86_64 bit architecture all the addresses of the call >>>> instructions (C++ object dispatches) or directly the return address >>>> which are just the next address after a call instruction. >>>> >>>> I think that this information is not obtainable during link time since >>>> we have at that moment only IR code. Please corect me if I am wrong. >>>> So my assumption is that in the compiler back end after the IR code is >>>> lowered to machine code and the addresses for the call instructions >>>> and the addresses next to the call instructions are available. >>>> >>>> Has anybody a suggestion where are the possible places in the compiler >>>> where I should look for? >>>> >>>> Since I am new to this topic suggestions or solutions are highly >>>> welcome. >>>> >>>> -Paul >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170708/6c07bfeb/attachment.html>