Lele Ma via llvm-dev
2019-Nov-27 05:50 UTC
[llvm-dev] Writing a Pass in LLVM MC (Machine Code) level to Analyze Assembly Code
Hi All, A self-follow up and rephrase of my previous question with updated subject: What I want to do is to analyze hand-written assembly code with 'full details' where semantics of each instruction can be known in LLVM passes. Many of such instructions have no corresponding counterparts in IR/MIR forms, such as 'syscall' 'iret', etc. At MC level, such assembly code can be translated to MCInst easily since this level is closest to the assembly code. Therefore, I am thinking to write a pass at MC level instead of IR/MIR. However, when I am searching to learn the MC level passes, I cannot find any related classes in LLVM infrastructure (such as FunctionPass at IR level; MachineFunctionPass at MIR pass). Could anyone direct me where I should start to write a MC level pass? Best Regards, Lele On Mon, Nov 25, 2019 at 5:24 PM Lele Ma <lelema.cn at gmail.com> wrote:> Thank you for the instructions, Aaron and Nicolai! > > Raising a binary to LLVM IR, or raising to MIR is a reasonable solution > for me. However, given Nicolai's information that not all target-specific > instructions are representable in MIR, I got two questions that need your > help: > > 1. Why MIR does not necessarily represent all target specific instructions > for certain hardware? If someone added those support, will this violate > some design principles of MIR? > > 2. Instead of IR/MIR raising, I am wondering whether a third path is > possible to solve the problem of analyzing assembly code: > * - write simple LLVM pass in the `MC` layer to process information not > available in MIR/IR and * > * - passing analysis result from IR/MIR pass to the MC layer pass where > we can enhance the result with missing representations.* > So the second question is whether it is possible to write passes directly > in the MC layer? If so, is there any documentation or example for that? > > > Thank you in advance! > > Best Regards, > Lele > > > On Mon, Nov 25, 2019 at 9:15 AM Aaron Smith <aaron.lee.smith at gmail.com> > wrote: > >> Llvm-mctoll will raise a binary back to LLVM IR. >> Not exactly what you want but it might be something you can leverage. >> >> https://github.com/microsoft/llvm-mctoll >> >> On Mon, Nov 25, 2019 at 1:19 PM Nicolai Hähnle via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> On Thu, Nov 21, 2019 at 3:37 AM Lele Ma via llvm-dev >>> <llvm-dev at lists.llvm.org> wrote: >>> > My goal is to write LLVM Machine IR (MIR) passes to analyze the >>> assembly source code. But it seems I need to find a way to translate the >>> handwritten assembly code into MIR format first. >>> > >>> > Is there any materials, or any modules in LLVM source code, that can >>> help to translate assembly code into LLVM MIR for analysis? >>> > >>> > Or is there any easier ways to analyze assembly code in MIR passes >>> without translating it? >>> >>> MachineIR is designed for code generation, not for general assembly >>> representation. MIR is even not necessarily able to represent all >>> assembly instructions that a target's hardware supports. The >>> disassembler produces MCInsts, and if you wanted to go from there back >>> to MachineIR, you'd have to write your own target-specific code to do >>> so. >>> >>> Cheers, >>> Nicolai >>> >>> >>> >>> > >>> > Best Regards, >>> > Lele Ma >>> > >>> > >>> > _______________________________________________ >>> > LLVM Developers mailing list >>> > llvm-dev at lists.llvm.org >>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >>> >>> -- >>> Lerne, wie die Welt wirklich ist, >>> aber vergiss niemals, wie sie sein sollte. >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191127/a6fc42ff/attachment.html>
Aaron Smith via llvm-dev
2019-Nov-27 07:00 UTC
[llvm-dev] Writing a Pass in LLVM MC (Machine Code) level to Analyze Assembly Code
The MC layer doesn’t have passes. There is a method called emitIntruction() which is called one by one to create the MCInst. In the past I have accomplished what you’d like by overloading the methods in ObjectStreamer to buffer all the MCInst for a function. Then doing analysis on the buffered instructions. Here’s a link about how instructions are lowered which might shed some light on how all this works. https://eli.thegreenplace.net/2012/11/24/life-of-an-instruction-in-llvm> On Nov 27, 2019, at 5:51 AM, Lele Ma <lelema.cn at gmail.com> wrote: > > > Hi All, > > A self-follow up and rephrase of my previous question with updated subject: > > What I want to do is to analyze hand-written assembly code with 'full details' where semantics of each instruction can be known in LLVM passes. Many of such instructions have no corresponding counterparts in IR/MIR forms, such as 'syscall' 'iret', etc. At MC level, such assembly code can be translated to MCInst easily since this level is closest to the assembly code. Therefore, I am thinking to write a pass at MC level instead of IR/MIR. > > However, when I am searching to learn the MC level passes, I cannot find any related classes in LLVM infrastructure (such as FunctionPass at IR level; MachineFunctionPass at MIR pass). Could anyone direct me where I should start to write a MC level pass? > > Best Regards, > Lele > > >> On Mon, Nov 25, 2019 at 5:24 PM Lele Ma <lelema.cn at gmail.com> wrote: >> Thank you for the instructions, Aaron and Nicolai! >> >> Raising a binary to LLVM IR, or raising to MIR is a reasonable solution for me. However, given Nicolai's information that not all target-specific instructions are representable in MIR, I got two questions that need your help: >> >> 1. Why MIR does not necessarily represent all target specific instructions for certain hardware? If someone added those support, will this violate some design principles of MIR? >> >> 2. Instead of IR/MIR raising, I am wondering whether a third path is possible to solve the problem of analyzing assembly code: >> - write simple LLVM pass in the `MC` layer to process information not available in MIR/IR and >> - passing analysis result from IR/MIR pass to the MC layer pass where we can enhance the result with missing representations. >> So the second question is whether it is possible to write passes directly in the MC layer? If so, is there any documentation or example for that? >> >> >> Thank you in advance! >> >> Best Regards, >> Lele >> >> >>> On Mon, Nov 25, 2019 at 9:15 AM Aaron Smith <aaron.lee.smith at gmail.com> wrote: >>> Llvm-mctoll will raise a binary back to LLVM IR. >>> Not exactly what you want but it might be something you can leverage. >>> >>> https://github.com/microsoft/llvm-mctoll >>> >>>> On Mon, Nov 25, 2019 at 1:19 PM Nicolai Hähnle via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>>> On Thu, Nov 21, 2019 at 3:37 AM Lele Ma via llvm-dev >>>> <llvm-dev at lists.llvm.org> wrote: >>>> > My goal is to write LLVM Machine IR (MIR) passes to analyze the assembly source code. But it seems I need to find a way to translate the handwritten assembly code into MIR format first. >>>> > >>>> > Is there any materials, or any modules in LLVM source code, that can help to translate assembly code into LLVM MIR for analysis? >>>> > >>>> > Or is there any easier ways to analyze assembly code in MIR passes without translating it? >>>> >>>> MachineIR is designed for code generation, not for general assembly >>>> representation. MIR is even not necessarily able to represent all >>>> assembly instructions that a target's hardware supports. The >>>> disassembler produces MCInsts, and if you wanted to go from there back >>>> to MachineIR, you'd have to write your own target-specific code to do >>>> so. >>>> >>>> Cheers, >>>> Nicolai >>>> >>>> >>>> >>>> > >>>> > Best Regards, >>>> > Lele Ma >>>> > >>>> > >>>> > _______________________________________________ >>>> > LLVM Developers mailing list >>>> > llvm-dev at lists.llvm.org >>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>>> >>>> -- >>>> Lerne, wie die Welt wirklich ist, >>>> aber vergiss niemals, wie sie sein sollte. >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191127/20daa9e1/attachment.html>
Lele Ma via llvm-dev
2019-Nov-28 01:50 UTC
[llvm-dev] Writing a Pass in LLVM MC (Machine Code) level to Analyze Assembly Code
Thank you so much! That is very helpful. Best, Lele On Wed, Nov 27, 2019 at 2:00 AM Aaron Smith <aaron.lee.smith at gmail.com> wrote:> The MC layer doesn’t have passes. There is a method called > emitIntruction() which is called one by one to create the MCInst. > > In the past I have accomplished what you’d like by overloading the methods > in ObjectStreamer to buffer all the MCInst for a function. Then doing > analysis on the buffered instructions. > > Here’s a link about how instructions are lowered which might shed some > light on how all this works. > > https://eli.thegreenplace.net/2012/11/24/life-of-an-instruction-in-llvm > > > > On Nov 27, 2019, at 5:51 AM, Lele Ma <lelema.cn at gmail.com> wrote: > > > Hi All, > > A self-follow up and rephrase of my previous question with updated subject: > > What I want to do is to analyze hand-written assembly code with 'full > details' where semantics of each instruction can be known in LLVM passes. > Many of such instructions have no corresponding counterparts in IR/MIR > forms, such as 'syscall' 'iret', etc. At MC level, such assembly code can > be translated to MCInst easily since this level is closest to the assembly > code. Therefore, I am thinking to write a pass at MC level instead of > IR/MIR. > > However, when I am searching to learn the MC level passes, I cannot find > any related classes in LLVM infrastructure (such as FunctionPass at IR > level; MachineFunctionPass at MIR pass). Could anyone direct me where I > should start to write a MC level pass? > > Best Regards, > Lele > > > On Mon, Nov 25, 2019 at 5:24 PM Lele Ma <lelema.cn at gmail.com> wrote: > >> Thank you for the instructions, Aaron and Nicolai! >> >> Raising a binary to LLVM IR, or raising to MIR is a reasonable solution >> for me. However, given Nicolai's information that not all target-specific >> instructions are representable in MIR, I got two questions that need your >> help: >> >> 1. Why MIR does not necessarily represent all target specific >> instructions for certain hardware? If someone added those support, will >> this violate some design principles of MIR? >> >> 2. Instead of IR/MIR raising, I am wondering whether a third path is >> possible to solve the problem of analyzing assembly code: >> * - write simple LLVM pass in the `MC` layer to process information >> not available in MIR/IR and * >> * - passing analysis result from IR/MIR pass to the MC layer pass >> where we can enhance the result with missing representations.* >> So the second question is whether it is possible to write passes directly >> in the MC layer? If so, is there any documentation or example for that? >> >> >> Thank you in advance! >> >> Best Regards, >> Lele >> >> >> On Mon, Nov 25, 2019 at 9:15 AM Aaron Smith <aaron.lee.smith at gmail.com> >> wrote: >> >>> Llvm-mctoll will raise a binary back to LLVM IR. >>> Not exactly what you want but it might be something you can leverage. >>> >>> https://github.com/microsoft/llvm-mctoll >>> >>> On Mon, Nov 25, 2019 at 1:19 PM Nicolai Hähnle via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> On Thu, Nov 21, 2019 at 3:37 AM Lele Ma via llvm-dev >>>> <llvm-dev at lists.llvm.org> wrote: >>>> > My goal is to write LLVM Machine IR (MIR) passes to analyze the >>>> assembly source code. But it seems I need to find a way to translate the >>>> handwritten assembly code into MIR format first. >>>> > >>>> > Is there any materials, or any modules in LLVM source code, that can >>>> help to translate assembly code into LLVM MIR for analysis? >>>> > >>>> > Or is there any easier ways to analyze assembly code in MIR passes >>>> without translating it? >>>> >>>> MachineIR is designed for code generation, not for general assembly >>>> representation. MIR is even not necessarily able to represent all >>>> assembly instructions that a target's hardware supports. The >>>> disassembler produces MCInsts, and if you wanted to go from there back >>>> to MachineIR, you'd have to write your own target-specific code to do >>>> so. >>>> >>>> Cheers, >>>> Nicolai >>>> >>>> >>>> >>>> > >>>> > Best Regards, >>>> > Lele Ma >>>> > >>>> > >>>> > _______________________________________________ >>>> > LLVM Developers mailing list >>>> > llvm-dev at lists.llvm.org >>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>>> >>>> -- >>>> Lerne, wie die Welt wirklich ist, >>>> aber vergiss niemals, wie sie sein sollte. >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191127/bc3101a0/attachment.html>