Aaron Smith via llvm-dev
2019-Nov-25 14:15 UTC
[llvm-dev] [Machine IR] Analyzing Assembly Source Code in MIR passes
Llvm-mctoll will raise a binary back to LLVM IR. Not exactly what you want but it might be something you can leverage. https://github.com/microsoft/llvm-mctoll On Mon, Nov 25, 2019 at 1:19 PM Nicolai Hähnle via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Thu, Nov 21, 2019 at 3:37 AM Lele Ma via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > My goal is to write LLVM Machine IR (MIR) passes to analyze the assembly > source code. But it seems I need to find a way to translate the handwritten > assembly code into MIR format first. > > > > Is there any materials, or any modules in LLVM source code, that can > help to translate assembly code into LLVM MIR for analysis? > > > > Or is there any easier ways to analyze assembly code in MIR passes > without translating it? > > MachineIR is designed for code generation, not for general assembly > representation. MIR is even not necessarily able to represent all > assembly instructions that a target's hardware supports. The > disassembler produces MCInsts, and if you wanted to go from there back > to MachineIR, you'd have to write your own target-specific code to do > so. > > Cheers, > Nicolai > > > > > > > Best Regards, > > Lele Ma > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > -- > Lerne, wie die Welt wirklich ist, > aber vergiss niemals, wie sie sein sollte. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191125/ece156b3/attachment.html>
Lele Ma via llvm-dev
2019-Nov-25 22:24 UTC
[llvm-dev] [Machine IR] Analyzing Assembly Source Code in MIR passes
Thank you for the instructions, Aaron and Nicolai! Raising a binary to LLVM IR, or raising to MIR is a reasonable solution for me. However, given Nicolai's information that not all target-specific instructions are representable in MIR, I got two questions that need your help: 1. Why MIR does not necessarily represent all target specific instructions for certain hardware? If someone added those support, will this violate some design principles of MIR? 2. Instead of IR/MIR raising, I am wondering whether a third path is possible to solve the problem of analyzing assembly code: * - write simple LLVM pass in the `MC` layer to process information not available in MIR/IR and * * - passing analysis result from IR/MIR pass to the MC layer pass where we can enhance the result with missing representations.* So the second question is whether it is possible to write passes directly in the MC layer? If so, is there any documentation or example for that? Thank you in advance! Best Regards, Lele On Mon, Nov 25, 2019 at 9:15 AM Aaron Smith <aaron.lee.smith at gmail.com> wrote:> Llvm-mctoll will raise a binary back to LLVM IR. > Not exactly what you want but it might be something you can leverage. > > https://github.com/microsoft/llvm-mctoll > > On Mon, Nov 25, 2019 at 1:19 PM Nicolai Hähnle via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> On Thu, Nov 21, 2019 at 3:37 AM Lele Ma via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> > My goal is to write LLVM Machine IR (MIR) passes to analyze the >> assembly source code. But it seems I need to find a way to translate the >> handwritten assembly code into MIR format first. >> > >> > Is there any materials, or any modules in LLVM source code, that can >> help to translate assembly code into LLVM MIR for analysis? >> > >> > Or is there any easier ways to analyze assembly code in MIR passes >> without translating it? >> >> MachineIR is designed for code generation, not for general assembly >> representation. MIR is even not necessarily able to represent all >> assembly instructions that a target's hardware supports. The >> disassembler produces MCInsts, and if you wanted to go from there back >> to MachineIR, you'd have to write your own target-specific code to do >> so. >> >> Cheers, >> Nicolai >> >> >> >> > >> > Best Regards, >> > Lele Ma >> > >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> >> -- >> Lerne, wie die Welt wirklich ist, >> aber vergiss niemals, wie sie sein sollte. >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191125/4b328a09/attachment.html>
Lele Ma via llvm-dev
2019-Nov-27 05:50 UTC
[llvm-dev] Writing a Pass in LLVM MC (Machine Code) level to Analyze Assembly Code
Hi All, A self-follow up and rephrase of my previous question with updated subject: What I want to do is to analyze hand-written assembly code with 'full details' where semantics of each instruction can be known in LLVM passes. Many of such instructions have no corresponding counterparts in IR/MIR forms, such as 'syscall' 'iret', etc. At MC level, such assembly code can be translated to MCInst easily since this level is closest to the assembly code. Therefore, I am thinking to write a pass at MC level instead of IR/MIR. However, when I am searching to learn the MC level passes, I cannot find any related classes in LLVM infrastructure (such as FunctionPass at IR level; MachineFunctionPass at MIR pass). Could anyone direct me where I should start to write a MC level pass? Best Regards, Lele On Mon, Nov 25, 2019 at 5:24 PM Lele Ma <lelema.cn at gmail.com> wrote:> Thank you for the instructions, Aaron and Nicolai! > > Raising a binary to LLVM IR, or raising to MIR is a reasonable solution > for me. However, given Nicolai's information that not all target-specific > instructions are representable in MIR, I got two questions that need your > help: > > 1. Why MIR does not necessarily represent all target specific instructions > for certain hardware? If someone added those support, will this violate > some design principles of MIR? > > 2. Instead of IR/MIR raising, I am wondering whether a third path is > possible to solve the problem of analyzing assembly code: > * - write simple LLVM pass in the `MC` layer to process information not > available in MIR/IR and * > * - passing analysis result from IR/MIR pass to the MC layer pass where > we can enhance the result with missing representations.* > So the second question is whether it is possible to write passes directly > in the MC layer? If so, is there any documentation or example for that? > > > Thank you in advance! > > Best Regards, > Lele > > > On Mon, Nov 25, 2019 at 9:15 AM Aaron Smith <aaron.lee.smith at gmail.com> > wrote: > >> Llvm-mctoll will raise a binary back to LLVM IR. >> Not exactly what you want but it might be something you can leverage. >> >> https://github.com/microsoft/llvm-mctoll >> >> On Mon, Nov 25, 2019 at 1:19 PM Nicolai Hähnle via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> On Thu, Nov 21, 2019 at 3:37 AM Lele Ma via llvm-dev >>> <llvm-dev at lists.llvm.org> wrote: >>> > My goal is to write LLVM Machine IR (MIR) passes to analyze the >>> assembly source code. But it seems I need to find a way to translate the >>> handwritten assembly code into MIR format first. >>> > >>> > Is there any materials, or any modules in LLVM source code, that can >>> help to translate assembly code into LLVM MIR for analysis? >>> > >>> > Or is there any easier ways to analyze assembly code in MIR passes >>> without translating it? >>> >>> MachineIR is designed for code generation, not for general assembly >>> representation. MIR is even not necessarily able to represent all >>> assembly instructions that a target's hardware supports. The >>> disassembler produces MCInsts, and if you wanted to go from there back >>> to MachineIR, you'd have to write your own target-specific code to do >>> so. >>> >>> Cheers, >>> Nicolai >>> >>> >>> >>> > >>> > Best Regards, >>> > Lele Ma >>> > >>> > >>> > _______________________________________________ >>> > LLVM Developers mailing list >>> > llvm-dev at lists.llvm.org >>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >>> >>> -- >>> Lerne, wie die Welt wirklich ist, >>> aber vergiss niemals, wie sie sein sollte. >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191127/a6fc42ff/attachment.html>