This might be a very beginner question, but I'm looking for an example for something that I have never done. Suppose that I wanted to express actions with respect to lifted semantics of CPU instructions to an intermediate representation, BAP IL or LLVM IR. How might I go about providing a Backus Naur Form specification and then dynamically interpreting those lifted instructions by also specifying actions to be done with any given IL/IR primitive? I'm looking for any library that allows me to express BNF terms and actions on them. Like, say I convert push ebp to Bap IL (here's a json representation from their live development branch): { "move": { "lvar": { "name": "t", "id": 107, "typ": { "imm": 64 } }, "rexp": { "var": { "name": "RBP", "id": 30, "typ": { "imm": 64 } } } } }, { "move": { "lvar": { "name": "RSP", "id": 32, "typ": { "imm": 64 } }, "rexp": { "binop": { "op": "minus", "lexp": { "var": { "name": "RSP", "id": 32, "typ": { "imm": 64 } } }, "rexp": { "inte": { "int": "MHg4OjY0" } } } } } }, { "move": { "lvar": { "name": "mem64", "id": 58, "typ": { "mem": { "index_type": { "r64": true }, "element_type": { "r8": true } } } }, "rexp": { "store": { "memory": { "var": { "name": "mem64", "id": 58, "typ": { "mem": { "index_type": { "r64": true }, "element_type": { "r8": true } } } } }, "address": { "var": { "name": "RSP", "id": 32, "typ": { "imm": 64 } } }, "value": { "var": { "name": "t", "id": 107, "typ": { "imm": 64 } } }, "endian": "little_endian", "size": { "r64": true } } } } } Then, for say, move, I could, in my interpreter specify some reasonable action that captures those semantics, like allocate a 64 bit space in which to store the value, and then also a SSA for the RSP variable value at such a point. In this way, I could possibly specify other things such as symbolic interpretation of specific memory regions for things like solving to find certain constraints and limitations on code blocks. Then, after some segments of code are lifted and interpreted, I provide some meaningful context in terms of state, registers and memory, and the representation gained could be executed upon in order to reach interesting path and state combinations. But I've never written a language before... I'm afraid I'm new. But I'm very interested, and I want to learn so I'm looking to use infrastructure that's already there, and learn how to construct this properly. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150409/a063b108/attachment.html>
I apologize, I found Yacc and Bison. I have my reading cut out for me. On Thu, Apr 9, 2015 at 2:37 PM, Kenneth Adam Miller < kennethadammiller at gmail.com> wrote:> This might be a very beginner question, but I'm looking for an example for > something that I have never done. > > Suppose that I wanted to express actions with respect to lifted semantics > of CPU instructions to an intermediate representation, BAP IL or LLVM IR. > How might I go about providing a Backus Naur Form specification and then > dynamically interpreting those lifted instructions by also specifying > actions to be done with any given IL/IR primitive? I'm looking for any > library that allows me to express BNF terms and actions on them. > > Like, say I convert push ebp to Bap IL (here's a json representation from > their live development branch): > > { > "move": { > "lvar": { "name": "t", "id": 107, "typ": { "imm": 64 } }, > "rexp": { "var": { "name": "RBP", "id": 30, "typ": { "imm": 64 } } } > } > }, > { > "move": { > "lvar": { "name": "RSP", "id": 32, "typ": { "imm": 64 } }, > "rexp": { > "binop": { > "op": "minus", > "lexp": { > "var": { "name": "RSP", "id": 32, "typ": { "imm": 64 } } > }, > "rexp": { "inte": { "int": "MHg4OjY0" } } > } > } > } > }, > { > "move": { > "lvar": { > "name": "mem64", > "id": 58, > "typ": { > "mem": { > "index_type": { "r64": true }, > "element_type": { "r8": true } > } > } > }, > "rexp": { > "store": { > "memory": { > "var": { > "name": "mem64", > "id": 58, > "typ": { > "mem": { > "index_type": { "r64": true }, > "element_type": { "r8": true } > } > } > } > }, > "address": { > "var": { "name": "RSP", "id": 32, "typ": { "imm": 64 } } > }, > "value": { > "var": { "name": "t", "id": 107, "typ": { "imm": 64 } } > }, > "endian": "little_endian", > "size": { "r64": true } > } > } > } > } > > Then, for say, move, I could, in my interpreter specify some reasonable > action that captures those semantics, like allocate a 64 bit space in which > to store the value, and then also a SSA for the RSP variable value at such > a point. In this way, I could possibly specify other things such as > symbolic interpretation of specific memory regions for things like solving > to find certain constraints and limitations on code blocks. Then, after > some segments of code are lifted and interpreted, I provide some meaningful > context in terms of state, registers and memory, and the representation > gained could be executed upon in order to reach interesting path and state > combinations. > > But I've never written a language before... I'm afraid I'm new. But I'm > very interested, and I want to learn so I'm looking to use infrastructure > that's already there, and learn how to construct this properly. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150409/5a44d25c/attachment.html>
How come no one pointed me toward the LLVM Kaleidescope project? On Thu, Apr 9, 2015 at 3:10 PM, Kenneth Adam Miller < kennethadammiller at gmail.com> wrote:> I apologize, I found Yacc and Bison. I have my reading cut out for me. > > On Thu, Apr 9, 2015 at 2:37 PM, Kenneth Adam Miller < > kennethadammiller at gmail.com> wrote: > >> This might be a very beginner question, but I'm looking for an example >> for something that I have never done. >> >> Suppose that I wanted to express actions with respect to lifted semantics >> of CPU instructions to an intermediate representation, BAP IL or LLVM IR. >> How might I go about providing a Backus Naur Form specification and then >> dynamically interpreting those lifted instructions by also specifying >> actions to be done with any given IL/IR primitive? I'm looking for any >> library that allows me to express BNF terms and actions on them. >> >> Like, say I convert push ebp to Bap IL (here's a json representation from >> their live development branch): >> >> { >> "move": { >> "lvar": { "name": "t", "id": 107, "typ": { "imm": 64 } }, >> "rexp": { "var": { "name": "RBP", "id": 30, "typ": { "imm": 64 } } } >> } >> }, >> { >> "move": { >> "lvar": { "name": "RSP", "id": 32, "typ": { "imm": 64 } }, >> "rexp": { >> "binop": { >> "op": "minus", >> "lexp": { >> "var": { "name": "RSP", "id": 32, "typ": { "imm": 64 } } >> }, >> "rexp": { "inte": { "int": "MHg4OjY0" } } >> } >> } >> } >> }, >> { >> "move": { >> "lvar": { >> "name": "mem64", >> "id": 58, >> "typ": { >> "mem": { >> "index_type": { "r64": true }, >> "element_type": { "r8": true } >> } >> } >> }, >> "rexp": { >> "store": { >> "memory": { >> "var": { >> "name": "mem64", >> "id": 58, >> "typ": { >> "mem": { >> "index_type": { "r64": true }, >> "element_type": { "r8": true } >> } >> } >> } >> }, >> "address": { >> "var": { "name": "RSP", "id": 32, "typ": { "imm": 64 } } >> }, >> "value": { >> "var": { "name": "t", "id": 107, "typ": { "imm": 64 } } >> }, >> "endian": "little_endian", >> "size": { "r64": true } >> } >> } >> } >> } >> >> Then, for say, move, I could, in my interpreter specify some reasonable >> action that captures those semantics, like allocate a 64 bit space in which >> to store the value, and then also a SSA for the RSP variable value at such >> a point. In this way, I could possibly specify other things such as >> symbolic interpretation of specific memory regions for things like solving >> to find certain constraints and limitations on code blocks. Then, after >> some segments of code are lifted and interpreted, I provide some meaningful >> context in terms of state, registers and memory, and the representation >> gained could be executed upon in order to reach interesting path and state >> combinations. >> >> But I've never written a language before... I'm afraid I'm new. But I'm >> very interested, and I want to learn so I'm looking to use infrastructure >> that's already there, and learn how to construct this properly. >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150413/5c951b4e/attachment.html>