Hello, I’m a newbie here, working on a project to enforce Control Flow Integrity (CFI) on programs compiled with LLVM. We’re using LLVM 3.3 so we can leverage poolalloc's dsa analysis. Ideally this will be as target-independent as possible, but our primary target is ARM. One of our passes requires inserting different i32 IDs at various points into the code we’re compiling. As far as I can tell, it’s impossible to with just LLVM IR, so we’re looking into ways of getting these IDs through the CodeGen. One thing that looked promising is the function “prefix” value in LLVM 3.4, which is able to emit a global value into the asm. This is the right idea except we need it at arbitrary points in code. We then looked at defining a custom intrinsic function (@llvm.cfiid) that we can insert into the IR and then lower to assembly. It didn’t seem like this was exactly what we wanted either, because the asm that is generated has to be target dependent. We’ve checked out the poolalloc/safecoode projects and there’s some helpful analysis tools, but didn’t find anything relevant to ID lowering. Our current thrust is to define a custom target intrinsic function (@llvm.arm.cfiid) that we can insert into the IR and lower using a definition in the ARMInstrInfo.td file. Right now, I’m trying to define the pattern and instruction in that file. At first, I just inserted a pattern to lower our intrinsic into a “trap” instruction, which worked fine: /* Code in IR/IntrinsicsARM.td */ /* Note, I’m not positive that IntrNoReturn is correct here, but IntrNoMem type wouldn’t lower to an SDNode because of lack of “results” */ def int_arm_cfiid : Intrinsic<[], [llvm_i32_ty], [IntrNoReturn]>; … /* Code in Target/ARM/ARMInstrInfo.td */ def : Pat<(int_arm_cfiid (i32 imm)), (TRAP)>; ... Next, I’m trying to create my own “AXI” definition based on the TRAP definition, and then put that into the pattern. I admit that I don’t fully grok the tablegen syntax, so a lot of what I’ve been doing is trial and error, and based on examples in other *.td files. Here’s what I think I’m shooting for... /* Code in Target/ARM/ARMInstrInfo.td */ def ARMCFIID : AXI<(outs), (ins i32imm:$opt), MiscFrm, NoItinerary, "cfiid", "\t$opt", [(int_arm_cfiid i32imm:$opt)]>, Requires<[IsARM]> { bits<32> opt; let Inst{31-0} = opt; } ... I realize this is very wrong, but just to give you an idea of what I’m trying to do… basically take the i32 param of the intrinsic and encode it as a raw bytes. Obviously, this is broke… TL;DR: What’s the best way to lower an IR i32 into code as raw bytes? If an Intrinsic is the answer, can it be done entirely in the TableGen files or do I need to do some SDNode stuff as well? If a TargetIntrinsic is the answer, what’s the proper syntax to define an ARM Instruction and matching it with my intrinsic pattern? Sorry if this is pretty basic stuff… I’ve been looking at the archives and couldn’t find any other threads that worked for me. Also, I noticed that there is an llvm-devs google group as well. Is it faux-pas to cross-post to that list as well, or are these lists disjoint enough that it wouldn’t be spammy? Thanks, Joe -- Joseph Battaglia M.S. Information Security '14 Information Networking Institute Carnegie Mellon University jabat at cmu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140303/3ffac9e6/attachment.html>
I may be missing something obvious here, but why isn't a regular immediate sufficient? Do you want to ensure that the value is in a particular BB with movw/movt and doesn't get optimized out, or in a specific constant pool? On Mon, Mar 3, 2014 at 11:06 AM, Joseph Battaglia <jbattagl at andrew.cmu.edu>wrote:> Hello, > > I’m a newbie here, working on a project to enforce Control Flow Integrity > (CFI) on programs compiled with LLVM. We’re using LLVM 3.3 so we can > leverage poolalloc's dsa analysis. Ideally this will be as > target-independent as possible, but our primary target is ARM. One of our > passes requires inserting different i32 IDs at various points into the code > we’re compiling. As far as I can tell, it’s impossible to with just LLVM > IR, so we’re looking into ways of getting these IDs through the CodeGen. > > One thing that looked promising is the function “prefix” value in LLVM > 3.4, which is able to emit a global value into the asm. This is the right > idea except we need it at arbitrary points in code. We then looked at > defining a custom intrinsic function (@llvm.cfiid) that we can insert into > the IR and then lower to assembly. It didn’t seem like this was exactly > what we wanted either, because the asm that is generated has to be target > dependent. We’ve checked out the poolalloc/safecoode projects and there’s > some helpful analysis tools, but didn’t find anything relevant to ID > lowering. > > Our current thrust is to define a custom target intrinsic function > (@llvm.arm.cfiid) that we can insert into the IR and lower using a > definition in the ARMInstrInfo.td file. Right now, I’m trying to define the > pattern and instruction in that file. At first, I just inserted a pattern > to lower our intrinsic into a “trap” instruction, which worked fine: > > /* Code in IR/IntrinsicsARM.td */ > /* Note, I’m not positive that IntrNoReturn is correct here, but IntrNoMem > type wouldn’t lower to an SDNode because of lack of “results” */ > def int_arm_cfiid : Intrinsic<[], [llvm_i32_ty], [IntrNoReturn]>; > … > > /* Code in Target/ARM/ARMInstrInfo.td */ > def : Pat<(int_arm_cfiid (i32 imm)), > (TRAP)>; > ... > > Next, I’m trying to create my own “AXI” definition based on the TRAP > definition, and then put that into the pattern. I admit that I don’t fully > grok the tablegen syntax, so a lot of what I’ve been doing is trial and > error, and based on examples in other *.td files. > > Here’s what I think I’m shooting for... > > /* Code in Target/ARM/ARMInstrInfo.td */ > def ARMCFIID : AXI<(outs), (ins i32imm:$opt), MiscFrm, NoItinerary, > "cfiid", "\t$opt", [(int_arm_cfiid i32imm:$opt)]>, > Requires<[IsARM]> { > bits<32> opt; > let Inst{31-0} = opt; > } > ... > > I realize this is very wrong, but just to give you an idea of what I’m > trying to do… basically take the i32 param of the intrinsic and encode it > as a raw bytes. Obviously, this is broke… > > TL;DR: > > - What’s the best way to lower an IR i32 into code as raw bytes? > - If an Intrinsic is the answer, can it be done entirely in the > TableGen files or do I need to do some SDNode stuff as well? > - If a TargetIntrinsic is the answer, what’s the proper syntax to > define an ARM Instruction and matching it with my intrinsic pattern? > > > Sorry if this is pretty basic stuff… I’ve been looking at the archives and > couldn’t find any other threads that worked for me. > > Also, I noticed that there is an llvm-devs google group as well. Is it > faux-pas to cross-post to that list as well, or are these lists disjoint > enough that it wouldn’t be spammy? > > > Thanks, > Joe > > -- > *Joseph Battaglia* > M.S. Information Security '14 > Information Networking Institute > Carnegie Mellon University > jabat at cmu.edu > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140303/382df9d3/attachment.html>
That’s definitely one thing way we could lower, and I suppose the main reason we haven’t gone that direction is because we’d like for the IDs to be at least 32 bits. The optimization thing I’m not sure about, but presumably with -O0 it won’t be a problem? Another thing to consider is clobbering the registers… we have to insert IDs both at function sites and indirect branch/call return sites, and don’t want to worry about messing with regs. That being said, just lowering it as a mov does save us from having to jump over the id, which is a problem on its own. We aren’t set one way or the other, and would love to hear your thoughts. Anyway, I’ve made some progress in the lowering. Its rather hacky, and the codegen compiles with a handful of warnings… but the resulting obj has the integer correctly encoded. /* Code in Target/ARM/ARMInstrInfo.td */ def ImmMaxAsmOperand: ImmAsmOperand { let Name = "ImmMax"; } def immMax : Operand<i32>, ImmLeaf<i32, [{ return Imm >= -2147483648 && Imm < 2147483648; }]> { let ParserMatchClass = ImmMaxAsmOperand; } def CFITRAP : AXI<(outs), (ins immMax:$id), MiscFrm, NoItinerary, "cfiid\t$id", [(int_arm_cfiid immMax:$id)]>, Requires<[IsARM]> { bits<32> id; let Inst = id; } … Is there a better way to do this for 32 bit ids? On Mar 3, 2014, at 4:49 PM, JF Bastien <jfb at google.com> wrote:> I may be missing something obvious here, but why isn't a regular immediate sufficient? Do you want to ensure that the value is in a particular BB with movw/movt and doesn't get optimized out, or in a specific constant pool? > > > On Mon, Mar 3, 2014 at 11:06 AM, Joseph Battaglia <jbattagl at andrew.cmu.edu> wrote: > Hello, > > I’m a newbie here, working on a project to enforce Control Flow Integrity (CFI) on programs compiled with LLVM. We’re using LLVM 3.3 so we can leverage poolalloc's dsa analysis. Ideally this will be as target-independent as possible, but our primary target is ARM. One of our passes requires inserting different i32 IDs at various points into the code we’re compiling. As far as I can tell, it’s impossible to with just LLVM IR, so we’re looking into ways of getting these IDs through the CodeGen. > > One thing that looked promising is the function “prefix” value in LLVM 3.4, which is able to emit a global value into the asm. This is the right idea except we need it at arbitrary points in code. We then looked at defining a custom intrinsic function (@llvm.cfiid) that we can insert into the IR and then lower to assembly. It didn’t seem like this was exactly what we wanted either, because the asm that is generated has to be target dependent. We’ve checked out the poolalloc/safecoode projects and there’s some helpful analysis tools, but didn’t find anything relevant to ID lowering. > > Our current thrust is to define a custom target intrinsic function (@llvm.arm.cfiid) that we can insert into the IR and lower using a definition in the ARMInstrInfo.td file. Right now, I’m trying to define the pattern and instruction in that file. At first, I just inserted a pattern to lower our intrinsic into a “trap” instruction, which worked fine: > > /* Code in IR/IntrinsicsARM.td */ > /* Note, I’m not positive that IntrNoReturn is correct here, but IntrNoMem type wouldn’t lower to an SDNode because of lack of “results” */ > def int_arm_cfiid : Intrinsic<[], [llvm_i32_ty], [IntrNoReturn]>; > … > > /* Code in Target/ARM/ARMInstrInfo.td */ > def : Pat<(int_arm_cfiid (i32 imm)), > (TRAP)>; > ... > > Next, I’m trying to create my own “AXI” definition based on the TRAP definition, and then put that into the pattern. I admit that I don’t fully grok the tablegen syntax, so a lot of what I’ve been doing is trial and error, and based on examples in other *.td files. > > Here’s what I think I’m shooting for... > > /* Code in Target/ARM/ARMInstrInfo.td */ > def ARMCFIID : AXI<(outs), (ins i32imm:$opt), MiscFrm, NoItinerary, > "cfiid", "\t$opt", [(int_arm_cfiid i32imm:$opt)]>, > Requires<[IsARM]> { > bits<32> opt; > let Inst{31-0} = opt; > } > ... > > I realize this is very wrong, but just to give you an idea of what I’m trying to do… basically take the i32 param of the intrinsic and encode it as a raw bytes. Obviously, this is broke… > > TL;DR: > What’s the best way to lower an IR i32 into code as raw bytes? > If an Intrinsic is the answer, can it be done entirely in the TableGen files or do I need to do some SDNode stuff as well? > If a TargetIntrinsic is the answer, what’s the proper syntax to define an ARM Instruction and matching it with my intrinsic pattern? > > Sorry if this is pretty basic stuff… I’ve been looking at the archives and couldn’t find any other threads that worked for me. > > Also, I noticed that there is an llvm-devs google group as well. Is it faux-pas to cross-post to that list as well, or are these lists disjoint enough that it wouldn’t be spammy? > > > Thanks, > Joe > > -- > Joseph Battaglia > M.S. Information Security '14 > Information Networking Institute > Carnegie Mellon University > jabat at cmu.edu > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140303/6792653a/attachment.html>