Hello,
I’m a newbie here, working on a project to enforce Control Flow Integrity (CFI)
on programs compiled with LLVM. We’re using LLVM 3.3 so we can leverage
poolalloc's dsa analysis. Ideally this will be as target-independent as
possible, but our primary target is ARM. One of our passes requires inserting
different i32 IDs at various points into the code we’re compiling. As far as I
can tell, it’s impossible to with just LLVM IR, so we’re looking into ways of
getting these IDs through the CodeGen.
One thing that looked promising is the function “prefix” value in LLVM 3.4,
which is able to emit a global value into the asm. This is the right idea except
we need it at arbitrary points in code. We then looked at defining a custom
intrinsic function (@llvm.cfiid) that we can insert into the IR and then lower
to assembly. It didn’t seem like this was exactly what we wanted either, because
the asm that is generated has to be target dependent. We’ve checked out the
poolalloc/safecoode projects and there’s some helpful analysis tools, but didn’t
find anything relevant to ID lowering.
Our current thrust is to define a custom target intrinsic function
(@llvm.arm.cfiid) that we can insert into the IR and lower using a definition in
the ARMInstrInfo.td file. Right now, I’m trying to define the pattern and
instruction in that file. At first, I just inserted a pattern to lower our
intrinsic into a “trap” instruction, which worked fine:
/* Code in IR/IntrinsicsARM.td */
/* Note, I’m not positive that IntrNoReturn is correct here, but IntrNoMem type
wouldn’t lower to an SDNode because of lack of “results” */
def int_arm_cfiid : Intrinsic<[], [llvm_i32_ty], [IntrNoReturn]>;
…
/* Code in Target/ARM/ARMInstrInfo.td */
def : Pat<(int_arm_cfiid (i32 imm)),
(TRAP)>;
...
Next, I’m trying to create my own “AXI” definition based on the TRAP definition,
and then put that into the pattern. I admit that I don’t fully grok the tablegen
syntax, so a lot of what I’ve been doing is trial and error, and based on
examples in other *.td files.
Here’s what I think I’m shooting for...
/* Code in Target/ARM/ARMInstrInfo.td */
def ARMCFIID : AXI<(outs), (ins i32imm:$opt), MiscFrm, NoItinerary,
"cfiid", "\t$opt", [(int_arm_cfiid
i32imm:$opt)]>,
Requires<[IsARM]> {
bits<32> opt;
let Inst{31-0} = opt;
}
...
I realize this is very wrong, but just to give you an idea of what I’m trying to
do… basically take the i32 param of the intrinsic and encode it as a raw bytes.
Obviously, this is broke…
TL;DR:
What’s the best way to lower an IR i32 into code as raw bytes?
If an Intrinsic is the answer, can it be done entirely in the TableGen files or
do I need to do some SDNode stuff as well?
If a TargetIntrinsic is the answer, what’s the proper syntax to define an ARM
Instruction and matching it with my intrinsic pattern?
Sorry if this is pretty basic stuff… I’ve been looking at the archives and
couldn’t find any other threads that worked for me.
Also, I noticed that there is an llvm-devs google group as well. Is it faux-pas
to cross-post to that list as well, or are these lists disjoint enough that it
wouldn’t be spammy?
Thanks,
Joe
--
Joseph Battaglia
M.S. Information Security '14
Information Networking Institute
Carnegie Mellon University
jabat at cmu.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140303/3ffac9e6/attachment.html>
I may be missing something obvious here, but why isn't a regular immediate sufficient? Do you want to ensure that the value is in a particular BB with movw/movt and doesn't get optimized out, or in a specific constant pool? On Mon, Mar 3, 2014 at 11:06 AM, Joseph Battaglia <jbattagl at andrew.cmu.edu>wrote:> Hello, > > I’m a newbie here, working on a project to enforce Control Flow Integrity > (CFI) on programs compiled with LLVM. We’re using LLVM 3.3 so we can > leverage poolalloc's dsa analysis. Ideally this will be as > target-independent as possible, but our primary target is ARM. One of our > passes requires inserting different i32 IDs at various points into the code > we’re compiling. As far as I can tell, it’s impossible to with just LLVM > IR, so we’re looking into ways of getting these IDs through the CodeGen. > > One thing that looked promising is the function “prefix” value in LLVM > 3.4, which is able to emit a global value into the asm. This is the right > idea except we need it at arbitrary points in code. We then looked at > defining a custom intrinsic function (@llvm.cfiid) that we can insert into > the IR and then lower to assembly. It didn’t seem like this was exactly > what we wanted either, because the asm that is generated has to be target > dependent. We’ve checked out the poolalloc/safecoode projects and there’s > some helpful analysis tools, but didn’t find anything relevant to ID > lowering. > > Our current thrust is to define a custom target intrinsic function > (@llvm.arm.cfiid) that we can insert into the IR and lower using a > definition in the ARMInstrInfo.td file. Right now, I’m trying to define the > pattern and instruction in that file. At first, I just inserted a pattern > to lower our intrinsic into a “trap” instruction, which worked fine: > > /* Code in IR/IntrinsicsARM.td */ > /* Note, I’m not positive that IntrNoReturn is correct here, but IntrNoMem > type wouldn’t lower to an SDNode because of lack of “results” */ > def int_arm_cfiid : Intrinsic<[], [llvm_i32_ty], [IntrNoReturn]>; > … > > /* Code in Target/ARM/ARMInstrInfo.td */ > def : Pat<(int_arm_cfiid (i32 imm)), > (TRAP)>; > ... > > Next, I’m trying to create my own “AXI” definition based on the TRAP > definition, and then put that into the pattern. I admit that I don’t fully > grok the tablegen syntax, so a lot of what I’ve been doing is trial and > error, and based on examples in other *.td files. > > Here’s what I think I’m shooting for... > > /* Code in Target/ARM/ARMInstrInfo.td */ > def ARMCFIID : AXI<(outs), (ins i32imm:$opt), MiscFrm, NoItinerary, > "cfiid", "\t$opt", [(int_arm_cfiid i32imm:$opt)]>, > Requires<[IsARM]> { > bits<32> opt; > let Inst{31-0} = opt; > } > ... > > I realize this is very wrong, but just to give you an idea of what I’m > trying to do… basically take the i32 param of the intrinsic and encode it > as a raw bytes. Obviously, this is broke… > > TL;DR: > > - What’s the best way to lower an IR i32 into code as raw bytes? > - If an Intrinsic is the answer, can it be done entirely in the > TableGen files or do I need to do some SDNode stuff as well? > - If a TargetIntrinsic is the answer, what’s the proper syntax to > define an ARM Instruction and matching it with my intrinsic pattern? > > > Sorry if this is pretty basic stuff… I’ve been looking at the archives and > couldn’t find any other threads that worked for me. > > Also, I noticed that there is an llvm-devs google group as well. Is it > faux-pas to cross-post to that list as well, or are these lists disjoint > enough that it wouldn’t be spammy? > > > Thanks, > Joe > > -- > *Joseph Battaglia* > M.S. Information Security '14 > Information Networking Institute > Carnegie Mellon University > jabat at cmu.edu > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140303/382df9d3/attachment.html>
That’s definitely one thing way we could lower, and I suppose the main reason we
haven’t gone that direction is because we’d like for the IDs to be at least 32
bits. The optimization thing I’m not sure about, but presumably with -O0 it
won’t be a problem? Another thing to consider is clobbering the registers… we
have to insert IDs both at function sites and indirect branch/call return sites,
and don’t want to worry about messing with regs. That being said, just lowering
it as a mov does save us from having to jump over the id, which is a problem on
its own. We aren’t set one way or the other, and would love to hear your
thoughts.
Anyway, I’ve made some progress in the lowering. Its rather hacky, and the
codegen compiles with a handful of warnings… but the resulting obj has the
integer correctly encoded.
/* Code in Target/ARM/ARMInstrInfo.td */
def ImmMaxAsmOperand: ImmAsmOperand { let Name = "ImmMax"; }
def immMax : Operand<i32>, ImmLeaf<i32, [{
return Imm >= -2147483648 && Imm < 2147483648;
}]> {
let ParserMatchClass = ImmMaxAsmOperand;
}
def CFITRAP : AXI<(outs), (ins immMax:$id), MiscFrm, NoItinerary,
"cfiid\t$id", [(int_arm_cfiid immMax:$id)]>,
Requires<[IsARM]> {
bits<32> id;
let Inst = id;
}
…
Is there a better way to do this for 32 bit ids?
On Mar 3, 2014, at 4:49 PM, JF Bastien <jfb at google.com> wrote:
> I may be missing something obvious here, but why isn't a regular
immediate sufficient? Do you want to ensure that the value is in a particular BB
with movw/movt and doesn't get optimized out, or in a specific constant
pool?
>
>
> On Mon, Mar 3, 2014 at 11:06 AM, Joseph Battaglia <jbattagl at
andrew.cmu.edu> wrote:
> Hello,
>
> I’m a newbie here, working on a project to enforce Control Flow Integrity
(CFI) on programs compiled with LLVM. We’re using LLVM 3.3 so we can leverage
poolalloc's dsa analysis. Ideally this will be as target-independent as
possible, but our primary target is ARM. One of our passes requires inserting
different i32 IDs at various points into the code we’re compiling. As far as I
can tell, it’s impossible to with just LLVM IR, so we’re looking into ways of
getting these IDs through the CodeGen.
>
> One thing that looked promising is the function “prefix” value in LLVM 3.4,
which is able to emit a global value into the asm. This is the right idea except
we need it at arbitrary points in code. We then looked at defining a custom
intrinsic function (@llvm.cfiid) that we can insert into the IR and then lower
to assembly. It didn’t seem like this was exactly what we wanted either, because
the asm that is generated has to be target dependent. We’ve checked out the
poolalloc/safecoode projects and there’s some helpful analysis tools, but didn’t
find anything relevant to ID lowering.
>
> Our current thrust is to define a custom target intrinsic function
(@llvm.arm.cfiid) that we can insert into the IR and lower using a definition in
the ARMInstrInfo.td file. Right now, I’m trying to define the pattern and
instruction in that file. At first, I just inserted a pattern to lower our
intrinsic into a “trap” instruction, which worked fine:
>
> /* Code in IR/IntrinsicsARM.td */
> /* Note, I’m not positive that IntrNoReturn is correct here, but IntrNoMem
type wouldn’t lower to an SDNode because of lack of “results” */
> def int_arm_cfiid : Intrinsic<[], [llvm_i32_ty], [IntrNoReturn]>;
> …
>
> /* Code in Target/ARM/ARMInstrInfo.td */
> def : Pat<(int_arm_cfiid (i32 imm)),
> (TRAP)>;
> ...
>
> Next, I’m trying to create my own “AXI” definition based on the TRAP
definition, and then put that into the pattern. I admit that I don’t fully grok
the tablegen syntax, so a lot of what I’ve been doing is trial and error, and
based on examples in other *.td files.
>
> Here’s what I think I’m shooting for...
>
> /* Code in Target/ARM/ARMInstrInfo.td */
> def ARMCFIID : AXI<(outs), (ins i32imm:$opt), MiscFrm, NoItinerary,
> "cfiid", "\t$opt", [(int_arm_cfiid
i32imm:$opt)]>,
> Requires<[IsARM]> {
> bits<32> opt;
> let Inst{31-0} = opt;
> }
> ...
>
> I realize this is very wrong, but just to give you an idea of what I’m
trying to do… basically take the i32 param of the intrinsic and encode it as a
raw bytes. Obviously, this is broke…
>
> TL;DR:
> What’s the best way to lower an IR i32 into code as raw bytes?
> If an Intrinsic is the answer, can it be done entirely in the TableGen
files or do I need to do some SDNode stuff as well?
> If a TargetIntrinsic is the answer, what’s the proper syntax to define an
ARM Instruction and matching it with my intrinsic pattern?
>
> Sorry if this is pretty basic stuff… I’ve been looking at the archives and
couldn’t find any other threads that worked for me.
>
> Also, I noticed that there is an llvm-devs google group as well. Is it
faux-pas to cross-post to that list as well, or are these lists disjoint enough
that it wouldn’t be spammy?
>
>
> Thanks,
> Joe
>
> --
> Joseph Battaglia
> M.S. Information Security '14
> Information Networking Institute
> Carnegie Mellon University
> jabat at cmu.edu
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140303/6792653a/attachment.html>