thr3ads.net - llvm dev - [LLVMdev] [RFC] Simple control-flow integrity [Apr 2014]

If this information is useful, please help other people find it:
Share via:

Peter Collingbourne

2014-Mar-21 20:46 UTC

[LLVMdev] [RFC] Simple control-flow integrity

On Fri, Mar 21, 2014 at 12:54:07PM -0700, Tom Roeder
wrote:> On Fri, Mar 21, 2014 at 12:15 PM, Peter Collingbourne <peter at
pcc.me.uk> wrote:
> >> The way I've implemented it (see the patch I sent to
llvm-commits
> >> yesterday), it's not just metadata: the intrinsic lowers to
the
> >> jumptable entry code given above. The CFI pass then generates a
> >> function for each jump table; the function consists solely of
these
> >> intrinsic calls.
> >
> > Well, the intrinsic you proposed has no effect on the caller and has
> > non-local effects on other specified functions. I'm not aware of
any other
> > intrinsic with similar behavior.
> 
> I agree that it's not very similar to other intrinsics. But I don't
> exactly follow these statements. There are definitely intrinsics that
> have no effect on the caller, like llvm.var.annotation.
Yes but the purpose of such intrinsics is to communicate information about
a specific value that may have an effect on analysis, optimization or code
generation for that caller. On the other hand, the intrinsic you are proposing
has nothing to do with the caller.
> And AFAIK,
> there is no non-local behavior: all the intrinsic does is lower to the
> labeled jump instruction; the changes to address-taken functions are
> done separately by the CFI pass. Note that in the patch I sent, the
> intrinsic only takes one argument: the function to jump to. Are there
> other effects in this case?
The non-local effect is that the intrinsic describes the definition of a
function in the global scope. Normally such definitions come from top-level
entities.
> So, maybe it would be better to call it something like
> @llvm.unconditional.jump(i8*)? I could then make it only lower to the
> jump and add an intrinsic that lowered to a function label as well.
I'd imagine that might present more problems. For example, if you used
either intrinsic in the middle of a regular function the behavior would not
necessarily be well defined. It would be necessary to carefully document
where and how these intrinsics may be used.

Thanks,
-- 
Peter

Tom Roeder

2014-Mar-25 19:36 UTC

head link

[LLVMdev] [RFC] Simple control-flow integrity

On Fri, Mar 21, 2014 at 1:46 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:> On Fri, Mar 21, 2014 at 12:54:07PM -0700, Tom Roeder wrote:
>> On Fri, Mar 21, 2014 at 12:15 PM, Peter Collingbourne <peter at
pcc.me.uk> wrote:
>> >> The way I've implemented it (see the patch I sent to
llvm-commits
>> >> yesterday), it's not just metadata: the intrinsic lowers
to the
>> >> jumptable entry code given above. The CFI pass then generates
a
>> >> function for each jump table; the function consists solely of
these
>> >> intrinsic calls.
>> >
>> > Well, the intrinsic you proposed has no effect on the caller and
has
>> > non-local effects on other specified functions. I'm not aware
of any other
>> > intrinsic with similar behavior.
>>
>> I agree that it's not very similar to other intrinsics. But I
don't
>> exactly follow these statements. There are definitely intrinsics that
>> have no effect on the caller, like llvm.var.annotation.
>
> Yes but the purpose of such intrinsics is to communicate information about
> a specific value that may have an effect on analysis, optimization or code
> generation for that caller. On the other hand, the intrinsic you are
proposing
> has nothing to do with the caller.
>
Thanks for the clarification. I started working on doing this with a
function attribute, but then I realized that I probably don't even
need that: I don't ever need the information about the jump tables to
persist to modules in bitcode form since the CFI pass is only meant to
be run at LTO time, when all the modules have already been collected
and merged. I just need to pass the mapping from address-taken
function to newly created jump function down to CodeGen so that the
table can be emitted at the right time, probably at the beginning of
the Module.

So, I think it would be even simpler for the CFI pass to declare the
jump functions and do the replacement of address-taken functions like
it currently does, and record this mapping in the Module. At
AsmPrinter time, the Module can then emit the appropriate jump
table(s) given this map, using target-overridden calls.

What do you think about this?

Thanks,

Tom

Tom Roeder

2014-Apr-03 00:28 UTC

head link

[LLVMdev] [RFC] Simple control-flow integrity

On Fri, Mar 21, 2014 at 1:46 PM, Peter Collingbourne <peter at
pcc.me.uk>wrote:
> On Fri, Mar 21, 2014 at 12:54:07PM -0700, Tom Roeder wrote:
> > On Fri, Mar 21, 2014 at 12:15 PM, Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
> > >> The way I've implemented it (see the patch I sent to
llvm-commits
> > >> yesterday), it's not just metadata: the intrinsic lowers
to the
> > >> jumptable entry code given above. The CFI pass then generates
a
> > >> function for each jump table; the function consists solely of
these
> > >> intrinsic calls.
> > >
> > > Well, the intrinsic you proposed has no effect on the caller and
has
> > > non-local effects on other specified functions. I'm not aware
of any
> other
> > > intrinsic with similar behavior.
> >
> > I agree that it's not very similar to other intrinsics. But I
don't
> > exactly follow these statements. There are definitely intrinsics that
> > have no effect on the caller, like llvm.var.annotation.
>
> Yes but the purpose of such intrinsics is to communicate information about
> a specific value that may have an effect on analysis, optimization or code
> generation for that caller. On the other hand, the intrinsic you are
> proposing
> has nothing to do with the caller.
>
> > And AFAIK,
> > there is no non-local behavior: all the intrinsic does is lower to the
> > labeled jump instruction; the changes to address-taken functions are
> > done separately by the CFI pass. Note that in the patch I sent, the
> > intrinsic only takes one argument: the function to jump to. Are there
> > other effects in this case?
>
> The non-local effect is that the intrinsic describes the definition of a
> function in the global scope. Normally such definitions come from top-level
> entities.
>
> > So, maybe it would be better to call it something like
> > @llvm.unconditional.jump(i8*)? I could then make it only lower to the
> > jump and add an intrinsic that lowered to a function label as well.
>
> I'd imagine that might present more problems. For example, if you used
> either intrinsic in the middle of a regular function the behavior would not
> necessarily be well defined. It would be necessary to carefully document
> where and how these intrinsics may be used.
>
Taking these considerations into account, I propose this:

Jump functions created for the jump table are actual functions, marked with
a new jumptable attribute, as well as naked and noreturn and optnone. Each
table is put in a special section using the section attribute, and
alignment of these functions is done using the align attribute. The
combination of these attributes means that the function has no preamble or
postamble, so it consists of exactly a global function label and its
instructions.

Then I think it would make sense to create an intrinsic like
llvm.jumptable.instr(i8*) that would be a placeholder for an unconditional
jump. I can add code to the verifier that insists on two conditions:

1. functions marked as jumptable must have exactly two instructions: an
llvm.jumptable.instr followed by a unreachable.
2. llvm.jumptable.instr can only occur in a jumptable function.

I think this handles the problem of using llvm.jumptable.instr in normal
functions: that isn't allowed. And I think it deals with the problems of
non-local behavior by making functions out of the things that really are
functions. Each entry in the jump table really is a function, albeit a
rather strange one, so it should look like a function at the IR level.

In other words, this change would add a new attribute that marks a very
specialized kind of function and an intrinsic that can only occur in this
kind of function.

What do you think?

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140402/3108634e/attachment.html>

Peter Collingbourne

2014-Apr-04 01:28 UTC

head link

[LLVMdev] [RFC] Simple control-flow integrity

On Wed, Apr 02, 2014 at 05:28:04PM -0700, Tom Roeder
wrote:> On Fri, Mar 21, 2014 at 1:46 PM, Peter Collingbourne <peter at
pcc.me.uk>wrote:
> 
> > On Fri, Mar 21, 2014 at 12:54:07PM -0700, Tom Roeder wrote:
> > > On Fri, Mar 21, 2014 at 12:15 PM, Peter Collingbourne <peter
at pcc.me.uk>
> > wrote:
> > > >> The way I've implemented it (see the patch I sent to
llvm-commits
> > > >> yesterday), it's not just metadata: the intrinsic
lowers to the
> > > >> jumptable entry code given above. The CFI pass then
generates a
> > > >> function for each jump table; the function consists
solely of these
> > > >> intrinsic calls.
> > > >
> > > > Well, the intrinsic you proposed has no effect on the caller
and has
> > > > non-local effects on other specified functions. I'm not
aware of any
> > other
> > > > intrinsic with similar behavior.
> > >
> > > I agree that it's not very similar to other intrinsics. But I
don't
> > > exactly follow these statements. There are definitely intrinsics
that
> > > have no effect on the caller, like llvm.var.annotation.
> >
> > Yes but the purpose of such intrinsics is to communicate information
about
> > a specific value that may have an effect on analysis, optimization or
code
> > generation for that caller. On the other hand, the intrinsic you are
> > proposing
> > has nothing to do with the caller.
> >
> > > And AFAIK,
> > > there is no non-local behavior: all the intrinsic does is lower
to the
> > > labeled jump instruction; the changes to address-taken functions
are
> > > done separately by the CFI pass. Note that in the patch I sent,
the
> > > intrinsic only takes one argument: the function to jump to. Are
there
> > > other effects in this case?
> >
> > The non-local effect is that the intrinsic describes the definition of
a
> > function in the global scope. Normally such definitions come from
top-level
> > entities.
> >
> > > So, maybe it would be better to call it something like
> > > @llvm.unconditional.jump(i8*)? I could then make it only lower to
the
> > > jump and add an intrinsic that lowered to a function label as
well.
> >
> > I'd imagine that might present more problems. For example, if you
used
> > either intrinsic in the middle of a regular function the behavior
would not
> > necessarily be well defined. It would be necessary to carefully
document
> > where and how these intrinsics may be used.
> >
> 
> Taking these considerations into account, I propose this:
> 
> Jump functions created for the jump table are actual functions, marked with
> a new jumptable attribute, as well as naked and noreturn and optnone. Each
> table is put in a special section using the section attribute, and
> alignment of these functions is done using the align attribute. The
> combination of these attributes means that the function has no preamble or
> postamble, so it consists of exactly a global function label and its
> instructions.
> 
> Then I think it would make sense to create an intrinsic like
> llvm.jumptable.instr(i8*) that would be a placeholder for an unconditional
> jump. I can add code to the verifier that insists on two conditions:
> 
> 1. functions marked as jumptable must have exactly two instructions: an
> llvm.jumptable.instr followed by a unreachable.
> 2. llvm.jumptable.instr can only occur in a jumptable function.
> 
> I think this handles the problem of using llvm.jumptable.instr in normal
> functions: that isn't allowed. And I think it deals with the problems
of
> non-local behavior by making functions out of the things that really are
> functions. Each entry in the jump table really is a function, albeit a
> rather strange one, so it should look like a function at the IR level.
> 
> In other words, this change would add a new attribute that marks a very
> specialized kind of function and an intrinsic that can only occur in this
> kind of function.
> 
> What do you think?
I think you might be close.

The llvm.jumptable.instr intrinsic you have proposed is similar to the
'musttail' call marker that Reid (cc'd) is working on in [1]. That
marker will
also have verifier support. I think it might be reasonable to require the
code generator to emit the body of a function containing only a
'musttail'
function call as a single branch instruction. Then you could emit such
functions in the jump table section.

Reid, what do you think?

Thanks,
-- 
Peter

[1] http://llvm-reviews.chandlerc.com/D3240

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Apr 2014 - [LLVMdev] [RFC] Simple control-flow integrity

[LLVMdev] [RFC] Simple control-flow integrity

[LLVMdev] [RFC] Simple control-flow integrity

[LLVMdev] [RFC] Simple control-flow integrity

[LLVMdev] [RFC] Simple control-flow integrity

Apparently Analagous Threads