thr3ads.net - llvm dev - [llvm-dev] [RFC] Using Intel MPX to harden SafeStack [Feb 2017]

If this information is useful, please help other people find it:
Share via:

LeMay, Michael via llvm-dev

2017-Feb-08 00:05 UTC

[llvm-dev] [RFC] Using Intel MPX to harden SafeStack

Hi,

I previously posted about using 32-bit X86 segmentation to harden SafeStack:
http://lists.llvm.org/pipermail/llvm-dev/2016-May/100346.html  That involves
lowering the limits of the DS and ES segments that are used for ordinary data
accesses while leaving the limit for SS, the stack segment, set to its maximum
value.  The safe stacks were clustered above the limits of DS and ES.  Thus, by
directing individual memory operands to either DS/ES or SS, stray pointer writes
that could otherwise corrupt the safe stack would be blocked by the segmentation
checks.  My proposed compiler modifications inspect memory operands to determine
whether the compiler (or more specifically, the SafeStack pass) intends that
they be allowed to access the safe stack.  It then inserts segment override
prefixes and related instructions as necessary.

I submitted patches today to implement an analogous idea in 64-bit mode using
Intel MPX.  MPX can be used both to enforce fine-grained per-object bounds and
coarse-grained bounds.  My patches use it for the latter purpose, so they make
no use of the table-related instructions in MPX.  The runtime library [1] simply
initializes one bounds register, BND0, to have an upper bound that is set below
all safe stacks and above all ordinary data.  A pre-isel patch instruments
stores that are not authorized to access the safe stack by preceding each such
instruction with a BNDCU instruction.  That checks whether the following store
accesses memory that is entirely below the upper bound in BND0 [2].  Loads are
not instrumented, since the purpose of the checks is only to help prevent
corruption of the safe stacks.  Authorized safe stack accesses are not
instrumented, since the SafeStack pass is responsible for verifying that such
accesses do not corrupt the safe stack.  The default handler is used when a
bound check fails, which results in the program being terminated on the systems
where I have performed tests.

To reduce the performance and size overhead from instrumenting the code, both
the pre-isel patch and a pre-emit patch elide various checks [2, 3].  The
pre-isel patch uses techniques derived from the BoundsChecking pass to
statically verify that some stores are safe so that the checks for those stores
can be elided.  The pre-emit patch compares the bound checks in each basic block
and combines those that are redundant.  The contents of BND0 are static, so a
successful check of a higher address implies that any check of a lower address
will also succeed.  Thus, if a check of a higher address precedes a check of a
lower address in a basic block, the latter check can be erased.  On the other
hand, if a check of a lower address precedes a check of a higher address in a
basic block, then the latter check can still be erased, but it is also necessary
to use the higher address in the remaining check.  However, my pass is only able
to statically compare certain addresses, which limits the checks that can be
combined.  For example, if two addresses use the same base and index registers
and scale along with a simple displacement, then my pass may be able to compare
them.  However, if either the base or the index register is redefined by an
instruction between the two checks, then my pass is currently unable to compare
the two addresses.  Incidentally, the pre-emit pass uses the getAddressFromInstr
routine, which needs to be patched to properly handle certain global variable
addresses [7].  The pre-emit pass also erases checks for addresses that do not
specify a base or index register as well as those that specify a RIP-relative
offset with no index register.  I think that the source code would need to be
quite malformed to corrupt safe stacks using such address types.  Additional
optimizations may be possible in the future, such as lifting checks out of loops
or otherwise performing inter-basic block analysis to identify additional
redundant checks.

The pre-emit pass also erases bound checks for accesses relative to a
non-default segment, such as thread-local accesses relative to FS.  Linear
addresses for thread-local accesses are computed with a non-zero segment base
address, so it would be necessary to check thread-local effective addresses
against a bounds register with an upper bound that is adjusted down to account
for that rather than the bounds register that is used for checking other
accesses.  However, negative offsets are sometimes used for thread-local
accesses, which are treated as very large unsigned effective addresses. 
Checking them would require them to first be added to the base of the
thread-local storage segment.

Developers can use the -mseparate-stack-seg flag to enable instrumentation of
functions that have the SafeStack attribute [4, 6].  That flag also causes the
runtime library to be linked [5].

Due to BND0 being treated as per-thread state, the runtime library picks an
initial BND0 upper bound when the program starts that is arbitrarily set to be
256MiB below the base of the initial (safe) stack.  If and when that 256MiB
space becomes overfilled by safe stacks, the program will crash due to a failing
CHECK_GE statement in the runtime library.  Without this check, an adversary may
be able to modify a variable in the runtime library recording the address of the
most-recently allocated safe stack to cause safe stacks to be allocated in
vulnerable locations.  An alternative approach to avoid that limitation could be
to store that variable above the bound checked by the instrumented code.  This
could help to prevent adversaries from forcing safe stacks to be allocated at
vulnerable locations while still allowing the program to keep running even when
its safe stacks protrude below the bound.  Of course, the protruding portions of
the safe stacks would be vulnerable.  Another alternative could be to treat the
MPX bounds registers as per-program state rather than per-thread state.  BND0
could then be adjusted downwards as necessary when new safe stacks are
allocated.

The runtime library currently ignores all attribute settings passed to
pthread_create.  It allocates a safe stack itself at a high address. 
Furthermore, the runtime library currently does not support expansion of the
safe stacks, nor does it free the safe stacks that it allocates.

If the BND0 upper bound happens to fall below some ordinary data, then attempted
accesses to that data by instrumented stores will violate the associated bound
checks.

The runtime library checks for MPX support when it initializes, and it falls
back to the default (ASLR-based) safe stack protections if MPX is unavailable. 
However, (inactive) bound check instructions in the program may still impose
size and performance overheads.

There could conceivably be a situation in which an instrumented program passes a
function pointer for an instrumented callback to an uninstrumented library.  If
that library allocates an object on the (safe) stack and passes its pointer to
the callback, then a bound check violation could result.  This is due to an
assumption in the pass that instruments the code.  It assumes that all pointer
arguments except those with the byval or readnone attributes point to the unsafe
stack.  I think this is a valid assumption when all stack frames correspond to
instrumented functions.  However, it is not a valid assumption in the scenario
described above.  Such bound check violations can be avoided by not
instrumenting such callbacks as well as any functions to which they pass
pointers to any allocations on the safe stack.

Comments appreciated.

Thanks,
Michael

[1] [safestack] Add runtime support for MPX-based hardening:
https://reviews.llvm.org/D29657
[2] [X86] Add X86SafeStackBoundsChecking pass: https://reviews.llvm.org/D29649
[3] [X86] Add X86SafeStackBoundsCheckingCombiner pass:
https://reviews.llvm.org/D29652
[4] [X86] Add -mseparate-stack-seg: https://reviews.llvm.org/D17092 
[5] [X86] Link safestacksepseg runtime: https://reviews.llvm.org/D29655
[6] [X86] Add separate-stack-seg feature: https://reviews.llvm.org/D29646
[7] [x86] Fix getAddressFromInstr: https://reviews.llvm.org/D27169

Kostya Serebryany via llvm-dev

2017-Feb-08 00:11 UTC

head link

[llvm-dev] [RFC] Using Intel MPX to harden SafeStack

(explicitly CC-ing more folks, just in case)

On Tue, Feb 7, 2017 at 4:05 PM, LeMay, Michael via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
> I previously posted about using 32-bit X86 segmentation to harden
> SafeStack: http://lists.llvm.org/pipermail/llvm-dev/2016-May/100346.html
> That involves lowering the limits of the DS and ES segments that are used
> for ordinary data accesses while leaving the limit for SS, the stack
> segment, set to its maximum value.  The safe stacks were clustered above
> the limits of DS and ES.  Thus, by directing individual memory operands to
> either DS/ES or SS, stray pointer writes that could otherwise corrupt the
> safe stack would be blocked by the segmentation checks.  My proposed
> compiler modifications inspect memory operands to determine whether the
> compiler (or more specifically, the SafeStack pass) intends that they be
> allowed to access the safe stack.  It then inserts segment override
> prefixes and related instructions as necessary.
>
> I submitted patches today to implement an analogous idea in 64-bit mode
> using Intel MPX.  MPX can be used both to enforce fine-grained per-object
> bounds and coarse-grained bounds.  My patches use it for the latter
> purpose, so they make no use of the table-related instructions in MPX.  The
> runtime library [1] simply initializes one bounds register, BND0, to have
> an upper bound that is set below all safe stacks and above all ordinary
> data.  A pre-isel patch instruments stores that are not authorized to
> access the safe stack by preceding each such instruction with a BNDCU
> instruction.  That checks whether the following store accesses memory that
> is entirely below the upper bound in BND0 [2].  Loads are not instrumented,
> since the purpose of the checks is only to help prevent corruption of the
> safe stacks.  Authorized safe stack accesses are not instrumented, since
> the SafeStack pass is responsible for verifying that such accesses do not
> corrupt the safe stack.  The default handler is used when a bound check
> fails, which results in the program being terminated on the systems where I
> have performed tests.
>
> To reduce the performance and size overhead from instrumenting the code,
> both the pre-isel patch and a pre-emit patch elide various checks [2, 3].
> The pre-isel patch uses techniques derived from the BoundsChecking pass to
> statically verify that some stores are safe so that the checks for those
> stores can be elided.  The pre-emit patch compares the bound checks in each
> basic block and combines those that are redundant.  The contents of BND0
> are static, so a successful check of a higher address implies that any
> check of a lower address will also succeed.  Thus, if a check of a higher
> address precedes a check of a lower address in a basic block, the latter
> check can be erased.  On the other hand, if a check of a lower address
> precedes a check of a higher address in a basic block, then the latter
> check can still be erased, but it is also necessary to use the higher
> address in the remaining check.  However, my pass is only able to
> statically compare certain addresses, which limits the checks that can be
> combined.  For example, if two addresses use the same base and index
> registers and scale along with a simple displacement, then my pass may be
> able to compare them.  However, if either the base or the index register is
> redefined by an instruction between the two checks, then my pass is
> currently unable to compare the two addresses.  Incidentally, the pre-emit
> pass uses the getAddressFromInstr routine, which needs to be patched to
> properly handle certain global variable addresses [7].  The pre-emit pass
> also erases checks for addresses that do not specify a base or index
> register as well as those that specify a RIP-relative offset with no index
> register.  I think that the source code would need to be quite malformed to
> corrupt safe stacks using such address types.  Additional optimizations may
> be possible in the future, such as lifting checks out of loops or otherwise
> performing inter-basic block analysis to identify additional redundant
> checks.
>
> The pre-emit pass also erases bound checks for accesses relative to a
> non-default segment, such as thread-local accesses relative to FS.  Linear
> addresses for thread-local accesses are computed with a non-zero segment
> base address, so it would be necessary to check thread-local effective
> addresses against a bounds register with an upper bound that is adjusted
> down to account for that rather than the bounds register that is used for
> checking other accesses.  However, negative offsets are sometimes used for
> thread-local accesses, which are treated as very large unsigned effective
> addresses.  Checking them would require them to first be added to the base
> of the thread-local storage segment.
>
> Developers can use the -mseparate-stack-seg flag to enable instrumentation
> of functions that have the SafeStack attribute [4, 6].  That flag also
> causes the runtime library to be linked [5].
>
> Due to BND0 being treated as per-thread state, the runtime library picks
> an initial BND0 upper bound when the program starts that is arbitrarily set
> to be 256MiB below the base of the initial (safe) stack.  If and when that
> 256MiB space becomes overfilled by safe stacks, the program will crash due
> to a failing CHECK_GE statement in the runtime library.  Without this
> check, an adversary may be able to modify a variable in the runtime library
> recording the address of the most-recently allocated safe stack to cause
> safe stacks to be allocated in vulnerable locations.  An alternative
> approach to avoid that limitation could be to store that variable above the
> bound checked by the instrumented code.  This could help to prevent
> adversaries from forcing safe stacks to be allocated at vulnerable
> locations while still allowing the program to keep running even when its
> safe stacks protrude below the bound.  Of course, the protruding portions
> of the safe stacks would be vulnerable.  Another alternative could be to
> treat the MPX bounds registers as per-program state rather than per-thread
> state.  BND0 could then be adjusted downwards as necessary when new safe
> stacks are allocated.
>
> The runtime library currently ignores all attribute settings passed to
> pthread_create.  It allocates a safe stack itself at a high address.
> Furthermore, the runtime library currently does not support expansion of
> the safe stacks, nor does it free the safe stacks that it allocates.
>
> If the BND0 upper bound happens to fall below some ordinary data, then
> attempted accesses to that data by instrumented stores will violate the
> associated bound checks.
>
> The runtime library checks for MPX support when it initializes, and it
> falls back to the default (ASLR-based) safe stack protections if MPX is
> unavailable.  However, (inactive) bound check instructions in the program
> may still impose size and performance overheads.
>
> There could conceivably be a situation in which an instrumented program
> passes a function pointer for an instrumented callback to an uninstrumented
> library.  If that library allocates an object on the (safe) stack and
> passes its pointer to the callback, then a bound check violation could
> result.  This is due to an assumption in the pass that instruments the
> code.  It assumes that all pointer arguments except those with the byval or
> readnone attributes point to the unsafe stack.  I think this is a valid
> assumption when all stack frames correspond to instrumented functions.
> However, it is not a valid assumption in the scenario described above.
> Such bound check violations can be avoided by not instrumenting such
> callbacks as well as any functions to which they pass pointers to any
> allocations on the safe stack.
>
> Comments appreciated.
>
> Thanks,
> Michael
>
> [1] [safestack] Add runtime support for MPX-based hardening:
> https://reviews.llvm.org/D29657
> [2] [X86] Add X86SafeStackBoundsChecking pass: https://reviews.llvm.org/
> D29649
> [3] [X86] Add X86SafeStackBoundsCheckingCombiner pass:
> https://reviews.llvm.org/D29652
> [4] [X86] Add -mseparate-stack-seg: https://reviews.llvm.org/D17092
> [5] [X86] Link safestacksepseg runtime: https://reviews.llvm.org/D29655
> [6] [X86] Add separate-stack-seg feature: https://reviews.llvm.org/D29646
> [7] [x86] Fix getAddressFromInstr: https://reviews.llvm.org/D27169
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170207/c6f802fd/attachment.html>

Kostya Serebryany via llvm-dev

2017-Feb-08 04:02 UTC

head link

[llvm-dev] [RFC] Using Intel MPX to harden SafeStack

On Tue, Feb 7, 2017 at 4:05 PM, LeMay, Michael via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
> I previously posted about using 32-bit X86 segmentation to harden
> SafeStack: http://lists.llvm.org/pipermail/llvm-dev/2016-May/100346.html
> That involves lowering the limits of the DS and ES segments that are used
> for ordinary data accesses while leaving the limit for SS, the stack
> segment, set to its maximum value.  The safe stacks were clustered above
> the limits of DS and ES.  Thus, by directing individual memory operands to
> either DS/ES or SS, stray pointer writes that could otherwise corrupt the
> safe stack would be blocked by the segmentation checks.  My proposed
> compiler modifications inspect memory operands to determine whether the
> compiler (or more specifically, the SafeStack pass) intends that they be
> allowed to access the safe stack.  It then inserts segment override
> prefixes and related instructions as necessary.
>
> I submitted patches today to implement an analogous idea in 64-bit mode
> using Intel MPX.  MPX can be used both to enforce fine-grained per-object
> bounds and coarse-grained bounds.  My patches use it for the latter
> purpose, so they make no use of the table-related instructions in MPX.

That's a relief :)

> The runtime library [1] simply initializes one bounds register, BND0, to
> have an upper bound that is set below all safe stacks and above all
> ordinary data.

So you enforce that safe stacks and other data are not intermixed, as you
explain below.
What are the downsides? Performance? Compatibility?

> A pre-isel patch instruments stores that are not authorized to access the
> safe stack by preceding each such instruction with a BNDCU instruction.

My understanding is that BNDCU is the cheapest possible instruction, just
like XOR or ADD,
so the overhead should be relatively small.
Still my guesstimate would be >= 5% since stores are very numerous.
And such overhead will be on top of whatever overhead SafeStack has.
Do you have any measurements to share?


> That checks whether the following store accesses memory that is entirely
> below the upper bound in BND0 [2].  Loads are not instrumented, since the
> purpose of the checks is only to help prevent corruption of the safe
> stacks.  Authorized safe stack accesses are not instrumented, since the
> SafeStack pass is responsible for verifying that such accesses do not
> corrupt the safe stack.  The default handler is used when a bound check
> fails, which results in the program being terminated on the systems where I
> have performed tests.
>
> To reduce the performance and size overhead from instrumenting the code,
> both the pre-isel patch and a pre-emit patch elide various checks [2, 3].
> The pre-isel patch uses techniques derived from the BoundsChecking pass to
> statically verify that some stores are safe so that the checks for those
> stores can be elided.  The pre-emit patch compares the bound checks in each
> basic block and combines those that are redundant.  The contents of BND0
> are static, so a successful check of a higher address implies that any
> check of a lower address will also succeed.  Thus, if a check of a higher
> address precedes a check of a lower address in a basic block, the latter
> check can be erased.  On the other hand, if a check of a lower address
> precedes a check of a higher address in a basic block, then the latter
> check can still be erased, but it is also necessary to use the higher
> address in the remaining check.  However, my pass is only able to
> statically compare certain addresses, which limits the checks that can be
> combined.  For example, if two addresses use the same base and index
> registers and scale along with a simple displacement, then my pass may be
> able to compare them.  However, if either the base or the index register is
> redefined by an instruction between the two checks, then my pass is
> currently unable to compare the two addresses.

The usual question in such situation: how do we verify that the
optimizations are not too optimistic?
If we remove a check that is not in fact redundant, we will never know,
until clever folks use it for an exploit (and maybe not even then).

> Incidentally, the pre-emit pass uses the getAddressFromInstr routine,
> which needs to be patched to properly handle certain global variable
> addresses [7].  The pre-emit pass also erases checks for addresses that do
> not specify a base or index register as well as those that specify a
> RIP-relative offset with no index register.  I think that the source code
> would need to be quite malformed to corrupt safe stacks using such address
> types.  Additional optimizations may be possible in the future, such as
> lifting checks out of loops or otherwise performing inter-basic block
> analysis to identify additional redundant checks.
>
> The pre-emit pass also erases bound checks for accesses relative to a
> non-default segment, such as thread-local accesses relative to FS.  Linear
> addresses for thread-local accesses are computed with a non-zero segment
> base address, so it would be necessary to check thread-local effective
> addresses against a bounds register with an upper bound that is adjusted
> down to account for that rather than the bounds register that is used for
> checking other accesses.  However, negative offsets are sometimes used for
> thread-local accesses, which are treated as very large unsigned effective
> addresses.  Checking them would require them to first be added to the base
> of the thread-local storage segment.
>
> Developers can use the -mseparate-stack-seg flag to enable instrumentation
> of functions that have the SafeStack attribute [4, 6].  That flag also
> causes the runtime library to be linked [5].
>
> Due to BND0 being treated as per-thread state, the runtime library picks
> an initial BND0 upper bound when the program starts that is arbitrarily set
> to be 256MiB below the base of the initial (safe) stack.  If and when that
> 256MiB space becomes overfilled by safe stacks, the program will crash due
> to a failing CHECK_GE statement in the runtime library.  Without this
> check, an adversary may be able to modify a variable in the runtime library
> recording the address of the most-recently allocated safe stack to cause
> safe stacks to be allocated in vulnerable locations.  An alternative
> approach to avoid that limitation could be to store that variable above the
> bound checked by the instrumented code.  This could help to prevent
> adversaries from forcing safe stacks to be allocated at vulnerable
> locations while still allowing the program to keep running even when its
> safe stacks protrude below the bound.  Of course, the protruding portions
> of the safe stacks would be vulnerable.  Another alternative could be to
> treat the MPX bounds registers as per-program state rather than per-thread
> state.  BND0 could then be adjusted downwards as necessary when new safe
> stacks are allocated.
>
> The runtime library currently ignores all attribute settings passed to
> pthread_create.  It allocates a safe stack itself at a high address.
> Furthermore, the runtime library currently does not support expansion of
> the safe stacks, nor does it free the safe stacks that it allocates.
>
> If the BND0 upper bound happens to fall below some ordinary data, then
> attempted accesses to that data by instrumented stores will violate the
> associated bound checks.
>
> The runtime library checks for MPX support when it initializes, and it
> falls back to the default (ASLR-based) safe stack protections if MPX is
> unavailable.  However, (inactive) bound check instructions in the program
> may still impose size and performance overheads.
>
> There could conceivably be a situation in which an instrumented program
> passes a function pointer for an instrumented callback to an uninstrumented
> library.  If that library allocates an object on the (safe) stack and
> passes its pointer to the callback, then a bound check violation could
> result.  This is due to an assumption in the pass that instruments the
> code.  It assumes that all pointer arguments except those with the byval or
> readnone attributes point to the unsafe stack.  I think this is a valid
> assumption when all stack frames correspond to instrumented functions.
> However, it is not a valid assumption in the scenario described above.
> Such bound check violations can be avoided by not instrumenting such
> callbacks as well as any functions to which they pass pointers to any
> allocations on the safe stack.
>
> Comments appreciated.
>
> Thanks,
> Michael
>
> [1] [safestack] Add runtime support for MPX-based hardening:
> https://reviews.llvm.org/D29657
> [2] [X86] Add X86SafeStackBoundsChecking pass: https://reviews.llvm.org/
> D29649
> [3] [X86] Add X86SafeStackBoundsCheckingCombiner pass:
> https://reviews.llvm.org/D29652
> [4] [X86] Add -mseparate-stack-seg: https://reviews.llvm.org/D17092
> [5] [X86] Link safestacksepseg runtime: https://reviews.llvm.org/D29655
> [6] [X86] Add separate-stack-seg feature: https://reviews.llvm.org/D29646
> [7] [x86] Fix getAddressFromInstr: https://reviews.llvm.org/D27169
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
Thanks!
--kcc
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170207/cde9479f/attachment.html>

LeMay, Michael via llvm-dev

2017-Feb-09 00:51 UTC

head link

[llvm-dev] [RFC] Using Intel MPX to harden SafeStack

On 2/7/2017 20:02, Kostya Serebryany wrote:> On Tue, Feb 7, 2017 at 4:05 PM, LeMay, Michael via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>
...>
>
>     The runtime library [1] simply initializes one bounds register,
>     BND0, to have an upper bound that is set below all safe stacks and
>     above all ordinary data. 
>
>
> So you enforce that safe stacks and other data are not intermixed, as 
> you explain below.
> What are the downsides? Performance? Compatibility?
I think the main downside is that only a limited number of threads can 
be created before the safe stacks would protrude below the bound.  
Extending the proposed runtime library to deallocate safe stacks when 
they are no longer needed may help with this.  The safe stacks are also 
prevented from expanding, since they are allocated contiguously at high 
addresses.
>     A pre-isel patch instruments stores that are not authorized to
>     access the safe stack by preceding each such instruction with a
>     BNDCU instruction. 
>
>
> My understanding is that BNDCU is the cheapest possible instruction, 
> just like XOR or ADD,
> so the overhead should be relatively small.
> Still my guesstimate would be >= 5% since stores are very numerous.
> And such overhead will be on top of whatever overhead SafeStack has.
> Do you have any measurements to share?
I'm working on getting approval to release some benchmark results.
>     That checks whether the following store accesses memory that is
>     entirely below the upper bound in BND0 [2].  Loads are not
>     instrumented, since the purpose of the checks is only to help
>     prevent corruption of the safe stacks. Authorized safe stack
>     accesses are not instrumented, since the SafeStack pass is
>     responsible for verifying that such accesses do not corrupt the
>     safe stack.  The default handler is used when a bound check fails,
>     which results in the program being terminated on the systems where
>     I have performed tests.
>
>     To reduce the performance and size overhead from instrumenting the
>     code, both the pre-isel patch and a pre-emit patch elide various
>     checks [2, 3].  The pre-isel patch uses techniques derived from
>     the BoundsChecking pass to statically verify that some stores are
>     safe so that the checks for those stores can be elided.  The
>     pre-emit patch compares the bound checks in each basic block and
>     combines those that are redundant.  The contents of BND0 are
>     static, so a successful check of a higher address implies that any
>     check of a lower address will also succeed. Thus, if a check of a
>     higher address precedes a check of a lower address in a basic
>     block, the latter check can be erased.  On the other hand, if a
>     check of a lower address precedes a check of a higher address in a
>     basic block, then the latter check can still be erased, but it is
>     also necessary to use the higher address in the remaining check. 
>     However, my pass is only able to statically compare certain
>     addresses, which limits the checks that can be combined.  For
>     example, if two addresses use the same base and index registers
>     and scale along with a simple displacement, then my pass may be
>     able to compare them.  However, if either the base or the index
>     register is redefined by an instruction between the two checks,
>     then my pass is currently unable to compare the two addresses. 
>
>
> The usual question in such situation: how do we verify that the 
> optimizations are not too optimistic?
> If we remove a check that is not in fact redundant, we will never 
> know, until clever folks use it for an exploit (and maybe not even then).
The pre-emit pass is able to verify that some checks are redundant by 
inspecting the operands used to specify an address.  For example, 
consider the following test for the pre-emit pass:

     0: %rax = MOVSX64rr32 killed %edi
     1: INLINEASM $"bndcu $0, %bnd0", 8, 196654, _, 8, %rax, @x + 4, _
     ; CHECK: INLINEASM $"bndcu $0, %bnd0", 8, 196654, _, 8, %rax, @x
+ 8, _
     2: MOV32mi _, 8, %rax, @x, _, 0
     3: INLINEASM $"bndcu $0, %bnd0", 8, 196654, _, 8, %rax, @x + 8, _
     ; CHECK-NOT: INLINEASM $"bndcu $0, %bnd0", 8, 196654, _, 8, %rax,
@x + 8, _
     4: MOV32mi _, 8, killed %rax, @x + 4, _, 0

The pass verifies that the only difference between the memory operands 
in instructions 1 and 3 is that they use a different offset from the 
global variable, so they can be combined.  The pass also tracks register 
definitions, so it would know not to combine the checks in this example 
if there had been an instruction that redefined %rax between 
instructions 1 and 3.

On the other hand, some of the optimizations described in the next 
couple of paragraphs may be optimistic, so I especially welcome feedback 
on them:

...
>     The pre-emit pass also erases checks for addresses that do not
>     specify a base or index register as well as those that specify a
>     RIP-relative offset with no index register.  I think that the
>     source code would need to be quite malformed to corrupt safe
>     stacks using such address types.
>
...>
>     The pre-emit pass also erases bound checks for accesses relative
>     to a non-default segment, such as thread-local accesses relative
>     to FS.  Linear addresses for thread-local accesses are computed
>     with a non-zero segment base address, so it would be necessary to
>     check thread-local effective addresses against a bounds register
>     with an upper bound that is adjusted down to account for that
>     rather than the bounds register that is used for checking other
>     accesses.  However, negative offsets are sometimes used for
>     thread-local accesses, which are treated as very large unsigned
>     effective addresses. Checking them would require them to first be
>     added to the base of the thread-local storage segment.
>...

Thanks,
Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170208/3c7e4f66/attachment.html>

LeMay, Michael via llvm-dev

2017-Feb-18 01:27 UTC

head link

[llvm-dev] [RFC] Using Intel MPX to harden SafeStack

On 2/7/2017 20:02, Kostya Serebryany wrote:> ...
>
> My understanding is that BNDCU is the cheapest possible instruction, 
> just like XOR or ADD,
> so the overhead should be relatively small.
> Still my guesstimate would be >= 5% since stores are very numerous.
> And such overhead will be on top of whatever overhead SafeStack has.
> Do you have any measurements to share?
>Here are estimated SPECint_base2006 component runtimes for some relevant 
test configurations:

Runtime in seconds:

+--------------+---------+---------+---------+-------+

|Benchmark|Unpatched|Unpatched|Patched|MPX|

||Base|SafeStack|SafeStack||

+--------------+---------+---------+---------+-------+

|400.perlbench |430.82|443.07|442.42|456.34 |

+--------------+---------+---------+---------+-------+

|401.bzip2|711.43|716.59|717.35|750.06 |

+--------------+---------+---------+---------+-------+

|403.gcc|333.76|334.11|334.95|336.13 |

+--------------+---------+---------+---------+-------+

|429.mcf|371.48|375.75|373.50|377.93 |

+--------------+---------+---------+---------+-------+

|445.gobmk|677.80|686.12|685.50|702.87 |

+--------------+---------+---------+---------+-------+

|456.hmmer|534.94|533.68|534.37|553.40 |

+--------------+---------+---------+---------+-------+

|458.sjeng|633.69|641.21|641.81|655.94 |

+--------------+---------+---------+---------+-------+

|462.libquantum|362.82|367.00|367.38|382.14 |

+--------------+---------+---------+---------+-------+

|464.h264ref|701.37|682.13|683.41|699.93 |

+--------------+---------+---------+---------+-------+

|471.omnetpp|397.04|407.38|407.33|411.36 |

+--------------+---------+---------+---------+-------+

|473.astar|611.51|610.46|610.19|624.78 |

+--------------+---------+---------+---------+-------+

|483.xalancbmk |291.66|295.61|296.42|298.29 |

+--------------+---------+---------+---------+-------+

|SUM |6058.32|6093.10|6094.62|6249.16|

+--------------+---------+---------+---------+-------+


These runtimes are estimates as benchmark runs for research purposes 
built with patched/experimental compilers cannot be benchmark compliant. 
Compilation flags that aren't yet fully documented also can not be 
compliant.


Percentage changes in runtime relative to Unpatched Base:

+--------------+---------+---------+-----+

|Benchmark|Unpatched|Patched|MPX|

||SafeStack|SafeStack||

+--------------+---------+---------+-----+

|400.perlbench |2.84|2.69|5.93 |

+--------------+---------+---------+-----+

|401.bzip2|0.73|0.83|5.43 |

+--------------+---------+---------+-----+

|403.gcc|0.10|0.36|0.71 |

+--------------+---------+---------+-----+

|429.mcf|1.15|0.54|1.74 |

+--------------+---------+---------+-----+

|445.gobmk|1.23|1.14|3.70 |

+--------------+---------+---------+-----+

|456.hmmer|-0.24|-0.11|3.45 |

+--------------+---------+---------+-----+

|458.sjeng|1.19|1.28|3.51 |

+--------------+---------+---------+-----+

|462.libquantum|1.15|1.26|5.32 |

+--------------+---------+---------+-----+

|464.h264ref|-2.74|-2.56|-0.21|

+--------------+---------+---------+-----+

|471.omnetpp|2.60|2.59|3.61 |

+--------------+---------+---------+-----+

|473.astar|-0.17|-0.21|2.17 |

+--------------+---------+---------+-----+

|483.xalancbmk |1.35|1.63|2.27 |

+--------------+---------+---------+-----+

|SUM |0.57|0.60|3.15 |

+--------------+---------+---------+-----+


These measurements were collected on an Intel NUC6i5SY with an Intel 
Core i5-6260U CPU and 32G RAM running Clear Linux 13330.  Intel 
Hyper-Threading, Intel Turbo Boost, and the LAN were all disabled.  I 
used SPEC CPU2006 v1.2 and started the Clang/LLVM port from the gcc 4.6 
Linux x86 example file included in the SPEC CPU 2006 kit.


Here is the legend for the various test configurations:

- Unpatched Base: Unpatched compiler with SafeStack disabled.This is the 
reference configuration.

- Unpatched SafeStack: Unpatched compiler with SafeStack enabled.

- Patched SafeStack: Patched compiler with SafeStack enabled.However, 
MPX-based hardening is not enabled in this configuration.This 
configuration is intended to show the effect of the Compiler-RT patches 
on programs that do not enable MPX-based hardening.

- MPX: Patched compiler with MPX-hardened SafeStack enabled.

The unpatched compiler was built from the following SVN IDs:

- LLVM: 292171 from January 16, 2017

- Clang: 292141 from January 16, 2017

- Compiler-RT: 291346 from January 7, 2017

The patched compiler was built with the current posted versions of my 
patches applied on top of the SVN IDs listed above.

The following compiler settings in the SPEC CPU2006 cfg files were used 
for each configuration:

COPTIMIZE:

- Unpatched Base: -std=gnu89 -O2 -fno-strict-aliasing -march=skylake 
-mtune=skylake

- Unpatched/Patched SafeStack: -std=gnu89 -O2 -fno-strict-aliasing 
-march=skylake -mtune=skylake -fsanitize=safe-stack

- MPX: -std=gnu89 -O2 -fno-strict-aliasing -march=skylake -mtune=skylake 
-mseparate-stack-seg -fsanitize=safe-stack

CXXOPTIMIZE:

- Unpatched Base: -O2 -fno-strict-aliasing -march=skylake -mtune=skylake

- Unpatched/Patched SafeStack: -O2 -fno-strict-aliasing -march=skylake 
-mtune=skylake -fsanitize=safe-stack

- MPX: -O2 -fno-strict-aliasing -march=skylake -mtune=skylake 
-mseparate-stack-seg -fsanitize=safe-stack


The FOPTIMIZE settings are irrelevant, since none of the SPECint tests 
use Fortran.


Here are measurements of the absolute sizes of the .text sections for 
the programs as well as percentage changes in those sizes:

.text section size in bytes:

+--------------+---------+---------+---------+-------+

|Benchmark|Unpatched|Unpatched|Patched|MPX|

||Base|SafeStack|SafeStack||

+--------------+---------+---------+---------+-------+

|400.perlbench |884769|1003041|1003233|1131769|

+--------------+---------+---------+---------+-------+

|401.bzip2|79393|175297|175489|235577 |

+--------------+---------+---------+---------+-------+

|403.gcc|2420209|2545041|2545233|2727913|

+--------------+---------+---------+---------+-------+

|429.mcf|10977|105345|105537|155705 |

+--------------+---------+---------+---------+-------+

|445.gobmk|633953|743585|743777|823993 |

+--------------+---------+---------+---------+-------+

|456.hmmer|258593|358033|358225|432249 |

+--------------+---------+---------+---------+-------+

|458.sjeng|96593|192929|193121|251545 |

+--------------+---------+---------+---------+-------+

|462.libquantum|32441|127065|127257|177545 |

+--------------+---------+---------+---------+-------+

|464.h264ref|539713|638705|638897|736729 |

+--------------+---------+---------+---------+-------+

|471.omnetpp|403521|527345|527537|597801 |

+--------------+---------+---------+---------+-------+

|473.astar|31169|126225|126417|178105 |

+--------------+---------+---------+---------+-------+

|483.xalancbmk |2358241|2725921 |2726113|2936841|

+--------------+---------+---------+---------+-------+

Percentage changes in .text section size relative to Unpatched Base:

+--------------+---------+---------+-------+

|Benchmark|Unpatched|Patched|MPX|

||SafeStack|SafeStack||

+--------------+---------+---------+-------+

|400.perlbench |13.37|13.39|27.92|

+--------------+---------+---------+-------+

|401.bzip2|120.80|121.04|196.72 |

+--------------+---------+---------+-------+

|403.gcc|5.16|5.17|12.71|

+--------------+---------+---------+-------+

|429.mcf|859.69|861.44|1318.47|

+--------------+---------+---------+-------+

|445.gobmk|17.29|17.32|29.98|

+--------------+---------+---------+-------+

|456.hmmer|38.45|38.53|67.15|

+--------------+---------+---------+-------+

|458.sjeng|99.73|99.93|160.42 |

+--------------+---------+---------+-------+

|462.libquantum|291.68|292.27|447.29 |

+--------------+---------+---------+-------+

|464.h264ref|18.34|18.38|36.50|

+--------------+---------+---------+-------+

|471.omnetpp|30.69|30.73|48.15|

+--------------+---------+---------+-------+

|473.astar|304.97|305.59|471.42 |

+--------------+---------+---------+-------+

|483.xalancbmk |15.59|15.60|24.54|

+--------------+---------+---------+-------+


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170217/97ffc8cd/attachment.html>

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Feb 2017 - [RFC] Using Intel MPX to harden SafeStack

[llvm-dev] [RFC] Using Intel MPX to harden SafeStack

[llvm-dev] [RFC] Using Intel MPX to harden SafeStack

[llvm-dev] [RFC] Using Intel MPX to harden SafeStack

[llvm-dev] [RFC] Using Intel MPX to harden SafeStack

[llvm-dev] [RFC] Using Intel MPX to harden SafeStack

Apparently Analagous Threads